Commit Graph

12176 Commits

Author SHA1 Message Date
Linus Torvalds
2355e42903 Merge git://github.com/rustyrussell/linux
* git://github.com/rustyrussell/linux:
  params: make dashes and underscores in parameter names truly equal
  kmod: prevent kmod_loop_msg overflow in __request_module()
2011-10-26 14:39:47 +02:00
Michal Schmidt
b1e4d20cbf params: make dashes and underscores in parameter names truly equal
The user may use "foo-bar" for a kernel parameter defined as "foo_bar".
Make sure it works the other way around too.

Apply the equality of dashes and underscores on early_params and __setup
params as well.

The example given in Documentation/kernel-parameters.txt indicates that
this is the intended behaviour.

With the patch the kernel accepts "log-buf-len=1M" as expected.
https://bugzilla.redhat.com/show_bug.cgi?id=744545

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (neatened implementations)
2011-10-26 13:10:39 +10:30
Jiri Kosina
37252db6aa kmod: prevent kmod_loop_msg overflow in __request_module()
Due to post-increment in condition of kmod_loop_msg in __request_module(),
the system log can be spammed by much more than 5 instances of the 'runaway
loop' message if the number of events triggering it makes the kmod_loop_msg
to overflow.

Fix that by making sure we never increment it past the threshold.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
CC: stable@kernel.org
2011-10-26 13:10:39 +10:30
Linus Torvalds
7e0bb71e75 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
  PM / Clocks: Remove redundant NULL checks before kfree()
  PM / Documentation: Update docs about suspend and CPU hotplug
  ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist.
  ARM: mach-shmobile: sh7372 A4R support (v4)
  ARM: mach-shmobile: sh7372 A3SP support (v4)
  PM / Sleep: Mark devices involved in wakeup signaling during suspend
  PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
  PM / Hibernate: Do not initialize static and extern variables to 0
  PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
  PM / Hibernate: Add resumedelay kernel param in addition to resumewait
  MAINTAINERS: Update linux-pm list address
  PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs
  PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
  PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
  PM / Hibernate: Fix typo in a kerneldoc comment
  PM / Hibernate: Freeze kernel threads after preallocating memory
  PM: Update the policy on default wakeup settings
  PM / VT: Cleanup #if defined uglyness and fix compile error
  PM / Suspend: Off by one in pm_suspend()
  PM / Hibernate: Include storage keys in hibernation image on s390
  ...
2011-10-25 15:18:39 +02:00
Linus Torvalds
8a9ea3237e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1745 commits)
  dp83640: free packet queues on remove
  dp83640: use proper function to free transmit time stamping packets
  ipv6: Do not use routes from locally generated RAs
  |PATCH net-next] tg3: add tx_dropped counter
  be2net: don't create multiple RX/TX rings in multi channel mode
  be2net: don't create multiple TXQs in BE2
  be2net: refactor VF setup/teardown code into be_vf_setup/clear()
  be2net: add vlan/rx-mode/flow-control config to be_setup()
  net_sched: cls_flow: use skb_header_pointer()
  ipv4: avoid useless call of the function check_peer_pmtu
  TCP: remove TCP_DEBUG
  net: Fix driver name for mdio-gpio.c
  ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT
  rtnetlink: Add missing manual netlink notification in dev_change_net_namespaces
  ipv4: fix ipsec forward performance regression
  jme: fix irq storm after suspend/resume
  route: fix ICMP redirect validation
  net: hold sock reference while processing tx timestamps
  tcp: md5: add more const attributes
  Add ethtool -g support to virtio_net
  ...

Fix up conflicts in:
 - drivers/net/Kconfig:
	The split-up generated a trivial conflict with removal of a
	stale reference to Documentation/networking/net-modules.txt.
	Remove it from the new location instead.
 - fs/sysfs/dir.c:
	Fairly nasty conflicts with the sysfs rb-tree usage, conflicting
	with Eric Biederman's changes for tagged directories.
2011-10-25 13:25:22 +02:00
Linus Torvalds
1be025d3cb Merge branch 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
* 'usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (260 commits)
  usb: renesas_usbhs: fixup inconsistent return from usbhs_pkt_push()
  usb/isp1760: Allow to optionally trigger low-level chip reset via GPIOLIB.
  USB: gadget: midi: memory leak in f_midi_bind_config()
  USB: gadget: midi: fix range check in f_midi_out_open()
  QE/FHCI: fixed the CONTROL bug
  usb: renesas_usbhs: tidyup for smatch warnings
  USB: Fix USB Kconfig dependency problem on 85xx/QoirQ platforms
  EHCI: workaround for MosChip controller bug
  usb: gadget: file_storage: fix race on unloading
  USB: ftdi_sio.c: Use ftdi async_icount structure for TIOCMIWAIT, as in other drivers
  USB: ftdi_sio.c:Fill MSR fields of the ftdi async_icount structure
  USB: ftdi_sio.c: Fill LSR fields of the ftdi async_icount structure
  USB: ftdi_sio.c:Fill TX field of the ftdi async_icount structure
  USB: ftdi_sio.c: Fill the RX field of the ftdi async_icount structure
  USB: ftdi_sio.c: Basic icount infrastructure for ftdi_sio
  usb/isp1760: Let OF bindings depend on general CONFIG_OF instead of PPC_OF .
  USB: ftdi_sio: Support TI/Luminary Micro Stellaris BD-ICDI Board
  USB: Fix runtime wakeup on OHCI
  xHCI/USB: Make xHCI driver have a BOS descriptor.
  usb: gadget: add new usb gadget for ACM and mass storage
  ...
2011-10-25 12:23:15 +02:00
Linus Torvalds
59e5253417 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
  MAINTAINERS: linux-m32r is moderated for non-subscribers
  linux@lists.openrisc.net is moderated for non-subscribers
  Drop default from "DM365 codec select" choice
  parisc: Kconfig: cleanup Kernel page size default
  Kconfig: remove redundant CONFIG_ prefix on two symbols
  cris: remove arch/cris/arch-v32/lib/nand_init.S
  microblaze: add missing CONFIG_ prefixes
  h8300: drop puzzling Kconfig dependencies
  MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
  tty: drop superfluous dependency in Kconfig
  ARM: mxc: fix Kconfig typo 'i.MX51'
  Fix file references in Kconfig files
  aic7xxx: fix Kconfig references to READMEs
  Fix file references in drivers/ide/
  thinkpad_acpi: Fix printk typo 'bluestooth'
  bcmring: drop commented out line in Kconfig
  btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
  doc: raw1394: Trivial typo fix
  CIFS: Don't free volume_info->UNC until we are entirely done with it.
  treewide: Correct spelling of successfully in comments
  ...
2011-10-25 12:11:02 +02:00
Linus Torvalds
36b8d186e6 Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-security
* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits)
  TOMOYO: Fix incomplete read after seek.
  Smack: allow to access /smack/access as normal user
  TOMOYO: Fix unused kernel config option.
  Smack: fix: invalid length set for the result of /smack/access
  Smack: compilation fix
  Smack: fix for /smack/access output, use string instead of byte
  Smack: domain transition protections (v3)
  Smack: Provide information for UDS getsockopt(SO_PEERCRED)
  Smack: Clean up comments
  Smack: Repair processing of fcntl
  Smack: Rule list lookup performance
  Smack: check permissions from user space (v2)
  TOMOYO: Fix quota and garbage collector.
  TOMOYO: Remove redundant tasklist_lock.
  TOMOYO: Fix domain transition failure warning.
  TOMOYO: Remove tomoyo_policy_memory_lock spinlock.
  TOMOYO: Simplify garbage collector.
  TOMOYO: Fix make namespacecheck warnings.
  target: check hex2bin result
  encrypted-keys: check hex2bin result
  ...
2011-10-25 09:45:31 +02:00
David S. Miller
1805b2f048 Merge branch 'master' of ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2011-10-24 18:18:09 -04:00
Peter Zijlstra
bcd5cff721 cputimer: Cure lock inversion
There's a lock inversion between the cputimer->lock and rq->lock;
notably the two callchains involved are:

 update_rlimit_cpu()
   sighand->siglock
   set_process_cpu_timer()
     cpu_timer_sample_group()
       thread_group_cputimer()
         cputimer->lock
         thread_group_cputime()
           task_sched_runtime()
             ->pi_lock
             rq->lock

 scheduler_tick()
   rq->lock
   task_tick_fair()
     update_curr()
       account_group_exec()
         cputimer->lock

Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
the second one is keeping up-to-date.

This problem was introduced by e8abccb719 ("posix-cpu-timers: Cure
SMP accounting oddities").

Cure the problem by removing the cputimer->lock and rq->lock nesting,
this leaves concurrent enablers doing duplicate work, but the time
wasted should be on the same order otherwise wasted spinning on the
lock and the greater-than assignment filter should ensure we preserve
monotonicity.

Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Simon Kirby <sim@hostway.ca>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twins
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-10-18 11:36:59 +02:00
Linus Torvalds
a84a79e4d3 Avoid using variable-length arrays in kernel/sys.c
The size is always valid, but variable-length arrays generate worse code
for no good reason (unless the function happens to be inlined and the
compiler sees the length for the simple constant it is).

Also, there seems to be some code generation problem on POWER, where
Henrik Bakken reports that register r28 can get corrupted under some
subtle circumstances (interrupt happening at the wrong time?).  That all
indicates some seriously broken compiler issues, but since variable
length arrays are bad regardless, there's little point in trying to
chase it down.

"Just don't do that, then".

Reported-by: Henrik Grindal Bakken <henribak@cisco.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-17 08:24:24 -07:00
Bojan Smojver
081a9d043c PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
Use threads for LZO compression/decompression on hibernate/thaw.
Improve buffering on hibernate/thaw.
Calculate/verify CRC32 of the image pages on hibernate/thaw.

In my testing, this improved write/read speed by a factor of about two.

Signed-off-by: Bojan Smojver <bojan@rexursive.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:30:38 +02:00
Barry Song
d231ff1af7 PM / Hibernate: Do not initialize static and extern variables to 0
Static and extern variables in kernel/power/hibernate.c need not be
initialized to 0 explicitly, so remove those initializations.

[rjw: Modified subject, added changelog.]

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:30:38 +02:00
Jeff Layton
27920651fe PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
TASK_KILLABLE is often used to put tasks to sleep for quite some time.
One of the most common uses is to put tasks to sleep while waiting for
replies from a server on a networked filesystem (such as CIFS or NFS).

Unfortunately, fake_signal_wake_up does not currently wake up tasks
that are sleeping in TASK_KILLABLE state. This means that even if the
code were in place to allow them to freeze while in this sleep, it
wouldn't work anyway.

This patch changes this function to wake tasks in this state as well.
This should be harmless -- if the code doing the sleeping doesn't have
handling to deal with freezer events, it should just go back to sleep.
If it does, then this will allow that code to do the right thing.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:30:37 +02:00
Barry Song
f126f7334f PM / Hibernate: Add resumedelay kernel param in addition to resumewait
Patch "PM / Hibernate: Add resumewait param to support MMC-like
devices as resume file" added the resumewait kernel command line
option.  The present patch adds resumedelay so that
resumewait/delay were analogous to rootwait/delay.

[rjw: Modified the subject and changelog slightly.]

Signed-off-by: Barry Song <baohua.song@csr.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:30:37 +02:00
Barry Song
6f8d7022a8 PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
Some devices like MMC are async detected very slow. For example,
drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect
MMC partitions then add disk.

We have wait_for_device_probe() and scsi_complete_async_scans()
before calling swsusp_check(), but it is not enough to wait for MMC.

This patch adds resumewait kernel param just like rootwait so
that we have enough time to wait until MMC is ready. The difference is
that we wait for resume partition whereas rootwait waits for rootfs
partition (which may be on a different device).

This patch will make hibernation support many embedded products
without SCSI devices, but with devices like MMC.

[rjw: Modified the changelog slightly.]

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:30:36 +02:00
Barry Song
21e82808fc PM / Hibernate: Fix typo in a kerneldoc comment
Fix a typo in a function name in the kerneldoc comment next to
resume_target_kernel().

[rjw: Changed the subject slightly, added the changelog.]

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:52 +02:00
Rafael J. Wysocki
2aede851dd PM / Hibernate: Freeze kernel threads after preallocating memory
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers.  Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out.  Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation.  If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).

Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.

Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.

Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:52 +02:00
H Hartley Sweeten
37cce26b32 PM / VT: Cleanup #if defined uglyness and fix compile error
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
the #if defined ugliness for the vt suspend support functions. Note that
CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.

The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
not set:

drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here

Also, remove the incorrect path from the comment in console.c.

[rjw: Replaced #if defined() with #ifdef in suspend.h.]

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:28:51 +02:00
Dan Carpenter
528f7ce6e4 PM / Suspend: Off by one in pm_suspend()
In enter_state() we use "state" as an offset for the pm_states[]
array.  The pm_states[] array only has PM_SUSPEND_MAX elements so
this test is off by one.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
2011-10-16 23:27:46 +02:00
Martin Schwidefsky
85055dd805 PM / Hibernate: Include storage keys in hibernation image on s390
For s390 there is one additional byte associated with each page,
the storage key. This byte contains the referenced and changed
bits and needs to be included into the hibernation image.
If the storage keys are not restored to their previous state all
original pages would appear to be dirty. This can cause
inconsistencies e.g. with read-only filesystems.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:46 +02:00
Rafael J. Wysocki
ca123102f6 PM: Fix build issue in main.c for CONFIG_PM_SLEEP unset
Suspend statistics should depend on CONFIG_PM_SLEEP, so make that
happen.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:46 +02:00
ShuoX Liu
2a77c46de1 PM / Suspend: Add statistics debugfs file for suspend to RAM
Record S3 failure time about each reason and the latest two failed
devices' names in S3 progress.
We can check it through 'suspend_stats' entry in debugfs.

The motivation of the patch:

We are enabling power features on Medfield. Comparing with PC/notebook,
a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
more frequently. If it can't enter suspend-2-ram in time, the power
might be used up soon.

We often find sometimes, a device suspend fails. Then, system retries
s3 over and over again. As display is off, testers and developers
don't know what happens.

Some testers and developers complain they don't know if system
tries suspend-2-ram, and what device fails to suspend. They need
such info for a quick check. The patch adds suspend_stats under
debugfs for users to check suspend to RAM statistics quickly.

If not using this patch, we have other methods to get info about
what device fails. One is to turn on  CONFIG_PM_DEBUG, but users
would get too much info and testers need recompile the system.

In addition, dynamic debug is another good tool to dump debug info.
But it still doesn't match our utilization scenario closely.
1) user need write a user space parser to process the syslog output;
2) Our testing scenario is we leave the mobile for at least hours.
   Then, check its status. No serial console available during the
   testing. One is because console would be suspended, and the other
   is serial console connecting with spi or HSU devices would consume
   power. These devices are powered off at suspend-2-ram.

Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-10-16 23:27:45 +02:00
Rafael J. Wysocki
9696cc9007 Merge branch 'pm-qos' into pm-for-linus
* pm-qos:
  PM / QoS: Update Documentation for the pm_qos and dev_pm_qos frameworks
  PM / QoS: Add function dev_pm_qos_read_value() (v3)
  PM QoS: Add global notification mechanism for device constraints
  PM QoS: Implement per-device PM QoS constraints
  PM QoS: Generalize and export constraints management code
  PM QoS: Reorganize data structs
  PM QoS: Code reorganization
  PM QoS: Minor clean-ups
  PM QoS: Move and rename the implementation files
2011-10-07 23:17:07 +02:00
Rafael J. Wysocki
d727b60659 Merge branch 'pm-runtime' into pm-for-linus
* pm-runtime:
  PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
  PM / Runtime: Replace dev_dbg() with trace_rpm_*()
  PM / Runtime: Introduce trace points for tracing rpm_* functions
  PM / Runtime: Don't run callbacks under lock for power.irq_safe set
  USB: Add wakeup info to debugging messages
  PM / Runtime: pm_runtime_idle() can be called in atomic context
  PM / Runtime: Add macro to test for runtime PM events
  PM / Runtime: Add might_sleep() to runtime PM functions
2011-10-07 23:16:55 +02:00
David S. Miller
88c5100c28 Merge branch 'master' of github.com:davem330/net
Conflicts:
	net/batman-adv/soft-interface.c
2011-10-07 13:38:43 -04:00
Vasily Averin
349d2895cc ipv4: NET_IPV4_ROUTE_GC_INTERVAL removal
removing obsoleted sysctl,
ip_rt_gc_interval variable no longer used since 2.6.38

Signed-off-by: Vasily Averin <vvs@sw.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03 14:13:01 -04:00
Linus Torvalds
f72a209a3e Merge branches 'irq-urgent-for-linus', 'x86-urgent-for-linus' and 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  irq: Fix check for already initialized irq_domain in irq_domain_add
  irq: Add declaration of irq_domain_simple_ops to irqdomain.h

* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86/rtc: Don't recursively acquire rtc_lock

* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  posix-cpu-timers: Cure SMP wobbles
  sched: Fix up wchan borkage
  sched/rt: Migrate equal priority tasks to available CPUs
2011-10-01 08:37:25 -07:00
Peter Zijlstra
d670ec1317 posix-cpu-timers: Cure SMP wobbles
David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$ 

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&barrier, NULL, 2);
	  err = pthread_create(&th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&barrier);

	  err = clock_gettime(process_clock, &process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&sleeptime, NULL);

	  err = clock_gettime(th_clock, &th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.

Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-30 14:07:06 +02:00
Ram Pai
47ea91b405 Resource: fix wrong resource window calculation
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window.  This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.

__find_resource() looks for gaps in resource allocation among the
children resource windows.  When it encounters the last child window it
blindly tries the range next to one allocated to the last child.  Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation.  This leads to a conflicting
window allocation.

Michal Ludvig reported this issue seen on his platform.  The following
patch fixes the problem and has been verified by Michal.  I believe this
bug has been there for ages.  It got exposed by git commit 2bbc694227
("PCI : ability to relocate assigned pci-resources")

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michal Ludvig <mludvig@logix.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-29 20:04:34 -07:00
Serge Hallyn
d178bc3a70 user namespace: usb: make usb urbs user namespace aware (v2)
Add to the dev_state and alloc_async structures the user namespace
corresponding to the uid and euid.  Pass these to kill_pid_info_as_uid(),
which can then implement a proper, user-namespace-aware uid check.

Changelog:
Sep 20: Per Oleg's suggestion: Instead of caching and passing user namespace,
	uid, and euid each separately, pass a struct cred.
Sep 26: Address Alan Stern's comments: don't define a struct cred at
	usbdev_open(), and take and put a cred at async_completed() to
	ensure it lasts for the duration of kill_pid_info_as_cred().

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-29 13:13:08 -07:00
Ming Lei
2a5306cc5f PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
Do not build kernel/trace/rpm-traces.c if CONFIG_PM_RUNTIME is not
set, which avoids a build failure.

[rjw: Added the changelog and modified the subject slightly.]

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-29 22:07:23 +02:00
Vladimir Zapolskiy
f786ecba41 connector: add comm change event report to proc connector
Add an event to monitor comm value changes of tasks.  Such an event
becomes vital, if someone desires to control threads of a process in
different manner.

A natural characteristic of threads is its comm value, and helpfully
application developers have an opportunity to change it in runtime.
Reporting about such events via proc connector allows to fine-grain
monitoring and control potentials, for instance a process control daemon
listening to proc connector and following comm value policies can place
specific threads to assigned cgroup partitions.

It might be possible to achieve a pale partial one-shot likeness without
this update, if an application changes comm value of a thread generator
task beforehand, then a new thread is cloned, and after that proc
connector listener gets the fork event and reads new thread's comm value
from procfs stat file, but this change visibly simplifies and extends the
matter.

Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-28 13:41:50 -04:00
Ming Lei
53b615ccca PM / Runtime: Introduce trace points for tracing rpm_* functions
This patch introduces 3 trace points to prepare for tracing
rpm_idle/rpm_suspend/rpm_resume functions, so we can use these
trace points to replace the current dev_dbg().

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
2011-09-27 22:53:27 +02:00
Joe Perches
bfb9035c98 treewide: Correct spelling of successfully in comments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27 18:08:04 +02:00
Simon Kirby
6ebbe7a07b sched: Fix up wchan borkage
Commit c259e01a1e ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.

Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-26 12:51:08 +02:00
Oleg Nesterov
f9d81f61c8 ptrace: PTRACE_LISTEN forgets to unlock ->siglock
If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock.

Reported-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-25 11:02:00 -07:00
Rob Herring
eef24afb28 irq: Fix check for already initialized irq_domain in irq_domain_add
The sanity check in irq_domain_add() tests desc->irq_data != NULL or
irq_data->domain != NULL. This prevents adding an irq_domain to a irq
descriptor when irq_data exists, which true when the irq descriptor
exists.

This went unnoticed so far as the simple domain code did not enter
this code path because domain->nr_irqs is always 0 for the simple domains.

Split the check for irq_data == NULL out and have a separate warning
for it.

[ tglx: Made the check for irq_data == NULL separate ]

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: marc.zyngier@arm.com
Cc: thomas.abraham@linaro.org
Cc: jamie@jamieiles.com
Cc: b-cousson@ti.com
Cc: shawn.guo@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree-discuss@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1316017900-19918-3-git-send-email-robherring2@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-20 12:16:22 +02:00
Linus Torvalds
9d037a7776 Merge branch 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  x86, iommu: Mark DMAR IRQ as non-threaded
  genirq: Make irq_shutdown() symmetric vs. irq_startup again
2011-09-19 17:23:41 -07:00
Linus Torvalds
58c3c3aa01 Make taskstats round statistics down to nearest 1k bytes/events
Even with just the interface limited to admin, there really is little to
reason to give byte-per-byte counts for taskstats.  So round it down to
something less intrusive.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-19 17:10:57 -07:00
Linus Torvalds
1a51410abe Make TASKSTATS require root access
Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more.  But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.

Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.

Reported-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-19 17:04:37 -07:00
Shawn Bohrer
3be209a8e2 sched/rt: Migrate equal priority tasks to available CPUs
Commit 43fa5460fe ("sched: Try not to
migrate higher priority RT tasks") also introduced a change in behavior
which keeps RT tasks on the same CPU if there is an equal priority RT
task currently running even if there are empty CPUs available.

This can cause unnecessary wakeup latencies, and can prevent the
scheduler from balancing all RT tasks across available CPUs.

This change causes an RT task to search for a new CPU if an equal
priority RT task is already running on wakeup.  Lower priority tasks
will still have to wait on higher priority tasks, but the system should
still balance out because there is always the possibility that if there
are both a high and low priority RT tasks on a given CPU that the high
priority task could wakeup while the low priority task is running and
force it to search for a better runqueue.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org # 37+
Link: http://lkml.kernel.org/r/1315837684-18733-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-09-18 13:48:56 +02:00
Jiri Kosina
e060c38434 Merge branch 'master' into for-next
Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.
2011-09-15 15:08:18 +02:00
Bart Van Assche
ca4a04cf3d futex: Fix spelling in a source code comment
Change a single occurrence of "unlcoked" into "unlocked".

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:37:17 +02:00
Vitaliy Ivanov
7cfdaf38d4 futex: uninitialized warning corrections
The variables here are really not used uninitialized.

kernel/futex.c: In function 'fixup_pi_state_owner.clone.17':
kernel/futex.c:1582:6: warning: 'curval' may be used uninitialized in this function
kernel/futex.c: In function 'handle_futex_death':
kernel/futex.c:2486:6: warning: 'nval' may be used uninitialized in this function
kernel/futex.c: In function 'do_futex':
kernel/futex.c:863:11: warning: 'curval' may be used uninitialized in this function
kernel/futex.c:828:6: note: 'curval' was declared here
kernel/futex.c:898:5: warning: 'oldval' may be used uninitialized in this function
kernel/futex.c:890:6: note: 'oldval' was declared here

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:23:07 +02:00
Vitaliy Ivanov
124ff4e53a async: uninitialized warning corrections
The variables here are really not used uninitialized.

kernel/async.c: In function 'async_synchronize_cookie_domain':
kernel/async.c:270:10: warning: 'starttime.tv64' may be used uninitialized in this function
kernel/async.c: In function 'async_run_entry_fn':
kernel/async.c:122:10: warning: 'calltime.tv64' may be used uninitialized in this function

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-15 14:22:28 +02:00
Thomas Tuttle
fa2563e41c workqueue: lock cwq access in drain_workqueue
Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
and then incrementing nr_active when it activates a delayed work.

We discovered this when a corner case in one of our drivers resulted in
us trying to destroy a workqueue in which the remaining work would
always requeue itself again in the same workqueue.  We would hit this
race condition and trip the BUG_ON on workqueue.c:3080.

Signed-off-by: Thomas Tuttle <ttuttle@chromium.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-14 18:09:38 -07:00
Geert Uytterhoeven
ed585a6516 genirq: Make irq_shutdown() symmetric vs. irq_startup again
If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or
.irq_mask(), free_irq() crashes when jumping to NULL.
Fix this by only trying .irq_disable() and .irq_mask() if there's no
.irq_shutdown() provided.

This revives the symmetry with irq_startup(), which tries .irq_startup(),
.irq_enable(), and irq_unmask(), and makes it consistent with the comment for
irq_chip.irq_shutdown() in <linux/irq.h>, which says:

 * @irq_shutdown:	shut down the interrupt (defaults to ->disable if NULL)

This is also how __free_irq() behaved before the big overhaul, cfr. e.g.
3b56f0585f ("genirq: Remove bogus conditional"),
where the core interrupt code always overrode .irq_shutdown() to
.irq_disable() if .irq_shutdown() was NULL.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-09-12 09:38:53 +02:00
Linus Torvalds
79016f6488 Merge branch 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  rtc: twl: Fix registration vs. init order
  rtc: Initialized rtc_time->tm_isdst
  rtc: Fix RTC PIE frequency limit
  rtc: rtc-twl: Remove lockdep related local_irq_enable()
  rtc: rtc-twl: Switch to using threaded irq
  rtc: ep93xx: Fix 'rtc' may be used uninitialized warning
  alarmtimers: Avoid possible denial of service with high freq periodic timers
  alarmtimers: Memset itimerspec passed into alarm_timer_get
  alarmtimers: Avoid possible null pointer traversal
2011-09-07 13:03:48 -07:00
Linus Torvalds
e81b693c01 Merge branch 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip
* 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
  sched: Fix a memory leak in __sdt_free()
  sched: Move blk_schedule_flush_plug() out of __schedule()
  sched: Separate the scheduler entry for preemption
2011-09-07 13:01:34 -07:00