linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-14 03:06:42 +07:00

Author	SHA1	Message	Date
Kay Sievers	eb02dac937	kmsg: /proc/kmsg - support reading of partial log records Restore support for partial reads of any size on /proc/kmsg, in case the supplied read buffer is smaller than the record size. Some people seem to think is is ia good idea to run: $ dd if=/proc/kmsg bs=1 of=... as a klog bridge. Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44211 Reported-by: Jukka Ollila <jiiksteri@gmail.com> Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-09 10:05:10 -07:00
Tejun Heo	5db9a4d99b	cgroup: fix cgroup hierarchy umount race `48ddbe1946` "cgroup: make css->refcnt clearing on cgroup removal optional" allowed a css to linger after the associated cgroup is removed. As a css holds a reference on the cgroup's dentry, it means that cgroup dentries may linger for a while. Destroying a superblock which has dentries with positive refcnts is a critical bug and triggers BUG() in vfs code. As each cgroup dentry holds an s_active reference, any lingering cgroup has both its dentry and the superblock pinned and thus preventing premature release of superblock. Unfortunately, after `48ddbe1946`, there's a small window while releasing a cgroup which is directly under the root of the hierarchy. When a cgroup directory is released, vfs layer first deletes the corresponding dentry and then invokes dput() on the parent, which may recurse further, so when a cgroup directly below root cgroup is released, the cgroup is first destroyed - which releases the s_active it was holding - and then the dentry for the root cgroup is dput(). This creates a window where the root dentry's refcnt isn't zero but superblock's s_active is. If umount happens before or during this window, vfs will see the root dentry with non-zero refcnt and trigger BUG(). Before `48ddbe1946`, this problem didn't exist because the last dentry reference was guaranteed to be put synchronously from rmdir(2) invocation which holds s_active around the whole process. Fix it by holding an extra superblock->s_active reference across dput() from css release, which is the dput() path added by `48ddbe1946` and the only one which doesn't hold an extra s_active ref across the final cgroup dput(). Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <4FEEA5CB.8070809@huawei.com> Reported-by: shyju pv <shyju.pv@huawei.com> Tested-by: shyju pv <shyju.pv@huawei.com> Cc: Sasha Levin <levinsasha928@gmail.com> Acked-by: Li Zefan <lizefan@huawei.com>	2012-07-07 16:08:18 -07:00
Tejun Heo	7db5b3ca0e	Revert "cgroup: superblock can't be released with active dentries" This reverts commit `fa980ca87d`. The commit was an attempt to fix a race condition where a cgroup hierarchy may be unmounted with positive dentry reference on root cgroup. While the commit made the race condition slightly more difficult to trigger, the race was still there and could be reliably triggered using a different test case. Revert the incorrect fix. The next commit will describe the race and fix it correctly. Signed-off-by: Tejun Heo <tj@kernel.org> LKML-Reference: <4FEEA5CB.8070809@huawei.com> Reported-by: shyju pv <shyju.pv@huawei.com> Cc: Sasha Levin <levinsasha928@gmail.com> Acked-by: Li Zefan <lizefan@huawei.com>	2012-07-07 15:55:47 -07:00
Kay Sievers	68b6507dc5	kmsg: make sure all messages reach a newly registered boot console We suppress printing kmsg records to the console, which are already printed immediately while we have received their fragments. Newly registered boot consoles print the entire kmsg buffer during registration. Clear the console-suppress flag after we skipped the record during its first storage, so any later print will see these records as usual. Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-06 09:50:09 -07:00
Kay Sievers	cb424ffe9f	kmsg: properly handle concurrent non-blocking read() from /proc/kmsg The /proc/kmsg read() interface is internally simply wired up to a sequence of syslog() syscalls, which might are racy between their checks and actions, regarding concurrency. In the (very uncommon) case of concurrent readers of /dev/kmsg, relying on usual O_NONBLOCK behavior, the recently introduced mutex might block an O_NONBLOCK reader in read(), when poll() returns for it, but another process has already read the data in the meantime. We've seen that while running artificial test setups and tools that "fight" about /proc/kmsg data. This restores the original /proc/kmsg behavior, where in case of concurrent read()s, poll() might wake up but the read() syscall will just return 0 to the caller, while another process has "stolen" the data. This is in the general case not the expected behavior, but it is the exact same one, that can easily be triggered with a 3.4 kernel, and some tools might just rely on it. The mutex is not needed, the original integrity issue which introduced it, is in the meantime covered by: "fill buffer with more than a single message for SYSLOG_ACTION_READ" `116e90b23f` Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-06 09:50:09 -07:00
Kay Sievers	43a73a50b3	kmsg: add the facility number to the syslog prefix After the recent split of facility and level into separate variables, we miss the facility value (always 0 for kernel-originated messages) in the syslog prefix. On Tue, Jul 3, 2012 at 12:45 PM, Dan Carpenter <dan.carpenter@oracle.com> wrote: > Static checkers complain about the impossible condition here. > > In `084681d14e` ('printk: flush continuation lines immediately to > console'), we changed msg->level from being a u16 to being an unsigned > 3 bit bitfield. Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-06 09:50:09 -07:00
Kay Sievers	e3f5a5f271	kmsg: escape the backslash character while exporting data Non-printable characters in the log data are hex-escaped to ensure safe post processing. We need to escape a backslash we find in the data, to be able to distinguish it from a backslash we add for the escaping. Also escape the non-printable character 127. Thanks to Miloslav Trmac for the heads up. Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-06 09:50:09 -07:00
liu chuansheng	5c53d819c7	printk: replacing the raw_spin_lock/unlock with raw_spin_lock/unlock_irq In function devkmsg_read/writev/llseek/poll/open()..., the function raw_spin_lock/unlock is used, there is potential deadlock case happening. CPU1: thread1 doing the cat /dev/kmsg: raw_spin_lock(&logbuf_lock); while (user->seq == log_next_seq) { when thread1 run here, at this time one interrupt is coming on CPU1 and running based on this thread,if the interrupt handle called the printk which need the logbuf_lock spin also, it will cause deadlock. So we should use raw_spin_lock/unlock_irq here. Acked-by: Kay Sievers <kay@vrfy.org> Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-06 09:50:08 -07:00
Ingo Molnar	5c09d127a1	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the RCU tree from Paul E. McKenney: "The major features of this series are: 1. Preventing latency spikes of more than 200 microseconds for kernels built with NR_CPUS=4096, which is reportedly becoming the default for some distros. This is a first step, as it does not help with systems that actually -have- 4096 CPUs (work on this case is in progress, but is not yet ready for mainline). This category also includes improving concurrency of rcu_barrier(), placed here due to conflicts. Posted to LKML at: https://lkml.org/lkml/2012/6/22/381. Note that patches 18-22 of that series have been defered to 3.7, as they have not yet proven themselves to be mainline-ready (and yes, these are the ones intended to get rid of RCU's latency spikes for systems that actually have 4096 CPUs). 2. Updates to documentation and rcutorture fixes, the latter category including improvements to rcu_barrier() testing. Posted to LKML at: http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/04094.html. 3. Miscellaneous fixes posted to LKML at: https://lkml.org/lkml/2012/6/22/500, with the exception of the last commit, which was posted here: http://www.gossamer-threads.com/lists/linux/kernel/1561830 4. RCU_FAST_NO_HZ fixes and improvements. Posted to LKML at: http://lkml.indiana.edu/hypermail/linux/kernel/1206.1/00006.html and http://www.gossamer-threads.com/lists/linux/kernel/1561833. The first four patches of the first series went into 3.5 to fix a regression. 5. Code-style fixes. These were posted to LKML at http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01180.html and http://lkml.indiana.edu/hypermail/linux/kernel/1205.2/01181.html. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-06 16:13:58 +02:00
Paul E. McKenney	5cf05ad758	rcu: Fix broken strings in RCU's source code. Although the C language allows you to break strings across lines, doing this makes it hard for people to find the Linux kernel code corresponding to a given console message. This commit therefore fixes broken strings throughout RCU's source code. Suggested-by: Josh Triplett <josh@joshtriplett.org> Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-06 06:01:49 -07:00
Paul E. McKenney	c701d5d9b3	rcu: Fix code-style issues involving "else" The Linux kernel coding style says that single-statement blocks should omit curly braces unless the other leg of the "if" statement has multiple statements, in which case the curly braces should be included. This commit fixes RCU's violations of this rule. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-06 06:01:48 -07:00
Paul E. McKenney	02a0677b0b	Merge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', 'fixes.2012.07.06a' and 'fnh.2012.07.02a' into HEAD bigrtm: First steps towards getting RCU out of the way of tens-of-microseconds real-time response on systems compiled with NR_CPUS=4096. Also cleanups for and increased concurrency of rcu_barrier() family of primitives. doctorture: rcutorture and documentation improvements. fixes: Miscellaneous fixes. fnh: RCU_FAST_NO_HZ fixes and improvements.	2012-07-06 05:59:30 -07:00
Paul E. McKenney	cfca927972	rcu: Introduce check for callback list/count mismatch The recent bug that introduced the RCU callback list/count mismatch showed the need for a diagnostic to check for this, which this commit adds. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-06 05:55:16 -07:00
Grant Likely	74a7f08448	devicetree: add helper inline for retrieving a node's full name The pattern (np ? np->full_name : "<none>") is rather common in the kernel, but can also make for quite long lines. This patch adds a new inline function, of_node_full_name() so that the test for a valid node pointer doesn't need to be open coded at all call sites. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rob Herring <rob.herring@calxeda.com>	2012-07-06 07:16:34 -05:00
Ingo Molnar	40b3c43f04	Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull low probability CONFIG_RCU_BOOST=y deadlock fix from Paul E. McKenney. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-06 11:18:50 +02:00
Ingo Molnar	35c2f48c66	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull tracing updates from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-06 11:12:17 +02:00
Ingo Molnar	90574ebb7e	Merge branch 'perf/urgent' into perf/core Merge this branch to pick up a fixlet and to update to a more recent base. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 21:10:23 +02:00
Peter Zijlstra	5167e8d541	sched/nohz: Rewrite and fix load-avg computation -- again Thanks to Charles Wang for spotting the defects in the current code: - If we go idle during the sample window -- after sampling, we get a negative bias because we can negate our own sample. - If we wake up during the sample window we get a positive bias because we push the sample to a known active period. So rewrite the entire nohz load-avg muck once again, now adding copious documentation to the code. Reported-and-tested-by: Doug Smythies <dsmythies@telus.net> Reported-and-tested-by: Charles Wang <muming.wq@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins [ minor edits ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 20:58:13 +02:00
Salman Qazi	164c33c6ad	sched: Fix fork() error path to not crash In dup_task_struct(), if arch_dup_task_struct() fails, the clean up code fails to clean up correctly. That's because the clean up code depends on unininitalized ti->task pointer. We fix this by making sure that the task and thread_info know about each other before we attempt to take the error path. Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120626011815.11323.5533.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-05 20:57:32 +02:00
David S. Miller	c90a9bb907	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-07-05 03:44:25 -07:00
Linus Torvalds	a3da2c6913	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block bits from Jens Axboe: "As vacation is coming up, thought I'd better get rid of my pending changes in my for-linus branch for this iteration. It contains: - Two patches for mtip32xx. Killing a non-compliant sysfs interface and moving it to debugfs, where it belongs. - A few patches from Asias. Two legit bug fixes, and one killing an interface that is no longer in use. - A patch from Jan, making the annoying partition ioctl warning a bit less annoying, by restricting it to !CAP_SYS_RAWIO only. - Three bug fixes for drbd from Lars Ellenberg. - A fix for an old regression for umem, it hasn't really worked since the plugging scheme was changed in 3.0. - A few fixes from Tejun. - A splice fix from Eric Dumazet, fixing an issue with pipe resizing." * 'for-linus' of git://git.kernel.dk/linux-block: scsi: Silence unnecessary warnings about ioctl to partition block: Drop dead function blk_abort_queue() block: Mitigate lock unbalance caused by lock switching block: Avoid missed wakeup in request waitqueue umem: fix up unplugging splice: fix racy pipe->buffers uses drbd: fix null pointer dereference with on-congestion policy when diskless drbd: fix list corruption by failing but already aborted reads drbd: fix access of unallocated pages and kernel panic xen/blkfront: Add WARN to deal with misbehaving backends. blkcg: drop local variable @q from blkg_destroy() mtip32xx: Create debugfs entries for troubleshooting mtip32xx: Remove 'registers' and 'flags' from sysfs blkcg: fix blkg_alloc() failure path block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED block: fix return value on cfq_init() failure mtip32xx: Remove version.h header file inclusion xen/blkback: Copy id field when doing BLKIF_DISCARD.	2012-07-03 15:45:10 -07:00
Paul E. McKenney	9d2ad24306	rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to also disable itself. This commit therefore checks for tick_nohz_enabled being zero, disabling rcu_prepare_for_idle() if so. This commit assumes that tick_nohz_enabled can change at runtime: If this is not the case, then a simpler approach suffices. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:43 -07:00
Paul E. McKenney	e84c48ae30	rcu: Round FAST_NO_HZ lazy timeout to nearest second Currently, if several CPUs in the same package have all lazy RCU callbacks, their wakeups will be uncorrelated. If all the CPUs are in the same power domain (as is often the case), this will result in unnecessary power-ups of the package. This commit therefore uses round_jiffies() to round the timeouts to a second boundary, increasing the odds that they can be coalesced with each other or with other timeouts. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:42 -07:00
Paul E. McKenney	28f8555364	rcu: The rcu_needs_cpu() function is not a quiescent state The TINY_PREEMPT_RCU() function rcu_preempt_needs_cpu(), which is called from rcu_needs_cpu(), assumes that it is in a quiescent state with respect to the CPU. This is no longer the case. This commit therefore updates rcu_preempt_needs_cpu() to make it aware that it is not running in a quiescent state. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-07-02 12:34:42 -07:00
Paul E. McKenney	bf1304e9cd	rcu: Dump only the current CPU's buffers for idle-entry/exit warnings Problems in RCU idle entry and exit are almost always confined to the offending CPU. This commit therefore switches ftrace_dump() from DUMP_ALL to DUMP_ORIG. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-07-02 12:34:42 -07:00
Paul E. McKenney	cf01537ecf	rcu: Add check for CPUs going offline with callbacks queued If a CPU goes offline with callbacks queued, those callbacks might be indefinitely postponed, which can result in a system hang. This commit therefore inserts warnings for this condition. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:25 -07:00
Paul E. McKenney	95f0c1de3e	rcu: Disable preemption in rcu_blocking_is_gp() It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu() for uniprocessor optimization, which means that rcu_blocking_is_gp() can no longer rely on RCU read-side critical sections having disabled preemption. This commit therefore disables preemption across rcu_blocking_is_gp()'s scan of the cpu_online_mask. (Updated from previous version to fix embarrassing bug spotted by Wu Fengguang.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:25 -07:00
Carsten Emde	1c17e4d443	rcu: Prevent uninitialized string in RCU CPU stall info An uninitialized string may be displayed at the end of the rcu_preempt detected stall info such as 0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D ^^^^^^^^^^ if CONFIG_RCU_FAST_NO_HZ is not defined. This trivial patch clears the string in this case. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:25 -07:00
Paul E. McKenney	d7118175cc	rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU The rcu_is_cpu_idle() function is used if CONFIG_DEBUG_LOCK_ALLOC, but TINY_RCU defines it only when CONFIG_PROVE_RCU. This causes build failures when CONFIG_DEBUG_LOCK_ALLOC=y but CONFIG_PROVE_RCU=n. This commit therefore adjusts the #ifdefs for rcu_is_cpu_idle() so that it is defined when CONFIG_DEBUG_LOCK_ALLOC=y. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:25 -07:00
Paul E. McKenney	29154c57e3	rcu: Split RCU core processing out of __call_rcu() The __call_rcu() function is a bit overweight, so this commit splits it into actual enqueuing of and accounting for the callback (__call_rcu()) and associated RCU-core processing (__call_rcu_core()). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:24 -07:00
Paul E. McKenney	a16b7a6934	rcu: Prevent __call_rcu() from invoking RCU core on offline CPUs The __call_rcu() function will invoke the RCU core, for example, if it detects that the current CPU has too many callbacks. However, this can happen on an offline CPU that is on its way to the idle loop, in which case it is an error to invoke the RCU core, and the excess callbacks will be adopted in any case. This commit therefore adds checks to __call_rcu() for running on an offline CPU, refraining from invoking the RCU core in this case. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:24 -07:00
Paul E. McKenney	62fde6edf1	rcu: Make __call_rcu() handle invocation from idle Although __call_rcu() is handled correctly when called from a momentary non-idle period, if it is called on a CPU that RCU believes to be idle on RCU_FAST_NO_HZ kernels, the callback might be indefinitely postponed. This commit therefore ensures that RCU is aware of the new callback and has a chance to force the CPU out of dyntick-idle mode when a new callback is posted. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:24 -07:00
Paul E. McKenney	2a3fa843b5	rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of __rcu_read_lock() and __rcu_read_unlock() are identical, so this commit consolidates them into kernel/rcupdate.h. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:23 -07:00
Paul E. McKenney	1d1fb395f6	rcu: Add ACCESS_ONCE() to ->qlen accesses The _rcu_barrier() function accesses other CPUs' rcu_data structure's ->qlen field without benefit of locking. This commit therefore adds the required ACCESS_ONCE() wrappers around accesses and updates that need it. ACCESS_ONCE() is not needed when a CPU accesses its own ->qlen, or in code that cannot run while _rcu_barrier() is sampling ->qlen fields. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:22 -07:00
Paul E. McKenney	3f5d3ea64f	rcu: Consolidate duplicate callback-list initialization There are a couple of open-coded initializations of the rcu_data structure's RCU callback list. This commit therefore consolidates them into a new init_callback_list() function. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:21 -07:00
Paul E. McKenney	285fe29481	rcu: Fix detection of abruptly-ending stall The code that attempts to identify stalls that end just as we detect them is broken by both flavors of initialization failure. This commit therefore properly initializes and computes the count of the number of reasons why the RCU grace period is stalled. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:21 -07:00
Paul E. McKenney	72472a02a9	rcu: Make rcutorture fakewriters invoke rcu_barrier() The current rcutorture rcu_barrier() testing never intentionally runs more than one instance of rcu_barrier() at a given time. This fails to test the the shiny new concurrency features of rcu_barrier(). This commit therefore modifies the rcutorture fakewriter kthread to randomly invoke rcu_barrier() rather than the usual synchronize_rcu(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:04 -07:00
Paul E. McKenney	143aa672f4	rcu: Fix diagnostic-printk typo in rcutorture The rcu_torture_barrier() function has a copy-and-paste typo in the string passed to rcutorture_shutdown_absorb(), which this commit fixes. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:03 -07:00
Paul E. McKenney	c6ebcbb60c	rcu: Fix bug in rcu_barrier() torture test The child threads in the rcu_torture_barrier_cbs() are improperly synchronized, which can cause the rcu_barrier() tests to hang. The failure mode is as follows: 1. CPU 0 running in rcu_torture_barrier() sets barrier_cbs_count to n_barrier_cbs. 2. CPU 1 running in rcu_torture_barrier_cbs() wakes up, posts its RCU callback, and atomically decrements barrier_cbs_count. Because barrier_cbs_count is not zero, it does not do the wake_up(). 3. CPU 2 running in rcu_torture_barrier_cbs() wakes up, but finds that barrier_cbs_count is not equal to n_barrier_cbs, and so returns to sleep. 4. The value of barrier_cbs_count therefore never reaches zero, which causes the test to hang. This commit therefore uses a phase variable to coordinate the test, preventing this scenario from occurring. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:34:03 -07:00
Paul E. McKenney	e3f8d3788e	rcu: Test srcu_barrier() from rcutorture test suite SRCU now has a call_srcu() and an srcu_barrier(), but rcutorture does not test them. This commit adds the machinery to allow rcutorture's existing tests for call_rcu() and rcu_barrier() to apply to the SRCU equivalents. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:03 -07:00
Paul E. McKenney	751a68b2e4	rcu: Rationalize ordering of torture_ops list Move the raw SRCU interfaces out of the middle of the normal SRCU interfaces. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:34:03 -07:00
Paul E. McKenney	ff015030c9	rcu: RCU_SAVE_DYNTICK code no longer ever dead Before RCU had unified idle, the RCU_SAVE_DYNTICK leg of the switch statement in force_quiescent_state() was dead code for CONFIG_NO_HZ=n kernel builds. With unified idle, the code is never dead. This commit therefore removes the "if" statement designed to make gcc aware of when the code was and was not dead. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:24 -07:00
Paul E. McKenney	c0cc962da3	rcu: Use for_each_rcu_flavor() in TREE_RCU tracing This commit applies the new for_each_rcu_flavor() macro to the kernel/rcutree_trace.c file. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:24 -07:00
Paul E. McKenney	6ce75a2326	rcu: Introduce for_each_rcu_flavor() and use it The arrival of TREE_PREEMPT_RCU some years back included some ugly code involving either #ifdef or #ifdef'ed wrapper functions to iterate over all non-SRCU flavors of RCU. This commit therefore introduces a for_each_rcu_flavor() iterator over the rcu_state structures for each flavor of RCU to clean up a bit of the ugliness. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:24 -07:00
Paul E. McKenney	1bca8cf1a2	rcu: Remove unneeded __rcu_process_callbacks() argument With the advent of __this_cpu_ptr(), it is no longer necessary to pass both the rcu_state and rcu_data structures into __rcu_process_callbacks(). This commit therefore computes the rcu_data pointer from the rcu_state pointer within __rcu_process_callbacks() so that callers can pass in only the pointer to the rcu_state structure. This paves the way for linking the rcu_state structures together and iterating over them. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:23 -07:00
Paul E. McKenney	d7e187c8e9	rcu: Add rcu_barrier() statistics to debugfs tracing This commit adds an rcubarrier file to RCU's debugfs statistical tracing directory, providing diagnostic information on rcu_barrier(). Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:23 -07:00
Paul E. McKenney	a83eff0a82	rcu: Add tracing for _rcu_barrier() This commit adds event tracing for _rcu_barrier() execution. This is defined only if RCU_TRACE=y. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:23 -07:00
Paul E. McKenney	cf3a9c4842	rcu: Increase rcu_barrier() concurrency The traditional rcu_barrier() implementation has serialized all requests, regardless of RCU flavor, and also does not coalesce concurrent requests. In the past, this has been good and sufficient. However, systems are getting larger and use of rcu_barrier() has been increasing. This commit therefore introduces a counter-based scheme that allows _rcu_barrier() calls for the same flavor of RCU to take advantage of each others' work. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:23 -07:00
Paul E. McKenney	cfed0a85da	rcu: Remove needless initialization For global variables, C defaults all fields to zero. The initialization of the rcu_state structure's ->n_force_qs and ->n_force_qs_ngp fields is therefore redundant, so this commit removes these initializations. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:22 -07:00
Paul E. McKenney	7be7f0be90	rcu: Move rcu_barrier_mutex to rcu_state structure In order to allow each RCU flavor to concurrently execute its rcu_barrier() function, it is necessary to move the relevant state to the rcu_state structure. This commit therefore moves the rcu_barrier_mutex global variable to a new ->barrier_mutex field in the rcu_state structure. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:22 -07:00
Paul E. McKenney	7db74df88b	rcu: Move rcu_barrier_completion to rcu_state structure In order to allow each RCU flavor to concurrently execute its rcu_barrier() function, it is necessary to move the relevant state to the rcu_state structure. This commit therefore moves the rcu_barrier_completion global variable to a new ->barrier_completion field in the rcu_state structure. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:22 -07:00
Paul E. McKenney	24ebbca8ec	rcu: Move rcu_barrier_cpu_count to rcu_state structure In order to allow each RCU flavor to concurrently execute its rcu_barrier() function, it is necessary to move the relevant state to the rcu_state structure. This commit therefore moves the rcu_barrier_cpu_count global variable to a new ->barrier_cpu_count field in the rcu_state structure. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:22 -07:00
Paul E. McKenney	06668efa91	rcu: Move _rcu_barrier()'s rcu_head structures to rcu_data structures In order for multiple flavors of RCU to each concurrently run one rcu_barrier(), each flavor needs its own per-CPU set of rcu_head structures. This commit therefore moves _rcu_barrier()'s set of per-CPU rcu_head structures from per-CPU variables to the existing per-CPU and per-RCU-flavor rcu_data structures. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:21 -07:00
Paul E. McKenney	037b64ed0b	rcu: Place pointer to call_rcu() in rcu_data structure This is a preparatory commit for increasing rcu_barrier()'s concurrency. It adds a pointer in the rcu_data structure to the corresponding call_rcu() function. This allows a pointer to the rcu_data structure to imply the function pointer, which allows _rcu_barrier() state to be placed in the rcu_state structure. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:21 -07:00
Paul E. McKenney	6c90cc7bf0	rcu: Prevent excessive line length in RCU_STATE_INITIALIZER() Upcoming rcu_barrier() concurrency commits will result in line lengths greater than 80 characters in the RCU_STATE_INITIALIZER(), so this commit shortens the name of the macro's argument to prevent this. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-07-02 12:33:21 -07:00
Paul E. McKenney	cca6f39319	rcu: Size rcu_node tree from nr_cpu_ids rather than NR_CPUS The rcu_node tree array is sized based on compile-time constants, including NR_CPUS. Although this approach has worked well in the past, the recent trend by many distros to define NR_CPUS=4096 results in excessive grace-period-initialization latencies. This commit therefore substitutes the run-time computed nr_cpu_ids for the compile-time NR_CPUS when building the tree. This can result in much of the compile-time-allocated rcu_node array being unused. If this is a major problem, you are in a specialized situation anyway, so you can manually adjust the NR_CPUS, RCU_FANOUT, and RCU_FANOUT_LEAF kernel config parameters. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:21 -07:00
Paul E. McKenney	cc5df65b03	rcu: Four-level hierarchy is no longer experimental Time to make the four-level-hierarchy setting less scary, so this commit removes "Experimental" from the boot-time message. Leave the message in order to get a heads-up on any possible need to expand to a five-level hierarchy. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:20 -07:00
Paul E. McKenney	f885b7f2b2	rcu: Control RCU_FANOUT_LEAF from boot-time parameter Although making RCU_FANOUT_LEAF a kernel configuration parameter rather than a fixed constant makes it easier for people to decrease cache-miss overhead for large systems, it is of little help for people who must run a single pre-built kernel binary. This commit therefore allows the value of RCU_FANOUT_LEAF to be increased (but not decreased!) via a boot-time parameter named rcutree.rcu_fanout_leaf. Reported-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-07-02 12:33:20 -07:00
Paul E. McKenney	cba6d0d64e	Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation" This reverts commit `616c310e83`. (Move PREEMPT_RCU preemption to switch_to() invocation). Testing by Sasha Levin <levinsasha928@gmail.com> showed that this can result in deadlock due to invoking the scheduler when one of the runqueue locks is held. Because this commit was simply a performance optimization, revert it. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Sasha Levin <levinsasha928@gmail.com>	2012-07-02 11:39:19 -07:00
Bojan Smojver	d8150d3504	PM / Hibernate: Print hibernation/thaw progress indicator one line at a time. With the introduction of suspend to both into in-kernel hibernation code, dmesg was getting polluted with backspace characters printed as part of image saving progress indicator. This patch introduces printing of progress indicator on image save/load every 10% and one line at a time. As an additional benefit, all other messages emitted by the kernel during hibernation/thaw should now print cleanly as well. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-01 13:31:23 +02:00
Rafael J. Wysocki	b2df1d4f8b	PM / Sleep: Separate printing suspend times from initcall_debug Change the behavior of the newly introduced /sys/power/pm_print_times attribute so that its initial value depends on initcall_debug, but setting it to 0 will cause device suspend/resume times not to be printed, even if initcall_debug has been set. This way, the people who use initcall_debug for reasons other than PM debugging will be able to switch the suspend/resume times printing off, if need be. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-07-01 13:31:23 +02:00
Sameer Nanda	4b7760ba0d	PM / Sleep: add knob for printing device resume times Added a new knob called /sys/power/pm_print_times. Setting it to 1 enables printing of time taken by devices to suspend and resume. Setting it to 0 disables this printing (unless overridden by initcall_debug kernel command line option). Signed-off-by: Sameer Nanda <snanda@chromium.org> Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-01 13:31:22 +02:00
Srivatsa S. Bhat	443772d408	ftrace: Disable function tracing during suspend/resume and hibernation, again If function tracing is enabled for some of the low-level suspend/resume functions, it leads to triple fault during resume from suspend, ultimately ending up in a reboot instead of a resume (or a total refusal to come out of suspended state, on some machines). This issue was explained in more detail in commit `f42ac38c59` (ftrace: disable tracing for suspend to ram). However, the changes made by that commit got reverted by commit `cbe2f5a6e8` (tracing: allow tracing of suspend/resume & hibernation code again). So, unfortunately since things are not yet robust enough to allow tracing of low-level suspend/resume functions, suspend/resume is still broken when ftrace is enabled. So fix this by disabling function tracing during suspend/resume & hibernation. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-01 13:31:22 +02:00
Bojan Smojver	62c552ccc3	PM / Hibernate: Enable suspend to both for in-kernel hibernation. It is often useful to suspend to memory after hibernation image has been written to disk. If the battery runs out or power is otherwise lost, the computer will resume from the hibernated image. If not, it will resume from memory and hibernation image will be discarded. Signed-off-by: Bojan Smojver <bojan@rexursive.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-01 13:31:22 +02:00
Randy Dunlap	4f0f4af59c	printk.c: fix kernel-doc warnings Fix kernel-doc warnings in printk.c: use correct parameter name. Warning(kernel/printk.c:2429): No description found for parameter 'buf' Warning(kernel/printk.c:2429): Excess function parameter 'line' description in 'kmsg_dump_get_buffer' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-30 15:56:40 -07:00
Linus Torvalds	21f27291f5	Driver Core fixes for 3.5-rc5 Here is a number of printk() fixes, specifically a few reported by the crazy blog program that ships in SUSE releases (that's "boot log" and not "web log", it predates the general "blog" terminology by many years), and the restoration of the continuation line functionality reported by Stephen and others. Yes, the changes seem a bit big this late in the cycle, but I've been beating on them for a while now, and Stephen has even optimized it a bit, so all looks good to me. The other change in here is a Documentation update for the stable kernel rules describing how some distro patches should be backported, to hopefully drive a bit more response from the distros to the stable kernel releases. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk/uhJEACgkQMUfUDdst+ymL0QCfTjWJrdf+ooJ6Bx/NNgOGxYip Ss0AnRrCNkfgmMcdNMn/7CIbHlaTj+S+ =M5Rg -----END PGP SIGNATURE----- Merge tag 'driver-core-3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver Core fixes from Greg Kroah-Hartman: "Here is a number of printk() fixes, specifically a few reported by the crazy blog program that ships in SUSE releases (that's "boot log" and not "web log", it predates the general "blog" terminology by many years), and the restoration of the continuation line functionality reported by Stephen and others. Yes, the changes seem a bit big this late in the cycle, but I've been beating on them for a while now, and Stephen has even optimized it a bit, so all looks good to me. The other change in here is a Documentation update for the stable kernel rules describing how some distro patches should be backported, to hopefully drive a bit more response from the distros to the stable kernel releases. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: printk: Optimize if statement logic where newline exists printk: flush continuation lines immediately to console syslog: fill buffer with more than a single message for SYSLOG_ACTION_READ Revert "printk: return -EINVAL if the message len is bigger than the buf size" printk: fix regression in SYSLOG_ACTION_CLEAR stable: Allow merging of backports for serious user-visible performance issues	2012-06-30 10:11:24 -07:00
Pablo Neira Ayuso	a31f2d17b3	netlink: add netlink_kernel_cfg parameter to netlink_kernel_create This patch adds the following structure: struct netlink_kernel_cfg { unsigned int groups; void (input)(struct sk_buff skb); struct mutex *cb_mutex; }; That can be passed to netlink_kernel_create to set optional configurations for netlink kernel sockets. I've populated this structure by looking for NULL and zero parameters at the existing code. The remaining parameters that always need to be set are still left in the original interface. That includes optional parameters for the netlink socket creation. This allows easy extensibility of this interface in the future. This patch also adapts all callers to use this new interface. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-29 16:46:02 -07:00
Steven Rostedt	d36208227d	printk: Optimize if statement logic where newline exists In reviewing Kay's fix up patch: "printk: Have printk() never buffer its data", I found two if statements that could be combined and optimized. Put together the two 'cont.len && cont.owner == current' if statements into a single one, and check if we need to call cont_add(). This also removes the unneeded double cont_flush() calls. Link: http://lkml.kernel.org/r/1340869133.876.10.camel@mop Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-29 16:55:35 -04:00
Vaibhav Nagarnaik	48fdc72f23	ring-buffer: Fix accounting of entries when removing pages When removing pages from the ring buffer, its state is not reset. This means that the counters need to be correctly updated to account for the pages removed. Update the overrun counter to reflect the removed events from the pages. Link: http://lkml.kernel.org/r/1340998301-1715-1-git-send-email-vnagarnaik@google.com Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-29 16:17:17 -04:00
Vaibhav Nagarnaik	44b99462d9	ring-buffer: Fix crash due to uninitialized new_pages list head The new_pages list head in the cpu_buffer is not initialized. When adding pages to the ring buffer, if the memory allocation fails in ring_buffer_resize, the clean up handler tries to free up the allocated pages from all the cpu buffers. The panic is caused by referencing the uninitialized new_pages list head. Initializing the new_pages list head in rb_allocate_cpu_buffer fixes this. Link: http://lkml.kernel.org/r/1340391005-10880-1-git-send-email-vnagarnaik@google.com Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-29 16:16:35 -04:00
Kay Sievers	084681d14e	printk: flush continuation lines immediately to console Continuation lines are buffered internally, intended to merge the chunked printk()s into a single record, and to isolate potentially racy continuation users from usual terminated line users. This though, has the effect that partial lines are not printed to the console in the moment they are emitted. In case the kernel crashes in the meantime, the potentially interesting printed information would never reach the consoles. Here we share the continuation buffer with the console copy logic, and partial lines are always immediately flushed to the available consoles. They are still buffered internally to improve the readability and integrity of the messages and minimize the amount of needed record headers to store. Signed-off-by: Kay Sievers <kay@vrfy.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-29 11:39:42 -04:00
David S. Miller	b26d344c6b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/caif/caif_hsi.c drivers/net/usb/qmi_wwan.c The qmi_wwan merge was trivial. The caif_hsi.c, on the other hand, was not. It's a conflict between `1c385f1fdf` ("caif-hsi: Replace platform device with ops structure.") in the net-next tree and commit `39abbaef19` ("caif-hsi: Postpone init of HIS until open()") in the net tree. I did my best with that one and will ask Sjur to check it out. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-28 17:37:00 -07:00
Steven Rostedt	a5fb833172	ring-buffer: Fix uninitialized read_stamp The ring buffer reader page is used to swap a page from the writable ring buffer. If the writer happens to be on that page, it ends up on the reader page, but will simply move off of it, back into the writable ring buffer as writes are added. The time stamp passed back to the readers is stored in the cpu_buffer per CPU descriptor. This stamp is updated when a swap of the reader page takes place, and it reads the current stamp from the page taken from the writable ring buffer. Everytime a writer goes to a new page, it updates the time stamp of that page. The problem happens if a reader reads a page from an empty per CPU ring buffer. If the buffer is empty, the swap still takes place, placing the writer at the start of the reader page. If at a later time, a write happens, it updates the page's time stamp and continues. But the problem is that the read_stamp does not get updated, because the page was already swapped. The solution to this was to not swap the page if the ring buffer happens to be empty. This also removes the side effect that the writes on the reader page will not get updated because the writer never gets back on the reader page without a swap. That is, if a read happens on an empty buffer, but then no reads happen for a while. If a swap took place, and the writer were to start writing a lot of data (function tracer), it will start overflowing the ring buffer and overwrite the older data. But because the writer never goes back onto the reader page, the data left on the reader page never gets overwritten. This causes the reader to see really old data, followed by a jump to newer data. Link: http://lkml.kernel.org/r/1340060577-9112-1-git-send-email-dhsharp@google.com Google-Bug-Id: 6410455 Reported-by: David Sharp <dhsharp@google.com> tested-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-28 13:52:15 -04:00
Steven Rostedt	6d158a813e	tracing: Remove NR_CPUS array from trace_iterator Replace the NR_CPUS array of buffer_iter from the trace_iterator with an allocated array. This will just create an array of possible CPUS instead of the max number specified. The use of NR_CPUS in that array caused allocation failures for machines that were tight on memory. This did not cause any failures to the system itself (no crashes), but caused unnecessary failures for reading the trace files. Added a helper function called 'trace_buffer_iter()' that returns the buffer_iter item or NULL if it is not defined or the array was not allocated. Some routines do not require the array (tracing_open_pipe() for one). Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-28 13:52:15 -04:00
Steven Rostedt	0be61ebc18	tracing/selftest: Add a WARN_ON() if a tracer test fails Add a WARN_ON() output on test failures so that they are easier to detect in automated tests. Although, the WARN_ON() will not print if the test causes the system to crash, obviously. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-28 13:52:14 -04:00
David S. Miller	c64e66c67b	audit: netlink: Move away from NLMSG_NEW(). And use nlmsg_data() while we're here too. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-26 21:54:14 -07:00
Jan Beulich	116e90b23f	syslog: fill buffer with more than a single message for SYSLOG_ACTION_READ The recent changes to the printk buffer management resulted in SYSLOG_ACTION_READ to only return a single message, whereas previously the buffer would get filled as much as possible. As, when too small to fit everything, filling it to the last byte would be pretty ugly with the new code, the patch arranges for as many messages as possible to get returned in a single invocation. User space tools in at least all SLES versions depend on the old behavior. This at once addresses the issue attempted to get fixed with commit `b56a39ac26` ("printk: return -EINVAL if the message len is bigger than the buf size"), and since that commit widened the possibility for losing a message altogether, the patch here assumes that this other commit would get reverted first (otherwise the patch here won't apply). Furthermore, this patch also addresses the problem dealt with in commit `4a77a5a06e` ("printk: use mutex lock to stop syslog_seq from going wild"), so I'd recommend reverting that one too (albeit there's no direct collision between the two). Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kay Sievers <kay@vrfy.org> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-26 12:37:36 -07:00
Greg Kroah-Hartman	6fda135c90	Revert "printk: return -EINVAL if the message len is bigger than the buf size" This reverts commit `b56a39ac26`. A better patch from Jan will follow this to resolve the issue. Acked-by: Kay Sievers <kay@vrfy.org> Cc: Fengguang Wu <wfg@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-26 12:35:24 -07:00
Paul E. McKenney	b41772abeb	rcu: Stop rcu_do_batch() from multiplexing the "count" variable Commit `b1420f1c` (Make rcu_barrier() less disruptive) rearranged the code in rcu_do_batch(), moving the ->qlen manipulation to follow the requeueing of the callbacks. Unfortunately, this rearrangement clobbered the value of the "count" local variable before the value of rdp->qlen was adjusted, resulting in the value of rdp->qlen being inaccurate. This commit therefore introduces an index variable "i", avoiding the inadvertent multiplexing. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2012-06-25 12:35:25 -07:00
Alan Stern	4661e3568a	printk: fix regression in SYSLOG_ACTION_CLEAR Commit `7ff9554bb5` (printk: convert byte-buffer to variable-length record buffer) introduced a regression by accidentally removing a "break" statement from inside the big switch in printk's do_syslog(). The symptom of this bug is that the "dmesg -C" command doesn't only clear the kernel's log buffer; it also disables console logging. This patch (as1561) fixes the regression by adding the missing "break". Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-25 12:11:58 -07:00
David S. Miller	dfbce08c19	ipv4: Don't add deprecated new binary sysctl value. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-22 23:02:22 -07:00
Alexander Duyck	6648bd7e0e	ipv4: Add sysctl knob to control early socket demux This change is meant to add a control for disabling early socket demux. The main motivation behind this patch is to provide an option to disable the feature as it adds an additional cost to routing that reduces overall throughput by up to 5%. For example one of my systems went from 12.1Mpps to 11.6 after the early socket demux was added. It looks like the reason for the regression is that we are now having to perform two lookups, first the one for an established socket, and then the one for the routing table. By adding this patch and toggling the value for ip_early_demux to 0 I am able to get back to the 12.1Mpps I was previously seeing. [ Move local variables in ip_rcv_finish() down into the basic block in which they are actually used. -DaveM ] Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-06-22 17:11:13 -07:00
Linus Torvalds	a11637194a	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ftrace: Make all inline tags also include notrace perf: Use css_tryget() to avoid propping up css refcount perf tools: Fix synthesizing tracepoint names from the perf.data headers perf stat: Fix default output file perf tools: Fix endianity swapping for adds_features bitmask	2012-06-22 10:58:57 -07:00
Linus Torvalds	2ce5682947	Merge branch 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull two cgroup fixes from Tejun Heo: "This containes two patches fixing a refcnt race bug during css_put(). Decrementing and checking the value weren't atomic and two tasks could think that they both pushed the counter to zero." * 'for-3.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroups: Account for CSS_DEACT_BIAS in __css_put cgroup: make sure that decisions in __css_put are atomic	2012-06-20 22:11:04 -07:00
Linus Torvalds	fe80352460	Driver core and printk fixes for 3.5-rc4 Here are some fixes for 3.5-rc4 that resolve the kmsg problems that people have reported showing up after the printk and kmsg changes went into 3.5-rc1. There are also a smattering of other tiny fixes for the extcon and hyper-v drivers that people have reported. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iEYEABECAAYFAk/iNQcACgkQMUfUDdst+yklTQCfZCXFlhA43bZo/8Joqd2pLIIW 2uoAoMze0SlfJeN6Qu7yY0P+qV/f/pc3 =UNFY -----END PGP SIGNATURE----- Merge tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and printk fixes from Greg Kroah-Hartman: "Here are some fixes for 3.5-rc4 that resolve the kmsg problems that people have reported showing up after the printk and kmsg changes went into 3.5-rc1. There are also a smattering of other tiny fixes for the extcon and hyper-v drivers that people have reported. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>" * tag 'driver-core-3.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: extcon: max8997: Add missing kfree for info->edev in max8997_muic_remove() extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak extcon: Fix wrong index in max8997_extcon_cable[] kmsg - kmsg_dump() fix CONFIG_PRINTK=n compilation printk: return -EINVAL if the message len is bigger than the buf size printk: use mutex lock to stop syslog_seq from going wild kmsg - kmsg_dump() use iterator to receive log buffer content vme: change maintainer e-mail address Extcon: Don't try to create duplicate link names driver core: fixup reversed deferred probe order printk: Fix alignment of buf causing crash on ARM EABI Tools: hv: verify origin of netlink connector message	2012-06-20 15:14:28 -07:00
Cyrill Gorcunov	5702c5eeab	c/r: prctl: Move PR_GET_TID_ADDRESS to a proper place During merging of PR_GET_TID_ADDRESS patch the code has been misplaced (it happened to appear under PR_MCE_KILL) in result noone can use this option. Fix it by moving code snippet to a proper place. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Oleg Nesterov	50d75f8dae	pidns: find_new_reaper() can no longer switch to init_pid_ns.child_reaper find_new_reaper() changes pid_ns->child_reaper, see `add0d4df` ("pid_ns: zap_pid_ns_processes: fix the ->child_reaper changing"). The original reason has gone away after the previous patch, ->children list must be empty after zap_pid_ns_processes(). However now we can not switch to init_pid_ns.child_reaper. __unhash_process() relies on the "->child_reaper == parent" check, but this check does not work if the last exiting task is also the child reaper. As Eric sugested, we can change __unhash_process() to use the parent's pid_ns and remove this code. Also, with this change we can move detach_pid(PIDTYPE_PID) back, where it was before the previous fix. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Pavel Emelyanov <xemul@parallels.com> Tested-by: Andrew Wagin <avagin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Eric W. Biederman	6347e90091	pidns: guarantee that the pidns init will be the last pidns process reaped Today we have a twofold bug. Sometimes release_task on pid == 1 in a pid namespace can run before other processes in a pid namespace have had release task called. With the result that pid_ns_release_proc can be called before the last proc_flus_task() is done using upid->ns->proc_mnt, resulting in the use of a stale pointer. This same set of circumstances can lead to waitpid(...) returning for a processes started with clone(CLONE_NEWPID) before the every process in the pid namespace has actually exited. To fix this modify zap_pid_ns_processess wait until all other processes in the pid namespace have exited, even EXIT_DEAD zombies. The delay_group_leader and related tests ensure that the thread gruop leader will be the last thread of a process group to be reaped, or to become EXIT_DEAD and self reap. With the change to zap_pid_ns_processes we get the guarantee that pid == 1 in a pid namespace will be the last task that release_task is called on. With pid == 1 being the last task to pass through release_task pid_ns_release_proc can no longer be called too early nor can wait return before all of the EXIT_DEAD tasks in a pid namespace have exited. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Pavel Emelyanov <xemul@parallels.com> Tested-by: Andrew Wagin <avagin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Konstantin Khlebnikov	4fe7efdbdf	mm: correctly synchronize rss-counters at exit/exec do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does put_user(clear_child_tid) which can update task->rss_stat and thus make mm->rss_stat inconsistent. This triggers the "BUG:" printk in check_mm(). Let's fix this bug in the safest way, and optimize/cleanup this later. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Salman Qazi	8e3bbf42c6	cgroups: Account for CSS_DEACT_BIAS in __css_put When we fixed the race between atomic_dec and css_refcnt, we missed the fact that css_refcnt internally subtracts CSS_DEACT_BIAS to get the actual reference count. This can potentially cause a refcount leak if __css_put races with cgroup_clear_css_refs. Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-06-18 15:38:02 -07:00
Yan, Zheng	0cda4c0231	perf: Introduce perf_pmu_migrate_context() Originally from Peter Zijlstra. The helper migrates perf events from one cpu to another cpu. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1339741902-8449-5-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 12:13:21 +02:00
Yan, Zheng	e2d37cd213	perf: Allow the PMU driver to choose the CPU on which to install events Allow the pmu->event_init callback to change event->cpu, so the PMU driver can choose the CPU on which to install events. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1339741902-8449-4-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 12:13:21 +02:00
Yan, Zheng	fbfc623f82	perf: Avoid race between cpu hotplug and installing event perf_event_open() requires the cpu on which to install event is online, but the cpu can go offline after perf_event_open checks that. Add a get_online_cpus()/put_online_cpus() pair to avoid the race. Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1339741902-8449-3-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 12:13:20 +02:00
Ingo Molnar	d1ece0998e	Merge branch 'perf/urgent' into perf/core Merge in all fixes before applying more changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 11:47:58 +02:00
Salman Qazi	9c5da09d26	perf: Use css_tryget() to avoid propping up css refcount An rmdir pushes css's ref count to zero. However, if the associated directory is open at the time, the dentry ref count is non-zero. If the fd for this directory is then passed into perf_event_open, it does a css_get(). This bounces the ref count back up from zero. This is a problem by itself. But what makes it turn into a crash is the fact that we end up doing an extra dput, since we perform a dput when css_put sees the ref count go down to zero. css_tryget() does not fall into that trap. So, we use that instead. Reproduction test-case for the bug: #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <linux/unistd.h> #include <linux/perf_event.h> #include <string.h> #include <errno.h> #include <stdio.h> #define PERF_FLAG_PID_CGROUP (1U << 2) int perf_event_open(struct perf_event_attr hw_event_uptr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu, group_fd, flags); } / * Directly poke at the perf_event bug, since it's proving hard to repro * depending on where in the kernel tree. what moved? / int main(int argc, char *argv) { int fd; struct perf_event_attr attr; memset(&attr, 0, sizeof(attr)); attr.exclude_kernel = 1; attr.size = sizeof(attr); mkdir("/dev/cgroup/perf_event/blah", 0777); fd = open("/dev/cgroup/perf_event/blah", O_RDONLY); perror("open"); rmdir("/dev/cgroup/perf_event/blah"); sleep(2); perf_event_open(&attr, fd, 0, -1, PERF_FLAG_PID_CGROUP); perror("perf_event_open"); close(fd); return 0; } Signed-off-by: Salman Qazi <sqazi@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 11:45:57 +02:00
Ingo Molnar	4983955c04	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/core Pull ftrace robustization fixes from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-18 10:57:51 +02:00
Yuanhan Liu	b56a39ac26	printk: return -EINVAL if the message len is bigger than the buf size Just like what devkmsg_read() does, return -EINVAL if the message len is bigger than the buf size, or it will trigger a segfault error. Acked-by: Kay Sievers <kay@vrfy.org> Acked-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-16 08:36:03 -07:00
Yuanhan Liu	4a77a5a06e	printk: use mutex lock to stop syslog_seq from going wild Although syslog_seq and log_next_seq stuff are protected by logbuf_lock spin log, it's not enough. Say we have two processes A and B, and let syslog_seq = N, while log_next_seq = N + 1, and the two processes both come to syslog_print at almost the same time. And No matter which process get the spin lock first, it will increase syslog_seq by one, then release spin lock; thus later, another process increase syslog_seq by one again. In this case, syslog_seq is bigger than syslog_next_seq. And latter, it would make: wait_event_interruptiable(log_wait, syslog != log_next_seq) don't wait any more even there is no new write comes. Thus it introduce a infinite loop reading. I can easily see this kind of issue by the following steps: # cat /proc/kmsg # at meantime, I don't kill rsyslog # So they are the two processes. # xinit # I added drm.debug=6 in the kernel parameter line, # so that it will produce lots of message and let that # issue happen It's 100% reproducable on my side. And my disk will be filled up by /var/log/messages in a quite short time. So, introduce a mutex_lock to stop syslog_seq from going wild just like what devkmsg_read() does. It does fix this issue as expected. v2: use mutex_lock_interruptiable() instead (comments from Kay) Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Acked-By: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-16 08:36:02 -07:00
Oleg Nesterov	e227051b13	uprobes: Remove the unnecessary initialization in add_utask() Trivial cleanup. No need to nullify ->active_uprobe after kzalloc(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154401.GA9633@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:52 +02:00
Oleg Nesterov	593609a596	uprobes: __copy_insn() needs "loff_t offset" 1. __copy_insn() needs "loff_t offset", not "unsigned long", to read the file. 2. use pgoff_t for "idx" and remove the unnecessary typecast. 3. fix the typo, "&=" is not what we want 4. can't resist, rename off1 to off. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154359.GA9625@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:49 +02:00
Oleg Nesterov	816c03fbab	uprobes: Don't use loff_t for the valid virtual address loff_t looks confusing when it is used for the virtual address. Change map_info and install_breakpoint/remove_breakpoint paths to use "unsigned long". The patch doesn't change vma_address(), it can't return "long" because it is used to verify the mapping. But probably this needs some cleanups too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Anton Arapov <anton@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154355.GA9622@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:48 +02:00
Oleg Nesterov	449d0d7c9f	uprobes: Simplify the usage of uprobe->pending_list uprobe->pending_list is only used to create the temporary list, it has no meaning after we drop uprobes_mmap_hash(inode). No need to initialize this node or remove it from tmp_list, and we can use list_for_each_entry(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154353.GA9614@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:48 +02:00
Oleg Nesterov	d9c4a30e82	uprobes: Move BUG_ON(UPROBE_SWBP_INSN_SIZE) from write_opcode() to install_breakpoint() write_opcode() ensures that UPROBE_SWBP_INSN doesn't cross the page boundary. This looks a bit confusing, the check does not depend on vaddr and it is enough to do it only once right after install_breakpoint()->arch_uprobe_analyze_insn(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154350.GA9611@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:47 +02:00
Oleg Nesterov	eb2bf57bee	uprobes: No need to re-check vma_address() in write_opcode() write_opcode() is called by register_for_each_vma() and uprobe_mmap() paths. In both cases the caller has already verified this vaddr under mmap_sem, no need to re-check. Note also that this check is wrong anyway, we should not truncate loff_t returned by vma_address() if we do not trust this mapping. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154347.GA9604@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:47 +02:00
Oleg Nesterov	fc36f59565	uprobes: Copy_insn() should not return -ENOMEM if __copy_insn() fails copy_insn() returns -ENOMEM if the first __copy_insn() fails, it should return the correct error code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154344.GA9601@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:46 +02:00
Oleg Nesterov	d436615e60	uprobes: Copy_insn() shouldn't depend on mm/vma/vaddr 1. copy_insn() doesn't need "addr", it can use uprobe->offset. Remove this argument. 2. Change copy_insn/__copy_insn to accept "struct file*" instead of vma. copy_insn() is called only once and mm/vma/vaddr are random, it shouldn't depend on them. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154342.GA9598@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:45 +02:00
Peter Zijlstra	c5784de2b3	uprobes: Document uprobe_register() vs uprobe_mmap() race Because the mind is treacherous and makes us forget we need to write stuff down. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20120615154339.GA9591@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:45 +02:00
Oleg Nesterov	7a5bfb66b0	uprobes: Change build_map_info() to try kmalloc(GFP_NOWAIT) first build_map_info() doesn't allocate the memory under i_mmap_mutex to avoid the deadlock with page reclaim. But it can try GFP_NOWAIT first, it should work in the likely case and thus we almost never need the pre-alloc-and-retry path. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120615154336.GA9588@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:44 +02:00
Oleg Nesterov	268720903f	uprobes: Rework register_for_each_vma() to make it O(n) Currently register_for_each_vma() is O(n 2) + O(n 3), every time find_next_vma_info() "restarts" the vma_prio_tree_foreach() loop and each iteration rechecks the whole try_list. This also means that try_list can grow "indefinitely" if register/unregister races with munmap/mmap activity even if the number of mapping is bounded at any time. With this patch register_for_each_vma() builds the list of mm/vaddr structures only once and does install_breakpoint() for each entry. We do not care about the new mappings which can be created after build_map_info() drops mapping->i_mmap_mutex, uprobe_mmap() should do its work. Note that we do not allocate map_info under i_mmap_mutex, this can deadlock with page reclaim (but see the next patch). So we use 2 lists, "curr" which we are going to return, and "prev" which holds the already allocated memory. The main loop deques the entry from "prev" (initially it is empty), and if "prev" becomes empty again it counts the number of entries we need to pre-allocate outside of i_mmap_mutex. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20120615154333.GA9581@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:43 +02:00
Oleg Nesterov	c1914a0936	uprobes: Install_breakpoint() should fail if is_swbp_insn() == T install_breakpoint() returns -EEXIST if is_swbp_insn(orig_insn) == T, the caller treats this code as success. This is doubly wrong. The successful return should set UPROBE_COPY_INSN, but the real problem is that it shouldn't succeed. If the probed insn is int3 the application should get SIGTRAP, this won't happen with uprobe. Probably we can fix this, we can add the UPROBE_SHARED_BP flag and teach handle_swbp/set_orig_insn to handle this case correctly. But this needs some complications and we have other insns which can't be probed, lets make a simple fix for now. I think this needs a cleanup. UPROBE_COPY_INSN should die, copy_insn() should be called by alloc_uprobe(). arch_uprobe_analyze_insn() depends on ->mm (ia32_compat) but it is called only once. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154331.GA9578@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:43 +02:00
Oleg Nesterov	5323ce71e4	uprobes: Write_opcode()->__replace_page() can race with try_to_unmap() write_opcode() gets old_page via get_user_pages() and then calls __replace_page() which assumes that this old_page is still mapped after pte_offset_map_lock(). This is not true if this old_page was already try_to_unmap()'ed, and in this case everything __replace_page() does with old_page is wrong. Just for example, put_page() is not balanced. I think it is possible to teach __replace_page() to handle this unlikely case correctly, but this patch simply changes it to use page_check_address() and return -EAGAIN if it fails. The caller should notice this error code and retry. Note: write_opcode() asks for the cleanups, I'll try to do this in a separate patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154328.GA9571@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:42 +02:00
Oleg Nesterov	cc359d180f	uprobes: __copy_insn() should ensure a_ops->readpage != NULL __copy_insn() blindly calls read_mapping_page(), this will crash the kernel if ->readpage == NULL, add the necessary check. For example, hugetlbfs_aops->readpage is NULL. Perhaps we should change read_mapping_page() instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154325.GA9568@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:42 +02:00
Oleg Nesterov	ea13137714	uprobes: Valid_vma() should reject VM_HUGETLB __replace_page() obviously can't work with the hugetlbfs mappings, uprobe_register() will likely crash the kernel. Change valid_vma() to check VM_HUGETLB as well. As for PageTransHuge() no need to worry, vma->vm_file != NULL. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120615154322.GA9561@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-16 09:10:41 +02:00
Linus Torvalds	ed21a66c18	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar. * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog: Quiet down the boot messages perf/x86: Fix broken LBR fixup code tracing: Have tracing_off() actually turn tracing off	2012-06-15 16:58:10 -07:00
Linus Torvalds	a95f9b6e09	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core updates (RCU and locking) from Ingo Molnar: "Most of the diffstat comes from the RCU slow boot regression fixes, but there's also a debuggability improvements/fixes." * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: memblock: Document memblock_is_region_{memory,reserved}() rcu: Precompute RCU_FAST_NO_HZ timer offsets rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks rcu: RCU_FAST_NO_HZ detection of callback adoption spinlock: Indicate that a lockup is only suspected kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() panic: Make panic_on_oops configurable	2012-06-15 16:52:35 -07:00
Kay Sievers	e2ae715d66	kmsg - kmsg_dump() use iterator to receive log buffer content Provide an iterator to receive the log buffer content, and convert all kmsg_dump() users to it. The structured data in the kmsg buffer now contains binary data, which should no longer be copied verbatim to the kmsg_dump() users. The iterator should provide reliable access to the buffer data, and also supports proper log line-aware chunking of data while iterating. Signed-off-by: Kay Sievers <kay@vrfy.org> Tested-by: Tony Luck <tony.luck@intel.com> Reported-by: Anton Vorontsov <anton.vorontsov@linaro.org> Tested-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-15 14:53:59 -07:00
Steven Rostedt	7374e82771	tracing: Register the ftrace internal events during early boot All trace events including ftrace internel events (like trace_printk and function tracing), register functions that describe how to print their output. The events may be recorded as soon as the ring buffer is allocated, but they are just raw binary in the buffer. The mapping of event ids to how to print them are held within a structure that is registered on system boot. If a crash happens in boot up before these functions are registered then their output (via ftrace_dump_on_oops) will be useless: Dumping ftrace buffer: --------------------------------- <...>-1 0.... 319705us : Unknown type 6 --------------------------------- This can be quite frustrating for a kernel developer trying to see what is going wrong. There's no reason to register them so late in the boot up process. They can be registered by early_initcall(). Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-14 15:22:14 -04:00
Borislav Petkov	8d240dd88c	ftrace: Remove a superfluous check register_ftrace_function() checks ftrace_disabled and calls __register_ftrace_function which does it again. Drop the first check and add the unlikely hint to the second one. Also, drop the label as John correctly notices. No functional change. Link: http://lkml.kernel.org/r/20120329171140.GE6409@aftab Cc: Borislav Petkov <bp@amd64.org> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-14 15:22:12 -04:00
Don Zickus	a702704682	watchdog: Quiet down the boot messages A bunch of bugzillas have complained how noisy the nmi_watchdog is during boot-up especially with its expected failure cases (like virt and bios resource contention). This is my attempt to quiet them down and keep it less confusing for the end user. What I did is print the message for cpu0 and save it for future comparisons. If future cpus have an identical message as cpu0, then don't print the redundant info. However, if a future cpu has a different message, happily print that loudly. Before the change, you would see something like: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver. ... version: 2 ... bit width: 40 ... generic registers: 2 ... value mask: 000000ffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 0000000700000003 NMI watchdog enabled, takes one hw-pmu counter. Booting Node 0, Processors #1 NMI watchdog enabled, takes one hw-pmu counter. #2 NMI watchdog enabled, takes one hw-pmu counter. #3 Ok. NMI watchdog enabled, takes one hw-pmu counter. Brought up 4 CPUs Total of 4 processors activated (22607.24 BogoMIPS). After the change, it is simplified to: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver. ... version: 2 ... bit width: 40 ... generic registers: 2 ... value mask: 000000ffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 0000000700000003 NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. Booting Node 0, Processors #1 #2 #3 Ok. Brought up 4 CPUs V2: little changes based on Joe Perches' feedback V3: printk cleanup based on Ingo's feedback; checkpatch fix V4: keep printk as one long line V5: Ingo fix ups Reported-and-tested-by: Nathan Zimmer <nzimmer@sgi.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: nzimmer@sgi.com Cc: joe@perches.com Link: http://lkml.kernel.org/r/1339594548-17227-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-14 12:20:50 +02:00
Yinghai Lu	82ec90eac3	resources: allow adjust_resource() for resources with no parent If a resource has no parent, allow its start/end to be set arbitrarily as long as any children are still contained within the new range. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-06-13 15:42:22 -06:00
Eric Dumazet	047fe36052	splice: fix racy pipe->buffers uses Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit `35f3d14dbb` (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-13 21:16:42 +02:00
Andrew Lunn	6ebb017de9	printk: Fix alignment of buf causing crash on ARM EABI Commit `7ff9554bb5`, printk: convert byte-buffer to variable-length record buffer, causes systems using EABI to crash very early in the boot cycle. The first entry in struct log is a u64, which for EABI must be 8 byte aligned. Make use of __alignof__() so the compiler to decide the alignment, but allow it to be overridden using CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, for systems which can perform unaligned access and want to save a few bytes of space. Tested on Orion5x and Kirkwood. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Acked-by: Stephen Warren <swarren@wwwdotorg.org> Acked-by: Kay Sievers <kay@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-06-12 16:20:17 -07:00
Thomas Gleixner	924412f66f	Merge branch 'nohz-for-tip-2' of git://github.com/fweisbec/linux-dynticks into timers/core	2012-06-11 20:11:29 +02:00
Frederic Weisbecker	84bf1bccc6	nohz: Move next idle expiry time record into idle logic area The next idle expiry time record and idle sleeps tracking are statistics that only concern idle. Since we want the nohz APIs to become usable further idle context, let's pull up the handling of these statistics to the callers in idle. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 20:07:18 +02:00
Frederic Weisbecker	5b39939a40	nohz: Move ts->idle_calls incrementation into strict idle logic Since we want to prepare for making the nohz API to work further the idle case, we need to pull ts->idle_calls incrementation up to the callers in idle. To perform this, we split tick_nohz_stop_sched_tick() in two parts: a first one that checks if we can really stop the tick for idle, and another that actually stops it. Then from the callers in idle, we check if we can stop the tick and only then we increment idle_calls and finally relay to the nohz API that won't care about these details anymore. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 20:07:17 +02:00
Frederic Weisbecker	f5d411c91e	nohz: Rename ts->idle_tick to ts->last_tick Now that idle and nohz logics are going to be independant each others, ts->idle_tick becomes too much a biased name to describe the field that saves the last scheduled tick on top of which we re-calculate the next tick to schedule when the timer is restarted. We want to reuse this even to stop the tick outside idle cases. So let's rename it to some more generic name: ts->last_tick. This changes a bit the timer list stat export so we need to increase its version. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 20:07:17 +02:00
Frederic Weisbecker	2ac0d98fd6	nohz: Make nohz API agnostic against idle ticks cputime accounting When the timer tick fires, it accounts the new jiffy as either part of system, user or idle time. This is how we record the cputime statistics. But when the tick is stopped from the idle task, we still need to record the number of jiffies spent tickless until we restart the tick and fall back to traditional tick-based cputime accounting. To do this, we take a snapshot of jiffies when the tick is stopped and compute the difference against the new value of jiffies when the tick is restarted. Then we account this whole difference to the idle cputime. However we are preparing to be able to stop the tick from other places than idle. So this idle time accounting needs to be performed from the callers of nohz APIs, not from the nohz APIs themselves because we now want them to be agnostic against places that stop/restart tick. Therefore, we pull the tickless idle time accounting out of generic nohz helpers up to idle entry/exit callers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 20:07:16 +02:00
Frederic Weisbecker	19f5f7364a	nohz: Separate idle sleeping time accounting from nohz logic As we plan to be able to stop the tick outside the idle task, we need to prepare for separating nohz logic from idle. As a start, this pulls the idle sleeping time accounting out of the tick stop/restart API to the callers on idle entry/exit. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Avi Kivity <avi@redhat.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@ti.com> Cc: Max Krasnyansky <maxk@qualcomm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 20:06:36 +02:00
Thomas Gleixner	b871a42b60	smpboot: Remove leftover declaration Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2012-06-11 15:07:52 +02:00
Ingo Molnar	c3e228d59b	Linux 3.5-rc2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJP0qm4AAoJEHm+PkMAQRiG62QIAJRNJFyVB0ZrsMPgdwLnlX4O 5I86H7GaYXoOK/KMb2s5h4KiFggIODnyEkZi+/39tJOgGo0KrMcDlsh0owB1Iggw LE6iyze9I1z9wQze0+SXe7VAcvUYvsx2vgpOKvoNi97Qgn3B6onL+SAi5U+NAqJl 0NdKmveEd42UIm7JfChHlxl8bm8YB+WcU38OkMGpRpJ/Moz9EbSjYVQg3oHrzJjy duiX6SD/OV4m5yCcXXmu+f41pN+SG7xENJ5r4enyi2ZF8mAyVz2goIyL2bA0AJX2 +GbpD1sxUHkZ6yPg4tf2bmJOj0PkfZNAi8YpFxZDlP4y1pKuCTEDTBp8O2id43w= =Jyn8 -----END PGP SIGNATURE----- Merge tag 'v3.5-rc2' into perf/core Merge in Linux 3.5-rc2 - to pick up fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-11 10:51:35 +02:00
Ingo Molnar	4a1e001d2b	Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Merge RCU fixes from Paul E. McKenney: " This series has four patches, the major point of which is to eliminate some slowdowns (including boot-time slowdowns) resulting from some RCU_FAST_NO_HZ changes. The issue with the changes is that posting timers from the idle loop has no effect if the CPU has entered dyntick-idle mode because the CPU has already computed its wakeup time, and posting a timer does not cause it to be recomputed. The short-term fix is for RCU to precompute the timeout value so that the CPU's calculation is correct. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-11 10:30:23 +02:00
Linus Torvalds	7249450449	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar. * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix the relax_domain_level boot parameter sched: Validate assumptions in sched_init_numa() sched: Always initialize cpu-power sched: Fix domain iteration sched/rt: Fix lockdep annotation within find_lock_lowest_rq() sched/numa: Load balance between remote nodes sched/x86: Calculate booted cores after construction of sibling_mask	2012-06-08 14:59:29 -07:00
Randy Dunlap	cd96891d48	sched/fair: fix lots of kernel-doc warnings Fix lots of new kernel-doc warnings in kernel/sched/fair.c: Warning(kernel/sched/fair.c:3625): No description found for parameter 'env' Warning(kernel/sched/fair.c:3625): Excess function parameter 'sd' description in 'update_sg_lb_stats' Warning(kernel/sched/fair.c:3735): No description found for parameter 'env' Warning(kernel/sched/fair.c:3735): Excess function parameter 'sd' description in 'update_sd_pick_busiest' Warning(kernel/sched/fair.c:3735): Excess function parameter 'this_cpu' description in 'update_sd_pick_busiest' .. more warnings Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-08 14:59:10 -07:00
Linus Torvalds	106544d81d	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A bit larger than what I'd wish for - half of it is due to hw driver updates to Intel Ivy-Bridge which info got recently released, cycles:pp should work there now too, amongst other things. (but we are generally making exceptions for hardware enablement of this type.) There are also callchain fixes in it - responding to mostly theoretical (but valid) concerns. The tooling side sports perf.data endianness/portability fixes which did not make it for the merge window - and various other fixes as well." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) perf/x86: Check user address explicitly in copy_from_user_nmi() perf/x86: Check if user fp is valid perf: Limit callchains to 127 perf/x86: Allow multiple stacks perf/x86: Update SNB PEBS constraints perf/x86: Enable/Add IvyBridge hardware support perf/x86: Implement cycles:p for SNB/IVB perf/x86: Fix Intel shared extra MSR allocation x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix perf: Remove duplicate invocation on perf_event_for_each perf uprobes: Remove unnecessary check before strlist__delete perf symbols: Check for valid dso before creating map perf evsel: Fix 32 bit values endianity swap for sample_id_all header perf session: Handle endianity swap on sample_id_all header data perf symbols: Handle different endians properly during symbol load perf evlist: Pass third argument to ioctl explicitly perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP perf tools: Make --version show kernel version instead of pull req tag perf tools: Check if callchain is corrupted perf callchain: Make callchain cursors TLS ...	2012-06-08 09:14:46 -07:00
Linus Torvalds	b1e25f4109	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull leap second timer fix from Thomas Gleixner. * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond	2012-06-08 09:11:33 -07:00
Ingo Molnar	9ee6ddc9da	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent Pull brown paper bag fix from Steve Rostedt. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-08 12:23:22 +02:00
Ananth N Mavinakayanahalli	7eb9ba5ed3	uprobes: Pass probed vaddr to arch_uprobe_analyze_insn() On RISC architectures like powerpc, instructions are fixed size. Instruction analysis on such platforms is just a matter of (insn % 4). Pass the vaddr at which the uprobe is to be inserted so that arch_uprobe_analyze_insn() can flag misaligned registration requests. Signed-off-by: Ananth N Mavinakaynahalli <ananth@in.ibm.com> Cc: michael@ellerman.id.au Cc: antonb@thinktux.localdomain Cc: Paul Mackerras <paulus@samba.org> Cc: benh@kernel.crashing.org Cc: peterz@infradead.org Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: oleg@redhat.com Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20120608093257.GG13409@in.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-08 12:22:27 +02:00
Linus Torvalds	48d212a2ee	Revert "mm: correctly synchronize rss-counters at exit/exec" This reverts commit `40af1bbdca`. It's horribly and utterly broken for at least the following reasons: - calling sync_mm_rss() from mmput() is fundamentally wrong, because there's absolutely no reason to believe that the task that does the mmput() always does it on its own VM. Example: fork, ptrace, /proc - you name it. - calling it after having done mmdrop() on it is doubly insane, since the mm struct may well be gone now. - testing mm against NULL before you call it is insane too, since a NULL mm there would have caused oopses long before. .. and those are just the three bugs I found before I decided to give up looking for me and revert it asap. I should have caught it before I even took it, but I trusted Andrew too much. Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 17:54:07 -07:00
Konstantin Khlebnikov	40af1bbdca	mm: correctly synchronize rss-counters at exit/exec mm->rss_stat counters have per-task delta: task->rss_stat. Before changing task->mm pointer the kernel must flush this delta with sync_mm_rss(). do_exit() already calls sync_mm_rss() to flush the rss-counters before committing the rss statistics into task->signal->maxrss, taskstats, audit and other stuff. Unfortunately the kernel does this before calling mm_release(), which can call put_user() for processing task->clear_child_tid. So at this point we can trigger page-faults and task->rss_stat becomes non-zero again. As a result mm->rss_stat becomes inconsistent and check_mm() will print something like this: \| BUG: Bad rss-counter state mm:ffff88020813c380 idx:1 val:-1 \| BUG: Bad rss-counter state mm:ffff88020813c380 idx:2 val:1 This patch moves sync_mm_rss() into mm_release(), and moves mm_release() out of do_exit() and calls it earlier. After mm_release() there should be no pagefaults. [akpm@linux-foundation.org: tweak comment] Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> [3.4.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 14:43:55 -07:00
Cyrill Gorcunov	736f24d5e5	c/r: prctl: drop VMA flags test on PR_SET_MM_ stack data assignment In commit `b76437579d` ("procfs: mark thread stack correctly in proc/<pid>/maps") the stack allocated via clone() is marked in /proc/<pid>/maps as [stack:%d] thus it might be out of the former mm->start_stack/end_stack values (and even has some custom VMA flags set). So to be able to restore mm->start_stack/end_stack drop vma flags test, but still require the underlying VMA to exist. As always note this feature is under CONFIG_CHECKPOINT_RESTORE and requires CAP_SYS_RESOURCE to be granted. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 14:43:55 -07:00
Cyrill Gorcunov	300f786b26	c/r: prctl: add ability to get clear_tid_address Zero is written at clear_tid_address when the process exits. This functionality is used by pthread_join(). We already have sys_set_tid_address() to change this address for the current task but there is no way to obtain it from user space. Without the ability to find this address and dump it we can't restore pthread'ed apps which call pthread_join() once they have been restored. This patch introduces the PR_GET_TID_ADDRESS prctl option which allows the current process to obtain own clear_tid_address. This feature is available iif CONFIG_CHECKPOINT_RESTORE is set. [akpm@linux-foundation.org: fix prctl numbering] Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pedro Alves <palves@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 14:43:55 -07:00
Cyrill Gorcunov	1ad75b9e16	c/r: prctl: add minimal address test to PR_SET_MM Make sure the address being set is greater than mmap_min_addr (as suggested by Kees Cook). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 14:43:55 -07:00
Konstantin Khlebnikov	bafb282df2	c/r: prctl: update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal A fix for commit `b32dfe3771` ("c/r: prctl: add ability to set new mm_struct::exe_file"). After removing mm->num_exe_file_vmas kernel keeps mm->exe_file until final mmput(), it never becomes NULL while task is alive. We can check for other mapped files in mm instead of checking mm->num_exe_file_vmas, and mark mm with flag MMF_EXE_FILE_CHANGED in order to forbid second changing of mm->exe_file. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Kees Cook <keescook@chromium.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-07 14:43:55 -07:00
Paul E. McKenney	aa9b16306e	rcu: Precompute RCU_FAST_NO_HZ timer offsets When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick() calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the next wakeup time based on the timer wheels. Only later, when actually entering the idle loop, rcu_prepare_for_idle() will be invoked. In some cases, rcu_prepare_for_idle() will post timers to wake the CPU back up. But all for naught: The next wakeup time for the CPU has already been computed, and posting a timer afterwards does not force that wakeup time to be recomputed. This means that rcu_prepare_for_idle()'s have no effect. This is not a problem on a busy system because something else will wake up the CPU soon enough. However, on lightly loaded systems, the CPU might stay asleep for a considerable length of time. If that CPU has a callback that the rest of the system is waiting on, the system might run very slowly or (in theory) even hang. This commit avoids this problem by having rcu_needs_cpu() give tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU to wake back up, which tick_nohz_stop_sched_tick() takes into account when programming the CPU's wakeup time. An alternative approach is for rcu_prepare_for_idle() to use hrtimers instead of normal timers, but timers are much more efficient than are hrtimers for frequently and repeatedly posting and cancelling a given timer, which is exactly what RCU_FAST_NO_HZ does. Reported-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr> Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-06-06 20:43:28 -07:00
Paul E. McKenney	5955f7eecd	rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure The RCU_FAST_NO_HZ code relies on a number of per-CPU variables. This works, but is hidden from someone scanning the data structures in rcutree.h. This commit therefore converts these per-CPU variables to fields in the per-CPU rcu_dynticks structures. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-06-06 20:43:28 -07:00
Paul E. McKenney	fd4b352687	rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks In the current code, a short dyntick-idle interval (where there is at least one non-lazy callback on the CPU) and a long dyntick-idle interval (where there are only lazy callbacks on the CPU) are traced identically, which can be less than helpful. This commit therefore emits different event traces in these two cases. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-06-06 20:43:27 -07:00
Paul E. McKenney	8f5af6f1f2	rcu: RCU_FAST_NO_HZ detection of callback adoption In the present implementations of CPU hotplug, the outgoing CPU is guaranteed to run its stop-machine process on the way out, which will guarantee that RCU_FAST_NO_HZ forces the CPU out of dyntick-idle mode. However, new versions of CPU hotplug might not work this way. This commit therefore removes this design constraint by explicitly notifying CPUs when they adopt non-lazy RCU callbacks. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>	2012-06-06 20:43:27 -07:00
Steven Rostedt	f2bf1f6f5f	tracing: Have tracing_off() actually turn tracing off A recent update to have tracing_on/off() only affect the ftrace ring buffers instead of all ring buffers had a cut and paste error. The tracing_off() did the exact same thing as tracing_on() and would not actually turn off tracing. Unfortunately, tracing_off() is more important to be working than tracing_on() as this is a key development tool, as it lets the developer turn off tracing as soon as a problem is discovered. It is also used by panic and oops code. This bug also breaks the 'echo func:traceoff > set_ftrace_filter' Cc: <stable@vger.kernel.org> # 3.4 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2012-06-06 22:15:14 -04:00
Li Zefan	6be96a5c90	cgroup: remove hierarchy_mutex It was introduced for memcg to iterate cgroup hierarchy without holding cgroup_mutex, but soon after that it was replaced with a lockless way in memcg. No one used hierarchy_mutex since that, so remove it. Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-06-06 19:12:30 -07:00
Salman Qazi	967db0ea65	cgroup: make sure that decisions in __css_put are atomic __css_put is using atomic_dec on the ref count, and then looking at the ref count to make decisions. This is prone to races, as someone else may decrement ref count between our decrement and our decision. Instead, we should base our decisions on the value that we decremented the ref count to. (This results in an actual race on Google's kernel which I haven't been able to reproduce on the upstream kernel. Having said that, it's still incorrect by inspection). Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org	2012-06-06 18:51:35 -07:00

1 2 3 4 5 ...

13790 Commits