The runtime PM of sh-sci devices is enabled when sci_probe() returns,
so the pm_runtime_put_sync() executed by driver_probe_device()
attempts to suspend the device. Then, in some situations, a
diagnostic message is printed to the console by one of the runtime
suspend routines handling the sh-sci device, which causes synchronous
runtime resume to be started from the device's own runtime suspend
callback. This causes rpm_resume() to be run eventually, which sees
the RPM_SUSPENDING status set by rpm_suspend() and waits for it to
change. However, the device's runtime PM status cannot change at
that point, because the routine that has set it waits for the
rpm_suspend() to return. A deadlock occurs as a result.
To avoid that make sci_init_single() increment the device's
runtime PM usage counter, so that it cannot be suspended by
driver_probe_device(). That counter has to be decremented
eventually, so make sci_startup() do that before starting to
actually use the device and make sci_shutdown() increment it
again before returning to balance the incrementation carried out by
sci_startup().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Presently the SH7785 code misdefines the UBC clock connection ID in
relation to the other CPUs. This makes it uniform, so that things like
single-stepping work again.
Signed-off-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
It turns out that test-compiling this file on x86-64 doesn't really
help, because much of it is x86-32-specific. And so I hadn't noticed
the slightly over-eager removal of the 'r' from 'addr' variable despite
thinking I had tested it.
Signed-off-by: Linus "oopsie" Torvalds <torvalds@linux-foundation.org>
Several users of "find_vma_prev()" were not in fact interested in the
previous vma if there was no primary vma to be found either. And in
those cases, we're much better off just using the regular "find_vma()",
and then "prev" can be looked up by just checking vma->vm_prev.
The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
commit 83cd904d27: "mm: fix find_vma_prev"), and the whole "return
prev by reference" means that it generates worse code too.
Thus this "let's avoid using this inconvenient and clearly too subtle
interface when we don't really have to" patch.
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull CIFS fixes from Steve French
* git://git.samba.org/sfrench/cifs-2.6:
cifs: fix dentry refcount leak when opening a FIFO on lookup
CIFS: Fix mkdir/rmdir bug for the non-POSIX case
Commit 6bd4837de9 ("mm: simplify find_vma_prev()") broke memory
management on PA-RISC.
After application of the patch, programs that allocate big arrays on the
stack crash with segfault, for example, this will crash if compiled
without optimization:
int main()
{
char array[200000];
array[199999] = 0;
return 0;
}
The reason is that PA-RISC has up-growing stack and the stack is usually
the last memory area. In the above example, a page fault happens above
the stack.
Previously, if we passed too high address to find_vma_prev, it returned
NULL and stored the last VMA in *pprev. After "simplify find_vma_prev"
change, it stores NULL in *pprev. Consequently, the stack area is not
found and it is not expanded, as it used to be before the change.
This patch restores the old behavior and makes it return the last VMA in
*pprev if the requested address is higher than address of any other VMA.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken)
fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
handle_level_irq.
This happens because thread_mask is or'ed unconditionally in
irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So
the check for !desc->thread_active fails and keeps the interrupt
disabled.
Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
Document the thread_mask magic while at it.
Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP
from shared anonymous: hoist the file case's -EINVAL up for both.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull s390 regression fix from Martin Schwidefsky:
"It is a fix for a regression that has been introduced with git commit
25f269f173 - "[S390] qdio: EQBS retry after CCQ 96" - and if possible
we would like to have working code for the fcp data router in 3.3."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
[S390] qdio: fix handler function arguments for zfcp data router
of this driver yet (there's some i.MX platforms which will use it).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPVLHaAAoJEBus8iNuMP3d43kP/jnSYADNYcOSGtljew7Vrcck
GC5P6Gch3ZAG/GwhHjfz0rt2tFG81XpzQj3T7bqepxX/GBBB7wt2GOXhDXZoZpot
rsm5+YjeFtAfcXHmYQ22NVQKt4+2Lb80Ltsqu6+uSYtwlAAsFnip9+6AYfXE4Z/W
JbxWZoeROUGutEGBAckOWOgUUhuZauzYynxSTio1bVqF/eDa3qAQbsgDTERkR75T
glX9+sV4zUK20MUeNhlaLVlzhV0jBDcOikmp1kfu0q0s9ossn4KB7s/cACeXc4jw
sV/qFipgS0tHnL14GI2RSIZImCqyqpIafAm4LkK1vmhz93YrTwy45PpnTRG6FLxE
MJa6qK6n+Aw5I5E7pNJlcNoByVuo2Fis1Cq0cOgEaFHHzlIL6XtTqGVbGYa45qHh
uPcTrwCGjdBUhIaR00OC7nK0BiFBtgPf2TNTIh9GorGMzTpV1CGtrThxZq7gaO/3
ncCv/mgdq5NX21fWHl7/FwnNOBpcsia8CeSKnqk2wx6rOwAGAty5zzPpRyRGg4Gm
qJNHNKe6Y4H40fYRY05Kk/kzXXM2jSfBlhCl2zL9XeHfI/dnY+sn5K/ZPIusnpDh
o3cIK4EpAAxa0xYfgM/h7DOlVM3Egn+s5WW6bouNBTU0q2Aj6vV7V7Vt25CBvBha
Io8SAHcXeaH8OEGqQrXO
=ZXaD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator updates from Mark Brown:
"A simple fix that's obvious from inspection. There's no mainline
users of this driver yet (there's some i.MX platforms which will use
it)."
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: Fix mask parameter in da9052_reg_update calls
kasprintf() (and potentially other functions that I didn't run across so
far) want to evaluate argument lists twice. Caring to do so for the
primary list is obviously their job, but they can't reasonably be
expected to check the format string for instances of %pV, which however
need special handling too: On architectures like x86-64 (as opposed to
e.g. ix86), using the same argument list twice doesn't produce the
expected results, as an internally managed cursor gets updated during
the first run.
Fix the problem by always acting on a copy of the original list when
handling %pV.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Why is memcg's swap accounting so broken? Insane counts, wrong
ownership, unfreeable structures, which later get freed and then
accessed after free.
Turns out to be a tiny a little 3.3-rc1 regression in 9fb4b7cc07
"page_cgroup: add helper function to get swap_cgroup": the helper
function (actually named lookup_swap_cgroup()) returns an address using
void* arithmetic, but the structure in question is a short.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Bob Liu <lliubbo@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'fixes' of git://github.com/hzhuang1/linux: (3 commits)
ARM: pxa: fix invalid mfp pin issue
ARM: pxa: remove duplicated registeration on pxa-gpio
ARM: pxa: add dummy clock for pxa25x and pxa27x
Includes an update to v3.3-rc6
As done for the other ep93xx machines in:
commit 9a6879bd90
ARM: ep93xx: convert to MULTI_IRQ_HANDLER
Now that there is a generic IRQ handler for multiple VIC devices use it
for vision_ep9307 to help building multi platform kernels.
Signed-off-by: Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Ryan Mallon <rmallon@gmail.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Failure is reported on hx4700 with kernel v3.3-rc1.
__mfp_validate: GPIO20 is invalid pin
__mfp_validate: GPIO21 is invalid pin
__mfp_validate: GPIO15 is invalid pin
__mfp_validate: GPIO78 is invalid pin
__mfp_validate: GPIO79 is invalid pin
__mfp_validate: GPIO80 is invalid pin
__mfp_validate: GPIO33 is invalid pin
__mfp_validate: GPIO48 is invalid pin
__mfp_validate: GPIO49 is invalid pin
__mfp_validate: GPIO50 is invalid pin
Since pxa_last_gpio is used in mfp-pxa2xx driver. But it's only
updated in pxa-gpio driver that run after mfp-pxa2xx driver.
So update the pxa_last_gpio first in mfp-pxa2xx driver.
Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Both reboot (via reboot(RB_AUTOBOOT)) and suspend freeze on hx4700.
Registration of pxa_gpio_syscore_ops is moved into pxa-gpio driver,
but it still exists in arch-pxa directory. It resulsts failure on
reboot and suspend.
Now remove the registration code in arch-pxa.
Reported-by: Paul Parsons <lost.distance@yahoo.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
gpio-pxa driver is shared among arch-pxa and arch-mmp. Clock is the
essential component on pxa3xx/pxa95x and arch-mmp. So we need to
define dummy clock in pxa25x/pxa27x instead.
This regression was introduced by the commit "ARM: pxa: add dummy
clock for sa1100-rtc", id a55b5adaf4.
Reported-by: Jonathan Cameron <jic23@cam.ac.uk>
Signed-off-by: Paul Parsons <lost.distance@yahoo.com>
Tested-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Pull perf fixes from Ingo Molnar:
"It contains three cherry-picked fixes from perf/core, which turned out
to be more urgent than we originally thought."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Handle kernels that don't support attr.exclude_{guest,host}
perf tools: Change perf_guest default back to false
perf record: No build id option fails
There is just one patch in here, a revert of a powerpc EHCI driver
patch that was reported to cause problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk9VTXkACgkQMUfUDdst+ylV+wCg0LCngetBRR4J7Tu+fxfIBS00
z6YAni9fZFigFsapZqiypbSVrZ6FARQs
=g7Br
-----END PGP SIGNATURE-----
Merge tag 'usb-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
USB: revert a powerpc EHCI patch
There is just one patch in here, a revert of a powerpc EHCI driver
patch that was reported to cause problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tag 'usb-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "powerpc/usb: fix issue of CPU halt when missing USB PHY clock"
This contains one build fix for the powerpc udbg driver that was reported.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iEYEABECAAYFAk9VTK0ACgkQMUfUDdst+ykAGACeLDYE9U586NNUAGcHALtb6AtT
R1IAoK4NgsUvzxkp8XOlUYUar1DulcZB
=0xfn
-----END PGP SIGNATURE-----
Merge tag 'tty-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
tty: build fix for 3.3-rc6
This contains one build fix for the powerpc udbg driver that was reported.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tag 'tty-3.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty/powerpc: early udbg consoles can't be modules
2 relate to the recently added drive replacement.
One causes read error in RAID10 to sometimes be retried indefinitely.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIVAwUAT1VI1znsnt1WYoG5AQK47Q//d51y5QCpABFNUcgIM626zJXlBWFUSmzU
wFOGXh5emN6/TWguzkiZwrvcspDmXMzz1zmJtGWixYb2jBpn2MHEN4uNz3Vq68w+
IYk/dJg/CG4+lzX+6IjiHOb3+TASRx94QZHJASx68vypqniAyikshqcbUeZBMTB0
Fu+sKqsOGYmwQfe6/vtRPVXY7DYK2dFDBRMFpmOl+o4Y2XxmmWzMw4Dg1RIEdtFS
Jo9GwLHTnlw2xoc0XooufeT0Q2KOpqi9T8L6Nj0ORwpgsFqgtZ/kIOoGU6qOpSri
ofLTrobVKMpjFtmiYVOp9TaBlPnd/TNX3E4WPLGNsAwYuRUFjq8evmJKjG+pOdeB
3ArxRKRJCaI2jnVhH+NpT7i/tpkEg/8a/BoOAihX+hM/8QkmsWluaRBOGMhpuuuc
1baPVTusi/zijO9cM8RGIXaQj5UG4s3LUpCIOIYdDyxsfmAH5KN1F2EPrU4NMME2
96THSshIZLkgAg5ICwtva0qoHlBlEclAlVAzEomT7R9KwHojEB1xUiyMmaIdMFoy
JjGFAMp2E5+KBKZ1eYEHjthPWCb+nZ3eYHUh0DOnEt4kASCXnn45GJREQkpkNIR/
HhDTS8vI743unKnbCtYFMxiw/9OXZbMkdoZhobg7lxcpoQlWJ+5ziOtACl0h0Kv8
+ET+Kp3W8K4=
=93ms
-----END PGP SIGNATURE-----
Merge tag 'md-3.3-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
"Three fixes for md in 3.3-rc: Two relate to the recently added drive
replacement. One fixes the problem where a read error in RAID10 would
sometimes be retried indefinitely."
* tag 'md-3.3-fixes' of git://neil.brown.name/md:
md/raid10: fix assembling of arrays with replacement devices.
md/raid10: fix handling of error on last working device in array.
md/raid1: fix buglet in md_raid1_contested.
Merge the emailed seties of 19 patches from Andrew Morton
* akpm:
rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
memcg: fix mapcount check in move charge code for anonymous page
mm: thp: fix BUG on mm->nr_ptes
alpha: fix 32/64-bit bug in futex support
memcg: fix GPF when cgroup removal races with last exit
debugobjects: Fix selftest for static warnings
floppy/scsi: fix setting of BIO flags
memcg: fix deadlock by inverting lrucare nesting
drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
c2port: class_create() returns an ERR_PTR
pps: class_create() returns an ERR_PTR, not NULL
hung_task: fix the broken rcu_lock_break() logic
vfork: kill PF_STARTING
coredump_wait: don't call complete_vfork_done()
vfork: make it killable
vfork: introduce complete_vfork_done()
aio: wake up waiters when freeing unused kiocbs
kprobes: return proper error code from register_kprobe()
kmsg_dump: don't run on non-error paths by default
Fix a bug that causes a kernel panic when the number of received doorbells
is larger than number of entries in the inbound doorbell queue (current
default value = 512).
Another possible indication for this bug is large number of spurious
doorbells reported by tsi721 driver after reaching the queue size maximum.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org> [3.2.x+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the charge on shared anonyous pages is supposed not to moved in
task migration. To implement this, we need to check that mapcount > 1,
instread of > 2. So this patch fixes it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
in exit_mmap() recently.
Quoting Hugh's discovery and explanation of the SMP race condition:
"mm->nr_ptes had unusual locking: down_read mmap_sem plus
page_table_lock when incrementing, down_write mmap_sem (or mm_users
0) when decrementing; whereas THP is careful to increment and
decrement it under page_table_lock.
Now most of those paths in THP also hold mmap_sem for read or write
(with appropriate checks on mm_users), but two do not: when
split_huge_page() is called by hwpoison_user_mappings(), and when
called by add_to_swap().
It's conceivable that the latter case is responsible for the
exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."
The simplest way to fix it without having to alter the locking is to make
split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
pagetables that exists for every mapped hugepage. It was an arbitrary
choice not to count them and either way is not wrong or right, because
they are not used but they're still allocated.
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org> [3.0.x, 3.1.x, 3.2.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Cree said:
: : I have noticed some user space problems (pulseaudio crashes in pthread
: : code, glibc/nptl test suite failures, java compiler freezes on SMP alpha
: : systems) that arise when using a 2.6.39 or later kernel on Alpha.
: : Bisecting between 2.6.38 and 2.6.39 (using glibc/nptl test suite as
: : criterion for good/bad kernel) eventually leads to:
: :
: : 8d7718aa08 is the first bad commit
: : commit 8d7718aa08
: : Author: Michel Lespinasse <walken@google.com>
: : Date: Thu Mar 10 18:50:58 2011 -0800
: :
: : futex: Sanitize futex ops argument types
: :
: : Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
: : prototypes to use u32 types for the futex as this is the data type the
: : futex core code uses all over the place.
: :
: : Looking at the commit I see there is a change of the uaddr argument in
: : the Alpha architecture specific code for futexes from int to u32, but I
: : don't see why this should cause a problem.
Richard Henderson said:
: futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
: u32 oldval, u32 newval)
: ...
: : "r"(uaddr), "r"((long)oldval), "r"(newval)
:
:
: There is no 32-bit compare instruction. These are implemented by
: consistently extending the values to a 64-bit type. Since the
: load instruction sign-extends, we want to sign-extend the other
: quantity as well (despite the fact it's logically unsigned).
:
: So:
:
: - : "r"(uaddr), "r"((long)oldval), "r"(newval)
: + : "r"(uaddr), "r"((long)(int)oldval), "r"(newval)
:
: should do the trick.
Michael said:
: This fixes the glibc test suite failures and the pulseaudio related
: crashes, but it does not fix the java compiiler lockups that I was (and
: are still) observing. That is some other problem.
Reported-by: Michael Cree <mcree@orcon.net.nz>
Tested-by: Michael Cree <mcree@orcon.net.nz>
Acked-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When moving tasks from old memcg (with move_charge_at_immigrate on new
memcg), followed by removal of old memcg, hit General Protection Fault in
mem_cgroup_lru_del_list() (called from release_pages called from
free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
exit_mmap from mmput from exit_mm from do_exit).
Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
still trying to update its stats, and take page off lru before freeing.
A task, or a charge, or a page on lru: each secures a memcg against
removal. In this case, the last task has been moved out of the old memcg,
and it is exiting: anonymous pages are uncharged one by one from the
memcg, as they are zapped from its pagetables, so the charge gets down to
0; but the pages themselves are queued in an mmu_gather for freeing.
Most of those pages will be on lru (and force_empty is careful to
lru_add_drain_all, to add pages from pagevec to lru first), but not
necessarily all: perhaps some have been isolated for page reclaim, perhaps
some isolated for other reasons. So, force_empty may find no task, no
charge and no page on lru, and let the removal proceed.
There would still be no problem if these pages were immediately freed; but
typically (and the put_page_testzero protocol demands it) they have to be
added back to lru before they are found freeable, then removed from lru
and freed. We don't see the issue when adding, because the
mem_cgroup_iter() loops keep their own reference to the memcg being
scanned; but when it comes to mem_cgroup_lru_del_list().
I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
38c5d72f3e ("memcg: simplify LRU handling by new rule") mercifully
removed those convolutions, but left this General Protection Fault.
But it's surprisingly easy to restore the old behaviour: just check
PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
to add), and reset pc to root_mem_cgroup if page is uncharged. A risky
change? just going back to how it worked before; testing, and an audit of
uses of pc->mem_cgroup, show no problem.
And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
longer has any purpose, and we can safely revert 4e5f01c2b9 ("memcg:
clear pc->mem_cgroup if necessary").
Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
is not strictly necessary: the lru_lock there, with RCU before memcg
structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
without that; but it seems cleaner to rely on one dependency less.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
debugobjects is now printing a warning when a fixup for a NOTAVAILABLE
object is run. This causes the selftest to fail like:
ODEBUG: selftest warnings failed 4 != 5
We could just increase the number of warnings that the selftest is
expecting to see because that is actually what has changed. But, it turns
out that fixup_activate() was written with inverted logic and thus a fixup
for a static object returned 1 indicating the object had been fixed, and 0
otherwise. Fix the logic to be correct and update the counts to reflect
that nothing needed fixing for a static object.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have forgotten the rules of lock nesting: the irq-safe ones must be
taken inside the non-irq-safe ones, otherwise we are open to deadlock:
CPU0 CPU1
---- ----
lock(&(&pc->lock)->rlock);
local_irq_disable();
lock(&(&zone->lru_lock)->rlock);
lock(&(&pc->lock)->rlock);
<Interrupt>
lock(&(&zone->lru_lock)->rlock);
To check a different locking issue, I happened to add a spin_lock to
memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare to
__mem_cgroup_commit_charge() instead, taking zone->lru_lock under
lock_page_cgroup() in the lrucare case.
The original was using spin_lock_irqsave, but we'd be in more trouble if
it were ever called at interrupt time: unconditional _irq is enough. And
ClearPageLRU before del from lru, SetPageLRU before add to lru: no strong
reason, but that is the ordering used consistently elsewhere.
Fixes 36b62ad539 ("memcg: simplify corner case handling
of LRU").
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If probing the RTC didn't succeed due to failed RTC register access, the
RTC device will be unregistered. Then, when removing the module
r9701_remove() causes a kernel crash while trying to unregister a not
registered RTC device. Fix this by doing RTC register access test before
RTC device registration.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
class_create() doesn't return a NULL, it only returns ERR_PTRs.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
"softlockup: check all tasks in hung_task" commit ce9dbe24 looks
absolutely wrong.
- rcu_lock_break() does put_task_struct(). If the task has exited
it is not safe to even read its ->state, nothing protects this
task_struct.
- The TASK_DEAD checks are wrong too. Contrary to the comment, we
can't use it to check if the task was unhashed. It can be unhashed
without TASK_DEAD, or it can be valid with TASK_DEAD.
For example, an autoreaping task can do release_task(current)
long before it sets TASK_DEAD in do_exit().
Or, a zombie task can have ->state == TASK_DEAD but release_task()
was not called, and in this case we must not break the loop.
Change this code to check pid_alive() instead, and do this before we drop
the reference to the task_struct.
Note: while_each_thread() under rcu_read_lock() is not really safe, it can
livelock. This will be fixed later, but fortunately in this case the
"max_count" logic saves us anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Mandeep Singh Baines <msb@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously it was (ab)used by utrace. Then it was wrongly used by the
scheduler code.
Currently it is not used, kill it before it finds the new erroneous user.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done(). zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.
mm_release() becomes the only caller, unexport complete_vfork_done().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make vfork() killable.
Change do_fork(CLONE_VFORK) to do wait_for_completion_killable(). If it
fails we do not return to the user-mode and never touch the memory shared
with our child.
However, in this case we should clear child->vfork_done before return, we
use task_lock() in do_fork()->wait_for_vfork_done() and
complete_vfork_done() to serialize with each other.
Note: now that we use task_lock() we don't really need completion, we
could turn task->vfork_done into "task_struct *wake_up_me" but this needs
some complications.
NOTE: this and the next patches do not affect in-kernel users of
CLONE_VFORK, kernel threads run with all signals ignored including
SIGKILL/SIGSTOP.
However this is obviously the user-visible change. Not only a fatal
signal can kill the vforking parent, a sub-thread can do execve or
exit_group() and kill the thread sleeping in vfork().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes.
Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bart Van Assche reported a hung fio process when either hot-removing
storage or when interrupting the fio process itself. The (pruned) call
trace for the latter looks like so:
fio D 0000000000000001 0 6849 6848 0x00000004
ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
Call Trace:
schedule+0x3f/0x60
io_schedule+0x8f/0xd0
wait_for_all_aios+0xc0/0x100
exit_aio+0x55/0xc0
mmput+0x2d/0x110
exit_mm+0x10d/0x130
do_exit+0x671/0x860
do_group_exit+0x44/0xb0
get_signal_to_deliver+0x218/0x5a0
do_signal+0x65/0x700
do_notify_resume+0x65/0x80
int_signal+0x12/0x17
The problem lies with the allocation batching code. It will
opportunistically allocate kiocbs, and then trim back the list of iocbs
when there is not enough room in the completion ring to hold all of the
events.
In the case above, what happens is that the pruning back of events ends
up freeing up the last active request and the context is marked as dead,
so it is thus responsible for waking up waiters. Unfortunately, the
code does not check for this condition, so we end up with a hung task.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org> [3.2.x only]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
register_kprobe() aborts if the address of the new request falls in a
prohibited area (such as ftrace pouch, __kprobes annotated functions,
non-kernel text addresses, jump label text). We however don't return the
right error on this abort, resulting in a silent failure - incorrect
adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe
mcount' for instance).
In V2 we are incorporating Masami Hiramatsu's feedback.
This patch fixes it by returning -EINVAL upon failure.
While we are here, rename the label used for exit to be more appropriate.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Prashanth K Nageshappa <prashanth@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since commit 04c6862c05 ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.
This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace. In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.
This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace). Without it, the code will only be run on
failure paths rather than on normal paths. The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 56a2559bb6 (md/raid10: recognise replacements ...)
changed 'run' to set ->replacement or ->rdev depending on the
'Replacement' status if the device, but it didn't remove the
old unconditional setting of 'rdev'. So it was largely ineffective.
So remove that now.
Signed-off-by: NeilBrown <neilb@suse.de>
Production GMA3600/3650 hardware turns out to be subtly different to the
development platforms. This combined with a minor driver bug is causing
the kernel to hang on these platforms.
This patch does the following
- turn down a couple of messages that were meant to be debug and are
causing much confusion
- ensure the hotplug interrupt is disabled on Cedartrail systems.
- fix a bug where gtt roll mode called psbfb_sync, which tries to sync
the 2D engine. On other devices it is harmless as the 2D engine is
present but not in use when in gtt roll mode, on Cedartrail it causes
a hang
Without these changes 3.3-rc hangs on boot on Cedartrail based systems.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) TCP SACK processing can calculate an incorrect reordering value in
some cases, fix from Neal Cardwell.
2) tcp_mark_head_lost() can split SKBs in situations where it should
not, violating send queue invariants expected by other pieces of
code and thus resulting (eventually) in corrupted retransmit state
counters. Also from Neal Cardwell.
3) qla3xxx erroneously calls spin_lock_irqrestore() with constant
hw_flags of zero. Fix from Santosh Nayak.
4) Fix NULL deref in rt2x00, from Gabor Juhos.
5) pch_gbe passes address of wrong typed object to pch_gbe_validate_option
thus corrupting part of the value. From Dan Carpenter.
6) We must check the return value of nlmsg_parse() before trying to use
the results. From Eric Dumazet.
7) Bridging code fails to check return value of ipv6_dev_get_saddr()
thus potentially leaving uninitialized garbage in the outgoing ipv6
header. From Ulrich Weber.
8) Due to rounding and a reversed operation on jiffies, bridge message
ages can go backwards instead of forwards, thus breaking STP. Fixes
from Joakim Tjernlund.
9) r8169 modifies Config* registers without properly holding the
Config9346 lock, resulting in corrupted IP fragments on some chips.
Fix from Francois Romieu.
10) NET_PACKET_ENGINE default wan't set properly during the network
driver mega-move. Fix from Stephen Hemminger.
11) vmxnet3 uses TCP header size where it actually should use the UDP
header size, fix from Shreyas Bhatewara.
12) Netfilter bridge module autoload is busted in the compat case, fix
from Florian Westphal.
13) Wireless Key removal was not setting multicast bits correctly thus
accidently killing the unicast key 0 and thus all traffic stops.
Fix from Johannes Berg.
14) Fix endless retries of A-MPDU transmissions in brcm80211 driver.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
qla3xxx: ethernet: Fix bogus interrupt state flag.
bridge: check return value of ipv6_dev_get_saddr()
rtnetlink: fix rtnl_calcit() and rtnl_dump_ifinfo()
bridge: message age needs to increase, not decrease.
bridge: Adjust min age inc for HZ > 256
tcp: don't fragment SACKed skbs in tcp_mark_head_lost()
r8169: corrupted IP fragments fix for large mtu.
packetengines: fix config default
vmxnet3: Fix transport header size
enic: fix an endian bug in enic_probe()
pch_gbe: memory corruption calling pch_gbe_validate_option()
tg3: Fix tg3_get_stats64 for 5700 / 5701 devs
tcp: fix false reordering signal in tcp_shifted_skb
tcp: fix comment for tp->highest_sack
netfilter: bridge: fix module autoload in compat case
brcm80211: smac: only print block-ack timeout message at trace level
brcm80211: smac: fix endless retry of A-MPDU transmissions
mac80211: Fix a warning on changing to monitor mode from STA
mac80211: zero initialize count field in ieee80211_tx_rate
iwlwifi: fix key removal
...
Pull PCI fixes from Jesse Barnes:
"A couple of fixes for booting specific machines, and one for a minor
memory leak on pre-_CRS platforms."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
x86/PCI: do not tie MSI MS-7253 use_crs quirk to BIOS version
x86/PCI: use host bridge _CRS info on MSI MS-7253
PCI: fix memleak when ACPI _CRS is not used.