This reverts commit f83331bab1.
As the tests PPC64 (powernv platform) show, IOMMU pages are leaking
when transferring big amount of small packets (<=64 bytes),
"ping -f" and waiting for 15 seconds is the simplest way to confirm the bug.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Santosh Rastapur <santosh@chelsio.com>
Cc: Jay Fenlason <fenlason@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Divy Le ray <divy@chelsio.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible for some versions of firmware to advertise capabilities that driver
is not ready to handle. This may lead to controller stall. Since the driver is
interested only in subset of flags, clearing the rest.
Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't make sense to output a tunnel packet using the same
parameters that it was received with since that will generally
just result in the packet going back to us. As a result, userspace
assumes that the tunnel key is cleared when transitioning through
the switch. In the majority of cases this doesn't matter since a
packet is either going to a tunnel port (in which the key is
overwritten with new values) or to a non-tunnel port (in which
case the key is ignored). However, it's theoreticaly possible that
userspace could rely on the documented behavior, so this corrects
it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Flex array is used to allocate hash buckets which is type struct
hlist_head, but we use `struct hlist_head *` to calculate
array size. Since hlist_head is of size pointer it works fine.
Following patch use correct type.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Make sure to fail properly if the device is not accepted during attach
in order to avoid null-pointer derefs (of missing interface private
data) at disconnect or release.
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The parallel-port code of the drivers used a stack allocated
control-request buffer for asynchronous (and possibly deferred) control
requests. This not only violates the no-DMA-from-stack requirement but
could also lead to corrupt control requests being submitted.
Cc: stable@vger.kernel.org
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These devices tend to become unresponsive after S3
Signed-off-by: Oliver Neukum <oneukum@suse.de>
CC: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If we get an error event really early in the driver setup sequence,
which gen3 is especially prone to with various display GTT faults we
Oops. So try to avoid this.
Additionally with Haswell the transcoders are a separate bank of
registers from the pipes (4 transcoders, 3 pipes). In event of an
error, we want to be sure we have a complete and accurate picture of
the machine state, so record all the transcoders in addition to all
the active pipes.
This regression has been introduced in
commit 702e7a56af
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Tue Oct 23 18:29:59 2012 -0200
drm/i915: convert PIPECONF to use transcoder instead of pipe
Based on the patch "drm/i915: Dump all transcoder registers on error"
from Chris Wilson:
v2: Rebase so that we don't try to be clever and try to figure out the
cpu transcoder from hw state. That exercise should be done when we
analyze the error state offline.
The actual bugfix is to not call intel_pipe_to_cpu_transcoder in the
error state capture code in case the pipes aren't fully set up yet.
v3: Simplifiy the err->num_transcoders computation a bit. While at it
make the error capture stuff save on systems without a display block.
v4: Fix fail, spotted by Jani.
v5: Completely new commit message, cc: stable.
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=60021
Cc: stable@vger.kernel.org
Tested-by: Dustin King <daking@rescomp.stanford.edu>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Merge a bunch of fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
x86 get_unmapped_area(): use proper mmap base for bottom-up direction
ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
ocfs2: Revert 40bd62e to avoid regression in extended allocation
drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
hugetlb: fix lockdep splat caused by pmd sharing
aoe: adjust ref of head for compound page tails
microblaze: fix clone syscall
mm: save soft-dirty bits on file pages
mm: save soft-dirty bits on swapped pages
memcg: don't initialize kmem-cache destroying work for root caches
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
has copied the boot_cpu_data ontop and overwritten an already empty
structure (which we shouldn't access that early in the first place
anyway).
The second patch is kinda largish for that late in the game but it
shouldn't be problematic because we're simply switching from using
cpu_data to use the CPU family number directly and thus again, not use
uninitialized cpu_data structure.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSCT0UAAoJEBLB8Bhh3lVKWNsQAJbu1y79mc2vHR15NqMGXVoP
F6sUNC3iUnNjxlUxWn3PPynsJS9zFTBEo1S6ILnsfuMWliDoH+4psHLMmG5F20TV
XJCSS9s3fdsaJZiNa7JXggdI80xU3y8uB/z3nacRDKgcdw3tX+ITP4SdPgNF5F3k
B/XGAFxyXfl9M9SRFMIO4GS9UyFMsDQP0VWMRhcosY1/Fj5bc9Uds1ngP9E4XMe9
jB1pIzbvtDvtolYSdShwFE9M0os90TYPHVSgriYYXHLZPryZC8mWkc7GwSTl4bG9
EwBGXv79SJWics9vWM2n9EQbFt46fUAyhpjyOo/fqVp4L5XjKuV7UyeT7X+bsY0q
b3z1jWzBxGZ5ltE2BmZ8tVnfKgwYJlKgAf8FbSrylhtBInP80Wau1aYTzw50h9D2
lmjO/rSOhm2FszdeFHG/IeVbSZJbSFrUdwm51nx2xF1qG0MXldy9ehJ4pO3Ui2XA
d0z2y83ZxOYV723fTCa1/5C5xAes7LjvxftM32G9QTu7R5XnvVrfehVfPh9K1DJA
nIEH6pbM358/PT+/q7BCPTnH7KVi+mE633YERDOfAgz/NFnVKfKzLBM0+AyFNHjk
a4xFrOqVbTctW1Chv8Nh1457A2+gcT8yeju1XjbLE8IMqKOGyMt0gnRZlgMYj8BH
WYKWi2q0LaDwsOcaQ9bm
=cxbY
-----END PGP SIGNATURE-----
Merge tag 'amd_ucode_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent
Pull AMD microcode fixes from Borislav Petkov:
" Those are basically two fixes which correct the AMD early ucode loader
from accessing cpu_data too early, i.e. before smp_store_cpu_info()
has copied the boot_cpu_data ontop and overwritten an already empty
structure (which we shouldn't access that early in the first place
anyway).
The second patch is kinda largish for that late in the game but it
shouldn't be problematic because we're simply switching from using
cpu_data to use the CPU family number directly and thus again, not use
uninitialized cpu_data structure. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Explicitly truncate the second operand of do_div() to 32 bits to guard
against bogus code calling it with a 64-bit divisor.
[Thorsten]
After upgrading from 3.2 to 3.10, mounting a btrfs volume fails with:
btrfs: setting nodatacow, compression disabled
btrfs: enabling auto recovery
btrfs: disk space caching is enabled
*** ZERO DIVIDE *** FORMAT=2
Current process id is 722
BAD KERNEL TRAP: 00000000
Modules linked in: evdev mac_hid ext4 crc16 jbd2 mbcache btrfs xor lzo_compress zlib_deflate raid6_pq crc32c libcrc32c
PC: [<319535b2>] __btrfs_map_block+0x11c/0x119a [btrfs]
SR: 2000 SP: 30c1fab4 a2: 30f0faf0
d0: 00000000 d1: 00001000 d2: 00000000 d3: 00000000
d4: 00010000 d5: 00000000 a0: 3085c72c a1: 3085c72c
Process mount (pid: 722, task=30f0faf0)
Frame format=2 instr addr=319535ae
Stack from 30c1faec:
00000000 00000020 00000000 00001000 00000000 01401000 30253928 300ffc00
00a843ac 3026f640 00000000 00010000 0009e250 00d106c0 00011220 00000000
00001000 301c6830 0009e32a 000000ff 00000009 3085c72c 00000000 00000000
30c1fd14 00000000 00000020 00000000 30c1fd14 0009e26c 00000020 00000003
00000000 0009dd8a 300b0b6c 30253928 00a843ac 00001000 00000000 00000000
0000a008 3194e76a 30253928 00a843ac 00001000 00000000 00000000 00000002
Call Trace: [<00001000>] kernel_pg_dir+0x0/0x1000
[...]
Code: 222e ff74 2a2e ff5c 2c2e ff60 4c45 1402 <2d40> ff64 2d41 ff68 2205 4c2e 1800 ff68 4c04 0800 2041 d1c0 2206 4c2e 1400 ff68
[Geert]
As diagnosed by Andreas, fs/btrfs/volumes.c:__btrfs_map_block()
calls
do_div(stripe_nr, stripe_len);
with stripe_len u64, while do_div() assumes the divisor is a 32-bit number.
Due to the lack of truncation in the m68k-specific implementation of
do_div(), the division is performed using the upper 32-bit word of
stripe_len, which is zero.
This was introduced by commit 53b381b3ab
("Btrfs: RAID5 and RAID6"), which changed the divisor from
map->stripe_len (struct map_lookup.stripe_len is int) to a 64-bit temporary.
Reported-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
As pointed out by Andreas Schwab, pointers passed to ARAnyM NatFeat calls
should be physical addresses, not virtual addresses.
Fortunately on Atari, physical and virtual kernel addresses are the same,
as long as normal kernel memory is concerned, so this usually worked fine
without conversion.
But for modules, pointers to literal strings are located in vmalloc()ed
memory. Depending on the version of ARAnyM, this causes the nf_get_id()
call to just fail, or worse, crash ARAnyM itself with e.g.
Gotcha! Illegal memory access. Atari PC = $968c
This is a big issue for distro kernels, who want to have all drivers as
loadable modules in an initrd.
Add a wrapper for nf_get_id() that copies the literal to the stack to
work around this issue.
Reported-by: Thorsten Glaser <tg@debian.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Since we set "len = total_len" in the beginning of tun_get_user(),
so we should compare the new len with 0, instead of total_len,
or the if statement always returns false.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the iproute2 command `bridge vlan show`, after switching from
rtgenmsg to ifinfomsg.
Let's start with a little history:
Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in
the 3.9 merge window.
In the kernel commit 6cbdceeb, he added attribute support to
bridge GETLINK requests sent with rtgenmsg.
Mar 6th: Vlad got this iproute2 reference implementation of the bridge
vlan netlink interface accepted (iproute2 9eff0e5c)
Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
http://patchwork.ozlabs.org/patch/239602/http://marc.info/?t=136680900700007
Apr 28th: Linus released 3.9
Apr 30th: Stephen released iproute2 3.9.0
The `bridge vlan show` command haven't been working since the switch to
ifinfomsg, or in a released version of iproute2. Since the kernel side
only supports rtgenmsg, which iproute2 switched away from just prior to
the iproute2 3.9.0 release.
I haven't been able to find any documentation, about neither rtgenmsg
nor ifinfomsg, and in which situation to use which, but kernel commit
88c5b5ce seams to suggest that ifinfomsg should be used.
Fixing this in kernel will break compatibility, but I doubt that anybody
have been using it due to this bug in the user space reference
implementation, at least not without noticing this bug. That said the
functionality is still fully functional in 3.9, when reversing iproute2
commit 63338dca.
This could also be fixed in iproute2, but thats an ugly patch that would
reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
like rtgenmsg usage is discouraged. I'm assuming that the only reason
that Vlad implemented the kernel side to use rtgenmsg, was because
iproute2 was using it at the time.
Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net>
Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recently we met quite a lot of random kernel panic issues after enabling
CONFIG_PROC_PAGE_MONITOR. After debuggind we found this has something
to do with following bug in pagemap:
In struct pagemapread:
struct pagemapread {
int pos, len;
pagemap_entry_t *buffer;
bool v2;
};
pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
buffer, it is a mistake to compare pos and len in add_page_map() for
checking buffer is full or not, and this can lead to buffer overflow and
random kernel panic issue.
Correct len to be total number of PM_ENTRY_BYTES in buffer.
[akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
Signed-off-by: Yonghua Zheng <younghua.zheng@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All architectures include "kernel/Kconfig.freezer" except three left, so
let them include it too, or 'allmodconfig' will report error.
The related errors: (with allmodconfig for openrisc):
CC kernel/cgroup_freezer.o
kernel/cgroup_freezer.c: In function 'freezer_css_online':
kernel/cgroup_freezer.c:133:15: error: 'system_freezing_cnt' undeclared (first use in this function)
kernel/cgroup_freezer.c:133:15: note: each undeclared identifier is reported only once for each function it appears in
kernel/cgroup_freezer.c: In function 'freezer_css_offline':
kernel/cgroup_freezer.c:157:15: error: 'system_freezing_cnt' undeclared (first use in this function)
kernel/cgroup_freezer.c: In function 'freezer_attach':
kernel/cgroup_freezer.c:200:4: error: implicit declaration of function 'freeze_task'
kernel/cgroup_freezer.c: In function 'freezer_apply_state':
kernel/cgroup_freezer.c:371:16: error: 'system_freezing_cnt' undeclared (first use in this function)
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the stack is set to unlimited, the bottomup direction is used for
mmap-ings but the mmap_base is not used and thus effectively renders
ASLR for mmapings along with PIE useless.
Cc: Michel Lespinasse <walken@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Sendroiu <molecula2788@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since ocfs2_cow_file_pos will invoke ocfs2_refcount_icow with a NULL as
the struct file pointer, it finally result in a null pointer dereference
in ocfs2_duplicate_clusters_by_page.
This patch replace file pointer with inode pointer in
cow_duplicate_clusters to fix this issue.
[jeff.liu@oracle.com: rebased patch against linux-next tree]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Acked-by: Tao Ma <tm@tao.ma>
Tested-by: David Weber <wb@munzinger.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 40bd62eb7f ("fs/ocfs2/journal.h: add bits_wanted while
calculating credits in ocfs2_calc_extend_credits").
Unfortunately this change broke fallocate even if there is insufficient
disk space for the preallocation, which is a serious problem.
# df -h
/dev/sda8 22G 1.2G 21G 6% /ocfs2
# fallocate -o 0 -l 200M /ocfs2/testfile
fallocate: /ocfs2/test: fallocate failed: No space left on device
and a kernel warning:
CPU: 3 PID: 3656 Comm: fallocate Tainted: G W O 3.11.0-rc3 #2
Call Trace:
dump_stack+0x77/0x9e
warn_slowpath_common+0xc4/0x110
warn_slowpath_null+0x2a/0x40
start_this_handle+0x6c/0x640 [jbd2]
jbd2__journal_start+0x138/0x300 [jbd2]
jbd2_journal_start+0x23/0x30 [jbd2]
ocfs2_start_trans+0x166/0x300 [ocfs2]
__ocfs2_extend_allocation+0x38f/0xdb0 [ocfs2]
ocfs2_allocate_unwritten_extents+0x3c9/0x520
__ocfs2_change_file_space+0x5e0/0xa60 [ocfs2]
ocfs2_fallocate+0xb1/0xe0 [ocfs2]
do_fallocate+0x1cb/0x220
SyS_fallocate+0x6f/0xb0
system_call_fastpath+0x16/0x1b
JBD2: fallocate wants too many credits (51216 > 4381)
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's always a bad idea to poll on HW bits without a timeout.
The i.MX28 RTC can be easily brought into a state in which the RTC is
not running (until after a power-on-reset) and thus the status bits
which are polled in the driver won't ever change.
This patch prevents the kernel from getting stuck in this case.
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave has reported the following lockdep splat:
=================================
[ INFO: inconsistent lock state ]
3.11.0-rc1+ #9 Not tainted
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/49 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&mapping->i_mmap_mutex){+.+.?.}, at: [<c114971b>] page_referenced+0x87/0x5e3
{RECLAIM_FS-ON-W} state was registered at:
mark_held_locks+0x81/0xe7
lockdep_trace_alloc+0x5e/0xbc
__alloc_pages_nodemask+0x8b/0x9b6
__get_free_pages+0x20/0x31
get_zeroed_page+0x12/0x14
__pmd_alloc+0x1c/0x6b
huge_pmd_share+0x265/0x283
huge_pte_alloc+0x5d/0x71
hugetlb_fault+0x7c/0x64a
handle_mm_fault+0x255/0x299
__do_page_fault+0x142/0x55c
do_page_fault+0xd/0x16
error_code+0x6c/0x74
irq event stamp: 3136917
hardirqs last enabled at (3136917): _raw_spin_unlock_irq+0x27/0x50
hardirqs last disabled at (3136916): _raw_spin_lock_irq+0x15/0x78
softirqs last enabled at (3136180): __do_softirq+0x137/0x30f
softirqs last disabled at (3136175): irq_exit+0xa8/0xaa
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&mapping->i_mmap_mutex);
<Interrupt>
lock(&mapping->i_mmap_mutex);
*** DEADLOCK ***
no locks held by kswapd0/49.
stack backtrace:
CPU: 1 PID: 49 Comm: kswapd0 Not tainted 3.11.0-rc1+ #9
Hardware name: Dell Inc. Precision WorkStation 490 /0DT031, BIOS A08 04/25/2008
Call Trace:
dump_stack+0x4b/0x79
print_usage_bug+0x1d9/0x1e3
mark_lock+0x1e0/0x261
__lock_acquire+0x623/0x17f2
lock_acquire+0x7d/0x195
mutex_lock_nested+0x6c/0x3a7
page_referenced+0x87/0x5e3
shrink_page_list+0x3d9/0x947
shrink_inactive_list+0x155/0x4cb
shrink_lruvec+0x300/0x5ce
shrink_zone+0x53/0x14e
kswapd+0x517/0xa75
kthread+0xa8/0xaa
ret_from_kernel_thread+0x1b/0x28
which is a false positive caused by hugetlb pmd sharing code which
allocates a new pmd from withing mapping->i_mmap_mutex. If this
allocation causes reclaim then the lockdep detector complains that we
might self-deadlock.
This is not correct though, because hugetlb pages are not reclaimable so
their mapping will be never touched from the reclaim path.
The patch tells lockup detector that hugetlb i_mmap_mutex is special by
assigning it a separate lockdep class so it won't report possible
deadlocks on unrelated mappings.
[peterz@infradead.org: comment for annotation]
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a BUG which can trigger when direct-IO is used with AOE.
As discussed previously, the fact that some users of the block layer
provide bios that point to pages with a zero _count means that it is not
OK for the network layer to do a put_page on the skb frags during an
skb_linearize, so the aoe driver gets a reference to pages in bios and
puts the reference before ending the bio. And because it cannot use
get_page on a page with a zero _count, it manipulates the value
directly.
It is not OK to increment the _count of a compound page tail, though,
since the VM layer will VM_BUG_ON a non-zero _count. Block users that
do direct I/O can result in the aoe driver seeing compound page tails in
bios. In that case, the same logic works as long as the head of the
compound page is used instead of the tails. This patch handles compound
pages and does not BUG.
It relies on the block layer user leaving the relationship between the
page tail and its head alone for the duration between the submission of
the bio and its completion, whether successful or not.
Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix inadvertent breakage in the clone syscall ABI for Microblaze that
was introduced in commit f3268edbe6 ("microblaze: switch to generic
fork/vfork/clone").
The Microblaze syscall ABI for clone takes the parent tid address in the
4th argument; the third argument slot is used for the stack size. The
incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
slot.
This commit restores the original ABI so that existing userspace libc
code will work correctly.
All kernel versions from v3.8-rc1 were affected.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry. Thus when #pf happens on such non-present pte
we can restore it back.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.
To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out. When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.
One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.
Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
struct memcg_cache_params has a union. Different parts of this union
are used for root and non-root caches. A part with destroying work is
used only for non-root caches.
I fixed the same problem in another place v3.9-rc1-16204-gf101a94, but
didn't notice this one.
This patch fixes the kernel panic:
[ 46.848187] BUG: unable to handle kernel paging request at 000000fffffffeb8
[ 46.849026] IP: [<ffffffff811a484c>] kmem_cache_destroy_memcg_children+0x6c/0xc0
[ 46.849092] PGD 0
[ 46.849092] Oops: 0000 [#1] SMP
...
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: <stable@vger.kernel.org> [3.9.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Initially I improperly set a boundary for maximum number of input
packets to process on NAPI poll ("work") so it might be more than
expected amount ("weight").
This was really harmless but seeing WARN_ON_ONCE on every device boot is
not nice. So trivial fix ("<" instead of "<=") is here.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Mischa Jonker <mjonker@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull scheduler fixes from Ingo Molnar:
"Docbook fixes that make 99% of the diffstat, plus a oneliner fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks
sched: Fix some kernel-doc warnings
Pull perf fixes from Ingo Molnar:
"Two small fixlets"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Add Haswell ULT model number used in Macbook Air and other systems
perf/x86: Fix intel QPI uncore event definitions
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.
Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)
CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov says:
====================
Please consider applying the series of bnx2x fixes to net:
* statistics may cause FW assert
* missing fairness configuration in DCB flow
* memory leak in sriov related part
* Illegal PTE access
* Pagefault crash in shutdown flow with cnic
v1->v2
* fixed sparse error pointed by Joe Perches
* added missing signed-off from Sergei Shtylyov
v2->v3
* added missing signed-off from Sergei Shtylyov
* fixed formatting from Sergei Shtylyov
v3->v4
* patch 1/6: fixed declaration order
* patch 2/6 replaced with: protect flows using set_bit constraints
v4->v5
* patch 2/6: replace proprietary locking with semaphore
* droped 1/6: since adds redundant code from Benjamin Poirier
The following patchset contains four netfilter fixes, they are:
* Fix possible invalid access and mangling of the TCPMSS option in
xt_TCPMSS. This was spotted by Julian Anastasov.
* Fix possible off by one access and mangling of the TCP packet in
xt_TCPOPTSTRIP, also spotted by Julian Anastasov.
* Fix possible information leak due to missing initialization of one
padding field of several structures that are included in nfqueue and
nflog netlink messages, from Dan Carpenter.
* Fix TCP window tracking with Fast Open, from Yuchung Cheng.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There might be a crash as during shutdown flow CNIC might try
to access resources already freed by bnx2x.
Change bnx2x_close() into dev_close() in __bnx2x_remove (shutdown flow)
to guarantee CNIC is notified of the device's change of status.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTE write access error might occur in MF_ALLOWED mode when IOMMU
is active. The patch adds rmmod HSI indicating to MFW to stop
running queries which might trigger this failure.
Signed-off-by: Barak Witkowsky <barak@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ETS can be enabled as a result of DCB negotiation, then
fairness must be recalculated after each negotiation.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add locking to protect different statistics flows from
running simultaneously.
This in order to serialize statistics requests sent to FW,
otherwise two outstanding queries may cause FW assert.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pm_qos_update_request_timeout() updates a qos and then schedules
a delayed work item to bring the qos back down to the default
after the timeout. When the work item runs, pm_qos_work_fn() will
call pm_qos_update_request() and deadlock because it tries to
cancel itself via cancel_delayed_work_sync(). Future callers of
that qos will also hang waiting to cancel the work that is
canceling itself. Let's extract the little bit of code that does
the real work of pm_qos_update_request() and call it from the
work function so that we don't deadlock.
Before ed1ac6e (PM: don't use [delayed_]work_pending()) this didn't
happen because the work function wouldn't try to cancel itself.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove Andrew Gallatin, as he is no longer with Myricom. Add
Hyong-Youb Kim as the new maintainer. Update the website URL.
Signed-off-by: Hyong-Youb Kim <hykim@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA sync should sync the whole receive buffer, not just
part of it. Fixes log messages dma_sync_check.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Architectures should fully validate whether kexec is possible as part of
machine_kexec_prepare(), so that user-space's kexec_load() operation can
report any problems. Performing validation in machine_kexec() itself is
too late, since it is not allowed to return.
Prior to this patch, ARM's machine_kexec() was testing after-the-fact
whether machine_kexec_prepare() was able to disable all but one CPU.
Instead, modify machine_kexec_prepare() to validate all conditions
necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
succeeded, yet disabling the CPUs didn't actually work.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 15e7e5c1eb ("ARM: 7749/1: spinlock: retry trylock operation if
strex fails on free lock") modifying our arch_spin_trylock to retry the
acquisition if the lock appeared uncontended, but the strex failed.
This patch does the same for rwlocks, which were missed by the original
patch.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The res variable is written before we've finished with the input
operands (namely the lock address), so ensure that we mark it as `early
clobber' to avoid unintended register sharing.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
It is possible to construct an event group with a software event as a
group leader and then subsequently add a hardware event to the group.
This results in the event group being validated by adding all members
of the group to a fake PMU and attempting to allocate each event on
their respective PMU.
Unfortunately, for software events wthout a corresponding arm_pmu, this
results in a kernel crash attempting to dereference the ->get_event_idx
function pointer.
This patch fixes the problem by checking explicitly for software events
and ignoring those in event validation (since they can always be
scheduled). We will probably want to revisit this for 3.12, since the
validation checks don't appear to work correctly when dealing with
multiple hardware PMUs anyway.
Cc: <stable@vger.kernel.org>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The ISP clock registers belong to the ISP power domain and may change
their values if this power domain is switched off/on. Add
CLK_GET_RATE_NOCACHE flags to ensure we do not rely on invalid cached
data when setting or getting frequency of those clocks.
Without this fix the FIMC-IS Cortex-A5 core and AXI bus clocks have
incorrect frequencies, which breaks the ISP operation and starting the
video pipeline fails with timeouts reported by the FIMC-IS firmware.
See related commit 722a860ecb "[media]
exynos4-is: Fix FIMC-IS clocks initialization" for more details.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>