Currently we clear kasan_zero_page before __flush_tlb_all(). This
works with current implementation of native_flush_tlb[_global]()
because it doesn't cause do any writes to kasan shadow memory.
But any subtle change made in native_flush_tlb*() could break this.
Also current code seems doesn't work for paravirt guests (lguest).
Only after the TLB flush we can be sure that kasan_zero_page is not
used as early shadow anymore (instrumented code will not write to it).
So it should cleared it only after the TLB flush.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.
v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
cause us to check for the flow label.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
snd_timer_user_read() has a potential race among parallel reads, as
qhead and qused are updated outside the critical section due to
copy_to_user() calls. Move them into the critical section, and also
sanitize the relevant code a bit.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
sound/firewire/digi00x/amdtp-dot.c:67: warning: type qualifiers ignored on function return type
Drop the bogus "const" type qualifier on the return type of dot_scrt()
to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The hda_jack_tbl entries are managed by snd_array for allowing
multiple jacks. It's good per se, but the problem is that struct
hda_jack_callback keeps the hda_jack_tbl pointer. Since snd_array
doesn't preserve each pointer at resizing the array, we can't keep the
original pointer but have to deduce the pointer at each time via
snd_array_entry() instead. Actually, this resulted in the deference
to the wrong pointer on codecs that have many pins such as CS4208.
This patch replaces the pointer to the NID value as the search key.
As an unexpected good side effect, this even simplifies the code, as
only NID is needed in most cases.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
A slave timer element also unlinks at snd_timer_stop() but it takes
only slave_active_lock. When a slave is assigned to a master,
however, this may become a race against the master's interrupt
handling, eventually resulting in a list corruption. The actual bug
could be seen with a syzkaller fuzzer test case in BugLink below.
As a fix, we need to take timeri->timer->lock when timer isn't NULL,
i.e. assigned to a master, while the assignment to a master itself is
protected by slave_active_lock.
BugLink: http://lkml.kernel.org/r/CACT4Y+Y_Bm+7epAb=8Wi=AaWd+DYS7qawX52qxdCfOfY49vozQ@mail.gmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Commit ae46113196 ("of: of_mdio: Add a whitelist of PHY
compatibilities.") missed one compatible string used in in-tree DTBs:
in OCTEON, for selected boards, the kernel DTB pruning code will overwrite
the DTB compatible string with "marvell,88e1145", which is missing
from the whitelist. Add it.
The patch fixes broken networking on EdgeRouter Lite.
Fixes: ae46113196 ("of: of_mdio: Add a whitelist of PHY compatibilities.")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
check_prev_add() caches saved stack trace in static trace variable
to avoid duplicate save_trace() calls in dependencies involving trylocks.
But that caching logic contains a bug. We may not save trace on first
iteration due to early return from check_prev_add(). Then on the
second iteration when we actually need the trace we don't save it
because we think that we've already saved it.
Let check_prev_add() itself control when stack is saved.
There is another bug. Trace variable is protected by graph lock.
But we can temporary release graph lock during printing.
Fix this by invalidating cached stack trace when we release graph lock.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: glider@google.com
Cc: kcc@google.com
Cc: peter@hurleysoftware.com
Cc: sasha.levin@oracle.com
Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Without this, using SOCK_DESTROY in enforcing mode results in:
SELinux: unrecognized netlink message type=21 for sclass=32
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit ed5a377d87 ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.
We fix it by changing hmacids to host order when users get them with
getsockopt.
Fixes: Commit ed5a377d87 ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware posts the devcmd result in result ring. In case of timeout, driver
does not increment the current result pointer and firmware could post the
result after timeout has occurred. During next devcmd, driver would be
reading the result of previous devcmd.
Fix this by incrementing result even in case of timeout.
Fixes: 373fb0873d ("enic: add devcmd2")
Signed-off-by: Sandeep Pillai <sanpilla@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.
tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.
This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.
v2: Corrected patch description to avoid confusion.
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.
This is of course incorrect.
A specific list of ICMP messages should be able to drop a SYN_RECV.
For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: Petr Novopashenniy <pety@rusnet.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 29bb45f25f (regmap-mmio: Use native endianness for read/write)
attempted to fix some long standing bugs in the MMIO implementation for
big endian systems caused by duplicate byte swapping in both regmap and
readl()/writel() which affected MIPS systems as when they are in big
endian mode they flip the endianness of all registers in the system, not
just the CPU. MIPS systems had worked around this by declaring regmap
using IPs as little endian which is inaccurate, unfortunately the issue
had not been reported.
Sadly the fix makes things worse rather than better. By changing the
behaviour to match the documentation it caused behaviour changes for
other IPs which broke them and by using the __raw I/O accessors to avoid
the endianness swapping in readl()/writel() it removed some memory
ordering guarantees and could potentially generate unvirtualisable
instructions on some architectures.
Unfortunately sorting out all this mess in any half way sensible fashion
was far too invasive to go in during an -rc cycle so instead let's go
back to the old broken behaviour for v4.5, the better fixes are already
queued for v4.6. This does mean that we keep the broken MIPS DTs for
another release but that seems the least bad way of handling the
situation.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWtIjbAAoJECTWi3JdVIfQs8QH/jNpfio4klDkdlH/KpPZXlrp
FzASbGePNtLqZXFL5WcG//ni3EYdbaiXZIdLBKDx9K4F2ca9FAF8aAnZAZ5uefGx
bnloYpV34DqQwS5f9FrrNsm+YVTTuUIt0dx4ZRGCEdMTzW7i3efs/9eVEITUixK6
U1obTJovAl33bihadsC9hzJVwfOq3H4aFFWc/EFZzbQaU2/so2eiA1dhPr/YErRJ
dMR8drWxpYXuBsrk5T647R0sUw7pA4Zw+WAF032TPQf/1Fy9Vk1/yXbTyccZzFzo
bfupRA/HpeLNZ9cN9z9y3Fa0je4UNbBZh0poB5B773af84NnhX7Ytenjo+peVxI=
=+Q6E
-----END PGP SIGNATURE-----
Merge tag 'regmap-fix-v4.5-big-endian' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A single revert back to v4.4 endianness handling.
Commit 29bb45f25f ("regmap-mmio: Use native endianness for
read/write") attempted to fix some long standing bugs in the MMIO
implementation for big endian systems caused by duplicate byte
swapping in both regmap and readl()/writel(). Sadly the fix makes
things worse rather than better, so revert it for now"
* tag 'regmap-fix-v4.5-big-endian' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: mmio: Revert to v4.4 endianness handling
Fix the doubled "started" and tidy up the following sentences.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When trying to set the ICST 307 clock to 25174000 Hz I ran into
this arithmetic error: the icst_hz_to_vco() correctly figure out
DIVIDE=2, RDW=100 and VDW=99 yielding a frequency of
25174000 Hz out of the VCO. (I replicated the icst_hz() function
in a spreadsheet to verify this.)
However, when I called icst_hz() on these VCO settings it would
instead return 4122709 Hz. This causes an error in the common
clock driver for ICST as the common clock framework will call
.round_rate() on the clock which will utilize icst_hz_to_vco()
followed by icst_hz() suggesting the erroneous frequency, and
then the clock gets set to this.
The error did not manifest in the old clock framework since
this high frequency was only used by the CLCD, which calls
clk_set_rate() without first calling clk_round_rate() and since
the old clock framework would not call clk_round_rate() before
setting the frequency, the correct values propagated into
the VCO.
After some experimenting I figured out that it was due to a simple
arithmetic overflow: the divisor for 24Mhz reference frequency
as reference becomes 24000000*2*(99+8)=0x132212400 and the "1"
in bit 32 overflows and is lost.
But introducing an explicit 64-by-32 bit do_div() and casting
the divisor into (u64) we get the right frequency back, and the
right frequency gets set.
Tested on the ARM Versatile.
Cc: stable@vger.kernel.org
Cc: linux-clk@vger.kernel.org
Cc: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In snd_timer_notify1(), the wrong timer instance was passed for slave
ccallback function. This leads to the access to the wrong data when
an incompatible master is handled (e.g. the master is the sequencer
timer and the slave is a user timer), as spotted by syzkaller fuzzer.
This patch fixes that wrong assignment.
BugLink: http://lkml.kernel.org/r/CACT4Y+Y_Bm+7epAb=8Wi=AaWd+DYS7qawX52qxdCfOfY49vozQ@mail.gmail.com
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Enable vce and uvd pg based on single set of pg flags.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable vce and uvd pg based on single set of pg flags.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Needed to pass the cg and pg info to powerplay.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't do anything if the uvd cg flags are not set.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It was already disabled elsewhere, make it offical.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't attempt to start/stop the vce block if pg is disabled.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Don't attempt to start/stop the uvd block if pg is disabled.
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We already query this at driver init, so use that info. Also
handles virtualization cases.
Reviewed-by: monk liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pcie registers may not be available in a virtualized
environment.
Reviewed-by: monk liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allows the user to force the supported pcie gen and lane
config on both the asic and the chipset.
Useful for debugging pcie problems and for virtualization
where we may not be able to query the pcie bridge caps.
Default to:
gen: chipset 1/2, asic 1/2/3
lanes: 1/2/4/8/16
v2: fix bare metal case
Reviewed-by: monk liu <monk.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Silence lockdep false positive about rcu_dereference() being
used in the wrong context.
First one should use rcu_dereference_protected() as we own the spinlock.
Second one should be a normal assignation, as no barrier is needed.
Fixes: 18367681a1 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.
To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.
Fixes: 712f4aad40 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
A few random fixes, mostly coming from the PMU work by Shannon:
- fix for injecting faults coming from the guest's userspace
- cleanup for our CPTR_EL2 accessors (reserved bits)
- fix for a bug impacting perf (user/kernel discrimination)
- fix for a 32bit sysreg handling bug
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWqehPAAoJECPQ0LrRPXpDn6gP/2lrJ9lV5I3MxLzUytmRY8EM
Xl8WnNEQJ0e7oEdb1l6k4DR8D/HefzXpp/YWHY1WdDZSej0b2egro1xsFWdgaOr9
NVGJnoQBlCFqSIf2szml4ftpHXZZ/kMF/EvhtzEL6cpUdqeA/tkS6HoCMQknhCbx
3zOYnNKCGQUkFhTKJUSXB6NcZ/950uqkQxAdCPNUTGg1YzkNfbcgTewqKsmb25Cv
/sOUFmrq2AlnWkdH+QWP0BtNFUX9saOSXvxrABT6nfiXSpUeF6Rprcgi9gdoNhAD
mfE5IFw0dOEo2XThZTchKu3FBSMAkDadvC9yWFr88dr62E6EKFPzY3vHLCA8QoT/
zk5beGSjyWGe7FZZJ4CKdO4EWBZr/WSlSVzOfG4ZBVPUoh2AZcUEhzzrzTezzocO
71/5ZVpQ6O8+Pxwyy85Vd2drf7OZLagGNydNx46RHXrRxl+q0c5vFTVh4Txbd4YU
XNsd+kA62/OYyPHbtVzTzAPPKG7aM8hLzdy8dkTgvuDzWHmxFWhD/HgiMHfFrQqs
WCafvBhTc4375dvwYOupxaU2ncHKvt/zQJtBOw6bEwAIUa5c1IkIUr0i8XgRq6lr
x/YvhFIwiVyXVnrDt3ZSIx79Oajf541uJg7vLFyPBQkcnQlJ6T7oy7qJlqhM0567
Sr6G0/YXa1ccIfmKyeh4
=36kx
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM fixes for v4.5-rc2
A few random fixes, mostly coming from the PMU work by Shannon:
- fix for injecting faults coming from the guest's userspace
- cleanup for our CPTR_EL2 accessors (reserved bits)
- fix for a bug impacting perf (user/kernel discrimination)
- fix for a 32bit sysreg handling bug
The match module lacked module license and description, so add it
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
DPCM driver is recommended for BYT, CHT based platforms, so if
CONFIG_SND_SST_IPC_ACPI is selected then don't compile the BYT
Device IDs in common ACPI driver to avoid probe conflicts.
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Acked-by: Jie Yang <yang.jie@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
The ACPI match module is common to all three drivers, HSW, SKL
and Atom-DPCM driver. But Atom-DPCM driver does not use common
sst code so we cannot include the common SST module in Atom-DPCM
driver.
So the solution is to have a independent sst-match-acpi module
which helps in matching for all the three drivers. Now all driver
can be inbuilt in a single image
This patch really fixes the regression introduced by the
commit 95f0980148 ("ASoC: Intel: Move apci find machine routines")
Acked-by: Jie Yang <yang.jie@intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
This reverts commit dc901a3541 ("ASoC: Intel: fix ACPI probe
regression with Atom DPCM driver") as the fix prevented the probe
on HSW/BDW if Atom-DPCM was selected
Acked-by: Jie Yang <yang.jie@intel.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When the gpio driver is probed after the mmc one, the read/write gpio
and card detection one return -EPROBE_DEFER. Unfortunately, the memory
region remains requested, and upon the next probe, the probe will fail
anyway with -EBUSY.
Fix this by releasing the memory resource upon probe failure.
More broadly, this patch uses devm_*() primitives whenever possible in
the probe function.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
There is no checks for dma mapping errors in mmc_spi.
Tha patch fixes that and by the way it adds dma_unmap_single(ones_dma)
that was left on a failure path mmc_spi_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
numa_clear_kernel_node_hotplug() uses memblock_set_node() without
checking for failures.
memblock_set_node() is a complex function that might extend the
memblock array - which extension might fail - so check for this
possibility.
It's not supposed to happen (because realistically if we have so
little memory that this fails then we likely won't be able to
boot anyway), but do the check nevertheless.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: y14sg1 <y14sg1@comcast.net>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So we fixed an overflow bug in numa_clear_kernel_node_hotplug():
2b54ab3c66d4 ("x86/mm/numa: Fix memory corruption on 32-bit NUMA kernels")
... and the bug was indirectly caused by poor coding style,
such as using start/end local variables unnecessarily, which
lost the physaddr_t type.
So make the code more readable and try to fully comment all
the thinking behind the logic.
No change in functionality.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: y14sg1 <y14sg1@comcast.net>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
a0acda9172 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable")
Introduced numa_clear_kernel_node_hotplug(), which function is executed
during early bootup, and which marks all currently reserved memblock
regions as hot-memory-unswappable as well.
y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels,
the grsecurity/PAX kernel patch flagged a size overflow in this function:
PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...]
... the reason for the overflow is that memblock_clear_hotplug() takes physical
addresses as arguments, while the start/end variables used by
numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE
kernels, but which has 64-bit physical addresses.
So on 32-bit PAE kernels that have physical memory above the 4GB boundary,
we truncate a 64-bit physical address range to 32 bits and pass it to
memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug
bugfix from working, but might have other side effects as well.
The fix is to use the proper type to handle physical addresses, phys_addr_t.
Reported-by: y14sg1 <y14sg1@comcast.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>