The cached value of the last selected channel prevents retries on the
next call, even on failure to update the selected channel. Fix that.
Signed-off-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
For the DSMs where the kernel knows the format of the output buffer and
originates those DSMs from within the kernel, return -EIO for any
non-zero status. If the BIOS is indicating a status that we do not know
how to handle, fail the DSM.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The internal alloc_nvdimm_map() helper might fail, particularly if the
memory region is already busy. Report request_mem_region() failures and
check for the failure.
Reported-by: Ryan Chen <ryan.chan105@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
the eg20t driver call request_irq() function before the pch_base_address,
base address of i2c controller's register, is assigned an effective value.
there is one possible scenario that an interrupt which isn't inside eg20t
arrives immediately after request_irq() is executed when i2c controller
shares an interrupt number with others. since the interrupt handler
pch_i2c_handler() has already active as shared action, it will be called
and read its own register to determine if this interrupt is from itself.
At that moment, since base address of i2c registers is not remapped
in kernel space yet,so the INT handler will access an illegal address
and then a error occurs.
Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org
The page structures associated with the vDSO pages in the kernel image
are calculated using virt_to_page(), which uses __pa() under the hood to
find the pfn associated with the virtual address. The vDSO data pointers
however point to kernel symbols, so __pa_symbol() should really be used
instead.
Since there is no equivalent to virt_to_page() which uses __pa_symbol(),
fix init_vdso_image() to work directly with pfns, calculated with
__phys_to_pfn(__pa_symbol(...)).
This issue broke the Malta Enhanced Virtual Addressing (EVA)
configuration which has a non-default implementation of __pa_symbol().
This is because it uses a physical alias so that the kernel executes
from KSeg0 (VA 0x80000000 -> PA 0x00000000), while RAM is provided to
the kernel in the KUSeg range (VA 0x00000000 -> PA 0x80000000) which
uses the same underlying RAM.
Since there are no page structures associated with the low physical
address region, some arbitrary kernel memory would be interpreted as a
page structure for the vDSO pages and badness ensues.
Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/14229/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Configure the transmitter delay register at +0x1c to correctly handle
the CAN FD bitrate switch (BRS). This moves the SSP (secondary sample
point) to a proper offset, so that the TDC mechanism works and won't
generate error frames on the CAN link.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Since commit 1625f45299, vti6 is broken, all input packets are dropped
(LINUX_MIB_XFRMINNOSTATES is incremented).
XFRM_TUNNEL_SKB_CB(skb)->tunnel.ip6 is set by vti6_rcv() before calling
xfrm6_rcv()/xfrm6_rcv_spi(), thus we cannot set to NULL that value in
xfrm6_rcv_spi().
A new function xfrm6_rcv_tnl() that enables to pass a value to
xfrm6_rcv_spi() is added, so that xfrm6_rcv() is not touched (this function
is used in several handlers).
CC: Alexey Kodanev <alexey.kodanev@oracle.com>
Fixes: 1625f45299 ("net/xfrm_input: fix possible NULL deref of tunnel.ip6->parms.i_key")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When I introduced the lastuse member I made a subtle error because it was
returned as an absolute value but that is meaningless to user-space as it
doesn't allow to see how old exactly an entry is. Let's make it similar to
how the bridge returns such values and make it relative to "now" (jiffies).
This allows us to show the actual age of the entries and is much more
useful (e.g. user-space daemons can age out entries, iproute2 can display
the lastuse properly).
Fixes: 43b9e12740 ("net: ipmr/ip6mr: add support for keeping an entry age")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hayes Wang says:
====================
r8152: correct the flow of PHY
First, to enable the PHY as early as possible. Some settings may fail if the
PHY is power down.
Move the other PHY settings to hw_phy_cfg() to make sure the order is correct.
Finally, disable ALDPS and EEE before updating the PHY for RTL8153.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Disable ALDPS and EEE to avoid the possible failure when setting the PHY.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the PHY relative settings together to hw_phy_cfg().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move enabling PHY to init(), otherwise some other settings may fail.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the following functions forward.
r8152_mmd_indirect()
r8152_mmd_read()
r8152_mmd_write()
r8152_eee_en()
r8152b_enable_eee()
r8153_eee_en()
r8153_enable_eee()
r8152b_enable_fc()
r8153_aldps_en()
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were missing check for 25G and 100G while checking port speed,
which lead to less number of queues getting allocated for 25G & 100G
adapters and leading to low throughput. Adding the missing check for
both NIC and vNIC driver.
Also fixes port advertisement for 25G and 100G in ethtool output.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5958d19a14 checks for prefetchable m64 BARs by comparing the
addresses instead of using resource flags. This broke SR-IOV as the m64
check in pnv_pci_ioda_fixup_iov_resources() fails.
The condition in pnv_pci_window_alignment() also changed to checking
only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
IORESOURCE_PREFETCH.
Revert these cases to the previous behaviour, adding a new helper function
to do so. This is named pnv_pci_is_m64_flags() to make it clear this
function is only looking at resource flags and should not be relied on for
non-SRIOV resources.
Fixes: 5958d19a14 ("Fix incorrect PE reservation attempt on some 64-bit BARs")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJX3/GcAAoJED07qiWsqSVq9PgH/jucrHh2H7IAnJi6ZasZmu1t
+beOkzg954rJfz2cGulhpNBAl9xJYRhPuJ+Jw5JhDEPXVisgeRVm6lT+8YJaCsWL
utTotzyRYI3HUrXIl6V2rjn1ms7TAlZsHrAGPj1ufJKEZg8/SF5C15NuY9YTlh4d
JZT0V13Yxj0CV3d2jqZshFXrF+CiAsGyQF1aGA9Wo2xQeZIt2vY/1DfNtC9eXlXW
+t7tBCgLka7cwTBC513BKjl9fGwzYO0eYU993N9+Ijir6HLM9N3igZwxjH1hEl/S
ow6n5T1O89JEnghSN5Xq+2ThkkXR7zZwi+6l74eFICkIwHK/pekZ8dPr9Ct9ZZ0=
=JcjG
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-4.8-20160919' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2016-09-19
this is a pull request of one patch for the upcoming linux-4.8 release.
The patch by Fabio Estevam fixes the pm handling in the flexcan driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around. Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).
However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().
Since any wraparound means EFAULT there, the fix is trivial - turn
those
while (uaddr <= end)
...
into
if (unlikely(uaddr > end))
return -EFAULT;
do
...
while (uaddr <= end);
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While running a compile on arm64, I hit a memory exposure
usercopy: kernel memory exposure attempt detected from fffffc0000f3b1a8 (buffer_head) (1 bytes)
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:75!
Internal error: Oops - BUG: 0 [#1] SMP
Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
llc ebtable_nat ip6table_security ip6table_raw ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
iptable_security iptable_raw iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
ebtable_filter ebtables ip6table_filter ip6_tables vfat fat xgene_edac
xgene_enet edac_core i2c_xgene_slimpro i2c_core at803x realtek xgene_dma
mdio_xgene gpio_dwapb gpio_xgene_sb xgene_rng mailbox_xgene_slimpro nfsd
auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sdhci_of_arasan
sdhci_pltfm sdhci mmc_core xhci_plat_hcd gpio_keys
CPU: 0 PID: 19744 Comm: updatedb Tainted: G W 4.8.0-rc3-threadinfo+ #1
Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.12 Aug 12 2016
task: fffffe03df944c00 task.stack: fffffe00d128c000
PC is at __check_object_size+0x70/0x3f0
LR is at __check_object_size+0x70/0x3f0
...
[<fffffc00082b4280>] __check_object_size+0x70/0x3f0
[<fffffc00082cdc30>] filldir64+0x158/0x1a0
[<fffffc0000f327e8>] __fat_readdir+0x4a0/0x558 [fat]
[<fffffc0000f328d4>] fat_readdir+0x34/0x40 [fat]
[<fffffc00082cd8f8>] iterate_dir+0x190/0x1e0
[<fffffc00082cde58>] SyS_getdents64+0x88/0x120
[<fffffc0008082c70>] el0_svc_naked+0x24/0x28
fffffc0000f3b1a8 is a module address. Modules may have compiled in
strings which could get copied to userspace. In this instance, it
looks like "." which matches with a size of 1 byte. Extend the
is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
all possible cases.
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Since the device hierarchy domain was added by commit c98c1822ee
("irqchip/mips-gic: Add device hierarchy domain"), GIC local interrupts
have been broken.
Users attempting to setup a per-cpu local IRQ, for example the GIC timer
clock events code in drivers/clocksource/mips-gic-timer.c, the
setup_percpu_irq function would refuse with -EINVAL because the GIC
irqchip driver never called irq_set_percpu_devid so the
IRQ_PER_CPU_DEVID flag was never set for the IRQ. This happens because
irq_set_percpu_devid was being called from the gic_irq_domain_map
function which is no longer called.
Doing only that runs into further problems because gic_dev_domain_alloc
set the struct irq_chip for all interrupts, local or shared, to
gic_level_irq_controller despite that only being suitable for shared
interrupts. The typical outcome of this is that gic_level_irq_controller
callback functions are called for local interrupts, and then hwirq
number calculations overflow & the driver ends up attempting to access
some invalid register with an address calculated from an invalid hwirq
number. Best case scenario is that this then leads to a bus error. This
is fixed by abstracting the setup of the hwirq & chip to a new function
gic_setup_dev_chip which is used by both the root GIC IRQ domain & the
device domain.
Finally, decoding local interrupts failed because gic_dev_domain_alloc
only called irq_domain_alloc_irqs_parent for shared interrupts. Local
ones were therefore never associated with hwirqs in the root GIC IRQ
domain and the virq in gic_handle_local_int would always be 0. This is
fixed by calling irq_domain_alloc_irqs_parent unconditionally & having
gic_irq_domain_alloc handle both local & shared interrupts, which is
easy due to the aforementioned abstraction of chip setup into
gic_setup_dev_chip.
This fixes use of the MIPS GIC timer for clock events, which has been
broken since c98c1822ee ("irqchip/mips-gic: Add device hierarchy
domain") but hadn't been noticed due to a silent fallback to the MIPS
coprocessor 0 count/compare clock events device.
Fixes: c98c1822ee ("irqchip/mips-gic: Add device hierarchy domain")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Qais Yousef <qsyousef@gmail.com>
Cc: stable@vger.kernel.org
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/20160913165335.31389-1-paul.burton@imgtec.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We hit hardened usercopy feature check for kernel text access by reading
kcore file:
usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
kernel BUG at mm/usercopy.c:75!
Bypassing this check for kcore by adding bounce buffer for ktext data.
Reported-by: Steve Best <sbest@redhat.com>
Fixes: f5509cc18d ("mm: Hardened usercopy")
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Next patch adds bounce buffer for ktext area, so it's
convenient to have single bounce buffer for both
vmalloc/module and ktext cases.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
by type conversion errors in the x86 pat code - Matt Fleming
-----BEGIN PGP SIGNATURE-----
iQI2BAABCAAgBQJX4UkgGRxtYXR0QGNvZGVibHVlcHJpbnQuY28udWsACgkQLzhZ
wI0jPVWaoA/9EqgJqgXP5JSoV5EFzGA0uyE65J46hhmpqry0gnJ0Jv4lf9nWVShI
iYm3VRW2bIjF6sNKw/Z9ZFzhXY96bn3axlPwoMbwsuCJoWHwIG0dYkI+N4oQGcaR
7xX5hrnWeRPdU28wTZNyvqWBojXJcY2Cv3mg+/1/wByL7wABljcAxmr3a0Vk1j/R
UsAdWKz0AFYMnKKhfwcmIwTRDFXlzmCVNP1luycFNjVXALW7vHOIav7Wzn7ZMCfv
Nd6FYlabUm5Nq/o8o7QD/KniBj/0maurBq2jnbtV/Iim6OsMpN62AsXYY06QNtrp
IlrDb4RIYJa+Oi4VDpjoEZicBlVuEBBJaziS3bPMUlswkO2S/v7LvddIqqEm/K1V
rqLTpcx+O66LODs1GHz2hDd/T8I5lHsR8F9ggmboQj8kTze4kowpeN4kuTB91+oK
4gybkXiB7olrjdWCxPe2UE/2c325nHsSmuObDmfJjTxCn1zsbJY233oXLLfaiMgn
y4URazGH7m2kTAfjbwfJQKLMAz9AOItxaY4f2nIO6BM/ZaNM59CyGQUKU9muhecd
p/e8qJtyP4narwtDFWm5XMUybS3W46F5jAAux1AGi83vhpfbVjnvajm5wzeonTq3
pAS3NBWMuJ0xHQsq/zkI+S5bvzErsQKwV0p+tNhpKSb0n72tDaN+CYk=
=qMBL
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into efi/urgent
Pull EFI fixes from Matt Fleming:
* Fix a boot hang on large memory machines (multiple terabyte) caused
by type conversion errors in the x86 PAT code (Matt Fleming)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since commit 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
my box goes boom on boot:
| .... node #0, CPUs: #1#2#3#4#5#6#7
| BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
| IP: [<ffffffff8100c463>] intel_bts_interrupt+0x43/0x130
| Call Trace:
| <NMI> d [<ffffffff8100b341>] intel_pmu_handle_irq+0x51/0x4b0
| [<ffffffff81004d47>] perf_event_nmi_handler+0x27/0x40
This happens because the code introduced in this commit dereferences the
debug store pointer unconditionally. The debug store is not guaranteed to
be available, so a NULL pointer check as on other places is required.
Fixes: 4d4c474124 ("perf/x86/intel/bts: Fix BTS PMI detection")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: vince@deater.net
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/20160920131220.xg5pbdjtznszuyzb@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Waiman reported that booting with CONFIG_EFI_MIXED enabled on his
multi-terabyte HP machine results in boot crashes, because the EFI
region mapping functions loop forever while trying to map those
regions describing RAM.
While this patch doesn't fix the underlying hang, there's really no
reason to map EFI_CONVENTIONAL_MEMORY regions into the EFI page tables
when mixed-mode is not in use at runtime.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
There's a mixture of signed 32-bit and unsigned 32-bit and 64-bit data
types used for keeping track of how many pages have been mapped.
This leads to hangs during boot when mapping large numbers of pages
(multiple terabytes, as reported by Waiman) because those values are
interpreted as being negative.
commit 742563777e ("x86/mm/pat: Avoid truncation when converting
cpa->numpages to address") fixed one of those bugs, but there is
another lurking in __change_page_attr_set_clr().
Additionally, the return value type for the populate_*() functions can
return negative values when a large number of pages have been mapped,
triggering the error paths even though no error occurred.
Consistently use 64-bit types on 64-bit platforms when counting pages.
Even in the signed case this gives us room for regions 8PiB
(pebibytes) in size whilst still allowing the usual negative value
error checking idiom.
Reported-by: Waiman Long <waiman.long@hpe.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
CC: Theodore Ts'o <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Commit fe56b9e6a8 ("qed: Add module with basic common support")
has introduced a stack corruption during probe, where filling a
local struct with data to be sent to management firmware is incorrectly
filled; The data is written outside of the struct and corrupts
the stack.
Changes from v1:
----------------
- Correct the value written [Caught by David Laight]
Fixes: fe56b9e6a8 ("qed: Add module with basic common support")
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The core distributed switch architecture code currently does not have
a MAINTAINERS entry, which results in some contributions not landing
in the right peoples inbox.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8c14586fc3 ("net: ipv6: Use passed in table for nexthop
lookups") introduced a regression: insertion of an IPv6 route in a table
not containing the appropriate connected route for the gateway but which
contained a non-connected route (like a default gateway) fails while it
was previously working:
$ ip link add eth0 type dummy
$ ip link set up dev eth0
$ ip addr add 2001:db8::1/64 dev eth0
$ ip route add ::/0 via 2001:db8::5 dev eth0 table 20
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
RTNETLINK answers: No route to host
$ ip -6 route show table 20
default via 2001:db8::5 dev eth0 metric 1024 pref medium
After this patch, we get:
$ ip route add 2001:db8:cafe::1/128 via 2001:db8::6 dev eth0 table 20
$ ip -6 route show table 20
2001:db8:cafe::1 via 2001:db8::6 dev eth0 metric 1024 pref medium
default via 2001:db8::5 dev eth0 metric 1024 pref medium
Fixes: 8c14586fc3 ("net: ipv6: Use passed in table for nexthop lookups")
Signed-off-by: Vincent Bernat <vincent@bernat.im>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
mlx5 fixes to 4.8-rc6
This series series has a fix from Roi to memory corruption bug in
the bulk flow counters code and two late and hopefully last fixes
from me to the new eswitch offloads code.
Series done over net commit 37dd348 "bna: fix crash in bnad_get_strings()"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
E-switch mode changes involve creating HW tables, potentially allocating
netdevices, etc, and things can fail. Add an attempt to rollback to the
existing mode when changing to the new mode fails. Only if rollback fails,
getting proper SRIOV functionality requires module unload or sriov
disablement/enablement.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enablement of the SRIOV e-switch in certain mode (switchdev or legacy)
fails, we must set the mode to none. Otherwise, we'll run into double free
based crashes when further attempting to deal with the e-switch (such
as when disabling sriov or unloading the driver).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FW command output length should be only the length of struct
mlx5_cmd_fc_bulk out field. Failing to do so will cause the memcpy
call which is invoked later in the driver to write over wrong memory
address and corrupt kernel memory which results in random crashes.
This bug was found using the kernel address sanitizer (kasan).
Fixes: a351a1b03b ('net/mlx5: Introduce bulk reading of flow counters')
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gic_raise_softirq() walks the list of cpus using for_each_cpu(), it calls
gic_compute_target_list() which advances the iterator by the number of
CPUs in the cluster.
If gic_compute_target_list() reaches the last CPU it leaves the iterator
pointing at the last CPU. This means the next time round the for_each_cpu()
loop cpumask_next() will be called with an invalid CPU.
This triggers a warning when built with CONFIG_DEBUG_PER_CPU_MAPS:
[ 3.077738] GICv3: CPU1: found redistributor 1 region 0:0x000000002f120000
[ 3.077943] CPU1: Booted secondary processor [410fd0f0]
[ 3.078542] ------------[ cut here ]------------
[ 3.078746] WARNING: CPU: 1 PID: 0 at ../include/linux/cpumask.h:121 gic_raise_softirq+0x12c/0x170
[ 3.078812] Modules linked in:
[ 3.078869]
[ 3.078930] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.8.0-rc5+ #5188
[ 3.078994] Hardware name: Foundation-v8A (DT)
[ 3.079059] task: ffff80087a1a0080 task.stack: ffff80087a19c000
[ 3.079145] PC is at gic_raise_softirq+0x12c/0x170
[ 3.079226] LR is at gic_raise_softirq+0xa4/0x170
[ 3.079296] pc : [<ffff0000083ead24>] lr : [<ffff0000083eac9c>] pstate: 200001c9
[ 3.081139] Call trace:
[ 3.081202] Exception stack(0xffff80087a19fbe0 to 0xffff80087a19fd10)
[ 3.082269] [<ffff0000083ead24>] gic_raise_softirq+0x12c/0x170
[ 3.082354] [<ffff00000808e614>] smp_send_reschedule+0x34/0x40
[ 3.082433] [<ffff0000080e80a0>] resched_curr+0x50/0x88
[ 3.082512] [<ffff0000080e89d0>] check_preempt_curr+0x60/0xd0
[ 3.082593] [<ffff0000080e8a60>] ttwu_do_wakeup+0x20/0xe8
[ 3.082672] [<ffff0000080e8bb8>] ttwu_do_activate+0x90/0xc0
[ 3.082753] [<ffff0000080ea9a4>] try_to_wake_up+0x224/0x370
[ 3.082836] [<ffff0000080eabc8>] default_wake_function+0x10/0x18
[ 3.082920] [<ffff000008103134>] __wake_up_common+0x5c/0xa0
[ 3.083003] [<ffff0000081031f4>] __wake_up_locked+0x14/0x20
[ 3.083086] [<ffff000008103f80>] complete+0x40/0x60
[ 3.083168] [<ffff00000808df7c>] secondary_start_kernel+0x15c/0x1d0
[ 3.083240] [<00000000808911a4>] 0x808911a4
[ 3.113401] Detected PIPT I-cache on CPU2
Avoid updating the iterator if the next call to cpumask_next() would
cause the for_each_cpu() loop to exit.
There is no change to gic_raise_softirq()'s behaviour, (cpumask_next()s
eventual call to _find_next_bit() will return early as start >= nbits),
this patch just silences the warning.
Fixes: 021f653791 ("irqchip: gic-v3: Initial support for GICv3")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1474306155-3303-1-git-send-email-james.morse@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge fixes from Andrew Morton:
"20 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rapidio/rio_cm: avoid GFP_KERNEL in atomic context
Revert "ocfs2: bump up o2cb network protocol version"
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
cgroup: duplicate cgroup reference when cloning sockets
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
ocfs2: fix double unlock in case retry after free truncate log
fanotify: fix list corruption in fanotify_get_response()
fsnotify: add a way to stop queueing events on group shutdown
ocfs2: fix trans extend while free cached blocks
ocfs2: fix trans extend while flush truncate log
ipc/shm: fix crash if CONFIG_SHMEM is not set
mm: fix the page_swap_info() BUG_ON check
autofs: use dentry flags to block walks during expire
MAINTAINERS: update email for VLYNQ bus entry
mm: avoid endless recursion in dump_page()
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
khugepaged: fix use-after-free in collapse_huge_page()
MAINTAINERS: Maik has moved
ocfs2/dlm: fix race between convert and migration
mem-hotplug: don't clear the only node in new_node_page()
As reported by Alexey Khoroshilov (https://lkml.org/lkml/2016/9/9/737):
riocm_send_close() is called from rio_cm_shutdown() under
spin_lock_bh(idr_lock), but riocm_send_close() uses a GFP_KERNEL
allocation.
Fix by taking riocm_send_close() outside of spinlock protected code.
[akpm@linux-foundation.org: remove unneeded `if (!list_empty())']
Link: http://lkml.kernel.org/r/20160915175402.10122-1-alexandre.bounine@idt.com
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 38b52efd21 ("ocfs2: bump up o2cb network protocol
version").
This commit made rolling upgrade fail. When one node is upgraded to new
version with this commit, the remaining nodes will fail to establish
connections to it, then the application like VMs on the remaining nodes
can't be live migrated to the upgraded one. This will cause an outage.
Since negotiate hb timeout behavior didn't change without this commit,
so revert it.
Fixes: 38b52efd21 ("ocfs2: bump up o2cb network protocol version")
Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reported-by: Saar Maoz <saar.maoz@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.
Fixes: bd1060a1d6 ("sock, cgroup: add sock->sk_cgroup")
Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.
The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked. Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path. When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.
Disable IRQs during all per-cpu charge cache operations.
Fixes: f7e1cb6ec5 ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
Link: http://lkml.kernel.org/r/20160914194846.11153-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org> [4.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.
Fixes: 2070ad1aeb ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Jiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).
However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.
Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().
Fixes: 5838d4442b ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group. Fanotify will use this.
Fixes: 5838d4442b ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().
Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.
The following error was caught when unlinking a large file with many
extents:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 13626 Comm: unlink Tainted: G W 4.1.12-37.6.3.el6uek.x86_64 #2
Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
Call Trace:
dump_stack+0x48/0x5c
warn_slowpath_common+0x95/0xe0
warn_slowpath_null+0x1a/0x20
start_this_handle+0x4c3/0x510 [jbd2]
jbd2__journal_restart+0x161/0x1b0 [jbd2]
jbd2_journal_restart+0x13/0x20 [jbd2]
ocfs2_extend_trans+0x74/0x220 [ocfs2]
ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
__ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
ocfs2_evict_inode+0x28/0x60 [ocfs2]
evict+0xab/0x1a0
iput_final+0xf6/0x190
iput+0xc8/0xe0
do_unlinkat+0x1b7/0x310
SyS_unlink+0x16/0x20
system_call_fastpath+0x12/0x71
---[ end trace 28aa7410e69369cf ]---
JBD2: unlink wants too many credits (251 > 128)
Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit c01d5b3007 ("shmem: get_unmapped_area align huge page") makes
use of shm_get_unmapped_area() in shm_file_operations() unconditional to
CONFIG_MMU.
As Tony Battersby pointed this can lead NULL-pointer dereference on
machine with CONFIG_MMU=y and CONFIG_SHMEM=n. In this case ipc/shm is
backed by ramfs which doesn't provide f_op->get_unmapped_area for
configurations with MMU.
The solution is to provide dummy f_op->get_unmapped_area for ramfs when
CONFIG_MMU=y, which just call current->mm->get_unmapped_area().
Fixes: c01d5b3007 ("shmem: get_unmapped_area align huge page")
Link: http://lkml.kernel.org/r/20160912102704.140442-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Tested-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [4.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 62c230bc17 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages") replaced the
swap_aops dirty hook from __set_page_dirty_no_writeback() with
swap_set_page_dirty().
For normal cases without these special SWP flags code path falls back to
__set_page_dirty_no_writeback() so the behaviour is expected to be the
same as before.
But swap_set_page_dirty() makes use of the page_swap_info() helper to
get the swap_info_struct to check for the flags like SWP_FILE,
SWP_BLKDEV etc as desired for those features. This helper has
BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
set_page_dirty_lock() path.
For the set_page_dirty() path which is often needed for cases to be
called from irq context, kswapd() can toggle the flag behind the back
while the call is getting executed when system is low on memory and
heavy swapping is ongoing.
This ends up with undesired kernel panic.
This patch just moves the check outside the helper to its users
appropriately to fix kernel panic for the described path. Couple of
users of helpers already take care of SwapCache condition so I skipped
them.
Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Joe Perches <joe@perches.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jens Axboe <axboe@fb.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [4.7.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Somewhere along the way the autofs expire operation has changed to hold
a spin lock over expired dentry selection. The autofs indirect mount
expired dentry selection is complicated and quite lengthy so it isn't
appropriate to hold a spin lock over the operation.
Commit 47be61845c ("fs/dcache.c: avoid soft-lockup in dput()") added a
might_sleep() to dput() causing a WARN_ONCE() about this usage to be
issued.
But the spin lock doesn't need to be held over this check, the autofs
dentry info. flags are enough to block walks into dentrys during the
expire.
I've left the direct mount expire as it is (for now) because it is much
simpler and quicker than the indirect mount expire and adding spin lock
release and re-aquires would do nothing more than add overhead.
Fixes: 47be61845c ("fs/dcache.c: avoid soft-lockup in dput()")
Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Reported-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Takashi Iwai <tiwai@suse.de>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>