Commit Graph

648737 Commits

Author SHA1 Message Date
Mathias Nyman
d6169d0409 xhci: fix deadlock at host remove by running watchdog correctly
If a URB is killed while the host is removed we can end up in a situation
where the hub thread takes the roothub device lock, and waits for
the URB to be given back by xhci-hcd, blocking the host remove code.

xhci-hcd tries to stop the endpoint and give back the urb, but can't
as the host is removed from PCI bus at the same time, preventing the normal
way of giving back urb.

Instead we need to rely on the stop command timeout function to give back
the urb. This xhci_stop_endpoint_command_watchdog() timeout function
used a XHCI_STATE_DYING flag to indicate if the timeout function is already
running, but later this flag has been taking into use in other places to
mark that xhci is dying.

Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
checking that reading from pci state does not return 0xffffffff or that
host is not halted before trying to stop the endpoint.

This whole area of stopping endpoints, giving back URBs, and the wathdog
timeout need rework, this fix focuses on solving a specific deadlock
issue that we can then send to stable before any major rework.

Cc: <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 16:52:13 +01:00
Colin King
ad5013d569 perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
When x86_pmu.num_counters is 32 the shift of the integer constant 1 is
exceeding 32bit and therefor undefined behaviour.

Fix this by shifting 1ULL instead of 1.

Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-11 16:43:30 +01:00
Johannes Berg
d2941df8fb mac80211: recalculate min channel width on VHT opmode changes
When an associated station changes its VHT operating mode this
can/will affect the bandwidth it's using, and consequently we
must recalculate the minimum bandwidth we need to use. Failure
to do so can lead to one of two scenarios:
 1) we use a too high bandwidth, this is benign
 2) we use a too narrow bandwidth, causing rate control and
    actual PHY configuration to be out of sync, which can in
    turn cause problems/crashes

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:35:05 +01:00
Johannes Berg
96aa2e7cf1 mac80211: calculate min channel width correctly
In the current minimum chandef code there's an issue in that the
recalculation can happen after rate control is initialized for a
station that has a wider bandwidth than the current chanctx, and
then rate control can immediately start using those higher rates
which could cause problems.

Observe that first of all that this problem is because we don't
take non-associated and non-uploaded stations into account. The
restriction to non-associated is quite pointless and is one of
the causes for the problem described above, since the rate init
will happen before the station is set to associated; no frames
could actually be sent until associated, but the rate table can
already contain higher rates and that might cause problems.

Also, rejecting non-uploaded stations is wrong, since the rate
control can select higher rates for those as well.

Secondly, it's then necessary to recalculate the minimal config
before initializing rate control, so that when rate control is
initialized, the higher rates are already available. This can be
done easily by adding the necessary function call in rate init.

Change-Id: Ib9bc02d34797078db55459d196993f39dcd43070
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:51 +01:00
Beni Lev
06f7c88c10 cfg80211: consider VHT opmode on station update
Currently, this attribute is only fetched on station addition, but
not on station change. Since this info is only present in the assoc
request, with full station state support in the driver it cannot be
present when the station is added.

Thus, add support for changing the VHT opmode on station update if
done before (or while) the station is marked as associated. After
this, ignore it, since it used to be ignored.

Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:25 +01:00
Emmanuel Grumbach
d7f842442f mac80211: fix the TID on NDPs sent as EOSP carrier
In the commit below, I forgot to translate the mac80211's
AC to QoS IE order. Moreover, the condition in the if was
wrong. Fix both issues.
This bug would hit only with clients that didn't set all
the ACs as delivery enabled.

Fixes: f438ceb81d ("mac80211: uapsd_queues is in QoS IE order")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:32:58 +01:00
Cedric Izoard
c38c39bf7c mac80211: Fix headroom allocation when forwarding mesh pkt
This patch fix issue introduced by my previous commit that
tried to ensure enough headroom was present, and instead
broke it.

When forwarding mesh pkt, mac80211 may also add security header,
and it must therefore be taken into account in the needed headroom.

Fixes: d8da0b5d64 ("mac80211: Ensure enough headroom when forwarding mesh pkt")
Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:07:29 +01:00
David Ahern
24c63bbc18 net: vrf: do not allow table id 0
Frank reported that vrf devices can be created with a table id of 0.
This breaks many of the run time table id checks and should not be
allowed. Detect this condition at create time and fail with EINVAL.

Fixes: 193125dbd8 ("net: Introduce VRF device driver")
Reported-by: Frank Kellermann <frank.kellermann@atos.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 10:04:01 -05:00
Russell King
a13c06525a net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
When an Marvell 88E1512 PHY is connected to a nic in SGMII mode, the
fiber page is used for the SGMII host-side connection.  The PHY driver
notices that SUPPORTED_FIBRE is set, so it tries reading the fiber page
for the link status, and ends up reading the MAC-side status instead of
the outgoing (copper) link.  This leads to incorrect results reported
via ethtool.

If the PHY is connected via SGMII to the host, ignore the fiber page.
However, continue to allow the existing power management code to
suspend and resume the fiber page.

Fixes: 6cfb3bcc06 ("Marvell phy: check link status in case of fiber link.")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 10:02:37 -05:00
Colin Ian King
eb004603c8 sctp: Fix spelling mistake: "Atempt" -> "Attempt"
Trivial fix to spelling mistake in WARN_ONCE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 10:01:01 -05:00
David Ahern
7a18c5b9fb net: ipv4: Fix multipath selection with vrf
fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:59:55 -05:00
Arnd Bergmann
73b3514735 cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:

warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)

I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.

Fixes: 483c4933ea ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:47:10 -05:00
Chris Mason
0bf70aebf1 Merge branch 'tracepoint-updates-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.10 2017-01-11 06:26:12 -08:00
Eric Dumazet
7cfd5fd5a9 gro: use min_t() in skb_gro_reset_offset()
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 08:15:40 -05:00
Prarit Bhargava
6d6daa2094 perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot
cpu. This works as long as the boot CPU is actually on the physical package
0, which is normaly the case after power on / reboot.

But it fails with a NULL pointer dereference when a kdump kernel is started
on a secondary socket which has a different physical package id because the
locigal package translation for physical package 0 does not exist.

Use the logical package id of the boot cpu instead of hard coded 0.

[ tglx: Rewrote changelog once more ]

Fixes: cf6d445f68 ("perf/x86/uncore: Track packages, not per CPU data")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-11 12:13:21 +01:00
Johan Hovold
2d5a9c72d0 USB: serial: ch341: fix control-message error handling
A short control transfer would currently fail to be detected, something
which could lead to stale buffer data being used as valid input.

Check for short transfers, and make sure to log any transfer errors.

Note that this also avoids leaking heap data to user space (TIOCMGET)
and the remote device (break control).

Fixes: 6ce7610478 ("USB: Driver for CH341 USB-serial adaptor")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
2017-01-11 12:08:57 +01:00
Huang Shijie
69d012345a arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
In current code, the @changed always returns the last one's status for
the huge page with the contiguous bit set. This is really not what we
want. Even one of the PTEs is changed, we should tell it to the caller.

This patch fixes this issue.

Fixes: 66b3923a1a ("arm64: hugetlb: add support for PTE contiguous bit")
Cc: <stable@vger.kernel.org> # 4.5.x-
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-01-11 10:26:40 +00:00
Augusto Mecking Caringi
c8a6a09c1c vme: Fix wrong pointer utilization in ca91cx42_slave_get
In ca91cx42_slave_get function, the value pointed by vme_base pointer is
set through:

*vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);

So it must be dereferenced to be used in calculation of pci_base:

*pci_base = (dma_addr_t)*vme_base + pci_offset;

This bug was caught thanks to the following gcc warning:

drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
*pci_base = (dma_addr_t)vme_base + pci_offset;

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Acked-By: Martyn Welch <martyn@welchs.me.uk>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 10:42:16 +01:00
Frederic Weisbecker
24b91e360e nohz: Fix collision between tick and other hrtimers
When the tick is stopped and an interrupt occurs afterward, we check on
that interrupt exit if the next tick needs to be rescheduled. If it
doesn't need any update, we don't want to do anything.

In order to check if the tick needs an update, we compare it against the
clockevent device deadline. Now that's a problem because the clockevent
device is at a lower level than the tick itself if it is implemented
on top of hrtimer.

Every hrtimer share this clockevent device. So comparing the next tick
deadline against the clockevent device deadline is wrong because the
device may be programmed for another hrtimer whose deadline collides
with the tick. As a result we may end up not reprogramming the tick
accidentally.

In a worst case scenario under full dynticks mode, the tick stops firing
as it is supposed to every 1hz, leaving /proc/stat stalled:

      Task in a full dynticks CPU
      ----------------------------

      * hrtimer A is queued 2 seconds ahead
      * the tick is stopped, scheduled 1 second ahead
      * tick fires 1 second later
      * on tick exit, nohz schedules the tick 1 second ahead but sees
        the clockevent device is already programmed to that deadline,
        fooled by hrtimer A, the tick isn't rescheduled.
      * hrtimer A is cancelled before its deadline
      * tick never fires again until an interrupt happens...

In order to fix this, store the next tick deadline to the tick_sched
local structure and reuse that value later to check whether we need to
reprogram the clock after an interrupt.

On the other hand, ts->sleep_length still wants to know about the next
clock event and not just the tick, so we want to improve the related
comment to avoid confusion.

Reported-by: James Hartsock <hartsjc@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Link: http://lkml.kernel.org/r/1483539124-5693-1-git-send-email-fweisbec@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-11 10:41:33 +01:00
Randy Dunlap
546cf3ef9c auxdisplay: fix new ht16k33 build errors
Fix build errors caused by selecting incorrect kconfig symbols.

drivers/built-in.o:(.data+0x19cec): undefined reference to `sys_fillrect'
drivers/built-in.o:(.data+0x19cf0): undefined reference to `sys_copyarea'
drivers/built-in.o:(.data+0x19cf4): undefined reference to `sys_imageblit'

Fixes: 31114fa95b (auxdisplay: ht16k33: select framebuffer helper modules)

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Robin van der Gracht <robin@protonic.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 09:27:30 +01:00
Akinobu Mita
802c03881f sysrq: attach sysrq handler correctly for 32-bit kernel
The sysrq input handler should be attached to the input device which has
a left alt key.

On 32-bit kernels, some input devices which has a left alt key cannot
attach sysrq handler.  Because the keybit bitmap in struct input_device_id
for sysrq is not correctly initialized.  KEY_LEFTALT is 56 which is
greater than BITS_PER_LONG on 32-bit kernels.

I found this problem when using a matrix keypad device which defines
a KEY_LEFTALT (56) but doesn't have a KEY_O (24 == 56%32).

Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 09:22:54 +01:00
Colin Ian King
0fa2c8eb27 ppdev: don't print a free'd string
A previous fix of a memory leak now prints the string 'name'
that was previously free'd.  Fix this by free'ing the string
at the end of the function and adding an error exit path for
the error conditions.

CoverityScan CID#1384523 ("Use after free")

Fixes: 2bd362d5f4 ("ppdev: fix memory leak")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 09:14:19 +01:00
Pan Bian
5b11ebedd6 extcon: return error code on failure
Function get_zeroed_page() returns a NULL pointer if there is no enough
memory. In function extcon_sync(), it returns 0 if the call to
get_zeroed_page() fails. The return value 0 indicates success in the
context, which is incosistent with the execution status. This patch
fixes the bug by returning -ENOMEM.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188611

Signed-off-by: Pan Bian <bianpan2016@163.com>
Fixes: a580982f08
Cc: stable <stable@vger.kernel.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 09:11:39 +01:00
Herbert Xu
6741f551a0 Revert "tty: serial: 8250: add CON_CONSDEV to flags"
This commit needs to be reverted because it prevents people from
using the serial console as a secondary console with input being
directed to tty0.

IOW, if you boot with console=ttyS0 console=tty0 then all kernels
prior to this commit will produce output on both ttyS0 and tty0
but input will only be taken from tty0.  With this patch the serial
console will always be the primary console instead of tty0,
potentially preventing people from getting into their machines in
emergency situations.

Fixes: d03516df83 ("tty: serial: 8250: add CON_CONSDEV to flags")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:35:17 +01:00
Daniel Jedrychowski
2bed8a8e70 Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break
When in RS485 emulation mode, __do_stop_tx_rs485() calls
serial8250_clear_fifos().  This not only clears the FIFOs, but also sets
all bits in their control register (UART_FCR) to 0.

One of the effects of this is the disabling of the FIFOs, which turns
them into single-byte holding registers.  The rest of the driver doesn't
know this, which results in the lions share of characters passed into a
write call to be dropped.

(I can supply logic analyzer screenshots if necessary)

This fix replaces the serial8250_clear_fifos() call to
serial8250_clear_and_reinit_fifos() - this prevents the "dropped
characters" issue from manifesting again while retaining the requirement
of clearing the RX FIFO after transmission if the SER_RS485_RX_DURING_TX
flag is disabled.

Signed-off-by: Daniel Jedrychowski <avistel@gmail.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:35:17 +01:00
Gabriel Krisman Bertazi
c130b666a9 8250_pci: Fix potential use-after-free in error path
Commit f209fa03fc ("serial: 8250_pci: Detach low-level driver during
PCI error recovery") introduces a potential use-after-free in case the
pciserial_init_ports call in serial8250_io_resume fails, which may
happen if a memory allocation fails or if the .init quirk failed for
whatever reason).  If this happen, further pci_get_drvdata will return a
pointer to freed memory.

This patch reworks the PCI recovery resume hook to restore the old priv
structure in this case, which should be ok, since the ports were already
detached. Such error during recovery causes us to give up on the
recovery.

Fixes: f209fa03fc ("serial: 8250_pci: Detach low-level driver during
  PCI error recovery")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:35:17 +01:00
Richard Genoud
b389f173aa tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
When using RS485 in half duplex, RX should be enabled when TX is
finished, and stopped when TX starts.

Before commit 0058f0871e ("tty/serial: atmel: fix RS485 half
duplex with DMA"), RX was not disabled in atmel_start_tx() if the DMA
was used. So, collisions could happened.

But disabling RX in atmel_start_tx() uncovered another bug:
RX was enabled again in the wrong place (in atmel_tx_dma) instead of
being enabled when TX is finished (in atmel_complete_tx_dma), so the
transmission simply stopped.

This bug was not triggered before commit 0058f0871e
("tty/serial: atmel: fix RS485 half duplex with DMA") because RX was
never disabled before.

Moving atmel_start_rx() in atmel_complete_tx_dma() corrects the problem.

Cc: stable@vger.kernel.org
Reported-by: Gil Weber <webergil@gmail.com>
Fixes: 0058f0871e
Tested-by: Gil Weber <webergil@gmail.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:18:45 +01:00
Richard Genoud
89d8232411 tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
If we don't disable the transmitter in atmel_stop_tx, the DMA buffer
continues to send data until it is emptied.
This cause problems with the flow control (CTS is asserted and data are
still sent).

So, disabling the transmitter in atmel_stop_tx is a sane thing to do.

Tested on at91sam9g35-cm(DMA)
Tested for regressions on sama5d2-xplained(Fifo) and at91sam9g20ek(PDC)

Cc: <stable@vger.kernel.org> (beware, this won't apply before 4.3)
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:18:45 +01:00
Robin Murphy
488debb997 drivers: char: mem: Fix thinkos in kmem address checks
When borrowing the pfn_valid() check from mmap_kmem(), somebody managed
to get physical and virtual addresses spectacularly muddled up, such
that we've ended up with checks for one being the other. Whilst this
does indeed prevent out-of-bounds accesses crashing, on most systems
it also prevents the more desirable use-case of working at all ever.

Check the *virtual* offset correctly for what it is. Furthermore, do
so in the right place - a read or write may span multiple pages, so a
single up-front check is insufficient. High memory accesses already
have a similar validity check just before the copy_to_user() call, so
just make the low memory path fully consistent with that.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
CC: stable@vger.kernel.org
Fixes: 148a1bc843 ("drivers: char: mem: Check {read,write}_kmem() addresses")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 08:02:18 +01:00
Alexander Usyskin
7ee7f45a76 mei: bus: enable OS version only for SPT and newer
Sending OS version for support of TPM2_ChangeEPS() is required only
for SPT FW (HMB version 2.0) and newer.
On older platforms the command should be just ignored by the firmware
but some older platforms misbehave so it's safer to send the command
only if required.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=192051
Fixes: 7279b238ba (mei: send OS type to the FW)
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Tested-by: Jan Niehusmann <jan@gondor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-11 07:43:57 +01:00
David S. Miller
6c711c8691 Merge branch 'mlx5-fixes'
Saeed Mahameed says:

====================
Mellanox mlx5 fixes and cleanups 2017-01-10

This series includes some mlx5e general cleanups from Daniel, Gil, Hadar
and myself.
Also it includes some critical mlx5e TC offloads fixes from Or Gerlitz.

For -stable:
 - net/mlx5e: Remove WARN_ONCE from adaptive moderation code

   Although this fix doesn't affect any functionality, I thought it is
   better to clean this -WARN_ONCE- up for -stable in case someone hits
   such corner case.

Please apply and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:03 -05:00
Daniel Jurgens
5e44fca504 net/mlx5: Only cancel recovery work when cleaning up device
Do not attempt to drain the health workqueue when unloading the device in
the recovery flow, this can cause a deadlock when the recovery work
tries to cancel itself with sync.

Because the work is no longer unconditionally canceled when unloading, it
must be explicitly canceled in the AER flow.

fixes: 689a248df8 ("net/mlx5: Cancel recovery work in remove flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Gil Rockah
0bbcc0a8fc net/mlx5e: Remove WARN_ONCE from adaptive moderation code
When trying to do interface down or changing interface configuration
under heavy traffic, some of the adaptive moderation corner cases can
occur and leave a WARN_ONCE call trace in the kernel log.

Those WARN_ONCE are meant for debug only, and should have been inserted
only under debug. We avoid such call traces by removing those WARN_ONCE.

Fixes: cb3c7fd4f8 ("net/mlx5e: Support adaptive RX coalescing")
Signed-off-by: Gil Rockah <gilr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Saeed Mahameed
3deef8cea3 net/mlx5e: Un-register uplink representor on nic_disable
The code before this patch registered uplink e-Switch representor
on nic_enable and unregistered on nic_cleanup, the right place
for this unregister is in nic_disable.

Fixes: 127ea380ac ("net/mlx5: Add Representors registration API")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
5e86397abe net/mlx5e: Properly handle FW errors while adding TC rules
When the firmware returns an error (common example is an attempt to
add twice the same rule which is refused by the some FWs), we are not
properly derefing/cleaning few resources allocated on the way.
Examples are vport vlan deref under eswitch vlan offloads, and encap
entry/neighbour deref under eswitch encapsulation offloads, fix that.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: 8b32580df1 ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Hadar Hen Zion
a757d108dc net/mlx5e: Fix kbuild warnings for uninitialized parameters
kbuild warn about parameters that may be used uninitialized, fix it.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
0827444d05 net/mlx5e: Set inline mode requirements for matching on IP fragments
For e-switch level matching on packets being an IP fragment, we
need to make sure the source vport inline mode is L3, fix that.

Fixes: 3f7d0eb42d ('net/mlx5e: Offload TC matching on packets being IP fragments')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
2e72eb438c net/mlx5e: Properly get address type of encapsulation IP headers
As done elsewhere in our TC/flower offload code, the address type of
the encapsulation IP headers should be realized accroding to the
addr_type field of the encapsulation control dissector key, do that.

Fixes: bbd00f7e23 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
a42485eb0e net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
When the route lookup fails we should return the actual error.

When the neigh isn't valid, we should return -EOPNOTSUPP as done
in similar cases along the code.

When the offload can't take place as of invalid neigh etc, we
must release the neigh.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
2fcd82e9be net/mlx5e: Warn when rejecting offload attempts of IP tunnels
We silently reject offloading of IPv6 tunnels, non vxlan tunnels,
vxlan tunnels where the dst port to match is not provided, etc.

Be a bit more verbose and print a warning so the user better
realizes what went wrong here and can fix it.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: bbd00f7e23 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Or Gerlitz
cd37766380 net/mlx5e: Properly handle offloading of source udp port for IP tunnels
We can offload the matching on source udp port of ip tunnels for
decapsulation. We can not offload setting source udp port for tunnels
as part of encapsulation. Fix both the code that deals with matching
offload (decap) and the code that deal with encap offload to align with
that.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Fixes: bbd00f7e23 ('net/mlx5e: Add TC tunnel release action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:34:01 -05:00
Mike Frysinger
575b1967e1 timerfd: export defines to userspace
Since userspace is expected to call timerfd syscalls directly with these
flags/ioctls, make sure we export them so they don't have to duplicate
the values themselves.

Link: http://lkml.kernel.org/r/20161219064052.7196-1-vapier@gentoo.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Mike Kravetz
e5bbc8a6c9 mm/hugetlb.c: fix reservation race when freeing surplus pages
return_unused_surplus_pages() decrements the global reservation count,
and frees any unused surplus pages that were backing the reservation.

Commit 7848a4bf51 ("mm/hugetlb.c: add cond_resched_lock() in
return_unused_surplus_pages()") added a call to cond_resched_lock in the
loop freeing the pages.

As a result, the hugetlb_lock could be dropped, and someone else could
use the pages that will be freed in subsequent iterations of the loop.
This could result in inconsistent global hugetlb page state, application
api failures (such as mmap) failures or application crashes.

When dropping the lock in return_unused_surplus_pages, make sure that
the global reservation count (resv_huge_pages) remains sufficiently
large to prevent someone else from claiming pages about to be freed.

Analyzed by Paul Cassella.

Fixes: 7848a4bf51 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Paul Cassella <cassella@cray.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: <stable@vger.kernel.org>	[3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
John Sperbeck
c4e490cf14 mm/slab.c: fix SLAB freelist randomization duplicate entries
This patch fixes a bug in the freelist randomization code.  When a high
random number is used, the freelist will contain duplicate entries.  It
will result in different allocations sharing the same chunk.

It will result in odd behaviours and crashes.  It should be uncommon but
it depends on the machines.  We saw it happening more often on some
machines (every few hours of running tests).

Fixes: c7ce4f60ac ("mm: SLAB freelist randomization")
Link: http://lkml.kernel.org/r/20170103181908.143178-1-thgarnie@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Minchan Kim
b09ab054b6 zram: support BDI_CAP_STABLE_WRITES
zram has used per-cpu stream feature from v4.7.  It aims for increasing
cache hit ratio of scratch buffer for compressing.  Downside of that
approach is that zram should ask memory space for compressed page in
per-cpu context which requires stricted gfp flag which could be failed.
If so, it retries to allocate memory space out of per-cpu context so it
could get memory this time and compress the data again, copies it to the
memory space.

In this scenario, zram assumes the data should never be changed but it is
not true without stable page support.  So, If the data is changed under
us, zram can make buffer overrun so that zsmalloc free object chain is
broken so system goes crash like below

   https://bugzilla.suse.com/show_bug.cgi?id=997574

This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
device needing *stable write*".

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Minchan Kim
e7ccfc4ccb zram: revalidate disk under init_lock
Commit b4c5c60920 ("zram: avoid lockdep splat by revalidate_disk")
moved revalidate_disk call out of init_lock to avoid lockdep
false-positive splat.  However, commit 08eee69fcf ("zram: remove
init_lock in zram_make_request") removed init_lock in IO path so there
is no worry about lockdep splat.  So, let's restore it.

This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
patch.

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Minchan Kim
f05714293a mm: support anonymous stable page
During developemnt for zram-swap asynchronous writeback, I found strange
corruption of compressed page, resulting in:

  Modules linked in: zram(E)
  CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  task: ffff88007620b840 task.stack: ffff880078090000
  RIP: set_freeobj.part.43+0x1c/0x1f
  RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
  RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
  RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
  RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
  R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
  Call Trace:
    obj_malloc+0x22b/0x260
    zs_malloc+0x1e4/0x580
    zram_bvec_rw+0x4cd/0x830 [zram]
    page_requests_rw+0x9c/0x130 [zram]
    zram_thread+0xe6/0x173 [zram]
    kthread+0xca/0xe0
    ret_from_fork+0x25/0x30

With investigation, it reveals currently stable page doesn't support
anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
writeback completion so it can overwrite page zram is compressing.

Unfortunately, zram has used per-cpu stream feature from v4.7.
It aims for increasing cache hit ratio of scratch buffer for
compressing. Downside of that approach is that zram should ask
memory space for compressed page in per-cpu context which requires
stricted gfp flag which could be failed. If so, it retries to
allocate memory space out of per-cpu context so it could get memory
this time and compress the data again, copies it to the memory space.

In this scenario, zram assumes the data should never be changed
but it is not true unless stable page supports. So, If the data is
changed under us, zram can make buffer overrun because second
compression size could be bigger than one we got in previous trial
and blindly, copy bigger size object to smaller buffer which is
buffer overrun. The overrun breaks zsmalloc free object chaining
so system goes crash like above.

I think below is same problem.
https://bugzilla.suse.com/show_bug.cgi?id=997574

Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
writeback in there so the approach in this patch is simply return false if
we found it needs stable page.  Although it increases memory footprint
temporarily, it happens rarely and it should be reclaimed easily althoug
it happened.  Also, It would be better than waiting of IO completion,
which is critial path for application latency.

Fixes: da9556a236 ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Alexander Duyck
4d09d0f45d mm: add documentation for page fragment APIs
This is a first pass at trying to add documentation for the page_frag
APIs.  They may still change over time but for now I thought I would try
to get these documented so that as more network drivers and stack calls
make use of them we have one central spot to document how they are meant
to be used.

Link: http://lkml.kernel.org/r/20170104024157.13451.6758.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Alexander Duyck
2976db8018 mm: rename __page_frag functions to __page_frag_cache, drop order from drain
This patch does two things.

First it goes through and renames the __page_frag prefixed functions to
__page_frag_cache so that we can be clear that we are draining or
refilling the cache, not the frags themselves.

Second we drop the order parameter from __page_frag_cache_drain since we
don't actually need to pass it since all fragments are either order 0 or
must be a compound page.

Link: http://lkml.kernel.org/r/20170104023954.13451.5678.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Alexander Duyck
8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00