Commit Graph

618801 Commits

Author SHA1 Message Date
Chunyan Zhang
2b9743441a arm64: use preempt_disable_notrace in _percpu_read/write
When debug preempt or preempt tracer is enabled, preempt_count_add/sub()
can be traced by function and function graph tracing, and
preempt_disable/enable() would call preempt_count_add/sub(), so in Ftrace
subsystem we should use preempt_disable/enable_notrace instead.

In the commit 345ddcc882 ("ftrace: Have set_ftrace_pid use the bitmap
like events do") the function this_cpu_read() was added to
trace_graph_entry(), and if this_cpu_read() calls preempt_disable(), graph
tracer will go into a recursive loop, even if the tracing_on is
disabled.

So this patch change to use preempt_enable/disable_notrace instead in
this_cpu_read().

Since Yonghui Yang helped a lot to find the root cause of this problem,
so also add his SOB.

Signed-off-by: Yonghui Yang <mark.yang@spreadtrum.com>
Signed-off-by: Chunyan Zhang <zhang.chunyan@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-09-09 12:34:47 +01:00
Will Deacon
872c63fbf9 arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
smp_mb__before_spinlock() is intended to upgrade a spin_lock() operation
to a full barrier, such that prior stores are ordered with respect to
loads and stores occuring inside the critical section.

Unfortunately, the core code defines the barrier as smp_wmb(), which
is insufficient to provide the required ordering guarantees when used in
conjunction with our load-acquire-based spinlock implementation.

This patch overrides the arm64 definition of smp_mb__before_spinlock()
to map to a full smp_mb().

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-09-09 12:33:48 +01:00
Amitkumar Karwar
75696fe704 mwifiex: PCIe8997 chip specific handling
The patch corrects the revision id register and uses it along with
magic value and chip version registers to download appropriate firmware
image.

PCIe8997 Z chipset variant code has been removed, as it won't be used in
production.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 14:25:57 +03:00
Christophe Jaillet
b711657616 mwifiex: scan: Simplify code
This patch:
   - improves code layout
   - removes a useless memset(0) for some memory allocated with kzalloc
   - removes a useless if. We know that 'if (chan_band_tlv)' will succeed
     because it has been tested a few lines above

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:21:25 +03:00
Clemens Gruber
6f3c4fb6d0 usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
Problems with the signal integrity of the high speed USB data lines or
noise on reference ground lines can cause the i.MX6 USB controller to
violate USB specs and exhibit unexpected behavior.

It was observed that USBi_UI interrupts were triggered first and when
isr_setup_status_phase was called, ci->status was NULL, which lead to a
NULL pointer dereference kernel panic.

This patch fixes the kernel panic, emits a warning once and returns
-EPIPE to halt the device and let the host get stalled.
It also adds a comment to point people, who are experiencing this issue,
to their USB hardware design.

Cc: <stable@vger.kernel.org> #4.1+
Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Signed-off-by: Peter Chen <peter.chen@nxp.com>
2016-09-09 17:19:57 +08:00
Amitkumar Karwar
4c5dae59d2 mwifiex: add PCIe function level reset support
This patch implements pre and post FLR handlers to support PCIe FLR
functionality. Software cleanup is performed in pre-FLR whereas
firmware is downloaded and software is re-initialised in
post-FLR handler.

Following command triggers FLR.
echo "1" > /sys/bus/pci/devices/$NUMBER/reset

This feature can be used as a recovery mechanism when firmware gets
hang.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:18:32 +03:00
Arend Van Spriel
5251b6be8b brcmfmac: sdio: shorten retry loop in brcmf_sdio_kso_control()
In brcmf_sdio_kso_control() there is a retry loop as hardware may take
time to settle. However, when the call to brcmf_sdiod_regrb() returns
an error it is due to SDIO access failure and it makes no sense to wait
for hardware to settle. This patch aborts the loop after a number of
subsequent access errors.

Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:12:15 +03:00
Arend Van Spriel
634faf3686 brcmfmac: add support for bcm4339 chip with modalias sdio:c00v02D0d4339
The driver already supports the bcm4339 chipset but only for the variant
that shares the same modalias as the bcm4335, ie. sdio:c00v02D0d4335.
It turns out that there are also bcm4339 devices out there that have a
more distiguishable modalias sdio:c00v02D0d4339.

Reported-by: Steve deRosier <derosier@gmail.com>
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:12:14 +03:00
Jakub Kicinski
d43af50566 mt7601u: use linux/bitfield.h
Use the newly added linux/bitfield.h.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:25 +03:00
Jakub Kicinski
adcc710d0a mt7601u: remove unnecessary include
There is no need to include linux/version.h in a in-tree
driver.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:25 +03:00
Jakub Kicinski
faad5433b7 mt7601u: remove redefinition of GENMASK
Remove redefinition of GENMASK which should not be there
in the upstream version of the code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:24 +03:00
Jakub Kicinski
3e9b3112ec add basic register-field manipulation macros
Common approach to accessing register fields is to define
structures or sets of macros containing mask and shift pair.
Operations on the register are then performed as follows:

 field = (reg >> shift) & mask;

 reg &= ~(mask << shift);
 reg |= (field & mask) << shift;

Defining shift and mask separately is tedious.  Ivo van Doorn
came up with an idea of computing them at compilation time
based on a single shifted mask (later refined by Felix) which
can be used like this:

 #define REG_FIELD 0x000ff000

 field = FIELD_GET(REG_FIELD, reg);

 reg &= ~REG_FIELD;
 reg |= FIELD_PREP(REG_FIELD, field);

FIELD_{GET,PREP} macros take care of finding out what the
appropriate shift is based on compilation time ffs operation.

GENMASK can be used to define registers (which is usually
less error-prone and easier to match with datasheets).

This approach is the most convenient I've seen so to limit code
multiplication let's move the macros to a global header file.
Attempts to use static inlines instead of macros failed due
to false positive triggering of BUILD_BUG_ON()s, especially with
GCC < 6.0.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:09:24 +03:00
Xinming Hu
3935ccc14d mwifiex: add cfg80211 testmode support
This patch adds cfg80211 testmode support so that userspace tools can
download necessary commands to firmware during manufacturing mode tests.

Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:05:04 +03:00
Xinming Hu
cf5383b088 mwifiex: add manufacturing mode support
By default normal mode is chosen when driver is loaded. This
patch adds a provision to choose manufacturing mode via module
parameters.

Below command loads driver in manufacturing mode
insmod mwifiex.ko mfg_mode=1.

Tested-by: chunfan chen <jeffc@marvell.com>
Signed-off-by: Xinming Hu <huxm@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:05:03 +03:00
Arnd Bergmann
defb893fff bcma: use of_dma_configure() to set initial dma mask
While fixing another bug, I noticed that bcma manually sets up
a dma_mask pointer for its child devices. We have a generic
helper for that now, which should be able to cope better with
any variations that might be needed to deal with cache coherency,
unusual DMA address offsets, iommus, or limited DMA masks, none
of which are currently handled here.

This changes the core to use the of_dma_configure(), like
we do for platform devices that are probed directly from
DT.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2016-09-09 12:00:37 +03:00
Eric Dumazet
e895cdce68 ipv4: accept u8 in IP_TOS ancillary data
In commit f02db315b8 ("ipv4: IP_TOS and IP_TTL can be specified as
ancillary data") Francesco added IP_TOS values specified as integer.

However, kernel sends to userspace (at recvmsg() time) an IP_TOS value
in a single byte, when IP_RECVTOS is set on the socket.

It can be very useful to reflect all ancillary options as given by the
kernel in a subsequent sendmsg(), instead of aborting the sendmsg() with
EINVAL after Francesco patch.

So this patch extends IP_TOS ancillary to accept an u8, so that an UDP
server can simply reuse same ancillary block without having to mangle
it.

Jesper can then augment
https://github.com/netoptimizer/network-testing/blob/master/src/udp_example02.c
to add TOS reflection ;)

Fixes: f02db315b8 ("ipv4: IP_TOS and IP_TTL can be specified as ancillary data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Francesco Fusco <ffusco@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:45:57 -07:00
Daniel Borkmann
2d2be8cab2 bpf: fix range propagation on direct packet access
LLVM can generate code that tests for direct packet access via
skb->data/data_end in a way that currently gets rejected by the
verifier, example:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)

The reason why this gets rejected despite a proper test is that we
currently call find_good_pkt_pointers() only in case where we detect
tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
derived, for example, from a register of type pkt(id=Y,off=0,r=0)
pointing to skb->data. find_good_pkt_pointers() then fills the range
in the current branch to pkt(id=Y,off=0,r=Z) on success.

For above case, we need to extend that to recognize pkt_end >= rX
pattern and mark the other branch that is taken on success with the
appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
only two practical options to test for from what LLVM could have
generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
that we would need to take into account as well.

After the fix:

  [...]
   7: (61) r3 = *(u32 *)(r6 +80)
   8: (61) r9 = *(u32 *)(r6 +76)
   9: (bf) r2 = r9
  10: (07) r2 += 54
  11: (3d) if r3 >= r2 goto pc+12
   R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
   R9=pkt(id=0,off=0,r=0) R10=fp
  12: (18) r4 = 0xffffff7a
  14: (05) goto pc+430
  [...]

  from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                 R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
  24: (7b) *(u64 *)(r10 -40) = r1
  25: (b7) r1 = 0
  26: (63) *(u32 *)(r6 +56) = r1
  27: (b7) r2 = 40
  28: (71) r8 = *(u8 *)(r9 +20)
  29: (bf) r1 = r8
  30: (25) if r8 > 0x3c goto pc+47
   R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
   R9=pkt(id=0,off=0,r=54) R10=fp
  31: (b7) r1 = 1
  [...]

Verifier test cases are also added in this work, one that demonstrates
the mentioned example here and one that tries a bad packet access for
the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
with both test variants (>, >=).

Fixes: 969bf05eb3 ("bpf: direct packet access")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:28:37 -07:00
Yaogong Wang
9f5afeae51 tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude,
and some people are considering to reach the 2 Gbytes limit.

Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000
MSS.

In presence of packet losses (or reorders), TCP stores incoming packets
into an out of order queue, and number of skbs sitting there waiting for
the missing packets to be received can be in the 10^5 range.

Most packets are appended to the tail of this queue, and when
packets can finally be transferred to receive queue, we scan the queue
from its head.

However, in presence of heavy losses, we might have to find an arbitrary
point in this queue, involving a linear scan for every incoming packet,
throwing away cpu caches.

This patch converts it to a RB tree, to get bounded latencies.

Yaogong wrote a preliminary patch about 2 years ago.
Eric did the rebase, added ofo_last_skb cache, polishing and tests.

Tested with network dropping between 1 and 10 % packets, with good
success (about 30 % increase of throughput in stress tests)

Next step would be to also use an RB tree for the write queue at sender
side ;)

Signed-off-by: Yaogong Wang <wygivan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:25:58 -07:00
David S. Miller
2c2c8e33e4 Merge branch 'nfp-fixes'
Jakub Kicinski says:

====================
nfp: fixes and trivial cleanup

First patch drops unnecessary version.h includes.  Second one
drops support for pre-release versions of FW ABI.  Removing
FW ABI 0.0 from supported set is particularly good since 0
could just be uninitialized memory.  Last but not least I drop
unnecessary padding of frames on RX which makes us count bytes
incorrectly for the VF2VF traffic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:18:42 -07:00
Jakub Kicinski
ebecefc820 nfp: don't pad frames on receive
There is no need to pad frames to ETH_ZLEN on RX.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:18:42 -07:00
Jakub Kicinski
313b345cbf nfp: drop support for old firmware ABIs
Be more strict about FW versions.  Drop support for old
transitional revisions which were never used in production.
Dropping support for FW ABI version 0.0.0.0 is particularly
useful because 0 could just be uninitialized memory.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:18:42 -07:00
Jakub Kicinski
312fada1f9 nfp: remove linux/version.h includes
Remove unnecessary version.h includes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:18:42 -07:00
Artem Germanov
db7196a0d0 tcp: cwnd does not increase in TCP YeAH
Commit 76174004a0
(tcp: do not slow start when cwnd equals ssthresh )
introduced regression in TCP YeAH. Using 100ms delay 1% loss virtual
ethernet link kernel 4.2 shows bandwidth ~500KB/s for single TCP
connection and kernel 4.3 and above (including 4.8-rc4) shows bandwidth
~100KB/s.
   That is caused by stalled cwnd when cwnd equals ssthresh. This patch
fixes it by proper increasing cwnd in this case.

Signed-off-by: Artem Germanov <agermanov@anchorfree.com>
Acked-by: Dmitry Adamushko <d.adamushko@anchorfree.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:16:12 -07:00
David S. Miller
3b61075be0 Merge branch 'ovs-802.1ad'
Eric Garver says:

====================
openvswitch: add 802.1ad support

This series adds 802.1ad support to openvswitch. It is a continuation of the
work originally started by Thomas F Herbert - hence the large rev number.

The extra VLAN is implemented by using an additional level of the
OVS_KEY_ATTR_ENCAP netlink attribute.
In OVS flow speak, this looks like

   eth_type(0x88a8),vlan(vid=100),encap(eth_type(0x8100), vlan(vid=200),
                                        encap(eth_type(0x0800), ...))

The userspace counterpart has also seen recent activity on the ovs-dev mailing
lists. There are some new 802.1ad OVS tests being added - also on the ovs-dev
list. This patch series has been tested using the most recent version of
userspace (v3) and tests (v2).

v22 changes:
  - merge patch 4 into patch 3
  - fix checkpatch.pl errors
    - Still some 80 char warnings for long string literals
  - refresh pointer after pskb_may_pull()
  - refactor vlan nlattr parsing to remove some double checks
  - introduce ovs_nla_put_vlan()
  - move triple VLAN check to after ethertype serialization
  - WARN_ON_ONCE() on triple VLAN and unexpected encap values

v21 changes:
  - Fix (and simplify) netlink attribute parsing
  - re-add handling of truncated VLAN tags
  - fix if/else dangling assignment in {push,pop}_vlan()
  - simplify parse_vlan()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Eric Garver
018c1dda5f openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes
Add support for 802.1ad including the ability to push and pop double
tagged vlans. Add support for 802.1ad to netlink parsing and flow
conversion. Uses double nested encap attributes to represent double
tagged vlan. Inner TPID encoded along with ctci in nested attributes.

This is based on Thomas F Herbert's original v20 patch. I made some
small clean ups and bug fixes.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Eric Garver
fe19c4f971 vlan: Check for vlan ethernet types for 8021.q or 802.1ad
This is to simplify using double tagged vlans. This function allows all
valid vlan ethertypes to be checked in a single function call.
Also replace some instances that check for both ETH_P_8021Q and
ETH_P_8021AD.

Patch based on one originally by Thomas F Herbert.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:28 -07:00
Thomas F Herbert
8c146bb9d5 openvswitch: 802.1ad uapi changes.
openvswitch: Add support for 8021.AD

Change the description of the VLAN tpid field.

Signed-off-by: Thomas F Herbert <thomasfherbert@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 17:10:27 -07:00
David S. Miller
81d1a366ff Merge branch 'mlx5-fixes'
Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes 2016-09-07

The following series contains bug fixes for the mlx5e driver.

from Gal,
	- Static code checker cleanup (casting overflow)
	- Fix global PFC counter statistics reading
	- Fix HW LRO when vlan stripping is off

From Bodong,
	- Deprecate old autoneg capability bit and use new one.

From Tariq,
	- Fix xmit more counter race condition
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Gal Pressman
cd17d230dd net/mlx5e: Fix parsing of vlan packets when updating lro header
Currently vlan tagged packets were not parsed correctly
and assumed to be regular IPv4/IPv6 packets.
We should check for 802.1Q/802.1ad tags and update the lro header
accordingly.
This fixes the use case where LRO is on and rxvlan is off
(vlan stripping is off).

Fixes: e586b3b0ba ('net/mlx5: Ethernet Datapath files')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Gal Pressman
4e39883d9c net/mlx5e: Fix global PFC counters replication
Currently when reading global PFC statistics we left the counter
iterator out of the equation and we ended up reading the same counter
over and over again.

Instead of reading the counter at index 0 on every iteration we now read
the counter at index (i).

Fixes: e989d5a532 ('net/mlx5e: Expose flow control counters to ethtool')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Gal Pressman
7abc211077 net/mlx5e: Prevent casting overflow
On 64 bits architectures unsigned long is longer than u32,
casting to unsigned long will result in overflow.
We need to first allocate an unsigned long variable, then assign the
wanted value.

Fixes: 665bc53969 ('net/mlx5e: Use new ethtool get/set link ksettings API')
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:29 -07:00
Bodong Wang
e7e31ca43d net/mlx5e: Move an_disable_cap bit to a new position
Previous an_disable_cap position bit31 is deprecated to be use in driver
with newer firmware.  New firmware will advertise the same capability
in bit29.

Old capability didn't allow setting more than one protocol for a
specific speed when autoneg is off, while newer firmware will allow
this and it is indicated in the new capability location.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Tariq Toukan
0dbf657c39 net/mlx5e: Fix xmit_more counter race issue
Update the xmit_more counter before notifying the HW,
to prevent a possible use-after-free of the skb.

Fixes: c8cf78fe10 ("net/mlx5e: Add ethtool counter for TX xmit_more")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:15:28 -07:00
Lorenzo Colitti
d545caca82 net: inet: diag: expose the socket mark to privileged processes.
This adds the capability for a process that has CAP_NET_ADMIN on
a socket to see the socket mark in socket dumps.

Commit a52e95abf7 ("net: diag: allow socket bytecode filters to
match socket marks") recently gave privileged processes the
ability to filter socket dumps based on mark. This patch is
complementary: it ensures that the mark is also passed to
userspace in the socket's netlink attributes.  It is useful for
tools like ss which display information about sockets.

Tested: https://android-review.googlesource.com/270210
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:13:09 -07:00
Eric Dumazet
76061f631c tcp: fastopen: avoid negative sk_forward_alloc
When DATA and/or FIN are carried in a SYN/ACK message or SYN message,
we append an skb in socket receive queue, but we forget to call
sk_forced_mem_schedule().

Effect is that the socket has a negative sk->sk_forward_alloc as long as
the message is not read by the application.

Josh Hunt fixed a similar issue in commit d22e153718 ("tcp: fix tcp
fin memory accounting")

Fixes: 168a8f5805 ("tcp: TCP Fast Open Server - main code path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Josh Hunt <johunt@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:08:10 -07:00
Zubair Lutfullah Kakakhel
74f13c80e2 net: ethernet: xilinx: Enable emaclite for MIPS
The MIPS based xilfpga platform uses this driver.
Enable it for MIPS

Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 16:06:37 -07:00
Jean Delvare
3732b30a7d cpufreq-stats: Minor documentation fix
The cpufreq-stats code can no longer be built as a module, so it now
appears with square brackets in menuconfig.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 1aefc75b24 (cpufreq: stats: Make the stats code non-modular)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-09-08 23:05:07 +02:00
Doug Anderson
cbfff439c5 i2c: rk3x: Restore clock settings at resume time
Depending on a number of factors including:
- Which exact Rockchip SoC we're working with
- How deep we suspend
- Which i2c port we're on

We might lose the state of the i2c registers at suspend time.
Specifically we've found that on rk3399 the i2c ports that are not in
the PMU power domain lose their state with the current suspend depth
configured by ARM Tursted Firmware.

Note that there are very few actual i2c registers that aren't configured
per transfer anyway so all we actually need to re-configure are the
clock config registers.  We'll just add a call to rk3x_i2c_adapt_div()
at resume time and be done with it.

NOTE: On rk3399 on ports whose power was lost, I put printouts in at
resume time.  I saw things like:
  before: con=0x00010300, div=0x00060006
  after:  con=0x00010200, div=0x00180025

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: David Wu <david.wu@rock-chips.com>
Tested-by: David Wu <david.wu@rock-chips.com>
[wsa: removed duplicate const]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-08 22:50:33 +02:00
Geert Uytterhoeven
e0603c8dd2 i2c: Spelling s/acknowedge/acknowledge/
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-08 22:22:20 +02:00
Zhuo-hao Lee
664d58bf4d i2c: designware: save the preset value of DW_IC_SDA_HOLD
There are several ways to set the SDA hold time for i2c controller,
including: Device Tree, built-in device properties and ACPI. However,
if the SDA hold time is not specified by above method, we should
read the value, where it is preset by firmware, and save it to
sda_hold_time. This is needed because when i2c controller enters
runtime suspend, the DW_IC_SDA_HOLD value will be reset to chipset
default value. And during runtime resume, i2c_dw_init will be called
to reconfigure i2c controller. If sda_hold_time is zero, the chipset
default hold time will be used, that will be too short for some
platforms. Therefore, to have a better tolerance, the DW_IC_SDA_HOLD
value should be kept by sda_hold_time.

Signed-off-by: Zhuo-hao Lee <zhuo-hao.lee@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-08 22:14:36 +02:00
David S. Miller
40e3012e6e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
ipsec 2016-09-08

1) Fix a crash when xfrm_dump_sa returns an error.
   From Vegard Nossum.

2) Remove some incorrect WARN() on normal error handling.
   From Vegard Nossum.

3) Ignore socket policies when rebuilding hash tables,
   socket policies are not inserted into the hash tables.
   From Tobias Brunner.

4) Initialize and check tunnel pointers properly before
   we use it. From Alexey Kodanev.

5) Fix l3mdev oif setting on xfrm dst lookups.
   From David Ahern.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 13:12:37 -07:00
David S. Miller
575f9c43e7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
ipsec-next 2016-09-08

1) Constify the xfrm_replay structures. From Julia Lawall

2) Protect xfrm state hash tables with rcu, lookups
   can be done now without acquiring xfrm_state_lock.
   From Florian Westphal.

3) Protect xfrm policy hash tables with rcu, lookups
   can be done now without acquiring xfrm_policy_lock.
   From Florian Westphal.

4) We don't need to have a garbage collector list per
   namespace anymore, so use a global one instead.
   From Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-08 13:09:41 -07:00
Linus Torvalds
711bef65e9 A fix for a 4.7 performance regression, caused by a typo in an if
condition.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJX0WAXAAoJEEp/3jgCEfOL8bEIAIeVmaU+M/1HuPucc+xiUooy
 IP+LNjhSc52neqDsN/eYjTn4zipu0ZAojPW9geGkPHOnKTKdRqkNLpyWaQcSv0AG
 yOUiJxnfwZ2LuOD2L/EIzqSKWo1lrdhIP/tSyRk0cOhlXn8urKatEOQi1+8H3Jlr
 gzD2PZzXL1jnpCiGPnuYLlh6wmk9Su+7+GCkNOwi1z+j+sM5AupZESPsh6Ze+bYk
 xgii4e21cBQv4NXojLECnfNez3HjYJCqYAUZ55lzSneUkbTRdBoY+pye/FeFKIBe
 9Wb5IRYzvih9Bcdpx+o5RZtCoKhq1XRngjjvv7cHYELQ0JVM4kFwEKrUHMoHhWU=
 =VDDm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.8-rc6' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a 4.7 performance regression, caused by a typo in an if
  condition"

* tag 'ceph-for-4.8-rc6' of git://github.com/ceph/ceph-client:
  ceph: do not modify fi->frag in need_reset_readdir()
2016-09-08 12:23:13 -07:00
Linus Torvalds
acdfffb5e0 Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull dmi fix from Jean Delvare.

* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  dmi-id: don't free dev structure after calling device_register
2016-09-08 12:19:24 -07:00
Linus Torvalds
e8b3b45de8 ARM: SoC fixes
This is a slightly larger batch of fixes that we've been sitting on a few
 -rcs. Most of them are simple oneliners, but there are two sets that are
 slightly larger and worth pointing out:
 
  - A set of patches to OMAP to deal with hwmod for RTC on am33xx (beaglebone
    SoC, among others). It's the only clock that ever has a valid offset of 0,
    so a new flag needed introduction once this problem was discovered.
 
  - A collection of CCI fixes for performance counters discovered once people
    started using it on X-Gene CPUs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJX0OnNAAoJEIwa5zzehBx3C80P/3Adca+87uQPI57AWJL7TIU3
 i9YAf1povBg/u5PHn2XiNj2gBeYtDL6vKTbca9zDX/ezSPaGFAp1nPsUG3m4yKLG
 0FjkNRan+FvFoi195TxANrBeYE39/ZNWJCsPn01+2Q2EsYnlCn9wsFglxTQGObsN
 MI2rgYVxabdJC8TO5WuGUdJEa9TynU65fdqdD+KR+DfA8yOyu9wZEmE1JuHprUCQ
 H2MYYHHFZD5Y41/Y/SvdDVprCOd8lq8Syt7Vk1O6pQKQSGgojgGn36YJeBIapnOy
 4y1/FFqoRe5FwSbt934q/4P/QpHwnGvIFi0edS+Reb6/3gUWRchQ8Dt1I1C/2QoJ
 J+7KQKR4SBSiu6LnsP8gNOAqulpyGzwE2aHBVD1mpbPIynA7G5MfI1pEJw4nVvFQ
 rAlBmuVzjaysuhKAdM32fd34dVwQljDiqh4qWtk0hbgjyPpdxT7QjIwSCR6vfbzo
 toOKQ2eiIqF+syOAnXbpPSmtaURavxvBim4DEXQfVHUbFHVErZ5zHmwGWjDc5vLT
 WixQ62hilHc08pcalY31RF3k21GwBXBVDf6clQ/9TDCErlCAlpjyN0/4ZrRpcYwz
 NHpFdbBNLzW0VJ0gHtwb+Xz4qgcUneTxXDUTaU46MYRquML/uqXMFrkJUQEOq1h5
 nPGe55ItupK8REaMCJv/
 =8pPd
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "This is a slightly larger batch of fixes that we've been sitting on a
  few -rcs.  Most of them are simple oneliners, but there are two sets
  that are slightly larger and worth pointing out:

   - A set of patches to OMAP to deal with hwmod for RTC on am33xx
     (beaglebone SoC, among others).  It's the only clock that ever has
     a valid offset of 0, so a new flag needed introduction once this
     problem was discovered.

   - A collection of CCI fixes for performance counters discovered once
     people started using it on X-Gene CPUs"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (37 commits)
  arm-cci: pmu: Fix typo in event name
  Revert "ARM: tegra: fix erroneous address in dts"
  ARM: dts: imx6qdl: Fix SPDIF regression
  ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
  ARM: dts: imx7d-sdb: fix ti,x-plate-ohms property name
  ARM: dts: kirkwood: Fix PCIe label on OpenRD
  ARM: kirkwood: ib62x0: fix size of u-boot environment partition
  bus: arm-ccn: make event groups reliable
  bus: arm-ccn: fix hrtimer registration
  bus: arm-ccn: fix PMU interrupt flags
  ARM: tegra: Correct polarity for Tegra114 PMIC interrupt
  MAINTAINERS: add tree entry for ARM/UniPhier architecture
  ARM: sun5i: Fix typo in trip point temperature
  MAINTAINERS: Switch to kernel.org account for Krzysztof Kozlowski
  ARM: imx6ul: populates platform device at .init_machine
  bus: arm-ccn: Add missing event attribute exclusions for host/guest
  bus: arm-ccn: Correct required arguments for XP PMU events
  bus: arm-ccn: Fix XP watchpoint settings bitmask
  bus: arm-ccn: Do not attempt to configure XPs for cycle counter
  bus: arm-ccn: Fix PMU handling of MN
  ...
2016-09-08 12:05:15 -07:00
Wolfram Sang
30851a7c21 Documentation: i2c: slave-interface: add note for driver development
Make it clear that adding slave support shall not disable master
functionality. We can have both, so we should.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-08 16:57:14 +02:00
Wolfram Sang
e35478eac0 i2c: mux: demux-pinctrl: run properly with multiple instances
We can't use a static property for all the changesets, so we now create
dynamic ones for each changeset.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Fixes: 50a5ba8769 ("i2c: mux: demux-pinctrl: add driver")
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
2016-09-08 16:56:21 +02:00
David Howells
248f219cb8 rxrpc: Rewrite the data and ack handling code
Rewrite the data and ack handling code such that:

 (1) Parsing of received ACK and ABORT packets and the distribution and the
     filing of DATA packets happens entirely within the data_ready context
     called from the UDP socket.  This allows us to process and discard ACK
     and ABORT packets much more quickly (they're no longer stashed on a
     queue for a background thread to process).

 (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
     keep track of the offset and length of the content of each packet in
     the sk_buff metadata.  This means we don't do any allocation in the
     receive path.

 (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
     than cloning the packet once for each subpacket and pulling/trimming
     it, we file the packet multiple times with an annotation for each
     indicating which subpacket is there.  From that we can directly
     calculate the offset and length.

 (4) A call's receive queue can be accessed without taking locks (memory
     barriers do have to be used, though).

 (5) Incoming calls are set up from preallocated resources and immediately
     made live.  They can than have packets queued upon them and ACKs
     generated.  If insufficient resources exist, DATA packet #1 is given a
     BUSY reply and other DATA packets are discarded).

 (6) sk_buffs no longer take a ref on their parent call.

To make this work, the following changes are made:

 (1) Each call's receive buffer is now a circular buffer of sk_buff
     pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
     between the call and the socket.  This permits each sk_buff to be in
     the buffer multiple times.  The receive buffer is reused for the
     transmit buffer.

 (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
     to the data buffer.  Transmission phase annotations indicate whether a
     buffered packet has been ACK'd or not and whether it needs
     retransmission.

     Receive phase annotations indicate whether a slot holds a whole packet
     or a jumbo subpacket and, if the latter, which subpacket.  They also
     note whether the packet has been decrypted in place.

 (3) DATA packet window tracking is much simplified.  Each phase has just
     two numbers representing the window (rx_hard_ack/rx_top and
     tx_hard_ack/tx_top).

     The hard_ack number is the sequence number before base of the window,
     representing the last packet the other side says it has consumed.
     hard_ack starts from 0 and the first packet is sequence number 1.

     The top number is the sequence number of the highest-numbered packet
     residing in the buffer.  Packets between hard_ack+1 and top are
     soft-ACK'd to indicate they've been received, but not yet consumed.

     Four macros, before(), before_eq(), after() and after_eq() are added
     to compare sequence numbers within the window.  This allows for the
     top of the window to wrap when the hard-ack sequence number gets close
     to the limit.

     Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
     to indicate when rx_top and tx_top point at the packets with the
     LAST_PACKET bit set, indicating the end of the phase.

 (4) Calls are queued on the socket 'receive queue' rather than packets.
     This means that we don't need have to invent dummy packets to queue to
     indicate abnormal/terminal states and we don't have to keep metadata
     packets (such as ABORTs) around

 (5) The offset and length of a (sub)packet's content are now passed to
     the verify_packet security op.  This is currently expected to decrypt
     the packet in place and validate it.

     However, there's now nowhere to store the revised offset and length of
     the actual data within the decrypted blob (there may be a header and
     padding to skip) because an sk_buff may represent multiple packets, so
     a locate_data security op is added to retrieve these details from the
     sk_buff content when needed.

 (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
     individually secured and needs to be individually decrypted.  The code
     to do this is broken out into rxrpc_recvmsg_data() and shared with the
     kernel API.  It now iterates over the call's receive buffer rather
     than walking the socket receive queue.

Additional changes:

 (1) The timers are condensed to a single timer that is set for the soonest
     of three timeouts (delayed ACK generation, DATA retransmission and
     call lifespan).

 (2) Transmission of ACK and ABORT packets is effected immediately from
     process-context socket ops/kernel API calls that cause them instead of
     them being punted off to a background work item.  The data_ready
     handler still has to defer to the background, though.

 (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
     filesystem can shut down the socket and flush its own work items
     before closing the socket to deal with any in-progress service calls.

Future additional changes that will need to be considered:

 (1) Make sure that a call doesn't hog the front of the queue by receiving
     data from the network as fast as userspace is consuming it to the
     exclusion of other calls.

 (2) Transmit delayed ACKs from within recvmsg() when we've consumed
     sufficiently more packets to avoid the background work item needing to
     run.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
David Howells
00e907127e rxrpc: Preallocate peers, conns and calls for incoming service requests
Make it possible for the data_ready handler called from the UDP transport
socket to completely instantiate an rxrpc_call structure and make it
immediately live by preallocating all the memory it might need.  The idea
is to cut out the background thread usage as much as possible.

[Note that the preallocated structs are not actually used in this patch -
 that will be done in a future patch.]

If insufficient resources are available in the preallocation buffers, it
will be possible to discard the DATA packet in the data_ready handler or
schedule a BUSY packet without the need to schedule an attempt at
allocation in a background thread.

To this end:

 (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
     maximum number each of the listen backlog size.  The backlog size is
     limited to a maxmimum of 32.  Only this many of each can be in the
     preallocation buffer.

 (2) For userspace sockets, the preallocation is charged initially by
     listen() and will be recharged by accepting or rejecting pending
     new incoming calls.

 (3) For kernel services {,re,dis}charging of the preallocation buffers is
     handled manually.  Two notifier callbacks have to be provided before
     kernel_listen() is invoked:

     (a) An indication that a new call has been instantiated.  This can be
     	 used to trigger background recharging.

     (b) An indication that a call is being discarded.  This is used when
     	 the socket is being released.

     A function, rxrpc_kernel_charge_accept() is called by the kernel
     service to preallocate a single call.  It should be passed the user ID
     to be used for that call and a callback to associate the rxrpc call
     with the kernel service's side of the ID.

 (4) Discard the preallocation when the socket is closed.

 (5) Temporarily bump the refcount on the call allocated in
     rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
     preallocation ref on service calls unconditionally.  This will no
     longer be necessary once the preallocation is used.

Note that this does not yet control the number of active service calls on a
client - that will come in a later patch.

A future development would be to provide a setsockopt() call that allows a
userspace server to manually charge the preallocation buffer.  This would
allow user call IDs to be provided in advance and the awkward manual accept
stage to be bypassed.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00
David Howells
49e19ec7d3 rxrpc: Add tracepoints to record received packets and end of data_ready
Add two tracepoints:

 (1) Record the RxRPC protocol header of packets retrieved from the UDP
     socket by the data_ready handler.

 (2) Record the outcome of the data_ready handler.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-08 11:10:12 +01:00