Unlike most ports, ports 9 and 10 of the 6390X family have configurable
PHY modes. Set the mode as part of adjust_link().
Ordering is important, because the SERDES interfaces connected to
ports 9 and 10 can be split and assigned to other ports. The CMODE has
to be correctly set before the SERDES interface on another port can be
configured. Such configuration is likely to be performed in
port_enable() and port_disabled(), called on slave_open() and
slave_close().
The simple case is port 9 and 10 are used for 'CPU' or 'DSA'. In this
case, the CMODE is set via a phy-mode in dsa_cpu_dsa_setup(), which is
called early in the switch setup.
When ports 9 or 10 are used as user ports, and have a fixed-phy, when
the fixed fixed-phy is attached, dsa_slave_adjust_link() is called,
which results in the adjust_link function being called, setting the
cmode. The port_enable() will for other ports will be called much
later.
When ports 9 or 10 are used as user ports and have a real phy attached
which does not use all the available SERDES interface, e.g. a 1Gbps
SGMII, there is currently no mechanism in place to set the CMODE of
the port from software. It must be hoped the stripping resistors are
correct.
At the same time, add a function to get the cmode. This will be needed
when configuring the SERDES interfaces.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv88e6390 ports 9 and 10 supports some additional PHY modes. Add
these modes to the PHY core so they can be used in the binding.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
net: Fix on-stack USB buffers
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default). This
series fixes all the instances I could find where USB networking
drivers do that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating USB buffers on the stack is not portable, and no longer
works on x86_64 (with VMAP_STACK enabled as per default).
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
References: https://bugs.debian.org/852556
Reported-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Tested-by: Lisandro Damián Nicanor Pérez Meyer <lisandro@debian.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend says:
====================
XDP adjust head support for virtio
This series adds adjust head support for virtio. The following is my
test setup. I use qemu + virtio as follows,
./x86_64-softmmu/qemu-system-x86_64 \
-hda /var/lib/libvirt/images/Fedora-test0.img \
-m 4096 -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
-device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
In order to use XDP with virtio until LRO is supported TSO must be
turned off in the host. The important fields in the above command line
are the following,
guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
Also note it is possible to conusme more queues than can be supported
because when XDP is enabled for retransmit XDP attempts to use a queue
per cpu. My standard queue count is 'queues=4'.
After loading the VM I run the relevant XDP test programs in,
./sammples/bpf
For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
with iperf (-d option to get bidirectional traffic), ping, and pktgen.
I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
the normal traffic path to the stack continues to work with XDP loaded.
It would be great to automate this soon. At the moment I do it by hand
which is starting to get tedious.
v2: original series dropped trace points after merge.
====================
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.
In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.
At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.
This patch refactors the code path and adds a reset helper routine.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out qp assignment.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset paths from contexts where rtnl lock is already
held.
This requries updating virtnet_set_queues and free_receive_bufs the
two places where rtnl_lock is taken in virtio_net. To do this we
use the following pattern,
_foo(...) { do stuff }
foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};
this allows us to use freeze()/restore() flow from both contexts.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Aleksandrov says:
====================
bridge: improve cache utilization
This is the first set which begins to deal with the bad bridge cache
access patterns. The first patch rearranges the bridge and port structs
a little so the frequently (and closely) accessed members are in the same
cache line. The second patch then moves the garbage collection to a
workqueue trying to improve system responsiveness under load (many fdbs)
and more importantly removes the need to check if the matched entry is
expired in __br_fdb_get which was a major source of false-sharing.
The third patch is a preparation for the final one which
If properly configured, i.e. ports bound to CPUs (thus updating "updated"
locally) then the bridge's HitM goes from 100% to 0%, but even without
binding we get a win because previously every lookup that iterated over
the hash chain caused false-sharing due to the first cache line being
used for both mac/vid and used/updated fields.
Some results from tests I've run:
(note that these were run in good conditions for the baseline, everything
ran on a single NUMA node and there were only 3 fdbs)
1. baseline
100% Load HitM on the fdbs (between everyone who has done lookups and hit
one of the 3 hash chains of the communicating
src/dst fdbs)
Overall 5.06% Load HitM for the bridge, first place in the list
2. patched & ports bound to CPUs
0% Local load HitM, bridge is not even in the c2c report list
Also there's 3% consistent improvement in netperf tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Writing once per jiffy is enough to limit the bridge's false sharing.
After this change the bridge doesn't show up in the local load HitM stats.
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fdb's used and updated fields are written to on every packet forward and
packet receive respectively. Thus if we are receiving packets from a
particular fdb, they'll cause false-sharing with everyone who has looked
it up (even if it didn't match, since mac/vid share cache line!). The
"used" field is even worse since it is updated on every packet forward
to that fdb, thus the standard config where X ports use a single gateway
results in 100% fdb false-sharing. Note that this patch does not prevent
the last scenario, but it makes it better for other bridge participants
which are not using that fdb (and are only doing lookups over it).
The point is with this move we make sure that only communicating parties
get the false-sharing, in a later patch we'll show how to avoid that too.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the fdb garbage collector to a workqueue which fires at least 10
milliseconds apart and cleans chain by chain allowing for other tasks
to run in the meantime. When having thousands of fdbs the system is much
more responsive. Most importantly remove the need to check if the
matched entry has expired in __br_fdb_get that causes false-sharing and
is completely unnecessary if we cleanup entries, at worst we'll get 10ms
of traffic for that entry before it gets deleted.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move around net_bridge so the vlan fields are in the beginning since
they're checked on every packet even if vlan filtering is disabled.
For the port move flags & vlan group to the beginning, so they're in the
same cache line with the port's state (both flags and state are checked
on each packet).
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch fixes the case when adding a zero value to the packet
pointer. The zero value could come from src_reg equals type
BPF_K or CONST_IMM. The patch fixes both, otherwise the verifer
reports the following error:
[...]
R0=imm0,min_value=0,max_value=0
R1=pkt(id=0,off=0,r=4)
R2=pkt_end R3=fp-12
R4=imm4,min_value=4,max_value=4
R5=pkt(id=0,off=4,r=4)
269: (bf) r2 = r0 // r2 becomes imm0
270: (77) r2 >>= 3
271: (bf) r4 = r1 // r4 becomes pkt ptr
272: (0f) r4 += r2 // r4 += 0
addition of negative constant to packet pointer is not allowed
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Mihai Budiu <mbudiu@vmware.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
read vnet_hdr_sz once
Tuntap devices allow concurrent use and update of field vnet_hdr_sz.
Read the field once to avoid TOCTOU.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Macvtap functions read the value once, but unless READ_ONCE is used,
the compiler may ignore this and read multiple times. Enforce a single
read and locally cached value to avoid updates between test and use.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When IFF_VNET_HDR is enabled, a virtio_net header must precede data.
Data length is verified to be greater than or equal to expected header
length tun->vnet_hdr_sz before copying.
Read this value once and cache locally, as it can be updated between
the test and use (TOCTOU).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.
__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.
This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.
Again, this gem was found by syzkaller tool.
Fixes: 9c55e01c0c ("[TCP]: Splice receive support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull libnvdimm fixes from Dan Williams:
"None of these are showstoppers for 4.10 and could wait for 4.11 merge
window, but they are low enough risk for this late in the cycle and
the fixes have waiting users . They have received a build success
notification from the 0day robot, pass the latest ndctl unit tests,
and appeared in next:
- Fix a crash that can result when SIGINT is sent to a process that
is awaiting completion of an address range scrub command. We were
not properly cleaning up the workqueue after
wait_event_interruptible().
- Fix a memory hotplug failure condition that results from not
reserving enough space out of persistent memory for the memmap. By
default we align to 2M allocations that the memory hotplug code
assumes, but if the administrator specifies a non-default
4K-alignment then we can fail to correctly size the reservation.
- A one line fix to improve the predictability of libnvdimm block
device names. A common operation is to reconfigure /dev/pmem0 into
a different mode. For example, a reconfiguration might set a new
mode that reserves some of the capacity for a struct page memmap
array. It surprises users if the device name changes to
"/dev/pmem0.1" after the mode change and then back to /dev/pmem0
after a reboot.
- Add 'const' to some function pointer tables"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, pfn: fix memmap reservation size versus 4K alignment
acpi, nfit: fix acpi_nfit_flush_probe() crash
libnvdimm, namespace: do not delete namespace-id 0
nvdimm: constify device_type structures
These two tests are based on the work done for f23cc643f9. The first test is
just a basic one to make sure we don't allow AND'ing negative values, even if it
would result in a valid index for the array. The second is a cleaned up version
of the original testcase provided by Jann Horn that resulted in the commit.
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add an intel_pstate driver quirk to work around a firmware setting
that leads to frequency scaling issues on desktop Intel Kaby Lake
processors in some configurations if the hardware-managed P-states
(HWP) feature is in use (Srinivas Pandruvada).
- Fix up the recently added brcmstb-avs cpufreq driver: fix a bug
related to system suspend and change the sysfs interface to match
the user space expectations (Markus Mayer).
- Modify the runtime PM framework to avoid false-positive warnings
from the might_sleep_if() assertions in it (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYmIOoAAoJEILEb/54YlRx2XgP/AjkxDOBep8mnY3xHNMvx887
BjWWGkuO3qZiX/xFhkhlBuxx8zLfg3ie7NJiSpn1Lj0192M9ggwn4RrM61V1SBX6
QTOeP68uueIIA/SCt2dw9DdEpepzyvwSJpUTqRmZubZAAYjcVHN+FLgNa5QzUdMb
X73zORAXsOu4SAguS4MP3yvPvFfdsOwSexllfsK9IAd+eb5RSZYnNyHuwW+S8EK5
zVCE+cwJOYU4SFh0speGpvUh7wXtWdYM3CV6mzrkiQqfvS6q29xillrxKWaEIk7Z
5Y6sReFl6i/zIuFQnx0nKPNM5DUhiOY80ha5uaZzde+vRRR7EM7n6LaIQVp672A7
mq6d0YxsJD8Z9Gx9r66oID/pfEkviRvRc4ejm0Xb187QN9jZvBhY85WD22X5p3KR
B0B/mhN1zN8ruaLvitqKwXtTntq4ahLtjfxtsyYoH56kh8J6C5KNlJ985/q2hhLD
8W1ZAaXPy+vyfODsnxdzfnYvkOgeY/XIg9NQ++QGdhXNwDBb1QZP34zdNsIF/YXR
8A+usM8BAq4r2dDROFfxtCkIVH6JsMkqWVbrUPkM9hdIL2jF3IpRS9bXKBvDICMc
ZsbDm4ihz+XtjVG4GxmrB5SZ2C3qPYwbmLBiu1CwfY7L6lN8azF0RmRJ0s2fpvAG
d4nxEZa5ysPKEZfU6PyE
=2NdR
-----END PGP SIGNATURE-----
Merge tag 'pm-4.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These add a quirk to intel_pstate to work around a firmware setting
that leads to frequency scaling issues (discovered recently) on some
Intel Kaby Lake processors, fix up the recently added brcmstb-avs
cpufreq driver and avoid false-positive warnings from the runtime PM
framework triggered by recent changes in i915.
Specifics:
- Add an intel_pstate driver quirk to work around a firmware setting
that leads to frequency scaling issues on desktop Intel Kaby Lake
processors in some configurations if the hardware-managed P-states
(HWP) feature is in use (Srinivas Pandruvada)
- Fix up the recently added brcmstb-avs cpufreq driver: fix a bug
related to system suspend and change the sysfs interface to match
the user space expectations (Markus Mayer)
- Modify the runtime PM framework to avoid false-positive warnings
from the might_sleep_if() assertions in it (Rafael Wysocki)"
* tag 'pm-4.10-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Avoid false-positive warnings from might_sleep_if()
cpufreq: intel_pstate: Disable energy efficiency optimization
cpufreq: brcmstb-avs-cpufreq: properly retrieve P-state upon suspend
cpufreq: brcmstb-avs-cpufreq: extend sysfs entry brcm_avs_pmap
DM device destruction
- An RCU fix for dm-crypt's kernel keyring support that was included in
4.10-rc1
- A -Wbool-operation warning fix for DM multipath
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJYlKLnAAoJEMUj8QotnQNaUOcH/3OgBjsRuFpNmrW3bzzcpLUQ
XH6R2YAc1/14BU6rUYY7vLA905n7Jw3VyyJGgl2cJUqaDqA/Qj6uBmwfOve019Fl
xTzi8rMAas05zqs9b5SrRAtSt2z6eZnbpm8df+QjrFQZjsDzsl2+PLxR+emp9YIQ
wXLAU4Re35v/jPUccoNB0Be0LARIh0dSPQYWCYPomYqFHBoioHinGBOJpeq8HXFp
U+JRXPcjrFxs4zjKq8aRE1XhN7lqLt7uZ4gi43sRCgieWXApoOfmo634IINyZtCf
WIjS92qbFeyZfMk65q+GzfUYM9ZdAvzwXMlHnECYSd+jfbz89DhKv7YOuyCf1CU=
=RSQc
-----END PGP SIGNATURE-----
Merge tag 'dm-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a fix for a race in .request_fn request-based DM request handling vs
DM device destruction
- an RCU fix for dm-crypt's kernel keyring support that was included in
4.10-rc1
- a -Wbool-operation warning fix for DM multipath
* tag 'dm-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: replace RCU read-side section with rwsem
dm rq: cope with DM device destruction while in dm_old_request_fn()
dm mpath: cleanup -Wbool-operation warning in choose_pgpath()
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYmJDxAAoJEAhfPr2O5OEVFtkQAJHHkcc4H139Fdp0ThGLLLf/
1VWAwQqWb/WmXMqIrE2fIQdEty8za8GvX6gQMsgeDS06GTlxtprnBeH/ZKlwrNR3
akq/mwQPtJhOUe5uuAmgcGUOrIUIRfO0kmY6hZY+92MyxHbUbmSvpV6YHe81eWV1
Emkwc4cMrir3RaCAsn2FgRnFRRpB58oi9TCe89LtLI4zzCMuTzEvEfG7O9u5WCcO
SrhMv05M5XWIvEywoZsKcUW+vBZxZ6QvbG970MY5czW2WRs5GIoOnwskNYpo/bxy
ttKmcgDXD63Wa2JPuEImLm8imyHhquJMKXCFrCRDvSWVw5p+xKDx5lX56cMhJYmr
ZHG32Z7dsG2bfVLY5+L/B+4QhnBYqEehhgh/8oDhrR8POw+71L40aWCq5gC9cHp4
fjNBRxII11i6AEqyMA2Dv+aSyeW3LSYBxV11h5F/Zef5fpa+WMgtHLjKKxRMOnzF
lt81OC5yg3XWBRb4s/4xXghOOmMPFMPCo8LUbo3wB5wGNBtguBBXssZ2HIolrSCA
/NAD01Eb29dRhtRXEqQYAhEuSiuob6ETdn+FktbtUeZl5ZUg7JfZ+A5MqTmH1l1u
Ezl8I42BnkMuSc2gdkWh1eBJfrzNEytiW7EOZO4TMfi00PzlshwusQkXtRTUC5aC
DG5W41QC2PolZnV8XcXs
=nEM8
-----END PGP SIGNATURE-----
Merge tag 'media/v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A few documentation fixes at CEC (with got promoted from staging for
4.10), and one fix on its core."
* tag 'media/v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] cec: fix wrong last_la determination
[media] cec-intro.rst: mention the v4l-utils package and CEC utilities
[media] cec rst: remove "This API is not yet finalized" notice
Pull crypto fixes from Herbert Xu:
- use-after-free in algif_aead
- modular aesni regression when pcbc is modular but absent
- bug causing IO page faults in ccp
- double list add in ccp
- NULL pointer dereference in qat (two patches)
- panic in chcr
- NULL pointer dereference in chcr
- out-of-bound access in chcr
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: chcr - Fix key length for RFC4106
crypto: algif_aead - Fix kernel panic on list_del
crypto: aesni - Fix failure when pcbc module is absent
crypto: ccp - Fix double add when creating new DMA command
crypto: ccp - Fix DMA operations when IOMMU is enabled
crypto: chcr - Check device is allocated before use
crypto: chcr - Fix panic on dma_unmap_sg
crypto: qat - zero esram only for DH85x devices
crypto: qat - fix bar discovery for c62x
Vivien Didelot says:
====================
net: dsa: add fabric notifier
When a switch fabric is composed of multiple switch chips, these chips
must be programmed accordingly when an event occurred on one of them.
Examples of such event include hardware bridging: when a Linux bridge
spans interconnected chips, they must be programmed to allow external
ports to ingress frames on their internal ports.
Another example is cross-chip hardware VLANs. Switch chips in-between
interconnected bridge ports must also configure a given VLAN to allow
packets to pass through them.
In order to support that, this patchset introduces a non-intrusive
notifier mechanism. It adds a notifier head in every DSA switch tree
(the said fabric), and a notifier block in every DSA switch chip.
When an even occurs, it is chained to all notifiers of the fabric.
Switch chips can react accordingly if they are cross-chip capable.
On a dynamic debug enabled system, bridging a port in a multi-chip
fabric will print something like this (ZII Rev B board):
# brctl addif br0 lan3
mv88e6085 0.1:00: crosschip DSA port 1.0 bridged to br0
mv88e6085 0.4:00: crosschip DSA port 1.0 bridged to br0
# brctl delif br0 lan3
mv88e6085 0.1:00: crosschip DSA port 1.0 unbridged from br0
mv88e6085 0.4:00: crosschip DSA port 1.0 unbridged from br0
Currently only bridging events are added. A patchset introducing support
for cross-chip hardware bridging configuration in mv88e6xxx will follow
right after. Then events for switchdev operations are next on the line.
We should note that non-switchdev events do not support rolling-back
switch-wide operations. We'll have to work on closer integration with
switchdev for that, like introducing new attributes or objects, to
benefit from the prepare and commit phases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
A slave device will now notify the switch fabric once its port is
bridged or unbridged, instead of calling directly its switch operations.
This code allows propagating cross-chip bridging events in the fabric.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a notifier block per DSA switch, registered against a notifier head
in the switch fabric they belong to.
This infrastructure will allow to propagate fabric-wide events such as
port bridging, VLAN configuration, etc. If a DSA switch driver cares
about cross-chip configuration, such events can be caught.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scope of the functions inside net/dsa/slave.c must be the slave
net_device pointer. Change to state setter helper accordingly to
simplify callers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an error is returned during the bridging of a port in a
NETDEV_CHANGEUPPER event, net/core/dev.c rolls back the operation.
Be consistent and unassign dp->bridge_dev when this happens.
In the meantime, add comments to document this behavior.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code handling the slave netdevice notifier call by
providing a dsa_slave_changeupper helper for NETDEV_CHANGEUPPER, and so
on (only this event is supported at the moment.)
Return NOTIFY_DONE when we did not care about an event, and NOTIFY_OK
when we were concerned but no error occurred, as the API suggests.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the netdevice notifier block register code in slave.c and provide
helpers for dsa.c to register and unregister it.
At the same time, check for errors since (un)register_netdevice_notifier
may fail.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit abeffce ("net/mlx5e: Fix a -Wmaybe-uninitialized warning"), I fixed a
gcc warning for the ipv4 offload handling. Now we get the same warning for the
added ipv6 support:
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:815:40: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
We can apply the same workaround here as well.
Fixes: ce99f6b97f ("net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes use of is_vlan_dev() function instead of flag
comparison which is exactly done by is_vlan_dev() helper function.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jon Maxwell <jmaxwell37@gmail.com>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of ACCESS_ONCE() looks like a micro-optimization to force gcc to use
an indexed load for the register address, but it has an absolutely detrimental
effect on builds with gcc-5 and CONFIG_KASAN=y, leading to a very likely
kernel stack overflow aside from very complex object code:
hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_update_stats':
hisilicon/hns/hns_dsaf_gmac.c:419:1: error: the frame size of 2912 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_reset_common':
hisilicon/hns/hns_dsaf_ppe.c:390:1: error: the frame size of 1184 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_ppe.c: In function 'hns_ppe_get_regs':
hisilicon/hns/hns_dsaf_ppe.c:621:1: error: the frame size of 3632 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_common_regs':
hisilicon/hns/hns_dsaf_rcb.c:970:1: error: the frame size of 2784 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_gmac.c: In function 'hns_gmac_get_regs':
hisilicon/hns/hns_dsaf_gmac.c:641:1: error: the frame size of 5728 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_rcb.c: In function 'hns_rcb_get_ring_regs':
hisilicon/hns/hns_dsaf_rcb.c:1021:1: error: the frame size of 2208 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_comm_init':
hisilicon/hns/hns_dsaf_main.c:1209:1: error: the frame size of 1904 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_xgmac.c: In function 'hns_xgmac_get_regs':
hisilicon/hns/hns_dsaf_xgmac.c:748:1: error: the frame size of 4704 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_update_stats':
hisilicon/hns/hns_dsaf_main.c:2420:1: error: the frame size of 1088 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_get_regs':
hisilicon/hns/hns_dsaf_main.c:2753:1: error: the frame size of 10768 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
This does not seem to happen any more with gcc-7, but removing the ACCESS_ONCE
seems safe anyway and it avoids a serious issue for some people. I have verified
that with gcc-5.3.1, the object code we get is better in the new version
both with and without CONFIG_KASAN, as we no longer allocate a 1344 byte
stack frame for hns_dsaf_get_regs() but otherwise have practically identical
object code.
With gcc-7.0.0, removing ACCESS_ONCE has no effect, the object code is already
good either way.
This patch is probably not urgent to get into 4.11 as only KASAN=y builds
with certain compilers are affected, but I still think it makes sense to
backport into older kernels.
Cc: stable@vger.kernel.org
Fixes: 511e6bc ("net: add Hisilicon Network Subsystem DSAF support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a "||" vs "|" typo here so we test 0x1 instead of 0x6.
Fixes: 1f8176f735 ("net/mlx4_en: Check the enabling pptx/pprx flags in SET_PORT wrapper flow")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 364b605573 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.
Testing with a 7142 shows a small latency improvement of ~100 ns.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to check if asoc->peer.prsctp_capable is set before
processing fwd tsn chunk, if not, it will return an ERROR to the
peer, just as rfc3758 section 3.3.1 demands.
Reported-by: Julian Cordes <julian.cordes@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When for instance a mobile Linux device roams from one access point to
another with both APs sharing the same broadcast domain and a
multicast snooping switch in between:
1) (c) <~~~> (AP1) <--[SSW]--> (AP2)
2) (AP1) <--[SSW]--> (AP2) <~~~> (c)
Then currently IPv6 multicast packets will get lost for (c) until an
MLD Querier sends its next query message. The packet loss occurs
because upon roaming the Linux host so far stayed silent regarding
MLD and the snooping switch will therefore be unaware of the
multicast topology change for a while.
This patch fixes this by always resending MLD reports when an interface
change happens, for instance from NO-CARRIER to CARRIER state.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
mlxsw: cleanup neigh handling
Ido says:
This series addresses long standing issues in the mlxsw driver
concerning neighbour reflection. It also prepares the code for follow-up
changes dealing with proper resource cleanup and nexthop reflection.
The first two patches convert the neighbour reflection code to use an
ordered workqueue, to prevent re-ordering of NEIGH_UPDATE events that
may happen following subsequent patches.
The third to fifth patches remove the ndo_neigh_{construct,destroy}
entry points from the driver, thereby relying only on NEIGH_UPDATE
events for neighbour reflection. This simplifies the code considerably.
Last patches are fallout and adjust nits in the code I noticed while
going over it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We periodically ask the neighbouring system to try and resolve
neighbours that are used for nexthops, but aren't currently resolved.
However, 'nud_state' is protected by the neighbour lock, so we shouldn't
access it without taking it. Instead, we can simply check the
'connected' field of the neighbour entry, which we update upon
NEIGH_UPDATE events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We only add neighbour entries that are also used for nexthops to
'nexthop_neighs_list', so when iterating over this list there's no need
to check that the entry is indeed used for nexthops.
Remove the redundant check.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 18bfb924f0 ("net: introduce default neigh_construct/destroy
ndo calls for L2 upper devices") we added these ndos to stacked devices
such as team and bond, so that calls will be propagated to mlxsw.
However, previous commit removed the reliance on these ndos and no new
users of these ndos have appeared since above mentioned commit. We can
therefore safely remove this dead code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we had two interfaces for neighbour related configuration:
ndo_neigh_{construct,destroy} and NEIGH_UPDATE netevents. The ndos were
used to add and remove neighbours from the driver's cache, whereas the
netevent was used to reflect the neighbours into the device's tables.
However, if the NUD state of a neighbour isn't NUD_VALID or if the
neighbour is dead, then there's really no reason for us to keep it
inside our cache. The only exception to this rule are neighbours that
are also used for nexthops, which we periodically refresh to get them
resolved.
We can therefore eliminate the ndo entry point into the driver and
simplify the code, making it similar to the FIB reflection, which is
based solely on events. This also helps us avoid a locking issue, in
which the RIF cache was traversed without proper locking during
insertion into the neigh entry cache.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 33b1341cd1 ("mlxsw: spectrum_router: Fix handling of
neighbour structure") we no longer use destination IP for neighbour
lookup, so remove it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>