Order of $(LDLIBS) matters to linker, so put it after all the .o and .a
files.
Fixes: 74b5a5968f ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191023153128.3486140-1-andriin@fb.com
Make test_section_names into test_progs test. Also fix ESRCH expected
results. Add uprobe/uretprobe and tp/raw_tp test cases.
Fixes: dd4436bb83 ("libbpf: Teach bpf_object__open to guess program types")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191023060913.1713817-1-andriin@fb.com
In commit 43e74c0267 ("bpf_xdp_redirect_map: Perform map lookup in
eBPF helper") the bpf_redirect_map() helper learned to do map lookup,
which means that the explicit lookup in the XDP program for AF_XDP is
not needed for post-5.3 kernels.
This commit adds the implicit map lookup with default action, which
improves the performance for the "rx_drop" [1] scenario with ~4%.
For pre-5.3 kernels, the bpf_redirect_map() returns XDP_ABORTED, and a
fallback path for backward compatibility is entered, where explicit
lookup is still performed. This means a slight regression for older
kernels (an additional bpf_redirect_map() call), but I consider that a
fair punishment for users not upgrading their kernels. ;-)
v1->v2: Backward compatibility (Toke) [2]
v2->v3: Avoid masking/zero-extension by using JMP32 [3]
[1] # xdpsock -i eth0 -z -r
[2] https://lore.kernel.org/bpf/87pnirb3dc.fsf@toke.dk/
[3] https://lore.kernel.org/bpf/87v9sip0i8.fsf@toke.dk/
Suggested-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191022072206.6318-1-bjorn.topel@gmail.com
This patch allows you to register one netdev basechain to multiple
devices. This adds a new NFTA_HOOK_DEVS netlink attribute to specify
the list of netdevices. Basechains store a list of hooks.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After unbinding the list of flow_block callbacks, iterate over it to
remove the existing rules in the netdevice that has just been
unregistered.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Rise the maximum limit of devices per flowtable up to 256. Rename
NFT_FLOWTABLE_DEVICE_MAX to NFT_NETDEVICE_MAX in preparation to reuse
the netdev hook parser for ingress basechain.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Hardware offload needs access to the priority field, store this field in
the nf_flowtable object.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Since commit 342db22182 ("sched: Call skb_get_hash_perturb
in sch_fq_codel") we no longer need anything from this file.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The table entry in __XDP_ACT_SYM_TAB for the last item is set
to { -1, 0 } where it should be { -1, NULL } as the second item
is a pointer to a string.
Fixes the following sparse warnings:
./include/trace/events/xdp.h:28:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:53:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:82:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:140:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:155:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:190:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:225:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:260:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:318:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:356:1: warning: Using plain integer as NULL pointer
./include/trace/events/xdp.h:390:1: warning: Using plain integer as NULL pointer
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191022125925.10508-1-ben.dooks@codethink.co.uk
Vivien Didelot says:
====================
The dsa_switch structure represents the physical switch device itself,
and is allocated by the driver. The dsa_switch_tree and dsa_port structures
represent the logical switch fabric (eventually composed of multiple switch
devices) and its ports, and are allocated by the DSA core.
This branch lists the logical ports directly in the fabric which simplifies
the iteration over all ports when assigning the default CPU port or configuring
the D in DSA in drivers like mv88e6xxx.
This also removes the unique dst->cpu_dp pointer and is a first step towards
supporting multiple CPU ports and dropping the DSA_MAX_PORTS limitation.
Because the dsa_port structures are not tied to the dsa_switch structure
anymore, we do not need to provide an helper for the drivers to allocate a
switch structure. Like in many other subsystems, drivers can now embed their
dsa_switch structure as they wish into their private structure. This will
be particularly interesting for the Broadcom drivers which were currently
limited by the dynamically allocated array of DSA ports.
The series implements the list of dsa_port structures, makes use of it,
then drops dst->cpu_dp and the dsa_switch_alloc helper.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Now that ports are dynamically listed in the fabric, there is no need
to provide a special helper to allocate the dsa_switch structure. This
will give more flexibility to drivers to embed this structure as they
wish in their private structure.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Allocate the struct dsa_port the first time it is accessed with
dsa_port_touch, and remove the static dsa_port array from the
dsa_switch structure.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Like the dsa_switch_tree structures, the dsa_port structures will be
allocated on switch registration.
The SJA1105 driver is the only one accessing the dsa_port structure
after the switch allocation and before the switch registration.
For that reason, move switch registration prior to assigning the priv
member of the dsa_port structures.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Instead of digging into the other dsa_switch structures of the fabric
and relying too much on the dsa_to_port helper, use the new list
of switch fabric ports to remap the Port VLAN Map of local bridge
group members or remap the Port VLAN Table entry of external bridge
group members.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Instead of digging into the other dsa_switch structures of the fabric
and relying too much on the dsa_to_port helper, use the new list of
switch fabric ports to define the mask of the local ports allowed to
receive frames from another port of the fabric.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Since mv88e6xxx_pvt_map is a static helper, no need to return
-EOPNOTSUPP if the chip has no PVT, simply silently skip the operation.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of iterating over switches and their
ports when setting up the default CPU port. Unassign it on teardown.
Now that we can iterate over multiple CPU ports, remove dst->cpu_dp.
At the same time, provide a better error message for CPU-less tree.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of iterating over switches and their
ports when looking up the first CPU port in the tree.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Now that we have a potential list of CPU ports, make use of it instead
of only configuring the master device of an unique CPU port.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of iterating over switches and their
ports to find a port from a given node.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of accessing the dsa_switch array
of ports when iterating over DSA ports of a switch to set up the
routing table.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of iterating over switches and their
ports when setting up the switches and their ports.
At the same time, provide setup states and messages for ports and
switches as it is done for the trees.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of iterating over switches and their
ports when looking for a slave device from a given master interface.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use the new ports list instead of accessing the dsa_switch array
of ports in the dsa_to_port helper.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Add a list of switch ports within the switch fabric. This will help the
lookup of a port inside the whole fabric, and it is the first step
towards supporting multiple CPU ports, before deprecating the usage of
the unique dst->cpu_dp pointer.
In preparation for a future allocation of the dsa_port structures,
return -ENOMEM in case no structure is returned, even though this
error cannot be reached yet.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Do not let the drivers access the ds->ports static array directly
while there is a dsa_to_port helper for this purpose.
At the same time, un-const this helper since the SJA1105 driver
assigns the priv member of the returned dsa_port structure.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
LIBBPF_OPTS is implemented as a mix of field declaration and memset
+ assignment. This makes it neither variable declaration nor purely
statements, which is a problem, because you can't mix it with either
other variable declarations nor other function statements, because C90
compiler mode emits warning on mixing all that together.
This patch changes LIBBPF_OPTS into a strictly declaration of variable
and solves this problem, as can be seen in case of bpftool, which
previously would emit compiler warning, if done this way (LIBBPF_OPTS as
part of function variables declaration block).
This patch also renames LIBBPF_OPTS into DECLARE_LIBBPF_OPTS to follow
kernel convention for similar macros more closely.
v1->v2:
- rename LIBBPF_OPTS into DECLARE_LIBBPF_OPTS (Jakub Sitnicki).
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20191022172100.3281465-1-andriin@fb.com
LLVM alu32 was introduced in LLVM7:
https://reviews.llvm.org/rL325987https://reviews.llvm.org/rL325989
Experiments showed that in general performance is better with alu32
enabled:
https://lwn.net/Articles/775316/
This patch turns on alu32 with no-flavor test_progs which is tested
most often. The flavor test at no_alu32/test_progs can be used to
test without alu32 enabled. The Makefile check for whether LLVM
supports '-mattr=+alu32 -mcpu=v3' is removed as LLVM7 should be
available for recent distributions and also latest LLVM is preferred
to run BPF selftests.
Note that jmp32 is checked by -mcpu=probe and will be enabled if the
host kernel supports it.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191022043119.2625263-1-yhs@fb.com
Heiner Kallweit says:
====================
The attempt to improve performance by changing the PCIe max read request
size was added in the vendor driver more than 10 years back and copied
to r8169 driver. In the vendor driver this has been removed long ago.
Obviously it had no effect, also in my tests I didn't see any
difference. Typically the max payload size is less than 512 bytes
anyway, and the PCI core takes care that the maximum supported value
is set. So let's remove fiddling with PCIe max read request size from
r8169 too. This change allows to simplify the driver in the subsequent
three patches of this series.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
We can remove rtl_hw_start_8168bef() and use rtl_hw_start_8168b()
instead because setting register Config4 is done in
rtl_jumbo_config(), being called from rtl_hw_start().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
We can remove rtl_hw_start_8168dp() because it's the same as
rtl_hw_start_8168dp() now.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
r8168b_0_hw_jumbo_enable() and r8168b_0_hw_jumbo_disable() both do the
same and just set PCI_EXP_DEVCTL_NOSNOOP_EN. We can simplify the code
by moving this setting for RTL8168B to rtl_hw_start_8168().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The attempt to improve performance by changing the PCIe max read request
size was added in the vendor driver more than 10 years back and copied
to r8169 driver. In the vendor driver this has been removed long ago.
Obviously it had no effect, also in my tests I didn't see any
difference. Typically the max payload size is less than 512 bytes
anyway, and the PCI core takes care that the maximum supported value
is set. So let's remove fiddling with PCIe max read request size from
r8169 too.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Karsten Graul says:
====================
More patches to address abnormal termination processing of
sockets and link groups.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
With the introduction of the link group termination worker there is
no longer a need to postpone smc_close_active_abort() to a worker.
To protect socket destruction due to normal and abnormal socket
closing, the socket refcount is increased.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Use a worker for link group termination to guarantee process context.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
If a link group and its connections must be terminated,
* wake up socket waiters
* do not enable buffer reuse
A linkgroup might be terminated while normal connection closing
is running. Avoid buffer reuse and its related LLC DELETE RKEY
call, if linkgroup termination has started. And use the earliest
indication of linkgroup termination possible, namely the removal
from the linkgroup list.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
There are lots of link group termination scenarios. Most of them
still allow to inform the peer of the terminating sockets about aborting.
This patch tries to call smc_close_abort() for terminating sockets.
And the internal TCP socket is reset with tcp_abort().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Usually link groups are freed delayed to enable quick connection
creation for a follow-on SMC socket. Terminated link groups are
freed faster. This patch makes sure, fast schedule of link group
freeing is not rescheduled by a delayed schedule. And it makes sure
link group freeing is not rescheduled, if the real freeing is already
running.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Locking hierarchy requires that the link group conns_lock can be
taken if the socket lock is held, but not vice versa. Nevertheless
socket termination during abnormal link group termination should
be protected by the socket lock.
This patch reduces the time segments the link group conns_lock is
held to enable usage of lock_sock in smc_lgr_terminate().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
When a link group is to be terminated, it is sufficient to hold
the lgr lock when unlinking the link group from its list.
Move the lock-protected link group unlinking into smc_lgr_terminate().
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The resources for a terminated socket are being cleaned up.
This patch makes sure
* no more data is received for an actively terminated socket
* no more data is sent for an actively or passively terminated socket
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Extend the size of QSFP EEPROM for the cable types SSF8436 and SFF8636
from 256 to 640 bytes in order to expose all the EEPROM pages by
ethtool.
For SFF-8636 and SFF-8436 specifications, the driver exposes 256 bytes
of data for ethtool's get_module_eeprom() callback. This is because the
driver uses the below defines to specify SFF module length in ethtool's
get_module_info() callback:
'ETH_MODULE_SFF_8636_LEN' and 'ETH_MODULE_SFF_8436_LEN' (both are 256).
As a result of exposing 256 bytes only, ethtool shows wrong "zero" info
for pages 1, 2, 3.
The patch changes the length returned by callback for get_module_info()
to the values from the next defines: 'ETH_MODULE_SFF_8636_MAX_LEN' and
'ETH_MODULE_SFF_8436_MAX_LEN' (both are 640) to allow exposing of upper
page 1, 2 and 3.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Provide a macro for getting QSFP module EEPROM page number from the
optional upper page number row offset, specified in request.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>