Most vNIC capabilities are netdev related. It makes no sense
to initialize them and waste FW resources. Some are even
counter-productive, like IRQ moderation, which will slow
down exchange of control messages.
Add to nfp_app a mask of enabled control vNIC capabilities
for apps to use. Make flower and BPF enable all capabilities
for now. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_init() is a little long and we are about to add more
code to reading capabilties. Move the capability reading,
parsing and validating out. Only actual initialization
will stay in nfp_net_init().
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow specifying alternative vNIC mailbox location in TLV caps.
This way we can size the mailbox to the needs and not necessarily
waste 512B of ctrl memory space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCIe island clock frequency is used when converting coalescing
parameters from usecs to NFP timestamps. Most chips don't run
at 1200MHz, allow FW to provide us with the real frequency.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP is entirely programmable, including the PCI data interface.
Using a fixed control BAR layout certainly makes implementations
easier, but require careful considerations when space is allocated.
Once BAR area is allocated to one feature nothing else can use it.
Allocating space statically also requires it to be sized upfront,
which leads to either unnecessary limitation or wastage.
We currently have a 32bit capability word defined which tells drivers
which application FW features are supported. Most of the bits
are exhausted. The same bits are also reused for enabling specific
features. Bulk of capabilities don't have a need for an enable bit,
however, leading to confusion and wastage.
TLVs seems like a better fit for expressing capabilities of applications
running on programmable hardware.
This patch leaves the front of the BAR as is, and declares a TLV
capability start at offset 0x58. Most of the space up to 0x0d90
is already allocated, but the used space can be wrapped with RESERVED
TLVs. E.g.:
Address Type Length
0x0058 RESERVED 0xe00 /* Wrap basic structures */
0x0e5c FEATURE_A 0x004
0x0e64 FEATURE_B 0x004
0x0e6c RESERVED 0x990 /* Wrap qeueue stats */
0x1800 FEATURE_C 0x100
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When driver app matching loaded FW is not found users are faced with:
nfp: failed to find app with ID 0x%02x
This message does not properly explain that matching driver code is
either not built into the driver or the driver is too old.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Representors are grouped in sets by type. Currently the whole
sets are under RCU protection, but individual representor pointers
are not. This causes some inconveniences when representors have
to be destroyed, because we have to allocate new sets to remove
any representors. Protect the individual pointers with RCU.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The write side of repr tables is always done under pf->lock.
Add a helper to dereference repr table pointers under protection
of that lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink used to have two global locks: devlink lock and port lock,
our lock ordering looked like this:
devlink lock -> driver's pf->lock -> devlink port lock
After recent changes port lock was replaced with per-instance
lock. Unfortunately, new per-instance lock is taken on most
operations now. This means we can only grab the pf->lock from
the port split/unsplit ops. Lock ordering looks like this:
devlink lock -> driver's pf->lock -> devlink instance lock
Since we can't take pf->lock from most devlink ops, make sure
nfp_apps are prepared to service them as soon as devlink is
registered. Locking the pf must be pushed down after
nfp_app_init() callback.
The init order looks like this:
nfp_app_init
devlink_register
nfp_app_start
netdev/port_register
As soon as app_init is done nfp_apps must be ready to service
devlink-related callbacks. apps can only register their own
devlink objects from nfp_app_start.
Fixes: 2406e7e546 ("devlink: Add per devlink instance lock")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP app is currently shut down as soon as all the vNICs are gone.
This means we can't depend on the app existing throughout the
lifetime of the device. Free the app only from PCI remove path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the helpers for accessing 4 or 8 byte values over
the CPP bus return the length of IO on success. If the IO
was short caller has to deal with error handling. The short
IO for 4/8B values is completely impractical. Make the
helpers return an error if full access was not possible.
Fix the few places which are actually dealing with errors
correctly, most call sites already only deal with negative
return codes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Scheduling out and in for every FW message can slow us down
unnecessarily. Our experiments show that even under heavy load
the FW responds to 99.9% messages within 200 us. Add a short
busy wait before entering the wait queue.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The special handling of different map types is left to the driver.
Allow offload of array maps by simply adding it to accepted types.
For nfp we have to make sure array elements are not deleted.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch set those new jit info fields introduced in this patch set.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use the verifier log to output error messages if map lookup
can't be offloaded.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
sts variable is holding link speed as well as state. We should
be using ls to index into ls_to_ethtool.
Fixes: 265aeb511b ("nfp: add support for .get_link_ksettings()")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Plug in to the stack's map offload callbacks for BPF map offload.
Get next call needs some special handling on the FW side, since
we can't send a NULL pointer to the FW there is a get first entry
FW command.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Map memory needs to use 40 bit addressing. Add handling of such
accesses. Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.
New relocation types have to be added - for helpers and return
addresses.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Immediate loads are used to load the return address of a helper.
We need to be able to update those loads for relocations.
Immediate loads can be slightly more complex and spread over
two instructions in general, but here we only care about simple
loads of small (< 65k) constants, so complex cases are not handled.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For map support we will need to send and receive control messages.
Add basic support for sending a message to FW, and waiting for a
reply.
Control messages are tagged with a 16 bit ID. Add a simple ID
allocator and make sure we don't allow too many messages in flight,
to avoid request <> reply mismatches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To be able to split code into reasonable chunks we need to add
the map data structures already. Later patches will add code
piece by piece.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
With map offload coming, we need to call program offload structure
something less ambiguous. Pure rename, no functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.
Also, we should attempt to patch BPF call args if JIT always on is
enabled. Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
The link state and exception interrupts may be masked when we probe.
The firmware should in theory prevent sending (and automasking) those
interrupts if the device is disabled, but if my reading of the FW code
is correct there are firmwares out there with race conditions in this
area. The interrupt may also be masked if previous driver which used
the device was malfunctioning and we didn't load the FW (there is no
other good way to comprehensively reset the PF).
Note that FW unmasks the data interrupts by itself when vNIC is
enabled, such helpful operation is not performed for LSC/EXN interrupts.
Always unmask the auxiliary interrupts after request_irq(). On the
remove path add missing PCI write flush before free_irq().
Fixes: 4c3523623d ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that `bpf_verifier_log_write()` is exported from the verifier and
makes it possible to reuse the verifier log to print messages to the
standard output, use this instead of the kernel logs in the nfp driver
for printing error messages occurring at verification time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds signed jump instructions (jsgt, jsge, jslt, jsle)
to the nfp jit. As well as adding the additional required raw
assembler branch mask to nfp_asm.h
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Instead of having an app callback per message type hand off
all offload-related handling to apps with one "rest of ndo_bpf"
callback.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To make absolute relocated branches (branches which will be completely
rewritten with br_set_offset()) distinguishable in user space dumps
from normal jumps add a large offset to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The translator pre-allocates a buffer of maximal program size.
Due to HW/FW limitations the program buffer can't currently be
longer than 128Kb, so we used to kmalloc() it, and then map for
DMA directly.
Now that the late branch resolution is copying the program image
anyway, we can just kvmalloc() the buffer. While at it, after
translation reallocate the buffer to save space.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Don't translate the program assuming it will be loaded at a given
address. This will be required for sharing programs between ports
of the same NIC, tail calls and subprograms. It will also make the
jump targets easier to understand when dumping the program to user
space.
Translate the program as if it was going to be loaded at address
zero. When load happens add the load offset in and set addresses
of special branches.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
In preparation for better handling of relocations move existing
helper for setting branch offset to nfp_asm.c and add two more.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Jump target resolution should be in jit.c not offload.c.
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
TC BPF offload was added first, so we used to assume that
the ethtool TC HW offload flag cannot be touched whenever
any BPF program is loaded on the NIC. This unncessarily
limits changes to the TC flag when offloaded program is XDP.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When BPF offload is active we need may need to restrict the MTU
changes more than just to the limitation of the kernel XDP datapath.
Allow the BPF code to veto a MTU change.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Kernel enforces the alignment of the bottom of the stack, NFP
deals with positive offsets better so we should align the top
of the stack. Round the stack size to NFP word size (4B).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
We should use % instead of @ for documenting preprocessor defines.
Add missing documentation of __NFP_REPR_TYPE_MAX. This gets rid
of all remaining kdoc warnings in the driver.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some RX rings are used for control messages, those will not have
a netdev pointer in dp. Skip XDP rxq handling on those rings.
Fixes: 7f1c684a89 ("nfp: setup xdp_rxq_info")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-01-07
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add a start of a framework for extending struct xdp_buff without
having the overhead of populating every data at runtime. Idea
is to have a new per-queue struct xdp_rxq_info that holds read
mostly data (currently that is, queue number and a pointer to
the corresponding netdev) which is set up during rxqueue config
time. When a XDP program is invoked, struct xdp_buff holds a
pointer to struct xdp_rxq_info that the BPF program can then
walk. The user facing BPF program that uses struct xdp_md for
context can use these members directly, and the verifier rewrites
context access transparently by walking the xdp_rxq_info and
net_device pointers to load the data, from Jesper.
2) Redo the reporting of offload device information to user space
such that it works in combination with network namespaces. The
latter is reported through a device/inode tuple as similarly
done in other subsystems as well (e.g. perf) in order to identify
the namespace. For this to work, ns_get_path() has been generalized
such that the namespace can be retrieved not only from a specific
task (perf case), but also from a callback where we deduce the
netns (ns_common) from a netdevice. bpftool support using the new
uapi info and extensive test cases for test_offload.py in BPF
selftests have been added as well, from Jakub.
3) Add two bpftool improvements: i) properly report the bpftool
version such that it corresponds to the version from the kernel
source tree. So pick the right linux/version.h from the source
tree instead of the installed one. ii) fix bpftool and also
bpf_jit_disasm build with bintutils >= 2.9. The reason for the
build breakage is that binutils library changed the function
signature to select the disassembler. Given this is needed in
multiple tools, add a proper feature detection to the
tools/build/features infrastructure, from Roman.
4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
stacktrace map. It is currently unimplemented, but there are
use cases where user space needs to walk all stacktrace map
entries e.g. for dumping or deleting map entries w/o having to
close and recreate the map. Add BPF selftests along with it,
from Yonghong.
5) Few follow-up cleanups for the bpftool cgroup code: i) rename
the cgroup 'list' command into 'show' as we have it for other
subcommands as well, ii) then alias the 'show' command such that
'list' is accepted which is also common practice in iproute2,
and iii) remove couple of newlines from error messages using
p_err(), from Jakub.
6) Two follow-up cleanups to sockmap code: i) remove the unused
bpf_compute_data_end_sk_skb() function and ii) only build the
sockmap infrastructure when CONFIG_INET is enabled since it's
only aware of TCP sockets at this time, from John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver hook points for xdp_rxq_info:
* reg : nfp_net_rx_ring_alloc
* unreg: nfp_net_rx_ring_free
In struct nfp_net_rx_ring moved member @size into a hole on 64-bit.
Thus, the size remaines the same after adding member @xdp_rxq.
Cc: oss-drivers@netronome.com
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We currently always pass all multicast traffic through.
Only set L2MC when actually needed. Since the driver
was not making use of the capability to filter out mcast
frames, some FW projects don't implement it any more.
Don't warn users if capability is not present (like we
do for promisc flag). The lack of L2MC capability is
assumed to mean all multicast traffic goes through.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PORT_REIFY message indicates whether reprs have been created or
when they are about to be destroyed. This is necessary so firmware
can know which state the driver is in, e.g. the firmware will not send
any control messages related to ports when the reprs are destroyed.
This prevents nuisance warning messages printed whenever the firmware
sends updates for non-existent reprs.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just before a repr is cleaned up, we give the app a chance to perform
some preclean configuration while the reprs pointer is still configured
for the app.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of starting up reprs assuming that there is link, only respond
to the link state reported by firmware.
Furthermore, ensure link is down after repr netdevs are created.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow verifier instruction callbacks without any extra locking
NETDEV_UNREGISTER notification would wait on a waitqueue for verifier
to finish. This design decision was made when rtnl lock was providing
all the locking. Use the read/write lock instead and remove the
workqueue.
Verifier will now call into the offload code, so dev_ops are moved
to offload structure. Since verifier calls are all under
bpf_prog_is_dev_bound() we no longer need static inline implementations
to please builds with CONFIG_NET=n.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.
Since this change came a little late in the release cycle
there were a number of conflicts and allocation of vNIC priv
structure seems to have slipped away in linux-next.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of overlapping changes. Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.
Include a necessary change by Jakub Kicinski, with log message:
====================
cls_bpf no longer takes care of offload tracking. Make sure
netdevsim performs necessary checks. This fixes a warning
caused by TC trying to remove a filter it has not added.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.
The check for nn->dp.bpf_offload_xdp was a stop gap solution
to make sure failed TC offload won't disable XDP, it's no longer
necessary. nfp_net_bpf_offload() will return -EBUSY on
TC vs XDP conflicts.
Fixes: 3f7889c4c7 ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cls_bpf used to take care of tracking what offload state a filter
is in, i.e. it would track if offload request succeeded or not.
This information would then be used to issue correct requests to
the driver, e.g. requests for statistics only on offloaded filters,
removing only filters which were offloaded, using add instead of
replace if previous filter was not added etc.
This tracking of offload state no longer functions with the new
callback infrastructure. There could be multiple entities trying
to offload the same filter.
Throw out all the tracking and corresponding commands and simply
pass to the drivers both old and new bpf program. Drivers will
have to deal with offload state tracking by themselves.
Fixes: 3f7889c4c7 ("net: sched: cls_bpf: call block callbacks for offload")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Generate rules for the NFP to encapsulate packets in Geneve tunnels. Move
the vxlan action code to generic udp tunnel actions and use core code for
both vxlan and Geneve.
Only support outputting to well known port 6081. Setting tunnel options
is not supported yet.
Only attempt to offload if the fw supports Geneve.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Compile Geneve match fields for offloading to the NFP. The addition of
Geneve overflows the 8 bit key_layer field, so apply extended metadata to
the match cmsg allowing up to 32 more key_layer fields.
Rather than adding new Geneve blocks, move the vxlan code to generic ipv4
udp tunnel structs and use these for both vxlan and Geneve.
Matches are only supported when specifically mentioning well known port
6081. Geneve tunnel options are not yet included in the match.
Only offload Geneve if the fw supports it - include check for this.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the _abi_flower_extra_features symbol from the fw which gives a 64
bit bitmap of new features (on top of the flower base support) that the fw
can offload. Store this bitmap in the priv data associated with each app.
If the symbol does not exist, set the bitmap to 0.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tunnel dest IP is required for separate offload to the NFP. It is
already verified that a dest IP must be present and must be an exact
match in the flower rule. Therefore, we can just extract the IP from the
generated offload rule and remove the unused mask variable. The function
is then no longer required to return the IP separately.
Because tun_dst is localised to tunnel matches, move the declaration to
the tunnel if branch.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_bpf.flags is the input member for installing the program.
netdev_bpf.prog_flags is the output member for querying. Set
the correct one on query.
Fixes: 92f0292b35 ("net: xdp: report flags program was installed with on query")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Build bot reported warning about invalid printk formats on 32bit
architectures. Use %zu for size_t and %zd ptr diff.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For XPB registers reads, some island IDs require special handling (e.g.
ARM island), which is already taken care of in nfp_xpb_readl(), so use
that instead of a straight CPP read.
Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as
0xffffffff. It has also been observed to cause a system reboot.
With this fix correct values are reported, none of which are 0xffffffff.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: 0e6c4955e1 ("nfp: dump CPP, XPB and direct ME CSRs")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In TLV-based ethtool debug dumps, don't do a CPP read for absolute
rtsyms, use the addr field in the symbol table directly as the value.
Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros.
With this fix the correct value, 0x0000004a 0x00000000 is reported.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: e1e798e3fd ("nfp: dump rtsyms")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware flashing takes around 60s (specified to not take more than
70s). Prevent hogging the RTNL lock in this time and make use of the
longer timeout for the NSP command. The timeout is set to 2.5 * 70
seconds.
We only allow flashing the firmware from reprs or PF netdevs. VFs do not
have an app reference.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware flashing NSP operation takes longer to execute than the
current default timeout. We need a mechanism to set a longer timeout for
some commands. This patch adds the infrastructure to this.
The default timeout is still 30 seconds.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the program is simple and has only one adjust head call
with constant parameters, we can check that the call will
always succeed at translation time. We need to track the
location of the call and make sure parameters are always
the same. We also have to check the parameters against
datapath constraints and ETH_HLEN.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support bpf_xdp_adjust_head(). We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add skeleton of verifier checks and translation handler
for call instructions. Make sure jump target resolution
will not treat them as jumps.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF FW creates a run time symbol called bpf_capabilities which
contains TLV-formatted capability information. Allocate app
private structure to store parsed capabilities and add a skeleton
of parsing logic.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow users outside of core reading area sizes. This was not needed
previously because whatever entity created the area would usually know
what size it asked for. The nfp_rtsym_map() helper, however, will
allocate the area based on the size of an RT-symbol with given name.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Convert the requested dump level parameter to big-endian at the start of
nfp_net_dump_calculate_size() and nfp_net_dump_populate_buffer(), then
compare and assign it directly where needed in the traversal and prolog
code. This decreases the total number of conversions used.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Delete match field defines that are not supported at this time.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Port matching is selected by default on every rule so remove check for it
and delete 'else' side of the statement. Remove nfp_flower_meta_one as now
it will not feature in the code. Rename nfp_flower_meta_two given that one
has been removed.
'Additional metadata' if statement can never be true so remove it as well.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the matching of mac/mpls as a default selection. These are not
necessarily set by a TC rule (unlike the port). Previously a mac/mpls
field would exist in every match and be masked out if not used. This patch
has no impact on functionality but removes unnessary memory assignment in
the match cmsg.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.
If we would support moving target netdev across netns, using
pointer would be better than ifindex.
This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The spec defines CSR address ranges for indirect ME CSRs. For Each TLV
chunk in the spec, dump a chunk that includes the spec and the data
over the defined address range.
- Each indirect CSR has 8 contexts. To read one context, first write the
context to a specific derived address, read it back, and then read the
register value.
- For each address, read and dump all 8 contexts in this manner.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The spec defines CSR address ranges for these types.
- Dump each TLV chunk in the spec as a chunk that includes the spec and
the data over the defined address range.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dump FW name as TLV, based on dump specification.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add spec TLV for hwinfo field, containing key string as data.
- Add dump TLV for hwinfo field, with data being key and value as packed
zero-terminated strings.
- If specified hwinfo field is not found, dump the spec TLV as -ENOENT
error.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Dump hwinfo as separate TLV chunk, in a packed format containing
zero-separated key and value strings.
- This provides additional debug context, if requested by the dumpspec.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Support rtsym TLVs.
- If specified rtsym is not found, dump the spec TLV as -ENOENT error.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Perform dumpspec traversals for calculating size and populating the
dump.
- Initially, wrap all spec TLVs in dump error TLVs (changed by later
patches in the series).
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Use a TLV structure, with the typed chunks aligned to 8-byte sizes.
- Dump numeric fields as big-endian.
- Prolog contains the dump level.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Load the TLV-based binary specification of what needs to be included in
a dump, from the "_abi_dump_spec" rtsymbol. If the symbol is not defined,
then dumps for levels >= 1 are not supported.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Skeleton code to perform a binary debug dump via ethtoolops
"set_dump", "get_dump_flags" and "get_dump_data", i.e. the ethtool
-W/w mechanism.
- Skeleton functions for debugdump operations provided.
- An integer "dump level" can be specified, this is stored between
ethtool invocations. Dump level 0 is still the "arm.diag" resource for
backward compatibility. Other dump levels each define a set of state
information to include in the dump, driven by a spec from FW.
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously we swapped the tx_packets, tx_bytes and tx_dropped counters
with rx_packets, rx_bytes and rx_dropped counters, respectively. This
behaviour is correct and expected for VF representors but it should not
be swapped for physical port mac representors.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since day one of XDP drivers had to remember to free the program
on the remove path. This leads to code duplication and is error
prone. Make the stack query the installed programs on unregister
and if something is installed, remove the program. Freeing of
program attached to XDP generic is moved from free_netdev() as well.
Because the remove will now be called before notifiers are
invoked, BPF offload state of the program will not get destroyed
before uninstall.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Some drivers enforce that flags on program replacement and
removal must match the flags passed on install. This leaves
the possibility open to enable simultaneous loading
of XDP programs both to HW and DRV.
Allow such drivers to report the flags back to the stack.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch add the optimization frontend, but adding a new eBPF IR scan
pass "nfp_bpf_opt_ldst_gather".
The pass will traverse the IR to recognize the load/store pairs sequences
that come from lowering of memory copy builtins.
The gathered memory copy information will be kept in the meta info
structure of the first load instruction in the sequence and will be
consumed by the optimization backend added in the previous patches.
NOTE: a sequence with cross memory access doesn't qualify this
optimization, i.e. if one load in the sequence will load from place that
has been written by previous store. This is because when we turn the
sequence into single CPP operation, we are reading all contents at once
into NFP transfer registers, then write them out as a whole. This is not
identical with what the original load/store sequence is doing.
Detecting cross memory access for two random pointers will be difficult,
fortunately under XDP/eBPF's restrictied runtime environment, the copy
normally happen among map, packet data and stack, they do not overlap with
each other.
And for cases supported by NFP, cross memory access will only happen on
PTR_TO_PACKET. Fortunately for this, there is ID information that we could
do accurate memory alias check.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When the gathered copy length is bigger than 32-bytes and within 128-bytes
(the maximum length a single CPP Pull/Push request can finish), the
strategy of read/write are changeed into:
* Read.
- use direct reference mode when length is within 32-bytes.
- use indirect mode when length is bigger than 32-bytes.
* Write.
- length <= 8-bytes
use write8 (direct_ref).
- length <= 32-byte and 4-bytes aligned
use write32 (direct_ref).
- length <= 32-bytes but not 4-bytes aligned
use write8 (indirect_ref).
- length > 32-bytes and 4-bytes aligned
use write32 (indirect_ref).
- length > 32-bytes and not 4-bytes aligned and <= 40-bytes
use write32 (direct_ref) to finish the first 32-bytes.
use write8 (direct_ref) to finish all remaining hanging part.
- length > 32-bytes and not 4-bytes aligned
use write32 (indirect_ref) to finish those 4-byte aligned parts.
use write8 (direct_ref) to finish all remaining hanging part.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
For NFP, we want to re-group a sequence of load/store pairs lowered from
memcpy/memmove into single memory bulk operation which then could be
accelerated using NFP CPP bus.
This patch extends the existing load/store auxiliary information by adding
two new fields:
struct bpf_insn *paired_st;
s16 ldst_gather_len;
Both fields are supposed to be carried by the the load instruction at the
head of the sequence. "paired_st" is the corresponding store instruction at
the head and "ldst_gather_len" is the gathered length.
If "ldst_gather_len" is negative, then the sequence is doing memory
load/store in descending order, otherwise it is in ascending order. We need
this information to detect overlapped memory access.
This patch then optimize memory bulk copy when the copy length is within
32-bytes.
The strategy of read/write used is:
* Read.
Use read32 (direct_ref), always.
* Write.
- length <= 8-bytes
write8 (direct_ref).
- length <= 32-bytes and is 4-byte aligned
write32 (direct_ref).
- length <= 32-bytes but is not 4-byte aligned
write8 (indirect_ref).
NOTE: the optimization should not change program semantics. The destination
register of the last load instruction should contain the same value before
and after this optimization.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It is usual that we need to check if one BPF insn is for loading/storeing
data from/to memory.
Therefore, it makes sense to factor out related code to become common
helper functions.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add support for emitting commands with field overwrites.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When immed is used with No-Dest, the emitter should use reg.dst instead of
reg.areg for the destination, using the latter will actually encode
register zero.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The NFP normally requires the source operands to be difference addressing
modes, but we should rule out the very special NN_REG_NONE type.
There are instruction that ignores both A/B operands, for example:
local_csr_rd
For these instructions, we might pass the same operand type, NN_REG_NONE,
for both A/B operands.
NOTE: in current NFP ISA, it is only possible for instructions with
unrestricted operands to take none operands, but in case there is new and
similar instructoin in restricted form, they would follow similar rules,
so swreg_to_restricted is updated as well.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If any of the shift insns in the ld/shift sequence is jump destination,
don't do combination.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
If the mask insn in the ld/mask pair is jump destination, then don't do
combination.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
NFP eBPF offload JIT engine is doing some instruction combine based
optimizations which however must not be safe if the combined sequences
are across basic block boarders.
Currently, there are post checks during fixing jump destinations. If the
jump destination is found to be eBPF insn that has been combined into
another one, then JIT engine will raise error and abort.
This is not optimal. The JIT engine ought to disable the optimization on
such cross-bb-border sequences instead of abort.
As there is no control flow information in eBPF infrastructure that we
can't do basic block based optimizations, this patch extends the existing
jump destination record pass to also flag the jump destination, then in
instruction combine passes we could skip the optimizations if insns in the
sequence are jump targets.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
eBPF insns are internally organized as dual-list inside NFP offload JIT.
Random access to an insn needs to be done by either forward or backward
traversal along the list.
One place we need to do such traversal is at nfp_fixup_branches where one
traversal is needed for each jump insn to find the destination. Such
traversals could be avoided if jump destinations are collected through a
single travesal in a pre-scan pass, and such information could also be
useful in other places where jump destination info are needed.
This patch adds such jump destination collection in nfp_prog_prepare.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support for backward jump on NFP.
- restrictions on backward jump in various functions have been removed.
- nfp_fixup_branches now supports backward jump.
There is one thing to note, currently an input eBPF JMP insn may generate
several NFP insns, for example,
NFP imm move insn A \
NFP compare insn B --> 3 NFP insn jited from eBPF JMP insn M
NFP branch insn C /
---
NFP insn X --> 1 NFP insn jited from eBPF insn N
---
...
therefore, we are doing sanity check to make sure the last jited insn from
an eBPF JMP is a NFP branch instruction.
Once backward jump is allowed, it is possible an eBPF JMP insn is at the
end of the program. This is however causing trouble for the sanity check.
Because the sanity check requires the end index of the NFP insns jited from
one eBPF insn while only the start index is recorded before this patch that
we can only get the end index by:
start_index_of_the_next_eBPF_insn - 1
or for the above example:
start_index_of_eBPF_insn_N (which is the index of NFP insn X) - 1
nfp_fixup_branches was using nfp_for_each_insn_walk2 to expose *next* insn
to each iteration during the traversal so the last index could be
calculated from which. Now, it needs some extra code to handle the last
insn. Meanwhile, the use of walk2 is actually unnecessary, we could simply
use generic single instruction walk to do this, the next insn could be
easily calculated using list_next_entry.
So, this patch migrates the jump fixup traversal method to
*list_for_each_entry*, this simplifies the code logic a little bit.
The other thing to note is a new state variable "last_bpf_off" is
introduced to track the index of the last jited NFP insn. This is necessary
because NFP is generating special purposes epilogue sequences, so the index
of the last jited NFP insn is *not* always nfp_prog->prog_len - 1.
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>