Commit Graph

839 Commits

Author SHA1 Message Date
David S. Miller
a19c59cc10 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-10-21

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Implement two new kind of BPF maps, that is, queue and stack
   map along with new peek, push and pop operations, from Mauricio.

2) Add support for MSG_PEEK flag when redirecting into an ingress
   psock sk_msg queue, and add a new helper bpf_msg_push_data() for
   insert data into the message, from John.

3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use
   direct packet access for __skb_buff, from Song.

4) Use more lightweight barriers for walking perf ring buffer for
   libbpf and perf tool as well. Also, various fixes and improvements
   from verifier side, from Daniel.

5) Add per-symbol visibility for DSO in libbpf and hide by default
   global symbols such as netlink related functions, from Andrey.

6) Two improvements to nfp's BPF offload to check vNIC capabilities
   in case prog is shared with multiple vNICs and to protect against
   mis-initializing atomic counters, from Jakub.

7) Fix for bpftool to use 4 context mode for the nfp disassembler,
   also from Jakub.

8) Fix a return value comparison in test_libbpf.sh and add several
   bpftool improvements in bash completion, documentation of bpf fs
   restrictions and batch mode summary print, from Quentin.

9) Fix a file resource leak in BPF selftest's load_kallsyms()
   helper, from Peng.

10) Fix an unused variable warning in map_lookup_and_delete_elem(),
    from Alexei.

11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc,
    from Nicolas.

12) Add missing executables to .gitignore in BPF selftests, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-21 21:11:46 -07:00
David S. Miller
2e2d6f0342 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.

net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'.  Thanks to David
Ahern for the heads up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 11:03:06 -07:00
Ido Schimmel
5ff4ff4fe8 net: Add netif_is_vxlan()
Add the ability to determine whether a netdev is a VxLAN netdev by
calling the above mentioned function that checks the netdev's
rtnl_link_ops.

This will allow modules to identify netdev events involving a VxLAN
netdev and act accordingly. For example, drivers capable of VxLAN
offload will need to configure the underlying device when a VxLAN netdev
is being enslaved to an offloaded bridge.

Convert nfp to use the newly introduced helper.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-17 17:45:07 -07:00
Jakub Kicinski
44b6fed0c1 nfp: bpf: double check vNIC capabilities after object sharing
Program translation stage checks that program can be offloaded to
the netdev which was passed during the load (bpf_attr->prog_ifindex).
After program sharing was introduced, however, the netdev on which
program is loaded can theoretically be different, and therefore
we should recheck the program size and max stack size at load time.

This was found by code inspection, AFAIK today all vNICs have
identical caps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16 15:23:58 -07:00
Jakub Kicinski
527db74b71 nfp: bpf: protect against mis-initializing atomic counters
Atomic operations on the NFP are currently always in big endian.
The driver keeps track of regions of memory storing atomic values
and byte swaps them accordingly.  There are corner cases where
the map values may be initialized before the driver knows they
are used as atomic counters.  This can happen either when the
datapath is performing the update and the stack contents are
unknown or when map is updated before the program which will
use it for atomic values is loaded.

To avoid situation where user initializes the value to 0 1 2 3
and then after loading a program which uses the word as an atomic
counter starts reading 3 2 1 0 - only allow atomic counters to be
initialized to endian-neutral values.

For updates from the datapath the stack information may not be
as precise, so just allow initializing such values to 0.

Example code which would break:
struct bpf_map_def SEC("maps") rxcnt = {
       .type = BPF_MAP_TYPE_HASH,
       .key_size = sizeof(__u32),
       .value_size = sizeof(__u64),
       .max_entries = 1,
};

int xdp_prog1()
{
      	__u64 nonzeroval = 3;
	__u32 key = 0;
	__u64 *value;

	value = bpf_map_lookup_elem(&rxcnt, &key);
	if (!value)
		bpf_map_update_elem(&rxcnt, &key, &nonzeroval, BPF_ANY);
	else
		__sync_fetch_and_add(value, 1);

	return XDP_PASS;
}

$ offload bpftool map dump
key: 00 00 00 00 value: 00 00 00 03 00 00 00 00

should be:

$ offload bpftool map dump
key: 00 00 00 00 value: 03 00 00 00 00 00 00 00

Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-16 15:23:58 -07:00
Pieter Jansen van Vuuren
140b6abac2 nfp: flower: use offsets provided by pedit instead of index for ipv6
Previously when populating the set ipv6 address action, we incorrectly
made use of pedit's key index to determine which 32bit word should be
set. We now calculate which word has been selected based on the offset
provided by the pedit action.

Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 23:17:25 -07:00
Pieter Jansen van Vuuren
d08c9e5893 nfp: flower: fix multiple keys per pedit action
Previously we only allowed a single header key per pedit action to
change the header. This used to result in the last header key in the
pedit action to overwrite previous headers. We now keep track of them
and allow multiple header keys per pedit action.

Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Fixes: 354b82bb32 ("nfp: add set ipv6 source and destination address")
Fixes: f8b7b0a6b1 ("nfp: add set tcp and udp header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 23:17:24 -07:00
Pieter Jansen van Vuuren
8913806f16 nfp: flower: fix pedit set actions for multiple partial masks
Previously we did not correctly change headers when using multiple
pedit actions with partial masks. We now take this into account and
no longer just commit the last pedit action.

Fixes: c0b1bd9a8b ("nfp: add set ipv4 header action flower offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 23:17:24 -07:00
Ryan C Goodfellow
5948185b97 nfp: devlink port split support for 1x100G CXP NIC
This commit makes it possible to use devlink to split the 100G CXP
Netronome into two 40G interfaces. Currently when you ask for 2
interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split
calculates that you want 5 lanes per port because for some reason
eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really
want when asking for 2 breakout interfaces is 4 lanes per port. This
commit makes that happen by calculating based on 8 lanes if 10 are
present.

Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Greg Weeks <greg.weeks@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:29:55 -07:00
Jakub Kicinski
96de25060d nfp: replace long license headers with SPDX
Replace the repeated license text with SDPX identifiers.
While at it bump the Copyright dates for files we touched
this year.

Signed-off-by: Edwin Peer <edwin.peer@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Nic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-11 12:16:21 -07:00
Pieter Jansen van Vuuren
12ecf61529 nfp: flower: use host context count provided by firmware
Read the host context count symbols provided by firmware and use
it to determine the number of allocated stats ids. Previously it
won't be possible to offload more than 2^17 filter even if FW was
able to do so.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10 22:32:44 -07:00
Pieter Jansen van Vuuren
7fade1077c nfp: flower: use stats array instead of storing stats per flow
Make use of an array stats instead of storing stats per flow which
would require a hash lookup at critical times.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10 22:32:44 -07:00
Pieter Jansen van Vuuren
c01d0efa51 nfp: flower: use rhashtable for flow caching
Make use of relativistic hash tables for tracking flows instead
of fixed sized hash tables.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10 22:32:44 -07:00
David S. Miller
071a234ad7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 23:42:44 -07:00
Quentin Monnet
7ff0ccde43 nfp: bpf: support pointers to other stack frames for BPF-to-BPF calls
Mark instructions that use pointers to areas in the stack outside of the
current stack frame, and process them accordingly in mem_op_stack().
This way, we also support BPF-to-BPF calls where the caller passes a
pointer to data in its own stack frame to the callee (typically, when
the caller passes an address to one of its local variables located in
the stack, as an argument).

Thanks to Jakub and Jiong for figuring out how to deal with this case,
I just had to turn their email discussion into this patch.

Suggested-by: Jiong Wang <jiong.wang@netronome.com>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
4454962314 nfp: bpf: optimise save/restore for R6~R9 based on register usage
When pre-processing the instructions, it is trivial to detect what
subprograms are using R6, R7, R8 or R9 as destination registers. If a
subprogram uses none of those, then we do not need to jump to the
subroutines dedicated to saving and restoring callee-saved registers in
its prologue and epilogue.

This patch introduces detection of callee-saved registers in subprograms
and prevents the JIT from adding calls to those subroutines whenever we
can: we save some instructions in the translated program, and some time
at runtime on BPF-to-BPF calls and returns.

If no subprogram needs to save those registers, we can avoid appending
the subroutines at the end of the program.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
2178f3f0dc nfp: bpf: fix return address from register-saving subroutine to callee
On performing a BPF-to-BPF call, we first jump to a subroutine that
pushes callee-saved registers (R6~R9) to the stack, and from there we
goes to the start of the callee next. In order to do so, the caller must
pass to the subroutine the address of the NFP instruction to jump to at
the end of that subroutine. This cannot be reliably implemented when
translated the caller, as we do not always know the start offset of the
callee yet.

This patch implement the required fixup step for passing the start
offset in the callee via the register used by the subroutine to hold its
return address.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
bdf4c66faf nfp: bpf: update fixup function for BPF-to-BPF calls support
Relocation for targets of BPF-to-BPF calls are required at the end of
translation. Update the nfp_fixup_branches() function in that regard.

When checking that the last instruction of each bloc is a branch, we
must account for the length of the instructions required to pop the
return address from the stack.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
fb19816541 nfp: bpf: account for additional stack usage when checking stack limit
Offloaded programs using BPF-to-BPF calls use the stack to store the
return address when calling into a subprogram. Callees also need some
space to save eBPF registers R6 to R9. And contrarily to kernel
verifier, we align stack frames on 64 bytes (and not 32). Account for
all this when checking the stack size limit before JIT-ing the program.
This means we have to recompute maximum stack usage for the program, we
cannot get the value from the kernel.

In addition to adapting the checks on stack usage, move them to the
finalize() callback, now that we have it and because such checks are
part of the verification step rather than translation.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
389f263b60 nfp: bpf: add main logics for BPF-to-BPF calls support in nfp driver
This is the main patch for the logics of BPF-to-BPF calls in the nfp
driver.

The functions called on BPF_JUMP | BPF_CALL and BPF_JUMP | BPF_EXIT were
used to call helpers and exit from the program, respectively; make them
usable for calling into, or returning from, a BPF subprogram as well.

For all calls, push the return address as well as the callee-saved
registers (R6 to R9) to the stack, and pop them upon returning from the
calls. In order to limit the overhead in terms of instruction number,
this is done through dedicated subroutines. Jumping to the callee
actually consists in jumping to the subroutine, that "returns" to the
callee: this will require some fixup for passing the address in a later
patch. Similarly, returning consists in jumping to the subroutine, which
pops registers and then return directly to the caller (but no fixup is
needed here).

Return to the caller is performed with the RTN instruction newly added
to the JIT.

For the few steps where we need to know what subprogram an instruction
belongs to, the struct nfp_insn_meta is extended with a new subprog_idx
field.

Note that checks on the available stack size, to take into account the
additional requirements associated to BPF-to-BPF calls (storing R6-R9
and return addresses), are added in a later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
e3b49dc69b nfp: bpf: account for BPF-to-BPF calls when preparing nfp JIT
Similarly to "exit" or "helper call" instructions, BPF-to-BPF calls will
require additional processing before translation starts, in order to
record and mark jump destinations.

We also mark the instructions where each subprogram begins. This will be
used in a following commit to determine where to add prologues for
subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:13 +02:00
Quentin Monnet
bcfdfb7c96 nfp: bpf: ignore helper-related checks for BPF calls in nfp verifier
The checks related to eBPF helper calls are performed each time the nfp
driver meets a BPF_JUMP | BPF_CALL instruction. However, these checks
are not relevant for BPF-to-BPF call (same instruction code, different
value in source register), so just skip the checks for such calls.

While at it, rename the function that runs those checks to make it clear
they apply to _helper_ calls only.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Quentin Monnet
c5da54d93e nfp: bpf: copy eBPF subprograms information from kernel verifier
In order to support BPF-to-BPF calls in offloaded programs, the nfp
driver must collect information about the distinct subprograms: namely,
the number of subprograms composing the complete program and the stack
depth of those subprograms. The latter in particular is non-trivial to
collect, so we copy those elements from the kernel verifier via the
newly added post-verification hook. The struct nfp_prog is extended to
store this information. Stack depths are stored in an array of dedicated
structs.

Subprogram start indexes are not collected. Instead, meta instructions
associated to the start of a subprogram will be marked with a flag in a
later patch.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Quentin Monnet
1a7e62e632 nfp: bpf: rename nfp_prog->stack_depth as nfp_prog->stack_frame_depth
In preparation for support for BPF to BPF calls in offloaded programs,
rename the "stack_depth" field of the struct nfp_prog as
"stack_frame_depth". This is to make it clear that the field refers to
the maximum size of the current stack frame (as opposed to the maximum
size of the whole stack memory).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
Quentin Monnet
c941ce9c28 bpf: add verifier callback to get stack usage info for offloaded progs
In preparation for BPF-to-BPF calls in offloaded programs, add a new
function attribute to the struct bpf_prog_offload_ops so that drivers
supporting eBPF offload can hook at the end of program verification, and
potentially extract information collected by the verifier.

Implement a minimal callback (returning 0) in the drivers providing the
structs, namely netdevsim and nfp.

This will be useful in the nfp driver, in later commits, to extract the
number of subprograms as well as the stack depth for those subprograms.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-08 10:24:12 +02:00
David S. Miller
9e50727f0e mlx5-updates-2018-10-03
mlx5 core driver and ethernet netdev updates, please note there is a small
 devlink releated update to allow extack argument to eswitch operations.
 
 From Eli Britstein,
 1) devlink: Add extack argument to the eswitch related operations
 2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
 3) net/mlx5e: Add extack messages for TC offload failures
 
 From Eran Ben Elisha,
 4) mlx5e: Add counter for aRFS rule insertion failures
 
 From Feras Daoud
 5) Fast teardown support for mlx5 device
 This change introduces the enhanced version of the "Force teardown" that
 allows SW to perform teardown in a faster way without the need to reclaim
 all the FW pages.
 Fast teardown provides the following advantages:
     1- Fix a FW race condition that could cause command timeout
     2- Avoid moving to polling mode
     3- Close the vport to prevent PCI ACK to be sent without been scatter
     to memory
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbtU45AAoJEEg/ir3gV/o+/C4H/RHA4KImrb476EdB3VNYMqAN
 dgXb+bmh6sZP+jHWqQ4c3aVeh6/T8qm4gwiSn2nVTtHEnxtCdIYljzDC1Nswczeg
 pSjD1eOP7M1LpAOmBb8xdnJcX7yM7r1bTklnp2sN853WShbsDRYgZBHsBwTzx25U
 ZdzL4QTLuohlG/aLrbGXMntIy45ya2fVQrnK54s18nFlgsdFjEs0mi0xaUKNBC6+
 P8CTohHAxuuxmL5b+6MIYLZCdgd8cLNQFdtqbckEVw7SvcRTxfraRlyqJ0YOgTGB
 TdSWnqZz2JYH29wSFbpFG8qX6GCv8FoiZ+fKzldbolHk442rrktHv3+Y7qQuZVs=
 =NVks
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-10-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-10-03

mlx5 core driver and ethernet netdev updates, please note there is a small
devlink releated update to allow extack argument to eswitch operations.

From Eli Britstein,
1) devlink: Add extack argument to the eswitch related operations
2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
3) net/mlx5e: Add extack messages for TC offload failures

From Eran Ben Elisha,
4) mlx5e: Add counter for aRFS rule insertion failures

From Feras Daoud
5) Fast teardown support for mlx5 device
This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the FW pages.
Fast teardown provides the following advantages:
    1- Fix a FW race condition that could cause command timeout
    2- Avoid moving to polling mode
    3- Close the vport to prevent PCI ACK to be sent without been scatter
    to memory
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-04 09:48:37 -07:00
David S. Miller
6f41617bf2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflict in net/core/rtnetlink.c, David Ahern's bug fix in 'net'
overlapped the renaming of a netlink attribute in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-03 21:00:17 -07:00
Eli Britstein
db7ff19e7b devlink: Add extack for eswitch operations
Add extack argument to the eswitch related operations.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-03 16:17:58 -07:00
Jakub Kicinski
ff58e2df62 nfp: avoid soft lockups under control message storm
When FW floods the driver with control messages try to exit the cmsg
processing loop every now and then to avoid soft lockups.  Cmsg
processing is generally very lightweight so 512 seems like a reasonable
budget, which should not be exceeded under normal conditions.

Fixes: 77ece8d5f1 ("nfp: add control vNIC datapath")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-02 11:47:58 -07:00
Jakub Kicinski
0c9864c05f nfp: bpf: allow control message sizing for map ops
In current ABI the size of the messages carrying map elements was
statically defined to at most 16 words of key and 16 words of value
(NFP word is 4 bytes).  We should not make this assumption and use
the max key and value sizes from the BPF capability instead.

To make sure old kernels don't get surprised with larger (or smaller)
messages bump the FW ABI version to 3 when key/value size is different
than 16 words.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-02 14:39:59 +02:00
Jakub Kicinski
9bbdd41b8a nfp: allow apps to request larger MTU on control vNIC
Some apps may want to have higher MTU on the control vNIC/queue.
Allow them to set the requested MTU at init time.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-02 14:39:59 +02:00
Jakub Kicinski
28264eb227 nfp: bpf: parse global BPF ABI version capability
Up until now we only had per-vNIC BPF ABI version capabilities,
which are slightly awkward to use because bulk of the resources
and configuration does not relate to any particular vNIC.  Add
a new capability for global ABI version and check the per-vNIC
version are equal to it.  Assume the ABI version 2 if no explicit
version capability is present.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-02 14:39:58 +02:00
Jakub Kicinski
97ea8ac360 nfp: warn on experimental TLV types
Reserve two TLV types for feature development, and warn in the driver
if they ever leak into production.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:49:30 -07:00
David S. Miller
a06ee256e5 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Version bump conflict in batman-adv, take what's in net-next.

iavf conflict, adjustment of netdev_ops in net-next conflicting
with poll controller method removal in net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-25 10:35:29 -07:00
Eric Dumazet
0825ce7031 nfp: remove ndo_poll_controller
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

nfp uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Tested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-23 21:55:25 -07:00
Jakub Kicinski
23d9f5531c nfp: provide a better warning when ring allocation fails
NFP supports fairly enormous ring sizes (up to 256k descriptors).
In commit 4662717038 ("nfp: use kvcalloc() to allocate SW buffer
descriptor arrays") we have started using kvcalloc() functions to
make sure the allocation of software state arrays doesn't hit
the MAX_ORDER limit.  Unfortunately, we can't use virtual mappings
for the DMA region holding HW descriptors.  In case this allocation
fails instead of the generic (and fairly scary) warning/splat in
the logs print a helpful message explaining what happened and
suggesting how to fix it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-19 23:07:41 -07:00
David S. Miller
aaf9253025 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-12 22:22:42 -07:00
Jakub Kicinski
eca09be82e nfp: report FW vNIC stats in interface stats
Report in standard netdev stats drops and errors as well as
RX multicast from the FW vNIC counters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 13:20:49 -07:00
Louis Peens
224de549f0 nfp: flower: reject tunnel encap with ipv6 outer headers for offloading
This fixes a bug where ipv6 tunnels would report that it is
getting offloaded to hardware but would actually be rejected
by hardware.

Fixes: b27d6a95a7 ("nfp: compile flower vxlan tunnel set actions")
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 13:18:30 -07:00
Pieter Jansen van Vuuren
db191db813 nfp: flower: fix vlan match by checking both vlan id and vlan pcp
Previously we only checked if the vlan id field is present when trying
to match a vlan tag. The vlan id and vlan pcp field should be treated
independently.

Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-12 13:18:30 -07:00
jun qian
6577b0f716 nfp: replace spin_lock_bh with spin_lock in tasklet callback
As you are already in a tasklet, it is unnecessary to call spin_lock_bh.

Signed-off-by: jun qian <hangdianqj@163.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-07 22:56:02 -07:00
Jakub Kicinski
7848418e28 nfp: separate VXLAN and GRE feature handling
VXLAN and GRE FW features have to currently be both advertised
for the driver to enable them.  Separate the handling.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:18:11 -07:00
Jakub Kicinski
e84b2f2db2 nfp: validate rtsym accesses fall within the symbol
With the accesses to rtsyms now all going via special helpers
we can easily make sure the driver is not reading past the
end of the symbol.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:17:07 -07:00
Jakub Kicinski
31e380f38f nfp: prefix rtsym error messages with symbol name
For ease of debug preface all error messages with the name
of the symbol which caused them.  Use the same message format
for existing messages while at it.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:17:07 -07:00
Jakub Kicinski
3c576de30b nfp: fix readq on absolute RTsyms
Return the error and report value through the output param.

Fixes: 640917dd81 ("nfp: support access to absolute RTsyms")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-05 22:17:07 -07:00
David S. Miller
36302685f5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-09-04 21:33:03 -07:00
Jakub Kicinski
9ad716b95f nfp: wait for posted reconfigs when disabling the device
To avoid leaking a running timer we need to wait for the
posted reconfigs after netdev is unregistered.  In common
case the process of deinitializing the device will perform
synchronous reconfigs which wait for posted requests, but
especially with VXLAN ports being actively added and removed
there can be a race condition leaving a timer running after
adapter structure is freed leading to a crash.

Add an explicit flush after deregistering and for a good
measure a warning to check if timer is running just before
structures are freed.

Fixes: 3d780b926a ("nfp: add async reconfiguration mechanism")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-31 23:01:30 -07:00
Jakub Kicinski
4152e58cb8 nfp: make RTsym users handle absolute symbols correctly
Make the RTsym users access the size via the helper, which
takes care of special handling of absolute symbols.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
640917dd81 nfp: support access to absolute RTsyms
Add support in nfpcore for reading the absolute RTsyms.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
1240989ccc nfp: convert all RTsym users to use new read/write helpers
Convert all users of RTsym to the new set of helpers which
handle all targets correctly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
761969992d nfp: convert existing RTsym helpers to full target decoding
Make nfp_rtsym_{read,write}_le() and nfp_rtsym_map() use the new
target resolution helpers to allow accessing in-cache symbols.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
8f6d6052cf nfp: pass cpp_id to nfp_cpp_map_area()
Align nfp_cpp_map_area() with other CPP-level APIs and pass
encoded cpp_id/dest rather than target, action, domain tuple.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
3f0e55a2a6 nfp: add RTsym access helpers
RTsyms may have special encodings for more complex symbol types.
For example symbols which are placed in external memory unit's
cache directly, constants or local memory.  Add set of helpers
which will check for those special encodings and handle them
correctly.

For now only add direct cache accesses, we don't have a need to
access the other ones in foreseeable future.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
c678a9759a nfp: add basic errors messages to target logic
Add error prints to CPP target encoding/decoding logic, otherwise
it's quite hard to pin point the reasons why read or write
operations fail.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
73eaf3b7b8 nfp: save the MU locality field offset
We will soon need the MU locality field offset much more
often than just for decoding MIP address.  Save it in nfp_cpp
for quick access.  Note that we can already reuse the target
config from nfp_cpp, no need to do the XPB read.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Francois H. Theron <francois.theron@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
9bf6cce893 nfp: refactor the per-chip PCIe config
Use a switch statement instead of ifs for code dependent
on chip version.  While at it make sure we fail for unknown
chip revisions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
0377505c54 nfp: add support for NFP5000
Add NFP5000 to supported chips, the chip is backward compatible
with NFP4000 and NFP6000, so core PCIe code needs to handle it
the same way as 4k and 6k.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
f6e71efdf9 nfp: abm: look up MAC addresses via management FW
In multi-host scenarios Management FW may allocate MAC addresses
at runtime, we have to use the indirect lookup to find them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
34243f5909 nfp: add support for indirect HWinfo lookup
Management FW can adjust some of the information in the HWinfo table
at runtime.  In some cases reading the table directly will not yield
correct results.  Add a NSP command for looking up information.
Up until now we weren't making use of any of the values which may
get adjusted.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:47 -07:00
Jakub Kicinski
ac86da0546 nfp: interpret extended FW load result codes
To enable easier FW distribution NFP can now automatically
select between FW stored on the flash and loaded from the
kernel.

If FW loading policy is set to auto it will compare the
versions of FW from the host and from the flash and load
the newer one.  If FW type doesn't match (e.g. one advanced
application vs another) the FW from the host takes precedence,
unless one of them is the basic NIC firmware, in which case
the non-basic-NIC FW is selected.

This automatic selection mechanism requires we inform user
what the verdict was.  Print a message to the logs explaining
the decision and the reason.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:46 -07:00
Jakub Kicinski
2db100002e nfp: attempt FW load from flash
Flash may contain a default NFP application FW.  This application
can either be put there by the user (with ethtool -f) or shipped
with the card.  If file system FW is not found, attempt to load
this flash stored app FW.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:46 -07:00
Jakub Kicinski
1c0372b67c nfp: encapsulate NSP command arguments into structs
There is already a fair number of arguments to nfp_nsp_command()
family of functions.  Encapsulate them into structures to make
adding new ones easier.  No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-28 16:01:46 -07:00
Cong Wang
244cd96adb net_sched: remove list_head from tc_action
After commit 90b73b77d0, list_head is no longer needed.
Now we just need to convert the list iteration to array
iteration for drivers.

Fixes: 90b73b77d0 ("net: sched: change action API to use array of pointers to actions")
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-21 12:45:44 -07:00
Jakub Kicinski
19997ba7cb nfp: clean up return types in kdoc comments
Remove 'Return:' information from functions which no longer
return a value.  Also update name and return types of nfp_nffw_info
access functions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-13 19:36:08 -07:00
Pieter Jansen van Vuuren
0a22b17a6b nfp: flower: add geneve option match offload
Introduce a new layer for matching on geneve options. This allows
offloading filters configured to match geneve with options.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 12:22:15 -07:00
Pieter Jansen van Vuuren
9e7c32fe44 nfp: flower: add geneve option push action offload
Introduce new push geneve option action. This allows offloading
filters configured to entunnel geneve with options.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 12:22:15 -07:00
John Hurley
d7ff7ec573 nfp: flower: allow matching on ipv4 UDP tunnel tos and ttl
The addition of FLOW_DISSECTOR_KEY_ENC_IP to TC flower means that the ToS
and TTL of the tunnel header can now be matched on.

Extend the NFP tunnel match function to include these new fields.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 12:22:14 -07:00
John Hurley
2a43747147 nfp: flower: set ip tunnel ttl from encap action
The TTL for encapsulating headers in IPv4 UDP tunnels is taken from a
route lookup. Modify this to first check if a user has specified a TTL to
be used in the TC action.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 12:22:14 -07:00
David S. Miller
1ba982806c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-08-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add cgroup local storage for BPF programs, which provides a fast
   accessible memory for storing various per-cgroup data like number
   of transmitted packets, etc, from Roman.

2) Support bpf_get_socket_cookie() BPF helper in several more program
   types that have a full socket available, from Andrey.

3) Significantly improve the performance of perf events which are
   reported from BPF offload. Also convert a couple of BPF AF_XDP
   samples overto use libbpf, both from Jakub.

4) seg6local LWT provides the End.DT6 action, which allows to
   decapsulate an outer IPv6 header containing a Segment Routing Header.
   Adds this action now to the seg6local BPF interface, from Mathieu.

5) Do not mark dst register as unbounded in MOV64 instruction when
   both src and dst register are the same, from Arthur.

6) Define u_smp_rmb() and u_smp_wmb() to their respective barrier
   instructions on arm64 for the AF_XDP sample code, from Brian.

7) Convert the tcp_client.py and tcp_server.py BPF selftest scripts
   over from Python 2 to Python 3, from Jeremy.

8) Enable BTF build flags to the BPF sample code Makefile, from Taeung.

9) Remove an unnecessary rcu_read_lock() in run_lwt_bpf(), from Taehee.

10) Several improvements to the README.rst from the BPF documentation
    to make it more consistent with RST format, from Tobin.

11) Replace all occurrences of strerror() by calls to strerror_r()
    in libbpf and fix a FORTIFY_SOURCE build error along with it,
    from Thomas.

12) Fix a bug in bpftool's get_btf() function to correctly propagate
    an error via PTR_ERR(), from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 11:02:05 -07:00
Jakub Kicinski
0c26159352 nfp: bpf: xdp_adjust_tail support
Add support for adjust_tail.  There are no FW changes needed but add
a FW capability just in case there would be any issue with previously
released FW, or we will have to change the ABI in the future.

The helper is trivial and shouldn't be used too often so just inline
the body of the function.  We add the delta to locally maintained
packet length register and check for overflow, since add of negative
value must overflow if result is positive.  Note that if delta of 0
would be allowed in the kernel this trick stops working and we need
one more instruction to compare lengths before and after the change.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-08-04 21:58:12 +02:00
David S. Miller
89b1698c93 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
The BTF conflicts were simple overlapping changes.

The virtio_net conflict was an overlap of a fix of statistics counter,
happening alongisde a move over to a bonafide statistics structure
rather than counting value on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-02 10:55:32 -07:00
Jakub Kicinski
240b74fde3 nfp: fix variable dereferenced before check in nfp_app_ctrl_rx_raw()
'app' is dereferenced before used for the devlink trace point.
In case FW is buggy and sends a control message to a VF queue
we should make sure app is not NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-31 09:28:19 +02:00
John Hurley
ee614c8710 nfp: flower: fix port metadata conversion bug
Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type
return value but, if the repr type is unknown, returns a value of type
enum nfp_flower_cmsg_port_type.  This means that if FW encodes the port
ID in a way the driver does not understand instead of dropping the frame
driver may attribute it to a physical port (uplink) provided the port
number is less than physical port count.

Fix this and ensure a net_device of NULL is returned if the repr can not
be determined.

Fixes: 1025351a88 ("nfp: add flower app")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-28 14:27:32 -07:00
Jakub Kicinski
17082566a9 nfp: bpf: improve map offload info messages
FW can put constraints on map element size to maximize resource
use and efficiency.  When user attempts offload of a map which
does not fit into those constraints an informational message is
printed to kernel logs to inform user about the reason offload
failed.  Map offload does not have access to any advanced error
reporting like verifier log or extack.  There is also currently
no way for us to nicely expose the FW capabilities to user
space.  Given all those constraints we should make sure log
messages are as informative as possible.  Improve them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
ab01f4ac5f nfp: bpf: remember maps by ID
Record perf maps by map ID, not raw kernel pointer.  This helps
with debug messages, because printing pointers to logs is frowned
upon, and makes debug easier for the users, as map ID is something
they should be more familiar with.  Note that perf maps are offload
neutral, therefore IDs won't be orphaned.

While at it use a rate limited print helper for the error message.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
0958762748 nfp: bpf: allow receiving perf events on data queues
Control queue is fairly low latency, and requires SKB allocations,
which means we can't even reach 0.5Msps with perf events.  Allow
perf events to be delivered to data queues.  This allows us to not
only use multiple queues, but also receive and deliver to user space
more than 5Msps per queue (Xeon E5-2630 v4 2.20GHz, no retpolines).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
20c5420421 nfp: bpf: pass raw data buffer to nfp_bpf_event_output()
In preparation for SKB-less perf event handling make
nfp_bpf_event_output() take buffer address and length,
not SKB as parameters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
79ca38e80c nfp: allow control message reception on data queues
Port id 0xffffffff is reserved for control messages.  Allow reception
of messages with this id on data queues.  Hand off a raw buffer to
the higher layer code, without allocating SKB for max efficiency.
The RX handle can't modify or keep the buffer, after it returns
buffer is handed back over to the NIC RX free buffer list.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
78a8b760e4 nfp: move repr handling on RX path
Representor packets are received on PF queues with special metadata tag
for demux.  There is no reason to resolve the representor ID -> netdev
after the skb has been allocated.  Move the code, this will allow us to
handle special FW messages without SKB allocation overhead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-27 07:14:35 +02:00
Jakub Kicinski
5ea14712d7 nfp: protect from theoretical size overflows on HW descriptor ring
Use array_size() and store the size as full size_t to protect from
theoretical size overflow when handling HW descriptor rings.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 22:17:44 -07:00
Jakub Kicinski
e76c1d3d2a nfp: restore correct ordering of fields in rx ring structure
Commit 7f1c684a89 ("nfp: setup xdp_rxq_info") mixed the cache
cold and cache hot data in the nfp_net_rx_ring structure (ignoring
the feedback), to try to fit the structure into 2 cache lines
after struct xdp_rxq_info was added.  Now that we are about to add
a new field the structure will grow back to 3 cache lines, so
order the members correctly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 22:17:44 -07:00
Jakub Kicinski
4662717038 nfp: use kvcalloc() to allocate SW buffer descriptor arrays
Use kvcalloc() instead of tmp variable + kzalloc() when allocating
SW buffer information to allow falling back to vmalloc and to protect
from theoretical integer overflow.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 22:17:44 -07:00
Jakub Kicinski
5b0ced17ed nfp: don't fail probe on pci_sriov_set_totalvfs() errors
On machines with buggy ACPI tables or when SR-IOV is already enabled
we may not be able to set the SR-IOV VF limit in sysfs, it's not fatal
because the limit is imposed by the driver anyway.  Only the sysfs
'sriov_totalvfs' attribute will be too high.  Print an error to inform
user about the failure but allow probe to continue.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-25 22:17:44 -07:00
David S. Miller
19725496da Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
Jakub Kicinski
07300f774f nfp: avoid buffer leak when FW communication fails
After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers.  This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1.  Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.

This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed.  Buffer leak is the only problem.  Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.

nfp_net_clear_config_and_disable() is now fully idempotent.

Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22 10:58:52 -07:00
Jakub Kicinski
042f882556 nfp: bring back support for offloading shared blocks
Now that we have offload replay infrastructure added by
commit 326367427c ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6de ("nfp: reject binding to shared blocks").

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22 10:58:52 -07:00
John Hurley
b809ec869b nfp: flower: ensure dead neighbour entries are not offloaded
Previously only the neighbour state was checked to decide if an offloaded
entry should be removed. However, there can be situations when the entry
is dead but still marked as valid. This can lead to dead entries not
being removed from fw tables or even incorrect data being added.

Check the entry dead bit before deciding if it should be added to or
removed from fw neighbour tables.

Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-22 10:55:01 -07:00
David S. Miller
eae249b27f Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-07-20

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add sharing of BPF objects within one ASIC: this allows for reuse of
   the same program on multiple ports of a device, and therefore gains
   better code store utilization. On top of that, this now also enables
   sharing of maps between programs attached to different ports of a
   device, from Jakub.

2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
   detections and unused variable exports, also from Jakub.

3) First batch of RCU annotation fixes in prog array handling, i.e.
   there are several __rcu markers which are not correct as well as
   some of the RCU handling, from Roman.

4) Two fixes in BPF sample files related to checking of the prog_cnt
   upper limit from sample loader, from Dan.

5) Minor cleanup in sockmap to remove a set but not used variable,
   from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:58:30 -07:00
David S. Miller
c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Jakub Kicinski
b5faa20d6f nfp: bpf: allow program sharing within ASIC
Allow program sharing between netdevs of the same NFP ASIC.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-18 15:10:34 +02:00
Jakub Kicinski
602144c224 bpf: offload: keep the offload state per-ASIC
Create a higher-level entity to represent a device/ASIC to allow
programs and maps to be shared between device ports.  The extra
work is required to make sure we don't destroy BPF objects as
soon as the netdev for which they were loaded gets destroyed,
as other ports may still be using them.  When netdev goes away
all of its BPF objects will be moved to other netdevs of the
device, and only destroyed when last netdev is unregistered.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-18 15:10:34 +02:00
Jakub Kicinski
9fd7c55591 bpf: offload: aggregate offloads per-device
Currently we have two lists of offloaded objects - programs and maps.
Netdevice unregister notifier scans those lists to orphan objects
associated with device being unregistered.  This puts unnecessary
(even if negligible) burden on all netdev unregister calls in BPF-
-enabled kernel.  The lists of objects may potentially get long
making the linear scan even more problematic.  There haven't been
complaints about this mechanisms so far, but it is suboptimal.

Instead of relying on notifiers, make the few BPF-capable drivers
register explicitly for BPF offloads.  The programs and maps will
now be collected per-device not on a global list, and only scanned
for removal when driver unregisters from BPF offloads.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-18 15:10:34 +02:00
Jakub Kicinski
4612bebfa3 nfp: add .ndo_init() and .ndo_uninit() callbacks
BPF code should unregister the offload capabilities from .ndo_uninit(),
to make sure the operation is atomic with unlist_netdevice().  Plumb
the init/uninit NDOs for vNICs and representors.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-18 15:10:34 +02:00
David S. Miller
2aa4a3378a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-07-15

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Various different arm32 JIT improvements in order to optimize code emission
   and make the JIT code itself more robust, from Russell.

2) Support simultaneous driver and offloaded XDP in order to allow for advanced
   use-cases where some work is offloaded to the NIC and some to the host. Also
   add ability for bpftool to load programs and maps beyond just the cgroup case,
   from Jakub.

3) Add BPF JIT support in nfp for multiplication as well as division. For the
   latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.

4) Add BTF pretty print functionality to bpftool in plain and JSON output
   format, from Okash.

5) Add build and installation to the BPF helper man page into bpftool, from Quentin.

6) Add a TCP BPF callback for listening sockets which is triggered right after
   the socket transitions to TCP_LISTEN state, from Andrey.

7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
   tree and prints all attached programs, from Roman.

8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
   packets, from Jesper.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 18:47:44 -07:00
Jakub Kicinski
5f4284015e nfp: add support for simultaneous driver and hw XDP
Split handling of offloaded and driver programs completely.  Since
offloaded programs always come with XDP_FLAGS_HW_MODE set in reality
there could be no sharing, anyway, programs would only be installed
in driver or in hardware.  Splitting the handling allows us to install
programs in HW and in driver at the same time.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13 20:26:35 +02:00
Jakub Kicinski
a25717d2b6 xdp: support simultaneous driver and hw XDP attachment
Split the query of HW-attached program from the software one.
Introduce new .ndo_bpf command to query HW-attached program.
This will allow drivers to install different programs in HW
and SW at the same time.  Netlink can now also carry multiple
programs on dump (in which case mode will be set to
XDP_ATTACHED_MULTI and user has to check per-attachment point
attributes, IFLA_XDP_PROG_ID will not be present).  We reuse
IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size()
doesn't need to be updated.

Note that the installation side is still not there, since all
drivers currently reject installing more than one program at
the time.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13 20:26:35 +02:00
Jakub Kicinski
05296620f6 xdp: factor out common program/flags handling from drivers
Basic operations drivers perform during xdp setup and query can
be moved to helpers in the core.  Encapsulate program and flags
into a structure and add helpers.  Note that the structure is
intended as the "main" program information source in the driver.
Most drivers will additionally place the program pointer in their
fast path or ring structures.

The helpers don't have a huge impact now, but they will
decrease the code duplication when programs can be installed
in HW and driver at the same time.  Encapsulating the basic
operations in helpers will hopefully also reduce the number
of changes to drivers which adopt them.

Helpers could really be static inline, but they depend on
definition of struct netdev_bpf which means they'd have
to be placed in netdevice.h, an already 4500 line header.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13 20:26:35 +02:00
Jakub Kicinski
6b86758973 xdp: don't make drivers report attachment mode
prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw).  Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports.  Remove
prog_attached member.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13 20:26:35 +02:00
Arnd Bergmann
51bef926be nfp: avoid using getnstimeofday64()
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:

- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
  by a concurrent settimeofday() or during a leap second adjustment.

Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 14:55:39 -07:00
Linus Torvalds
8979319f2d pci-v4.18-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAltBPacUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vwx+hAAvzTP6o3VOtgNK7lm3nOBuzfykgCv
 TFhXP2yeDItWDBLDpWX7wjs2657W3Sjrpw6FyVIYvsoMKKRuOYeZ6ChDieG5ZgTj
 oxav5U6TlDHoF0fNq0LWfv78lP6+++7/6yaer6j9xDksVqE4/zxlFcExxuszhZlC
 8ptJ54ORn92RdfRHCDptA4PFReNlQWNw3bpKpGxu8xj0TN/sYPN3ggHfnmiEyGMZ
 8/KLNOzGrhoqztaBPyRoG3GhU1hpicT+fWzg11Li8hP1solQDht+S3mnCC1IxqdI
 JSU7/jhti4nUV56qE7QzYzdsVbTFcItGn0WImklIFCt7htp4Rms0wN+QCmwhtjKM
 0WH02gRDip4CcYcPZGeGaeexWJFEScamFK1yxxDV7059KsoQKUN1Sm+R7y9xORYw
 nQAHPTO/nv02Xo+ADAYzV0aBPD7fEvFaWtdXAuLocVWVj3eiEqp8ftoYmuWym6T3
 gHWt9Dod8olj3t9bkR1MWZ121Ar5MsobgJSF30smEx8VMKv+4Xx8eXvq0FCuUQwT
 s/WLTPqK31Sqjii0xe1wO8g0yexCVaBpXJB/e9PpYf/+ICItG1DHLz0Ygw01QWVK
 Tv7HTAofdO42kChHjErB6GbzSuxDxSVCPN26bwkZu0eV91uKiLKA7wLUv8+qSXlZ
 lK98Eypy0Ti7pvA=
 =IOF/
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix a use-after-free in the endpoint code (Dan Carpenter)

 - Stop defaulting CONFIG_PCIE_DW_PLAT_HOST to yes (Geert Uytterhoeven)

 - Fix an nfp regression caused by a change in how we limit the number
   of VFs we can enable (Jakub Kicinski)

 - Fix failure path cleanup issues in the new R-Car gen3 PHY support
   (Marek Vasut)

 - Fix leaks of OF nodes in faraday, xilinx-nwl, xilinx (Nicholas Mc
   Guire)

* tag 'pci-v4.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  nfp: stop limiting VFs to 0
  PCI/IOV: Reset total_VFs limit after detaching PF driver
  PCI: faraday: Add missing of_node_put()
  PCI: xilinx-nwl: Add missing of_node_put()
  PCI: xilinx: Add missing of_node_put()
  PCI: endpoint: Use after free in pci_epf_unregister_driver()
  PCI: controller: dwc: Do not let PCIE_DW_PLAT_HOST default to yes
  PCI: rcar: Clean up PHY init on failure
  PCI: rcar: Shut the PHY down in failpath
2018-07-08 10:55:21 -07:00