Commit Graph

878 Commits

Author SHA1 Message Date
David S. Miller
ec7146db15 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-01-29

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Teach verifier dead code removal, this also allows for optimizing /
   removing conditional branches around dead code and to shrink the
   resulting image. Code store constrained architectures like nfp would
   have hard time doing this at JIT level, from Jakub.

2) Add JMP32 instructions to BPF ISA in order to allow for optimizing
   code generation for 32-bit sub-registers. Evaluation shows that this
   can result in code reduction of ~5-20% compared to 64 bit-only code
   generation. Also add implementation for most JITs, from Jiong.

3) Add support for __int128 types in BTF which is also needed for
   vmlinux's BTF conversion to work, from Yonghong.

4) Add a new command to bpftool in order to dump a list of BPF-related
   parameters from the system or for a specific network device e.g. in
   terms of available prog/map types or helper functions, from Quentin.

5) Add AF_XDP sock_diag interface for querying sockets from user
   space which provides information about the RX/TX/fill/completion
   rings, umem, memory usage etc, from Björn.

6) Add skb context access for skb_shared_info->gso_segs field, from Eric.

7) Add support for testing flow dissector BPF programs by extending
   existing BPF_PROG_TEST_RUN infrastructure, from Stanislav.

8) Split BPF kselftest's test_verifier into various subgroups of tests
   in order better deal with merge conflicts in this area, from Jakub.

9) Add support for queue/stack manipulations in bpftool, from Stanislav.

10) Document BTF, from Yonghong.

11) Dump supported ELF section names in libbpf on program load
    failure, from Taeung.

12) Silence a false positive compiler warning in verifier's BTF
    handling, from Peter.

13) Fix help string in bpftool's feature probing, from Prashant.

14) Remove duplicate includes in BPF kselftests, from Yue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 19:38:33 -08:00
Jiong Wang
461448398a nfp: bpf: implement jitting of JMP32
This patch implements code-gen for new JMP32 instructions on NFP.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-26 13:33:02 -08:00
Jakub Kicinski
9a06927e77 nfp: bpf: support removing dead code
Add a verifier callback to the nfp JIT to remove the instructions
the verifier deemed to be dead.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski
a32014b351 nfp: bpf: support optimizing dead branches
Verifier will now optimize out branches to dead code, implement
the replace_insn callback to take advantage of that optimization.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski
e2fc61146a nfp: bpf: save original program length
Instead of passing env->prog->len around, and trying to adjust
for optimized out instructions just save the initial number
of instructions in struct nfp_prog.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski
91a87a5823 nfp: bpf: split up the skip flag
We fail program loading if jump lands on a skipped instruction.
This is for historical reasons, it used to be that we only skipped
instructions optimized out based on prior context, and therefore
the optimization would be buggy if we jumped directly to such
instruction (because the context would be skipped by the jump).

There are cases where instructions can be skipped without any
context, for example there is no point in generating code for:

	 r0 |= 0

We will also soon support dropping dead code, so make the skip
logic differentiate between "optimized with preceding context"
vs other skip types.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
Jakub Kicinski
e90287f3aa nfp: bpf: don't use instruction number for jump target
Instruction number is meaningless at code gen phase.  The target
of the instruction is overwritten by nfp_fixup_branches().  The
convention is to put the raw offset in target address as a place
holder.  See cmp_* functions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-01-23 17:35:32 -08:00
John Hurley
20cce88650 nfp: flower: enable MAC address sharing for offloadable devs
A MAC address is not necessarily a unique identifier for a netdev. Drivers
such as Linux bonds, for example, can apply the same MAC address to the
upper layer device and all lower layer devices.

NFP MAC offload for tunnel decap includes port verification for reprs but
also supports the offload of non-repr MAC addresses by assigning 'global'
indexes to these. This means that the FW will not verify the incoming port
of a packet matching this destination MAC.

Modify the MAC offload logic to assign global indexes based on MAC address
instead of net device (as it currently does). Use this to allow multiple
devices to share the same MAC. In other words, if a repr shares its MAC
address with another device then give the offloaded MAC a global index
rather than associate it with an ingress port. Track this so that changes
can be reverted as MACs stop being shared.

Implement this by removing the current list based assignment of global
indexes and replacing it with an rhashtable that maps an offloaded MAC
address to the number of devices sharing it, distributing global indexes
based on this.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
13cf71031d nfp: flower: ensure MAC cleanup on address change
It is possible to receive a MAC address change notification without the
net device being down (e.g. when an OvS bridge is assigned the same MAC as
a port added to it). This means that an offloaded MAC address may not be
removed if its device gets a new address.

Maintain a record of the offloaded MAC addresses for each repr and netdev
assigned a MAC offload index. Use this to delete the (now expired) MAC if
a change of address event occurs. Only handle change address events if the
device is already up - if not then the netdev up event will handle it.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
05d2bee6bd nfp: flower: add infastructure for non-repr priv data
NFP repr netdevs contain private data that can store per port information.
In certain cases, the NFP driver offloads information from non-repr ports
(e.g. tunnel ports). As the driver does not have control over non-repr
netdevs, it cannot add/track private data directly to the netdev struct.

Add infastructure to store private information on any non-repr netdev that
is offloaded at a given time. This is used in a following patch to track
offloaded MAC addresses for non-reprs and enable correct house keeping on
address changes.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
49402b0b7f nfp: flower: ensure deletion of old offloaded MACs
When a potential tunnel end point goes down then its MAC address should
not be matchable on the NFP.

Implement a delete message for offloaded MACs and call this on net device
down. While at it, remove the actions on register and unregister netdev
events. A MAC should only be offloaded if the device is up. Note that the
netdev notifier will replay any notifications for UP devices on
registration so NFP can still offload ports that exist before the driver
is loaded. Similarly, devices need to go down before they can be
unregistered so removal of offloaded MACs is only required on down events.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
0115dcc314 nfp: flower: remove list infastructure from MAC offload
Potential MAC destination addresses for tunnel end-points are offloaded to
firmware. This was done by building a list of such MACs and writing to
firmware as blocks of addresses.

Simplify this code by removing the list format and sending a new message
for each offloaded MAC.

This is in preparation for delete MAC messages. There will be one delete
flag per message so we cannot assume that this applies to all addresses
in a list.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
41da0b5ef3 nfp: flower: ignore offload of VF and PF repr MAC addresses
Currently MAC addresses of all repr netdevs, along with selected non-NFP
controlled netdevs, are offloaded to FW as potential tunnel end-points.
However, the addresses of VF and PF reprs are meaningless outside of
internal communication and it is only those of physical port reprs
required.

Modify the MAC address offload selection code to ignore VF/PF repr devs.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
John Hurley
f3b975778c nfp: flower: tidy tunnel related private data
Recent additions to the flower app private data have grouped the variables
of a given feature into a struct and added that struct to the main private
data struct.

In keeping with this, move all tunnel related private data to their own
struct. This has no affect on functionality but improves readability and
maintenance of the code.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:15 -08:00
Pieter Jansen van Vuuren
467322e262 nfp: flower: support multiple memory units for filter offloads
Adds support for multiple memory units which are used for filter
offloads. Each filter is assigned a stats id, the MSBs of the id are
used to determine which memory unit the filter should be offloaded
to. The number of available memory units that could be used for filter
offload is obtained from HW. A simple round robin technique is used to
allocate and distribute the ids across memory units.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:14 -08:00
Fred Lotter
96439889b4 nfp: flower: increase cmesg reply timeout
QA tests report occasional timeouts on REIFY message replies. Profiling
of the two cmesg reply types under burst conditions, with a 12-core host
under heavy cpu and io load (stress --cpu 12 --io 12), show both PHY MTU
change and REIFY replies can exceed the 10ms timeout. The maximum MTU
reply wait under burst is 16ms, while the maximum REIFY wait under 40 VF
burst is 12ms. Using a 4 VF REIFY burst results in an 8ms maximum wait.
A larger VF burst does increase the delay, but not in a linear enough
way to justify a scaled REIFY delay. The worse case values between
MTU and REIFY appears close enough to justify a common timeout. Pick a
conservative 40ms to make a safer future proof common reply timeout. The
delay only effects the failure case.

Change the REIFY timeout mechanism to use wait_event_timeout() instead
of wait_event_interruptible_timeout(), to match the MTU code. In the
current implementation, theoretically, a signal could interrupt the
REIFY waiting period, with a return code of ERESTARTSYS. However, this is
caught under the general timeout error code EIO. I cannot see the benefit
of exposing the REIFY waiting period to signals with such a short delay
(40ms), while the MTU mechnism does not use the same logic. In the absence
of any reply (wakeup() call), both reply types will wake up the task after
the timeout period. The REIFY timeout applies to the entire representor
group being instantiated (e.g. VFs), while the MTU timeout apples to a
single PHY MTU change.

Signed-off-by: Fred Lotter <frederik.lotter@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-16 15:23:14 -08:00
Luis Chamberlain
750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
David S. Miller
339bbff2d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-12-21

The following pull-request contains BPF updates for your *net-next* tree.

There is a merge conflict in test_verifier.c. Result looks as follows:

        [...]
        },
        {
                "calls: cross frame pruning",
                .insns = {
                [...]
                .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                .errstr_unpriv = "function calls to other bpf functions are allowed for root only",
                .result_unpriv = REJECT,
                .errstr = "!read_ok",
                .result = REJECT,
	},
        {
                "jset: functional",
                .insns = {
        [...]
        {
                "jset: unknown const compare not taken",
                .insns = {
                        BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
                                     BPF_FUNC_get_prandom_u32),
                        BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
                        BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
                        BPF_EXIT_INSN(),
                },
                .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                .errstr_unpriv = "!read_ok",
                .result_unpriv = REJECT,
                .errstr = "!read_ok",
                .result = REJECT,
        },
        [...]
        {
                "jset: range",
                .insns = {
                [...]
                },
                .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                .result_unpriv = ACCEPT,
                .result = ACCEPT,
        },

The main changes are:

1) Various BTF related improvements in order to get line info
   working. Meaning, verifier will now annotate the corresponding
   BPF C code to the error log, from Martin and Yonghong.

2) Implement support for raw BPF tracepoints in modules, from Matt.

3) Add several improvements to verifier state logic, namely speeding
   up stacksafe check, optimizations for stack state equivalence
   test and safety checks for liveness analysis, from Alexei.

4) Teach verifier to make use of BPF_JSET instruction, add several
   test cases to kselftests and remove nfp specific JSET optimization
   now that verifier has awareness, from Jakub.

5) Improve BPF verifier's slot_type marking logic in order to
   allow more stack slot sharing, from Jiong.

6) Add sk_msg->size member for context access and add set of fixes
   and improvements to make sock_map with kTLS usable with openssl
   based applications, from John.

7) Several cleanups and documentation updates in bpftool as well as
   auto-mount of tracefs for "bpftool prog tracelog" command,
   from Quentin.

8) Include sub-program tags from now on in bpf_prog_info in order to
   have a reliable way for user space to get all tags of the program
   e.g. needed for kallsyms correlation, from Song.

9) Add BTF annotations for cgroup_local_storage BPF maps and
   implement bpf fs pretty print support, from Roman.

10) Fix bpftool in order to allow for cross-compilation, from Ivan.

11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
    to be compatible with libbfd and allow for Debian packaging,
    from Jakub.

12) Remove an obsolete prog->aux sanitation in dump and get rid of
    version check for prog load, from Daniel.

13) Fix a memory leak in libbpf's line info handling, from Prashant.

14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
    does not get unaligned, from Jesper.

15) Fix test_progs kselftest to work with older compilers which are less
    smart in optimizing (and thus throwing build error), from Stanislav.

16) Cleanup and simplify AF_XDP socket teardown, from Björn.

17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
    to netns_id argument, from Andrey.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 17:31:36 -08:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Jakub Kicinski
4987eaccd2 nfp: bpf: optimize codegen for JSET with a constant
The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register.  We can just OR it into
the result as is.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-20 17:28:29 +01:00
Jakub Kicinski
6e774845b3 nfp: bpf: remove the trivial JSET optimization
The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop.  We won't generate any
code, anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-20 17:28:28 +01:00
John Hurley
b12c97d45c nfp: flower: fix cb_ident duplicate in indirect block register
Previously the identifier used for indirect block callback registry and
for block rule cb registry (when done via indirect blocks) was the pointer
to the netdev we were interested in receiving updates on. This worked fine
if a single app existed that registered one callback per netdev of
interest. However, if multiple cards are in place and, in turn, multiple
apps, then each app may register the same callback with the same
identifier to both the netdev's indirect block cb list and to a block's cb
list. This can lead to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the app pointer as the identifier for
netdev indirect block cb registry, allowing each app to register a unique
callback per netdev. For block cb registry, the same app may register
multiple cbs to the same block if using TC shared blocks. Instead of the
app, use the pointer to the allocated cb_priv data as the identifier here.
This means that there can be a unique block callback for each app/netdev
combo.

Fixes: 3166dd07a9 ("nfp: flower: offload tunnel decap rules via indirect TC blocks")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:34:12 -08:00
Jakub Kicinski
036b9e7cae nfp: abm: allow to opt-out of RED offload
FW team asks to be able to not support RED even if NIC is capable
of buffering for testing and experimentation.  Add an opt-out flag.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:41:42 -08:00
David S. Miller
addb067983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-12-11

The following pull-request contains BPF updates for your *net-next* tree.

It has three minor merge conflicts, resolutions:

1) tools/testing/selftests/bpf/test_verifier.c

 Take first chunk with alignment_prevented_execution.

2) net/core/filter.c

  [...]
  case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
  case bpf_ctx_range(struct __sk_buff, wire_len):
        return false;
  [...]

3) include/uapi/linux/bpf.h

  Take the second chunk for the two cases each.

The main changes are:

1) Add support for BPF line info via BTF and extend libbpf as well
   as bpftool's program dump to annotate output with BPF C code to
   facilitate debugging and introspection, from Martin.

2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
   and all JIT backends, from Jiong.

3) Improve BPF test coverage on archs with no efficient unaligned
   access by adding an "any alignment" flag to the BPF program load
   to forcefully disable verifier alignment checks, from David.

4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
   proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.

5) Extend tc BPF programs to use a new __sk_buff field called wire_len
   for more accurate accounting of packets going to wire, from Petar.

6) Improve bpftool to allow dumping the trace pipe from it and add
   several improvements in bash completion and map/prog dump,
   from Quentin.

7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
   kernel addresses and add a dedicated BPF JIT backend allocator,
   from Ard.

8) Add a BPF helper function for IR remotes to report mouse movements,
   from Sean.

9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
   member naming consistent with existing conventions, from Yonghong
   and Song.

10) Misc cleanups and improvements in allowing to pass interface name
    via cmdline for xdp1 BPF example, from Matteo.

11) Fix a potential segfault in BPF sample loader's kprobes handling,
    from Daniel T.

12) Fix SPDX license in libbpf's README.rst, from Andrey.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 18:00:43 -08:00
Pieter Jansen van Vuuren
290974d434 nfp: flower: ensure TCP flags can be placed in IPv6 frame
Previously we did not ensure tcp flags have a place to be stored
when using IPv6. We correct this by including IPv6 key layer when
we match tcp flags and the IPv6 key layer has not been included
already.

Fixes: 07e1671cfc ("nfp: flower: refactor shared ip header in match offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 17:45:41 -08:00
David S. Miller
4cc1feeb6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts, seemingly all over the place.

I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.

The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.

The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.

cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.

__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy.  Or at least I think it was :-)

Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.

The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09 21:43:31 -08:00
Jiong Wang
84708c1386 nfp: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*
BPF_X support needs indirect shift mode, please see code comments for
details.

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07 13:30:48 -08:00
Yangtao Li
6f6c74fad8 nfp: convert to DEFINE_SHOW_ATTRIBUTE
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-03 17:33:38 -08:00
Jakub Kicinski
6db3a9dcf0 nfp: report more info when reconfiguration fails
FW reconfiguration timeouts are a common indicator of FW trouble.
To make debugging easier print requested update and control word
when reconfiguration fails.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:45 -08:00
Jakub Kicinski
9571d98775 nfp: add offset to all TLV parsing errors
When troubleshooting incorrect FW capabilities it's useful to know
where the faulty TLV is located.  Add offset to all errors messages.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
51a6588e8c nfp: add offloads on representors
FW/HW can generally support the standard networking offloads
on representors without any trouble.  Add the ability for FW
to advertise which features should be available on representors.

Because representors are muxed on top of the vNIC we need to listen
on feature changes of their lower devices, and update their features
appropriately.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
71844fac1e nfp: add locking around representor changes
Up until now we never needed to keep a networking locks around
representors accesses, we only accessed them when device was
reconfigured (under nfp pf->lock) or on fast path (under RCU).
Now we want to be able to iterate over all representors during
notifications, so make sure representor assignment is done
under RTNL lock.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
fbf60e377d nfp: run don't require Qdiscs on representor netdevs
Our representors are software devices built on top of the PF
vNIC, the queuing should only happen at the vNIC netdevice.
Allow representors to run qdisc-less.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
9db8bbcb9b nfp: run representor TX locklessly
Our representors are software devices built on top of the PF
vNIC, the only state they have are per-cpu stats, so make
the TX run locklessly.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
d7cc825225 nfp: avoid oversized TSO headers with metadata prepend
In preparation for TSO over representors make sure the port id
prepend will always fit in the frame.  The current max header
length is 255, which is ample, so assume worst case scenario
of 8 byte prepend and save ourselves the conditionals.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
b54ad0eaad nfp: correct descriptor offsets in presence of metadata
The TSO-related offsets in the descriptor should not include
the length of the prepended metadata.  Adjust them.  Note that
this could not have caused issues in the past as we don't
support TSO with metadata prepend as of this patch.

Signed-off-by: Michael Rapson <michael.rapson@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
8b5ddf1e51 nfp: move queue variable init
nd_q is only used at the very end of nfp_net_tx(), there is no need
to initialize it early.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
de31049a48 nfp: move temporary variables in nfp_net_tx_complete()
Move temporary variables in scope of the loop in nfp_net_tx_complete(),
and add a temp for txbuf software structure.  This saves us 0.2% of CPU.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
Jakub Kicinski
9586274967 nfp: copy only the relevant part of the TX descriptor for frags
Chained descriptors for fragments need to duplicate all the descriptor
fields of the skb head, so we copy the descriptor and then modify the
relevant fields.  This is wasteful, because the top half of the descriptor
will get overwritten entirely while the bottom half is not modified at all.
Copy only the bottom half.  This saves us 0.3% of CPU in a GSO test.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:30:44 -08:00
John Hurley
b5f0cf0834 nfp: flower: prevent offload if rhashtable insert fails
For flow offload adds, if the rhash insert code fails, the flow will still
have been offloaded but the reference to it in the driver freed.

Re-order the offload setup calls to ensure that a flow will only be written
to FW if a kernel reference is held and stored in the rhashtable. Remove
this hashtable entry if the offload fails.

Fixes: c01d0efa51 ("nfp: flower: use rhashtable for flow caching")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:24:56 -08:00
John Hurley
1166494891 nfp: flower: release metadata on offload failure
Calling nfp_compile_flow_metadata both assigns a stats context and
increments a ref counter on (or allocates) a mask id table entry. These
are released by the nfp_modify_flow_metadata call on flow deletion,
however, if a flow add fails after metadata is set then the flow entry
will be deleted but the metadata assignments leaked.

Add an error path to the flow add offload function to ensure allocated
metadata is released in the event of an offload fail.

Fixes: 81f3ddf254 ("nfp: add control message passing capabilities to flower offloads")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-30 13:24:56 -08:00
David S. Miller
4afe60a97b Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-11-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
    the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
    from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-26 13:08:17 -08:00
Jakub Kicinski
340a4864d5 nfp: abm: add support for more threshold actions
Original FW only allowed us to perform ECN marking.  Newer releases
also support plain old drop.  Add the ability to configure drop
policy.  This is particularly useful in combination with GRED,
because different bands can have different ECN marking setting.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
174ab544e3 nfp: abm: add cls_u32 offload for simple band classification
Use offload of very simple u32 filters to direct packets to GRED
bands based on the DSCP marking.  No u32 hashing is supported,
just plain simple filters matching on ToS or Priority with
appropriate mask device can support.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
6a80240571 nfp: abm: add functions to update DSCP -> virtual queue map
Learn how to set the DSCP map.  FW uses a packed array which
geometry depends on the number of supported priorities and
virtual queues.  Write code to assemble this map and to communicate
the setting to the FW via mailbox.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
14780c3429 nfp: abm: calculate PRIO map len and check mailbox size
In preparation for PRIO offload calculate how long the prio map
for FW will be and make sure the configuration can be performed
via the vNIC mailbox.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
f3d6372064 nfp: abm: add GRED offload
Add support for GRED offload.  It behaves much like RED, but
can apply different parameters to different bands.  GRED operates
pretty much exactly like our HW/FW with a single FIFO and different
RED state instances.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
990b50a53a nfp: abm: wrap RED parameters in bands
Wrap RED parameters and stats into a structure, and a 1-element
array.  Upcoming GRED offload will add the support for more bands.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:46 -08:00
Jakub Kicinski
184ec856ca nfp: abm: add up bands for sto/non-sto stats
Add up stats for all bands for the extra ethtool statistics.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:45 -08:00
Jakub Kicinski
57f31bbaa9 nfp: abm: switch to extended stats for reading packet/byte counts
In PRIO-enabled FW read the statistics from per-band symbol, rather
than from the standard per-PCIe-queue counters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-19 18:53:45 -08:00