Commit Graph

843987 Commits

Author SHA1 Message Date
Michael Chan
c1ba92a86d bnxt_en: Refactor __bnxt_xmit_xdp().
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.

Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
Andy Gospodarek
52c0609258 bnxt_en: rename some xdp functions
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 15:15:24 -07:00
David S. Miller
aa6be2b95d Merge branch 'cpsw-Add-XDP-support'
Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw: Add XDP support

This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.

It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y

Link on previous v7:
https://lkml.org/lkml/2019/7/4/715

Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4

v8..v9:
- fix warnings on arm64 caused by typos in type casting

v7..v8:
- corrected dma calculation based on headroom instead of hard start
- minor comment changes

v6..v7:
- rolled back to v4 solution but with small modification
- picked up patch:
  https://www.spinics.net/lists/netdev/msg583145.html
- added changes related to netsec fix and cpsw

v5..v6:
- do changes that is rx_dev while redirect/flush cycle is kept the same
- dropped net: ethernet: ti: davinci_cpdma: return handler status
- other changes desc in patches

v4..v5:
- added two plreliminary patches:
  net: ethernet: ti: davinci_cpdma: allow desc split while down
  net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter

v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API

v2..v3:
- each rxq and ndev has its own page pool

v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
  net: page_pool: add helper function to retrieve dma addresses
  net: page_pool: add helper function to unmap dma addresses
  net: ethernet: ti: cpsw: use cpsw as drv data
  net: ethernet: ti: cpsw_ethtool: simplify slave loops
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
9ed4050c0d net: ethernet: ti: cpsw: add XDP support
Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.

Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Also, each rx queue have own page pools, but common for both
netdevs.

XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.

In order to keep rx_dev while redirect, that can be somehow used in
future, do flush in rx_handler, that allows to keep rx dev the same
while redirect. It allows to conform with tracing rx_dev pointed
by Jesper.

Also, there is probability, that XDP generic code can be extended to
support multi ndev drivers like this one, using same rx queue for
several ndevs, based on switchdev for instance or else. In this case,
driver can be modified like exposed here:
https://lkml.org/lkml/2019/7/3/243

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
608ef6202f net: ethernet: ti: cpsw_ethtool: allow res split while down
That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
962fb61890 net: ethernet: ti: davinci_cpdma: allow desc split while down
That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
6670acacd5 net: ethernet: ti: davinci_cpdma: add dma mapped submit
In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Ivan Khoronzhuk
1da4bbeffe net: core: page_pool: add user refcnt and reintroduce page_pool_destroy
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.

This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.

This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.

This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.

This patch is a modified version of Ivan Khoronzhuk's original patch.

Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 14:58:04 -07:00
Yang Wei
dd006fc434 nfc: fix potential illegal memory access
The frags_q is not properly initialized, it may result in illegal memory
access when conn_info is NULL.
The "goto free_exit" should be replaced by "goto exit".

Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:46:24 -07:00
Arnd Bergmann
49db9228b8 macb: fix build warning for !CONFIG_OF
When CONFIG_OF is disabled, we get a harmless warning about the
newly added variable:

drivers/net/ethernet/cadence/macb_main.c:48:39: error: 'mgmt' defined but not used [-Werror=unused-variable]
 static struct sifive_fu540_macb_mgmt *mgmt;

Move the variable closer to its use inside of the #ifdef.

Fixes: c218ad5590 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:43:07 -07:00
Arnd Bergmann
0287f9ed16 gve: fix unused variable/label warnings
On unusual page sizes, we get harmless warnings:

drivers/net/ethernet/google/gve/gve_rx.c:283:6: error: unused variable 'pagecount' [-Werror,-Wunused-variable]
drivers/net/ethernet/google/gve/gve_rx.c:336:1: error: unused label 'have_skb' [-Werror,-Wunused-label]

Change the preprocessor #if to regular if() to avoid this.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:40:58 -07:00
Jose Abreu
4993e5b37e net: stmmac: Re-work the queue selection for TSO packets
Ben Hutchings says:
	"This is the wrong place to change the queue mapping.
	stmmac_xmit() is called with a specific TX queue locked,
	and accessing a different TX queue results in a data race
	for all of that queue's state.

	I think this commit should be reverted upstream and in all
	stable branches.  Instead, the driver should implement the
	ndo_select_queue operation and override the queue mapping there."

Fixes: c5acdbee22 ("net: stmmac: Send TSO packets always from Queue 0")
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:36:06 -07:00
Martin Habets
05cfee98c8 sfc: Remove 'PCIE error reporting unavailable'
This is only at notice level but it was pointed out that no other driver
does this.
Also there is no action the user can take as it is really a property of
the server.

Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:16:48 -07:00
David S. Miller
47cfb90406 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu.

2) Support for bridge pvid matching, from wenxu.

3) Support for bridge vlan protocol matching, also from wenxu.

4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid
   from packet path.

5) Prefer specific family extension in nf_tables.

6) Autoload specific family extension in case it is missing.

7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera.

8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko.

9) ICMP handling for GRE encapsulation, from Julian Anastasov.

10) Remove unused parameter in nf_queue, from Florian Westphal.

11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring.

12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes
    public.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 12:13:38 -07:00
Ilias Apalodimas
bfb204129a net: netsec: Sync dma for device on buffer allocation
cd1973a921 ("net: netsec: Sync dma for device on buffer allocation")
was merged on it's v1 instead of the v3.
Merge the proper patch version

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08 11:48:10 -07:00
Quentin Monnet
8fc9f8bedf tools: bpftool: add completion for bpftool prog "loadall"
Bash completion for proposing the "loadall" subcommand is missing. Let's
add it to the completion script.

Add a specific case to propose "load" and "loadall" for completing:

    $ bpftool prog load
                       ^ cursor is here

Otherwise, completion considers that $command is in load|loadall and
starts making related completions (file or directory names, as the
number of words on the command line is below 6), when the only suggested
keywords should be "load" and "loadall" until one has been picked and a
space entered after that to move to the next word.

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:20:34 +02:00
Arnd Bergmann
bef8e26392 bpf: avoid unused variable warning in tcp_bpf_rtt()
When CONFIG_BPF is disabled, we get a warning for an unused
variable:

In file included from drivers/target/target_core_device.c:26:
include/net/tcp.h:2226:19: error: unused variable 'tp' [-Werror,-Wunused-variable]
        struct tcp_sock *tp = tcp_sk(sk);

The variable is only used in one place, so it can be
replaced with its value there to avoid the warning.

Fixes: 23729ff231 ("bpf: add BPF_CGROUP_SOCK_OPS callback that is executed on every RTT")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:18:03 +02:00
YueHaibing
6705fea0c7 bpf: cgroup: Fix build error without CONFIG_NET
If CONFIG_NET is not set and CONFIG_CGROUP_BPF=y,
gcc building fails:

kernel/bpf/cgroup.o: In function `cg_sockopt_func_proto':
cgroup.c:(.text+0x237e): undefined reference to `bpf_sk_storage_get_proto'
cgroup.c:(.text+0x2394): undefined reference to `bpf_sk_storage_delete_proto'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_getsockopt':
(.text+0x2a1f): undefined reference to `lock_sock_nested'
(.text+0x2ca2): undefined reference to `release_sock'
kernel/bpf/cgroup.o: In function `__cgroup_bpf_run_filter_setsockopt':
(.text+0x3006): undefined reference to `lock_sock_nested'
(.text+0x32bb): undefined reference to `release_sock'

Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Stanislav Fomichev <sdf@fomichev.me>
Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 17:17:00 +02:00
Andrii Nakryiko
06ec0e2c49 selftests/bpf: fix test_attach_probe map definition
ef99b02b23 ("libbpf: capture value in BTF type info for BTF-defined map
defs") changed BTF-defined maps syntax, while independently merged
1e8611bbdf ("selftests/bpf: add kprobe/uprobe selftests") added new
test using outdated syntax of maps. This patch fixes this test after
corresponding patch sets were merged.

Fixes: ef99b02b23 ("libbpf: capture value in BTF type info for BTF-defined map defs")
Fixes: 1e8611bbdf ("selftests/bpf: add kprobe/uprobe selftests")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:25:58 +02:00
Daniel Borkmann
8bfec4f325 Merge branch 'bpf-sockaddr-wide-store'
Stanislav Fomichev says:

====================
Clang can generate 8-byte stores for user_ip6 & msg_src_ip6,
let's support that on the verifier side.

v3:
* fix comments spelling an -> and (Andrii Nakryiko)

v2:
* Add simple cover letter (Yonghong Song)
* Update comments (Yonghong Song)
* Remove [4] selftests (Yonghong Song)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:56 +02:00
Stanislav Fomichev
76d950773c selftests/bpf: add verifier tests for wide stores
Make sure that wide stores are allowed at proper (aligned) addresses.
Note that user_ip6 is naturally aligned on 8-byte boundary, so
correct addresses are user_ip6[0] and user_ip6[2]. msg_src_ip6 is,
however, aligned on a 4-byte bondary, so only msg_src_ip6[1]
can be wide-stored.

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Stanislav Fomichev
4cfacbe6df bpf: sync bpf.h to tools/
Sync user_ip6 & msg_src_ip6 comments.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Stanislav Fomichev
600c70bad6 bpf: allow wide (u64) aligned stores for some fields of bpf_sock_addr
Since commit cd17d77705 ("bpf/tools: sync bpf.h") clang decided
that it can do a single u64 store into user_ip6[2] instead of two
separate u32 ones:

 #  17: (18) r2 = 0x100000000000000
 #  ; ctx->user_ip6[2] = bpf_htonl(DST_REWRITE_IP6_2);
 #  19: (7b) *(u64 *)(r1 +16) = r2
 #  invalid bpf_context access off=16 size=8

>From the compiler point of view it does look like a correct thing
to do, so let's support it on the kernel side.

Credit to Andrii Nakryiko for a proper implementation of
bpf_ctx_wide_store_ok.

Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: cd17d77705 ("bpf/tools: sync bpf.h")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 16:22:55 +02:00
Daniel Borkmann
d2850ce0bd Merge branch 'bpf-libbpf-perf-rb-api'
Andrii Nakryiko says:

====================
This patchset adds a high-level API for setting up and polling perf buffers
associated with BPF_MAP_TYPE_PERF_EVENT_ARRAY map. Details of APIs are
described in corresponding commit.

Patch #1 adds a set of APIs to set up and work with perf buffer.
Patch #2 enhances libbpf to support auto-setting PERF_EVENT_ARRAY map size.
Patch #3 adds test.
Patch #4 converts bpftool map event_pipe to new API.
Patch #5 updates README to mention perf_buffer_ prefix.

v6->v7:
- __x64_ syscall prefix (Yonghong);
v5->v6:
- fix C99 for loop variable initialization usage (Yonghong);
v4->v5:
- initialize perf_buffer_raw_opts in bpftool map event_pipe (Jakub);
- add perf_buffer_ to README;
v3->v4:
- fixed bpftool event_pipe cmd error handling (Jakub);
v2->v3:
- added perf_buffer__new_raw for more low-level control;
- converted bpftool map event_pipe to new API (Daniel);
- fixed bug with error handling in create_maps (Song);
v1->v2:
- add auto-sizing of PERF_EVENT_ARRAY maps;
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:45 +02:00
Andrii Nakryiko
cd07a95f9b libbpf: add perf_buffer_ prefix to README
perf_buffer "object" is part of libbpf API now, add it to the list of
libbpf function prefixes.

Suggested-by: Daniel Borkman <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:43 +02:00
Andrii Nakryiko
9b190f185d tools/bpftool: switch map event_pipe to libbpf's perf_buffer
Switch event_pipe implementation to rely on new libbpf perf buffer API
(it's raw low-level variant).

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:43 +02:00
Andrii Nakryiko
ee5cf82ce0 selftests/bpf: test perf buffer API
Add test verifying perf buffer API functionality.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:43 +02:00
Andrii Nakryiko
d7ff34d5fb libbpf: auto-set PERF_EVENT_ARRAY size to number of CPUs
For BPF_MAP_TYPE_PERF_EVENT_ARRAY typically correct size is number of
possible CPUs. This is impossible to specify at compilation time. This
change adds automatic setting of PERF_EVENT_ARRAY size to number of
system CPUs, unless non-zero size is specified explicitly. This allows
to adjust size for advanced specific cases, while providing convenient
and logical defaults.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:43 +02:00
Andrii Nakryiko
fb84b82246 libbpf: add perf buffer API
BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
to user space for additional processing. libbpf already has very low-level API
to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
use and requires a lot of code to set everything up. This patch adds
perf_buffer abstraction on top of it, abstracting setting up and polling
per-CPU logic into simple and convenient API, similar to what BCC provides.

perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
map entries. It accepts two user-provided callbacks: one for handling raw
samples and one for get notifications of lost samples due to buffer overflow.

perf_buffer__new_raw() is similar, but provides more control over how
perf events are set up (by accepting user-provided perf_event_attr), how
they are handled (perf_event_header pointer is passed directly to
user-provided callback), and on which CPUs ring buffers are created
(it's possible to provide a list of CPUs and corresponding map keys to
update). This API allows advanced users fuller control.

perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
utilizing epoll instance.

perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.

All APIs are not thread-safe. User should ensure proper locking/coordination if
used in multi-threaded set up.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-08 15:35:43 +02:00
Gary Lin
36c4357c63 net: bpfilter: print umh messages to /dev/kmsg
bpfilter_umh currently printed all messages to /dev/console and this
might interfere the user activity(*).

This commit changes the output device to /dev/kmsg so that the messages
from bpfilter_umh won't show on the console directly.

(*) https://bugzilla.suse.com/show_bug.cgi?id=1140221

Signed-off-by: Gary Lin <glin@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 22:16:11 -07:00
Xue Chaojing
61a582be1a hinic: add fw version query
This patch adds firmware version query in ethtool -i.

Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 22:15:12 -07:00
Wei Yongjun
877cb240f6 gve: Fix error return code in gve_alloc_qpls()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 19:25:36 -07:00
Joe Perches
6b4ddf9950 net: nixge: Fix misuse of strlcpy
Probable cut&paste typo - use the correct field size.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 19:23:05 -07:00
Joe Perches
7391324ba4 net: ethernet: sun4i-emac: Fix misuse of strlcpy
Probable cut&paste typo - use the correct field size.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 19:22:03 -07:00
David S. Miller
0799678349 Merge branch 'gtp-fix-several-bugs'
Taehee Yoo says:

====================
gtp: fix several bugs

This patch series fixes several bugs in the gtp module.

First patch fixes suspicious RCU usage.
The problem is to use rcu_dereference_sk_user_data() outside of
RCU read critical section.

Second patch fixes use-after-free.
gtp_encap_destroy() is called twice.
gtp_encap_destroy() use both gtp->sk0 and gtp->sk1u.
these pointers can be freed in gtp_encap_destroy().
So, gtp_encap_destroy() should avoid using freed sk pointer.

Third patch removes duplicate code in gtp_dellink().
gtp_dellink() calls gtp_encap_disable() twice.
So, remove one of them.

Fourth patch fixes usage of GFP_KERNEL.
GFP_KERNEL can not be used in RCU read critical section.
This patch make ipv4_pdp_add() to use GFP_ATOMIC instead of GFP_KERNEL.

Fifth patch fixes use-after-free in gtp_newlink().
gtp_newlink() uses gtp_net which would be destroyed by the __exit_net
routine.
So, gtp_newlink should not be called after the __exit_net routine.

Sixth patch adds missing error handling routine in gtp_encap_enable().
gtp_encap_enable() will fail, if invalid role value is sent from
user-space. if so, gtp_encap_enable() should execute error handling
routine.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:48 -07:00
Taehee Yoo
e30155fd23 gtp: add missing gtp_encap_disable_sock() in gtp_encap_enable()
If an invalid role is sent from user space, gtp_encap_enable() will fail.
Then, it should call gtp_encap_disable_sock() but current code doesn't.
It makes memory leak.

Fixes: 91ed81f9ab ("gtp: support SGSN-side tunnels")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:48 -07:00
Taehee Yoo
a2bed90704 gtp: fix use-after-free in gtp_newlink()
Current gtp_newlink() could be called after unregister_pernet_subsys().
gtp_newlink() uses gtp_net but it can be destroyed by
unregister_pernet_subsys().
So unregister_pernet_subsys() should be called after
rtnl_link_unregister().

Test commands:
   #SHELL 1
   while :
   do
	   for i in {1..5}
	   do
		./gtp-link add gtp$i &
	   done
	   killall gtp-link
   done

   #SHELL 2
   while :
   do
	modprobe -rv gtp
   done

Splat looks like:
[  753.176631] BUG: KASAN: use-after-free in gtp_newlink+0x9b4/0xa5c [gtp]
[  753.177722] Read of size 8 at addr ffff8880d48f2458 by task gtp-link/7126
[  753.179082] CPU: 0 PID: 7126 Comm: gtp-link Tainted: G        W         5.2.0-rc6+ #50
[  753.185801] Call Trace:
[  753.186264]  dump_stack+0x7c/0xbb
[  753.186863]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.187583]  print_address_description+0xc7/0x240
[  753.188382]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.189097]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.189846]  __kasan_report+0x12a/0x16f
[  753.190542]  ? gtp_newlink+0x9b4/0xa5c [gtp]
[  753.191298]  kasan_report+0xe/0x20
[  753.191893]  gtp_newlink+0x9b4/0xa5c [gtp]
[  753.192580]  ? __netlink_ns_capable+0xc3/0xf0
[  753.193370]  __rtnl_newlink+0xb9f/0x11b0
[ ... ]
[  753.241201] Allocated by task 7186:
[  753.241844]  save_stack+0x19/0x80
[  753.242399]  __kasan_kmalloc.constprop.3+0xa0/0xd0
[  753.243192]  __kmalloc+0x13e/0x300
[  753.243764]  ops_init+0xd6/0x350
[  753.244314]  register_pernet_operations+0x249/0x6f0
[ ... ]
[  753.251770] Freed by task 7178:
[  753.252288]  save_stack+0x19/0x80
[  753.252833]  __kasan_slab_free+0x111/0x150
[  753.253962]  kfree+0xc7/0x280
[  753.254509]  ops_free_list.part.11+0x1c4/0x2d0
[  753.255241]  unregister_pernet_operations+0x262/0x390
[ ... ]
[  753.285883] list_add corruption. next->prev should be prev (ffff8880d48f2458), but was ffff8880d497d878. (next.
[  753.287241] ------------[ cut here ]------------
[  753.287794] kernel BUG at lib/list_debug.c:25!
[  753.288364] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[  753.289099] CPU: 0 PID: 7126 Comm: gtp-link Tainted: G    B   W         5.2.0-rc6+ #50
[  753.291036] RIP: 0010:__list_add_valid+0x74/0xd0
[  753.291589] Code: 48 39 da 75 27 48 39 f5 74 36 48 39 dd 74 31 48 83 c4 08 b8 01 00 00 00 5b 5d c3 48 89 d9 48b
[  753.293779] RSP: 0018:ffff8880cae8f398 EFLAGS: 00010286
[  753.294401] RAX: 0000000000000075 RBX: ffff8880d497d878 RCX: 0000000000000000
[  753.296260] RDX: 0000000000000075 RSI: 0000000000000008 RDI: ffffed10195d1e69
[  753.297070] RBP: ffff8880cd250ae0 R08: ffffed101b4bff21 R09: ffffed101b4bff21
[  753.297899] R10: 0000000000000001 R11: ffffed101b4bff20 R12: ffff8880d497d878
[  753.298703] R13: 0000000000000000 R14: ffff8880cd250ae0 R15: ffff8880d48f2458
[  753.299564] FS:  00007f5f79805740(0000) GS:ffff8880da400000(0000) knlGS:0000000000000000
[  753.300533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  753.301231] CR2: 00007fe8c7ef4f10 CR3: 00000000b71a6006 CR4: 00000000000606f0
[  753.302183] Call Trace:
[  753.302530]  gtp_newlink+0x5f6/0xa5c [gtp]
[  753.303037]  ? __netlink_ns_capable+0xc3/0xf0
[  753.303576]  __rtnl_newlink+0xb9f/0x11b0
[  753.304092]  ? rtnl_link_unregister+0x230/0x230

Fixes: 459aa660eb ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:47 -07:00
Taehee Yoo
3f167e1921 gtp: fix Illegal context switch in RCU read-side critical section.
ipv4_pdp_add() is called in RCU read-side critical section.
So GFP_KERNEL should not be used in the function.
This patch make ipv4_pdp_add() to use GFP_ATOMIC instead of GFP_KERNEL.

Test commands:
gtp-link add gtp1 &
gtp-tunnel add gtp1 v1 100 200 1.1.1.1 2.2.2.2

Splat looks like:
[  130.618881] =============================
[  130.626382] WARNING: suspicious RCU usage
[  130.626994] 5.2.0-rc6+ #50 Not tainted
[  130.627622] -----------------------------
[  130.628223] ./include/linux/rcupdate.h:266 Illegal context switch in RCU read-side critical section!
[  130.629684]
[  130.629684] other info that might help us debug this:
[  130.629684]
[  130.631022]
[  130.631022] rcu_scheduler_active = 2, debug_locks = 1
[  130.632136] 4 locks held by gtp-tunnel/1025:
[  130.632925]  #0: 000000002b93c8b7 (cb_lock){++++}, at: genl_rcv+0x15/0x40
[  130.634159]  #1: 00000000f17bc999 (genl_mutex){+.+.}, at: genl_rcv_msg+0xfb/0x130
[  130.635487]  #2: 00000000c644ed8e (rtnl_mutex){+.+.}, at: gtp_genl_new_pdp+0x18c/0x1150 [gtp]
[  130.636936]  #3: 0000000007a1cde7 (rcu_read_lock){....}, at: gtp_genl_new_pdp+0x187/0x1150 [gtp]
[  130.638348]
[  130.638348] stack backtrace:
[  130.639062] CPU: 1 PID: 1025 Comm: gtp-tunnel Not tainted 5.2.0-rc6+ #50
[  130.641318] Call Trace:
[  130.641707]  dump_stack+0x7c/0xbb
[  130.642252]  ___might_sleep+0x2c0/0x3b0
[  130.642862]  kmem_cache_alloc_trace+0x1cd/0x2b0
[  130.643591]  gtp_genl_new_pdp+0x6c5/0x1150 [gtp]
[  130.644371]  genl_family_rcv_msg+0x63a/0x1030
[  130.645074]  ? mutex_lock_io_nested+0x1090/0x1090
[  130.645845]  ? genl_unregister_family+0x630/0x630
[  130.646592]  ? debug_show_all_locks+0x2d0/0x2d0
[  130.647293]  ? check_flags.part.40+0x440/0x440
[  130.648099]  genl_rcv_msg+0xa3/0x130
[ ... ]

Fixes: 459aa660eb ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:47 -07:00
Taehee Yoo
a635037ac1 gtp: remove duplicate code in gtp_dellink()
gtp_encap_disable() in gtp_dellink() is unnecessary because it will be
called by unregister_netdevice().
unregister_netdevice() internally calls gtp_dev_uninit() by ->ndo_uninit().
And gtp_dev_uninit() calls gtp_encap_disable().

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:47 -07:00
Taehee Yoo
1788b8569f gtp: fix use-after-free in gtp_encap_destroy()
gtp_encap_destroy() is called twice.
1. When interface is deleted.
2. When udp socket is destroyed.
either gtp->sk0 or gtp->sk1u could be freed by sock_put() in
gtp_encap_destroy(). so, when gtp_encap_destroy() is called again,
it would uses freed sk pointer.

patch makes gtp_encap_destroy() to set either gtp->sk0 or gtp->sk1u to
null. in addition, both gtp->sk0 and gtp->sk1u pointer are protected
by rtnl_lock. so, rtnl_lock() is added.

Test command:
   gtp-link add gtp1 &
   killall gtp-link
   ip link del gtp1

Splat looks like:
[   83.182767] BUG: KASAN: use-after-free in __lock_acquire+0x3a20/0x46a0
[   83.184128] Read of size 8 at addr ffff8880cc7d5360 by task ip/1008
[   83.185567] CPU: 1 PID: 1008 Comm: ip Not tainted 5.2.0-rc6+ #50
[   83.188469] Call Trace:
[ ... ]
[   83.200126]  lock_acquire+0x141/0x380
[   83.200575]  ? lock_sock_nested+0x3a/0xf0
[   83.201069]  _raw_spin_lock_bh+0x38/0x70
[   83.201551]  ? lock_sock_nested+0x3a/0xf0
[   83.202044]  lock_sock_nested+0x3a/0xf0
[   83.202520]  gtp_encap_destroy+0x18/0xe0 [gtp]
[   83.203065]  gtp_encap_disable.isra.14+0x13/0x50 [gtp]
[   83.203687]  gtp_dellink+0x56/0x170 [gtp]
[   83.204190]  rtnl_delete_link+0xb4/0x100
[ ... ]
[   83.236513] Allocated by task 976:
[   83.236925]  save_stack+0x19/0x80
[   83.237332]  __kasan_kmalloc.constprop.3+0xa0/0xd0
[   83.237894]  kmem_cache_alloc+0xd8/0x280
[   83.238360]  sk_prot_alloc.isra.42+0x50/0x200
[   83.238874]  sk_alloc+0x32/0x940
[   83.239264]  inet_create+0x283/0xc20
[   83.239684]  __sock_create+0x2dd/0x540
[   83.240136]  __sys_socket+0xca/0x1a0
[   83.240550]  __x64_sys_socket+0x6f/0xb0
[   83.240998]  do_syscall_64+0x9c/0x450
[   83.241466]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   83.242061]
[   83.242249] Freed by task 0:
[   83.242616]  save_stack+0x19/0x80
[   83.243013]  __kasan_slab_free+0x111/0x150
[   83.243498]  kmem_cache_free+0x89/0x250
[   83.244444]  __sk_destruct+0x38f/0x5a0
[   83.245366]  rcu_core+0x7e9/0x1c20
[   83.245766]  __do_softirq+0x213/0x8fa

Fixes: 1e3a3abd8b ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:47 -07:00
Taehee Yoo
e198987e7d gtp: fix suspicious RCU usage
gtp_encap_enable_socket() and gtp_encap_destroy() are not protected
by rcu_read_lock(). and it's not safe to write sk->sk_user_data.
This patch make these functions to use lock_sock() instead of
rcu_dereference_sk_user_data().

Test commands:
    gtp-link add gtp1

Splat looks like:
[   83.238315] =============================
[   83.239127] WARNING: suspicious RCU usage
[   83.239702] 5.2.0-rc6+ #49 Not tainted
[   83.240268] -----------------------------
[   83.241205] drivers/net/gtp.c:799 suspicious rcu_dereference_check() usage!
[   83.243828]
[   83.243828] other info that might help us debug this:
[   83.243828]
[   83.246325]
[   83.246325] rcu_scheduler_active = 2, debug_locks = 1
[   83.247314] 1 lock held by gtp-link/1008:
[   83.248523]  #0: 0000000017772c7f (rtnl_mutex){+.+.}, at: __rtnl_newlink+0x5f5/0x11b0
[   83.251503]
[   83.251503] stack backtrace:
[   83.252173] CPU: 0 PID: 1008 Comm: gtp-link Not tainted 5.2.0-rc6+ #49
[   83.253271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   83.254562] Call Trace:
[   83.254995]  dump_stack+0x7c/0xbb
[   83.255567]  gtp_encap_enable_socket+0x2df/0x360 [gtp]
[   83.256415]  ? gtp_find_dev+0x1a0/0x1a0 [gtp]
[   83.257161]  ? memset+0x1f/0x40
[   83.257843]  gtp_newlink+0x90/0xa21 [gtp]
[   83.258497]  ? __netlink_ns_capable+0xc3/0xf0
[   83.259260]  __rtnl_newlink+0xb9f/0x11b0
[   83.260022]  ? rtnl_link_unregister+0x230/0x230
[ ... ]

Fixes: 1e3a3abd8b ("gtp: make GTP sockets in gtp_newlink optional")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 18:42:47 -07:00
David S. Miller
ad7b134f65 Merge branch 'net-dsa-Add-Vitesse-VSC73xx-parallel-mode'
Pawel Dembicki says:

====================
net: dsa: Add Vitesse VSC73xx parallel mode

Main goal of this patch series is to add support for CPU attached parallel
bus in Vitesse VSC73xx switches. Existing driver supports only SPI mode.

Second change is needed for devices in unmanaged state.

V3:
- fix commit messages and descriptions about memory-mapped I/O mode

V2:
- drop changes in compatible strings
- make changes less invasive
- drop mutex in platform part and move mutex from core to spi part
- fix indentation
- fix devm_ioremap_resource result check
- add cover letter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:16:32 -07:00
Pawel Dembicki
1da39ff09e net: dsa: vsc73xx: Assert reset if iCPU is enabled
Driver allow to use devices with disabled iCPU only.

Some devices have pre-initialised iCPU by bootloader.
That state make switch unmanaged. This patch force reset
if device is in unmanaged state. In the result chip lost
internal firmware from RAM and it can be managed.

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:16:32 -07:00
Pawel Dembicki
631e83bf7c net: dsa: vsc73xx: add support for parallel mode
This patch add platform part of vsc73xx driver.
It allows to use chip connected to a parallel memory bus and work in
memory-mapped I/O mode. (aka PI bus in chip manual)

By default device is working in big endian mode.

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:16:32 -07:00
Pawel Dembicki
95711cd5f0 net: dsa: vsc73xx: Split vsc73xx driver
This driver (currently) only takes control of the switch chip over
SPI and configures it to route packages around when connected to a
CPU port. But Vitesse chip support also parallel interface.

This patch split driver into two parts: core and spi. It is required
for add support to another managing interface.

Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:16:32 -07:00
Pawel Dembicki
86116f4d64 net: dsa: Change DT bindings for Vitesse VSC73xx switches
This commit introduce how to use vsc73xx platform driver.

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:16:32 -07:00
David S. Miller
ccd1479e48 Merge branch 'net-tls-fix-poll-wake-up'
Jakub Kicinski says:

====================
net/tls: fix poll() wake up

This small fix + selftest series is very similar to the previous
commit 04b25a5411 ("net/tls: fix no wakeup on partial reads").
This time instead of recvmsg we're fixing poll wake up.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:11:45 -07:00
Jakub Kicinski
81a89ef6be selftests/tls: add test for poll() with data in TLS ULP
Add a test which checks if leftover record data in TLS
layer correctly wakes up poll().

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:11:45 -07:00
Jakub Kicinski
13aecb17ac net/tls: fix poll ignoring partially copied records
David reports that RPC applications which use epoll() occasionally
get stuck, and that TLS ULP causes the kernel to not wake applications,
even though read() will return data.

This is indeed true. The ctx->rx_list which holds partially copied
records is not consulted when deciding whether socket is readable.

Note that SO_RCVLOWAT with epoll() is and has always been broken for
kernel TLS. We'd need to parse all records from the TCP layer, instead
of just the first one.

Fixes: 692d7b5d1f ("tls: Fix recvmsg() to be able to peek across multiple records")
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 14:11:44 -07:00
Xin Long
30a4616c1b tipc: use rcu dereference functions properly
For these places are protected by rcu_read_lock, we change from
rcu_dereference_rtnl to rcu_dereference, as there is no need to
check if rtnl lock is held.

For these places are protected by rtnl_lock, we change from
rcu_dereference_rtnl to rtnl_dereference/rcu_dereference_protected,
as no extra memory barriers are needed under rtnl_lock() which also
protects tn->bearer_list[] and dev->tipc_ptr/b->media_ptr updating.

rcu_dereference_rtnl will be only used in the places where it could
be under rcu_read_lock or rtnl_lock.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-07 13:18:53 -07:00