linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 12:37:04 +07:00

Author	SHA1	Message	Date
Lorenz Bauer	6550f2dddf	bpf: sockmap: Enable map_update_elem from bpf_iter Allow passing a pointer to a BTF struct sock_common* when updating a sockmap or sockhash. Since BTF pointers can fault and therefore be NULL at runtime we need to add an additional !sk check to sock_map_update_elem. Since we may be passed a request or timewait socket we also need to check sk_fullsock. Doing this allows calling map_update_elem on sockmap from bpf_iter context, which uses BTF pointers. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com	2020-09-28 16:40:46 -07:00
Lorenzo Bianconi	efa90b5093	bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine signature since it is no longer needed. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org	2020-09-28 23:30:42 +02:00
Song Liu	09d8ad1688	selftests/bpf: Add raw_tp_test_run This test runs test_run for raw_tracepoint program. The test covers ctx input, retval output, and running on correct cpu. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-4-songliubraving@fb.com	2020-09-28 21:52:36 +02:00
Song Liu	88f7fe7233	libbpf: Support test run of raw tracepoint programs Add bpf_prog_test_run_opts() with support of new fields in bpf_attr.test, namely, flags and cpu. Also extend _opts operations to support outputs via opts. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-3-songliubraving@fb.com	2020-09-28 21:52:36 +02:00
Song Liu	1b4d60ec16	bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint Add .test_run for raw_tracepoint. Also, introduce a new feature that runs the target program on a specific CPU. This is achieved by a new flag in bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for BPF programs that handle perf_event and other percpu resources, as the program can access these resource locally. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com	2020-09-28 21:52:36 +02:00
Magnus Karlsson	1fd17c8cd0	xsk: Fix possible crash in socket_release when out-of-memory Fix possible crash in socket_release when an out-of-memory error has occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag encountered an error in xp_create_and_assign_umem, the bind code jumped to the exit routine but erroneously forgot to set the err value before jumping. This meant that the exit routine thought the setup went well and set the state of the socket to XSK_BOUND. The xsk socket release code will then, at application exit, think that this is a properly setup socket, when it is not, leading to a crash when all fields in the socket have in fact not been initialized properly. Fix this by setting the err variable in xsk_bind so that the socket is not set to XSK_BOUND which leads to the clean-up in xsk_release not being triggered. Fixes: `1c1efc2af1` ("xsk: Create and free buffer pool independently from umem") Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com	2020-09-28 21:18:01 +02:00
John Fastabend	ba5f4cfeac	bpf: Add comment to document BTF type PTR_TO_BTF_ID_OR_NULL The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the type PTR_TO_SOCKET is guaranteed to _not_ have a null value. In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value, but it is safe to read this pointer in the program context because the program context will handle any faults. The fallout is for PTR_TO_BTF_ID the verifier can assume reads are safe, but can not use the type in branch analysis. Additionally, authors need to be extra careful when passing PTR_TO_BTF_ID into helpers. In general helpers consuming type PTR_TO_BTF_ID will need to assume it may be null. Seeing the above is not obvious to readers without the back knowledge lets add a comment in the type definition. Editorial comment, as networking and tracing programs get closer and more tightly merged we may need to consider a new type that we can ensure is non-null for branch analysis and also passing into helpers. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com>	2020-09-25 17:05:14 -07:00
John Fastabend	99d4def4d0	bpf: Add AND verifier test case where 32bit and 64bit bounds differ If we AND two values together that are known in the 32bit subregs, but not known in the 64bit registers we rely on the tnum value to report the 32bit subreg is known. And do not use mark_reg_known() directly from scalar32_min_max_and() Add an AND test to cover the case with known 32bit subreg, but unknown 64bit reg. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-25 16:47:21 -07:00
John Fastabend	4fbb38a3b2	bpf, verifier: Remove redundant var_off.value ops in scalar known reg cases In BPF_AND and BPF_OR alu cases we have this pattern when the src and dst tnum is a constant. 1 dst_reg->var_off = tnum_[op](dst_reg->var_off, src_reg.var_off) 2 scalar32_min_max_[op] 3 if (known) return 4 scalar_min_max_[op] 5 if (known) 6 __mark_reg_known(dst_reg, dst_reg->var_off.value [op] src_reg.var_off.value) The result is in 1 we calculate the var_off value and store it in the dst_reg. Then in 6 we duplicate this logic doing the op again on the value. The duplication comes from the the tnum_[op] handlers because they have already done the value calcuation. For example this is tnum_and(). struct tnum tnum_and(struct tnum a, struct tnum b) { u64 alpha, beta, v; alpha = a.value \| a.mask; beta = b.value \| b.mask; v = a.value & b.value; return TNUM(v, alpha & beta & ~v); } So lets remove the redundant op calculation. Its confusing for readers and unnecessary. Its also not harmful because those ops have the property, r1 & r1 = r1 and r1 \| r1 = r1. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-25 16:47:21 -07:00
Alexei Starovoitov	84085f8772	Merge branch 'enable-bpf_skc-cast-for-networking-progs' Martin KaFai Lau says: ==================== This set allows networking prog type to directly read fields from the in-kernel socket type, e.g. "struct tcp_sock". Patch 2 has the details on the use case. v3: - Pass arg_btf_id instead of fn into check_reg_type() in Patch 1 (Lorenz) - Move arg_btf_id from func_proto to struct bpf_reg_types in Patch 2 (Lorenz) - Remove test_sock_fields from .gitignore in Patch 8 (Andrii) - Add tests to have better coverage on the modified helpers (Alexei) Patch 13 is added. - Use "void *sk" as the helper argument in UAPI bpf.h v3: - ARG_PTR_TO_SOCK_COMMON_OR_NULL was attempted in v2. The _OR_NULL was needed because the PTR_TO_BTF_ID could be NULL but note that a could be NULL PTR_TO_BTF_ID is not a scalar NULL to the verifier. "_OR_NULL" implicitly gives an expectation that the helper can take a scalar NULL which does not make sense in most (except one) helpers. Passing scalar NULL should be rejected at the verification time. Thus, this patch uses ARG_PTR_TO_BTF_ID_SOCK_COMMON to specify that the helper can take both the btf-id ptr or the legacy PTR_TO_SOCK_COMMON but not scalar NULL. It requires the func_proto to explicitly specify the arg_btf_id such that there is a very clear expectation that the helper can handle a NULL PTR_TO_BTF_ID. v2: - Add ARG_PTR_TO_SOCK_COMMON_OR_NULL (Lorenz) ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-25 14:16:35 -07:00
Martin KaFai Lau	9a856cae22	bpf: selftest: Add test_btf_skc_cls_ingress This patch attaches a classifier prog to the ingress filter. It exercises the following helpers with different socket pointer types in different logical branches: 1. bpf_sk_release() 2. bpf_sk_assign() 3. bpf_skc_to_tcp_request_sock(), bpf_skc_to_tcp_sock() 4. bpf_tcp_gen_syncookie, bpf_tcp_check_syncookie Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000458.3859627-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	0c402c6c30	bpf: selftest: Remove enum tcp_ca_state from bpf_tcp_helpers.h The enum tcp_ca_state is available in <linux/tcp.h>. Remove it from the bpf_tcp_helpers.h to avoid conflict when the bpf prog needs to include both both <linux/tcp.h> and bpf_tcp_helpers.h. Modify the bpf_cubic.c and bpf_dctcp.c to use <linux/tcp.h> instead. The <linux/stddef.h> is needed by <linux/tcp.h>. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000452.3859313-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	edc2d66ad1	bpf: selftest: Use bpf_skc_to_tcp_sock() in the sock_fields test This test uses bpf_skc_to_tcp_sock() to get a kernel tcp_sock ptr "ktp". Access the ktp->lsndtime and also pass ktp to bpf_sk_storage_get(). It also exercises the bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id() with the "ktp". To do that, a parent cgroup and a child cgroup are created. The bpf prog is attached to the child cgroup. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000446.3858975-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	c40a565a04	bpf: selftest: Use network_helpers in the sock_fields test This patch uses start_server() and connect_to_fd() from network_helpers.h to remove the network testing boiler plate codes. epoll is no longer needed also since the timeout has already been taken care of also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000440.3858639-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	b18c1f0aa4	bpf: selftest: Adapt sock_fields test to use skel and global variables skel is used. Global variables are used to store the result from bpf prog. addr_map, sock_result_map, and tcp_sock_result_map are gone. Instead, global variables listen_tp, srv_sa6, cli_tp,, srv_tp, listen_sk, srv_sk, and cli_sk are added. Because of that, bpf_addr_array_idx and bpf_result_array_idx are also no longer needed. CHECK() macro from test_progs.h is reused and bail as soon as a CHECK failure. shutdown() is used to ensure the previous data-ack is received. The bytes_acked, bytes_received, and the pkt_out_cnt checks are using "<" to accommodate the final ack may not have been received/sent. It is enough since it is not the focus of this test. The sk local storage is all initialized to 0xeB9F now, so the check_sk_pkt_out_cnt() always checks with the 0xeB9F base. It is to keep things simple. The next patch will reuse helpers from network_helpers.h to simplify things further. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000434.3858204-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	6f521a2bd2	bpf: selftest: Move sock_fields test into test_progs This is a mechanical change to 1. move test_sock_fields.c to prog_tests/sock_fields.c 2. rename progs/test_sock_fields_kern.c to progs/test_sock_fields.c Minimal change is made to the code itself. Next patch will make changes to use new ways of writing test, e.g. use skel and global variables. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000427.3857814-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	5d13746dd8	bpf: selftest: Add ref_tracking verifier test for bpf_skc casting The patch tests for: 1. bpf_sk_release() can be called on a tcp_sock btf_id ptr. 2. Ensure the tcp_sock btf_id pointer cannot be used after bpf_sk_release(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200925000421.3857616-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	27e5203bd9	bpf: Change bpf_sk_assign to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_sk_assign() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. The bpf_sk_lookup_assign() is taking ARG_PTR_TO_SOCKET_"OR_NULL". Meaning it specifically takes a literal NULL. ARG_PTR_TO_BTF_ID_SOCK_COMMON does not allow a literal NULL, so another ARG type is required for this purpose and another follow-up patch can be used if there is such need. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000415.3857374-1-kafai@fb.com	2020-09-25 13:58:02 -07:00
Martin KaFai Lau	c0df236e13	bpf: Change bpf_tcp__syncookie to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_tcp__syncookie() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200925000409.3856725-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	592a349864	bpf: Change bpf_sk_storage_() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON This patch changes the bpf_sk_storage_() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_*() helpers also. A micro benchmark has been done on a "cgroup_skb/egress" bpf program which does a bpf_sk_storage_get(). It was driven by netperf doing a 4096 connected UDP_STREAM test with 64bytes packet. The stats from "kernel.bpf_stats_enabled" shows no meaningful difference. The sk_storage_get_btf_proto, sk_storage_delete_btf_proto, btf_sk_storage_get_proto, and btf_sk_storage_delete_proto are no longer needed, so they are removed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/20200925000402.3856307-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	a5fa25adf0	bpf: Change bpf_sk_release and bpf_sk_cgroup_id to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON The previous patch allows the networking bpf prog to use the bpf_skc_to_() helpers to get a PTR_TO_BTF_ID socket pointer, e.g. "struct tcp_sock ". It allows the bpf prog to read all the fields of the tcp_sock. This patch changes the bpf_sk_release() and bpf_sk_cgroup_id() to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will work with the pointer returned by the bpf_skc_to_() helpers also. For example, the following will work: sk = bpf_skc_lookup_tcp(skb, tuple, tuplen, BPF_F_CURRENT_NETNS, 0); if (!sk) return; tp = bpf_skc_to_tcp_sock(sk); if (!tp) { bpf_sk_release(sk); return; } lsndtime = tp->lsndtime; / Pass tp to bpf_sk_release() will also work / bpf_sk_release(tp); Since PTR_TO_BTF_ID could be NULL, the helper taking ARG_PTR_TO_BTF_ID_SOCK_COMMON has to check for NULL at runtime. A btf_id of "struct sock" may not always mean a fullsock. Regardless the helper's running context may get a non-fullsock or not, considering fullsock check/handling is pretty cheap, it is better to keep the same verifier expectation on helper that takes ARG_PTR_TO_BTF_ID will be able to handle the minisock situation. In the bpf_sk_cgroup_id() case, it will try to get a fullsock by using sk_to_full_sk() as its skb variant bpf_sk"b"_cgroup_id() has already been doing. bpf_sk_release can already handle minisock, so nothing special has to be done. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200925000356.3856047-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	1df8f55a37	bpf: Enable bpf_skc_to_* sock casting helper to networking prog type There is a constant need to add more fields into the bpf_tcp_sock for the bpf programs running at tc, sock_ops...etc. A current workaround could be to use bpf_probe_read_kernel(). However, other than making another helper call for reading each field and missing CO-RE, it is also not as intuitive to use as directly reading "tp->lsndtime" for example. While already having perfmon cap to do bpf_probe_read_kernel(), it will be much easier if the bpf prog can directly read from the tcp_sock. This patch tries to do that by using the existing casting-helpers bpf_skc_to_() whose func_proto returns a btf_id. For example, the func_proto of bpf_skc_to_tcp_sock returns the btf_id of the kernel "struct tcp_sock". These helpers are also added to is_ptr_cast_function(). It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id. That will keep the ref-tracking works properly. The bpf_skc_to_ helpers are made available to most of the bpf prog types in filter.c. The bpf_skc_to_* helpers will be limited by perfmon cap. This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON]) or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_() helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will accept pointer obtained from skb->sk. Instead of specifying both arg_type and arg_btf_id in the same func_proto which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is always the same. Discussion in this thread: https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/ The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is expecting a PTR_TO_BTF_ID which could be NULL. This is the same behavior as the existing helper taking ARG_PTR_TO_BTF_ID. The _SOCK_COMMON part means the helper is also expecting the legacy SOCK_COMMON pointer. By excluding the _OR_NULL part, the bpf prog cannot call helper with a literal NULL which doesn't make sense in most cases. e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO__OR_NULL reg has to do a NULL check first before passing into the helper or else the bpf prog will be rejected. This behavior is nothing new and consistent with the current expectation during bpf-prog-load. [ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace ARG_PTR_TO_SOCK* of other existing helpers later such that those existing helpers can take the PTR_TO_BTF_ID returned by the bpf_skc_to_() helpers. The only special case is bpf_sk_lookup_assign() which can accept a literal NULL ptr. It has to be handled specially in another follow up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ] [ When converting the older helpers that take ARG_PTR_TO_SOCK in the later patch, if the kernel does not support BTF, ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON because no reg->type could have PTR_TO_BTF_ID in this case. It is not a concern for the newer-btf-only helper like the bpf_skc_to_*() here though because these helpers must require BTF vmlinux to begin with. ] Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Martin KaFai Lau	a968d5e277	bpf: Move the PTR_TO_BTF_ID check to check_reg_type() check_reg_type() checks whether a reg can be used as an arg of a func_proto. For PTR_TO_BTF_ID, the check is actually not completely done until the reg->btf_id is pointing to a kernel struct that is acceptable by the func_proto. Thus, this patch moves the btf_id check into check_reg_type(). "arg_type" and "arg_btf_id" are passed to check_reg_type() instead of "compatible". The compatible_reg_types[] usage is localized in check_reg_type() now. The "if (!btf_id) verbose(...); " is also removed since it won't happen. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200925000344.3854828-1-kafai@fb.com	2020-09-25 13:58:01 -07:00
Alexei Starovoitov	182bf3f3dd	Merge branch 'rtt-speedup.2020.09.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into bpf-next Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-23 19:32:09 -07:00
Alexei Starovoitov	f00f2f7fe8	Revert "bpf: Fix potential call bpf_link_free() in atomic context" This reverts commit `31f23a6a18`. This change made many selftests/bpf flaky: flow_dissector, sk_lookup, sk_assign and others. There was no issue in the code. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-09-23 19:14:11 -07:00
David S. Miller	3fc826f121	Merge branch 'net-dsa-bcm_sf2-Additional-DT-changes' Florian Fainelli says: ==================== net: dsa: bcm_sf2: Additional DT changes This patch series includes some additional changes to the bcm_sf2 in order to support the Device Tree firmwares provided on such platforms. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:51:16 -07:00
Florian Fainelli	0fa45ee3c1	net: dsa: bcm_sf2: Include address 0 for MDIO diversion We need to include MDIO address 0, which is how our Device Tree blobs indicate where to find the external BCM53125 switches. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:51:16 -07:00
Florian Fainelli	8c28044097	net: dsa: bcm_sf2: Disallow port 5 to be a DSA CPU port While the switch driver is written such that port 5 or 8 could be CPU ports, the use case on Broadcom STB chips is to use port 8 exclusively. The platform firmware does make port 5 comply to a proper DSA CPU port binding by specifiying an "ethernet" phandle. This is undesirable for now until we have an user-space configuration mechanism (such as devlink) which could support dynamically changing the port flavor at run time. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:51:15 -07:00
David S. Miller	9d33ffaaf3	Merge branch 'octeontx2-Add-support-for-VLAN-based-flow-distribution' George Cherian says: ==================== octeontx2: Add support for VLAN based flow distribution This series add support for VLAN based flow distribution for octeontx2 netdev driver. This adds support for configuring the same via ethtool. Following tests have been done. - Multi VLAN flow with same SD - Multi VLAN flow with same SDFN - Single VLAN flow with multi SD - Single VLAN flow with multi SDFN All tests done for udp/tcp both v4 and v6 ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:45:45 -07:00
George Cherian	a55ff8ef5a	octeontx2-pf: Support to change VLAN based RSS hash options via ethtool Add support to control rx-flow-hash based on VLAN. By default VLAN plus 4-tuple based hashing is enabled. Changes can be done runtime using ethtool To enable 2-tuple plus VLAN based flow distribution # ethtool -N <intf> rx-flow-hash <prot> sdv To enable 4-tuple plus VLAN based flow distribution # ethtool -N <intf> rx-flow-hash <prot> sdfnv Signed-off-by: George Cherian <george.cherian@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:45:23 -07:00
George Cherian	8f900363df	octeontx2-af: Add support for VLAN based RSS hashing Added support for PF/VF drivers to choose RSS flow key algorithm with VLAN tag included in hashing input data. Only CTAG is considered. Signed-off-by: George Cherian <george.cherian@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:45:23 -07:00
Mauro Carvalho Chehab	de2b541b3b	net: fix a new kernel-doc warning at dev.c kernel-doc expects the function prototype to be just after the kernel-doc markup, as otherwise it will get it all wrong: ./net/core/dev.c:10036: warning: Excess function parameter 'dev' description in 'WAIT_REFS_MIN_MSECS' Fixes: `0e4be9e57e` ("net: use exponential backoff in netdev_wait_allrefs") Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Francesco Ruggeri <fruggeri@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:42:54 -07:00
David S. Miller	774e9ea665	Merge branch 'net-mdio-ipq4019-add-Clause-45-support' Robert Marko says: ==================== net: mdio-ipq4019: add Clause 45 support This patch series adds support for Clause 45 to the driver. While at it also change some defines to upper case to match rest of the driver. Changes since v4: * Rebase onto net-next.git Changes since v1: * Drop clock patches, these need further investigation and no user for non default configuration has been found ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:41:15 -07:00
Robert Marko	06fb560602	net: mdio-ipq4019: add Clause 45 support While up-streaming the IPQ4019 driver it was thought that the controller had no Clause 45 support, but it actually does and its activated by writing a bit to the mode register. So lets add it as newer SoC-s use the same controller and Clause 45 compliant PHY-s. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Cc: Luka Perkov <luka.perkov@sartura.hr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:41:15 -07:00
Robert Marko	b840ec1efd	net: mdio-ipq4019: change defines to upper case In the commit adding the IPQ4019 MDIO driver, defines for timeout and sleep partially used lower case. Lets change it to upper case in line with the rest of driver defines. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Cc: Luka Perkov <luka.perkov@sartura.hr> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:41:15 -07:00
David S. Miller	35e3dbfafe	Merge branch 'Introduce-mbox-tracepoints-for-Octeontx2' Subbaraya Sundeep says: ==================== Introduce mbox tracepoints for Octeontx2 This patchset adds tracepoints support for mailbox. In Octeontx2, PFs and VFs need to communicate with AF for allocating and freeing resources. Once all the configuration is done by AF for a PF/VF then packet I/O can happen on PF/VF queues. When an interface is brought up many mailbox messages are sent to AF for initializing queues. Say a VF is brought up then each message is sent to PF and PF forwards to AF and response also traverses from AF to PF and then VF. To aid debugging, tracepoints are added at places where messages are allocated, sent and message interrupts. Below is the trace of one of the messages from VF to AF and AF response back to VF: ~ # echo 1 > /sys/kernel/tracing/events/rvu/enable ~ # ifconfig eth20 up [ 279.379559] eth20 NIC Link is UP 10000 Mbps Full duplex ~ # cat /sys/kernel/tracing/trace ifconfig-171 [000] .... 275.753345: otx2_msg_alloc: [0002:02:00.1] msg:(0x400) size:40 ifconfig-171 [000] ...1 275.753347: otx2_msg_send: [0002:02:00.1] sent 1 msg(s) of size:48 <idle>-0 [001] dNh1 275.753356: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt VF(s) to PF (0x1) kworker/u9:1-90 [001] ...1 275.753364: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48 kworker/u9:1-90 [001] d.h. 275.753367: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2) kworker/u9:2-167 [002] .... 275.753535: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0 kworker/u9:2-167 [002] ...1 275.753537: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32 <idle>-0 [003] d.h1 275.753543: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1) <idle>-0 [001] d.h2 275.754376: otx2_msg_interrupt: [0002:02:00.1] mbox interrupt PF to VF (0x1) v3 changes: Removed EXPORT_TRACEPOINT_SYMBOLS of otx2_msg_send and otx2_msg_check since they are called locally only v2 changes: Removed otx2_msg_err tracepoint since it is similar to devlink_hwerr and it will be used instead when devlink supported is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:35:26 -07:00
Subbaraya Sundeep	31a9746062	octeontx2-pf: Add tracepoints for PF/VF mailbox With tracepoints support present in the mailbox code this patch adds tracepoints in PF and VF drivers at places where mailbox messages are allocated, sent and at message interrupts. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:35:26 -07:00
Subbaraya Sundeep	49142d1236	octeontx2-af: Introduce tracepoints for mailbox Added tracepoints in mailbox code so that the mailbox operations like message allocation, sending message and message interrupts are traced. Also the mailbox errors occurred like timeout or wrong responses are traced. These will help in debugging mailbox issues. Here's an example output showing one of the mailbox messages sent by PF to AF and AF responding to it: ~# mount -t tracefs none /sys/kernel/tracing/ ~# echo 1 > /sys/kernel/tracing/events/rvu/enable ~# ifconfig eth0 up ~# cat /sys/kernel/tracing/trace ~# cat /sys/kernel/tracing/trace tracer: nop _-----=> irqs-off / _----=> need-resched \| / _---=> hardirq/softirq \|\| / _--=> preempt-depth \|\|\| / delay TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION \| \| \| \|\|\|\| \| \| ifconfig-2382 [002] .... 756.161892: otx2_msg_alloc: [0002:02:00.0] msg:(0x400) size:40 ifconfig-2382 [002] ...1 756.161895: otx2_msg_send: [0002:02:00.0] sent 1 msg(s) of size:48 <idle>-0 [000] d.h1 756.161902: otx2_msg_interrupt: [0002:01:00.0] mbox interrupt PF(s) to AF (0x2) kworker/u49:0-1165 [000] .... 756.162049: otx2_msg_process: [0002:01:00.0] msg:(0x400) error:0 kworker/u49:0-1165 [000] ...1 756.162051: otx2_msg_send: [0002:01:00.0] sent 1 msg(s) of size:32 kworker/u49:0-1165 [000] d.h. 756.162056: otx2_msg_interrupt: [0002:02:00.0] mbox interrupt AF to PF (0x1) Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:35:26 -07:00
Barry Song	3649326912	net: allwinner: remove redundant irqsave and irqrestore in hardIRQ The comment "holders of db->lock must always block IRQs" and related code to do irqsave and irqrestore don't make sense since we are in a IRQ-disabled hardIRQ context. Cc: Maxime Ripard <mripard@kernel.org> Cc: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Barry Song <song.bao.hua@hisilicon.com> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:33:52 -07:00
Rikard Falkeborn	e4b9146849	net: hns3: Constify static structs A number of static variables were not modified. Make them const to allow the compiler to put them in read-only memory. In order to do so, constify a couple of input pointers as well as some local pointers. This moves about 35Kb to read-only memory as seen by the output of the size command. Before: text data bss dec hex filename 404938 111534 640 517112 7e3f8 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko After: text data bss dec hex filename 439499 76974 640 517113 7e3f9 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge.ko Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 17:31:50 -07:00
David S. Miller	68d4fd30c8	Merge branch 'net-bridge-mcast-IGMPv3-MLDv2-fast-path-part-2' Nikolay Aleksandrov says: ==================== net: bridge: mcast: IGMPv3/MLDv2 fast-path (part 2) This is the second part of the IGMPv3/MLDv2 support which adds support for the fast-path. In order to be able to handle source entries we add mdb support for S,G entries (i.e. we add source address support to br_ip), that requires to extend the current mdb netlink API, fortunately we just add another attribute which will contain nested future mdb attributes, then we use it to add support for S,G user- add, del and dump. The lookup sequence is simple: when IGMPv3/MLDv2 are enabled do the S,G lookup first and if it fails fallback to ,G. The more complex part is when we begin handling source lists and auto-installing S,G entries and ,G filter mode transitions. We have the following cases: 1) ,G INCLUDE -> EXCLUDE transition: we need to install the port in all of ,G's installed S,G entries for proper replication (except the ones explicitly blocked), this is also necessary when adding a new ,G EXCLUDE port group 2) ,G EXCLUDE -> INCLUDE transition: we need to remove the port from all of ,G's installed S,G entries, this is also necessary when removing a ,G port group 3) New S,G port entry: we need to install all current ,G EXCLUDE ports 4) Remove S,G port entry: if all other port groups were auto-installed we can safely remove them and delete the whole S,G entry Currently we compute these operations from the available ports, their source lists and their filter mode. In the future we can extend the port group structure and reduce the running time of these ops. Also one current limitation is that host-joined S,G entries are not supported. I.e. one cannot add "dev bridge port bridge" mdb S,G entries. The host join is currently considered an EXCLUDE {} join, so it's reflected in all of ,G's installed S,G entries. If an S,G,port entry is added as temporary then the kernel can take it over if a source shows up from a report, permanent entries are skipped. In order to properly handle blocked sources we add a new port group blocked flag to avoid forwarding to that port group in the S,G. Finally when forwarding we use the port group filter mode (if it's INCLUDE and the port group is from a ,G then don't replicate to it, respectively if it's EXCLUDE then forward) and the blocked flag (obviously if it's set - skip that port unless it's a router port) to decide if the port should be skipped. Another limitation is that we can't do some of the above transitions without small traffic drop while installing/removing entries. That will be taken care of when we add atomic swap of port replication lists later. Patch break down: patches 1-3: prepare the mdb code for better extack support which is used in future patches to return a more meaningful error patches 4-6: add the source address field to struct br_ip, and do minor cleanups around it patches 7-8: extend the mdb netlink API so we can send new mdb attributes and uses the new API for S,G entry add/del/dump support patch 9: takes care of S,G entries when doing a lookup (first S,G then ,G lookup) patch 10: adds a new port group field and attribute for origin protocol we use the already available RTPROT_ definitions, currently user-space entries are added as RTPROT_STATIC and kernel entries are added as RTPROT_KERNEL, we may allow user-space to set custom values later (e.g. for FRR, clag) patch 11: adds an internal S,G,port rhashtable to speed up filter mode transitions patch 12: initial automatic install of S,G entries based on port groups' source lists patch 13: handles port group modes on transitions or when new port group entries are added patch 14: self-explanatory - adds support for blocked port group entries needed to stop forwarding to particular S,G,port entries patch 15: handles host-join/leave state changes, treats host-joins as EXCLUDE {} groups (reflected in all *,G's S,G entries) patch 16: finally adds the fast-path filter mode and block flag support Here're the sets that will come next (in order): - iproute2 support for IGMPv3/MLDv2 - selftests for all mode transitions and group flags - explicit host tracking for proper fast-leave support - atomic port replication lists (these are also needed for broadcast forwarding optimizations) - mode transition optimization and removal of open-coded sorted lists Not implemented yet: - Host IGMPv3/MLDv2 filter support (currently we handle only join/leave as before) - Proper other querier source timer and value updates - IGMPv3/v2 MLDv2/v1 compat (I have a few rough patches for this one) v2: fix build with CONFIG_BATMAN_ADV_MCAST in patch 6 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:35 -07:00
Nikolay Aleksandrov	36cfec7359	net: bridge: mcast: when forwarding handle filter mode and blocked flag We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the mdst entry is a *,G or when the port has the blocked flag. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:35 -07:00
Nikolay Aleksandrov	094b82fd53	net: bridge: mcast: handle host state Since host joins are considered as EXCLUDE {} joins we need to reflect that in all of *,G ports' S,G entries. Since the S,Gs can have host_joined == true only set automatically we can safely set it to false when removing all automatically added entries upon S,G delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9116ffbf1d	net: bridge: mcast: add support for blocked port groups When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8266a0491e	net: bridge: mcast: handle port group filter modes We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of ,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of ,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of ,G's EXCLUDE ports to it. In order to distinguish automatically added ,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	b08123684b	net: bridge: mcast: install S,G entries automatically based on reports This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	085b53c8be	net: bridge: mcast: add sg_port rhashtable To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8f8cb77e0b	net: bridge: mcast: add rt_protocol field to the port group struct We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7d07a68c25	net: bridge: mcast: when igmpv3/mldv2 are enabled lookup (S,G) first, then (,G) If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If there isn't a present (S,G) entry then try to find (,G). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	88d4bd1804	net: bridge: mdb: add support for add/del/dump of entries with source Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00

1 2 3 4 5 ...

952331 Commits