Use the newly added bpf_num_possible_cpus() in bpftool and selftests
and remove duplicate implementations.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Though currently there is no problem including bpf_util.h in kernel
space BPF C programs, in next patch in this stack, I will reuse
libbpf_num_possible_cpus() in bpf_util.h thus include libbpf.h in it,
which will cause BPF C programs compile error. Therefore I will first
remove bpf_util.h from all test BPF programs.
This can also make it clear that bpf_util.h is a user-space utility
while bpf_helpers.h is a kernel space utility.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adding a new API libbpf_num_possible_cpus() that helps user with
per-CPU map operations.
Signed-off-by: Hechao Li <hechaol@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
An error "implicit declaration of function 'reallocarray'" can be thrown
with the following steps:
$ cd tools/testing/selftests/bpf
$ make clean && make CC=<Path to GCC 4.8.5>
$ make clean && make CC=<Path to GCC 7.x>
The cause is that the feature folder generated by GCC 4.8.5 is not
removed, leaving feature-reallocarray being 1, which causes reallocarray
not defined when re-compliing with GCC 7.x. This diff adds feature
folder to EXTRA_CLEAN to avoid this problem.
v2: Rephrase the commit message.
Signed-off-by: Hechao Li <hechaol@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix signature of bpf_probe_read and bpf_probe_write_user to mark source
pointer as const. This causes warnings during compilation for
applications relying on those helpers.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Use the recent change to XSKMAP bpf_map_lookup_elem() to test if
there is a xsk present in the map instead of duplicating the work
with qidconf.
Fix things so callers using XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD
bypass any internal bpf maps, so xsk_socket__{create|delete} works
properly.
Clean up error handling path.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Song Liu <songliubraving@fb.com>
Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Check that bpf_map_lookup_elem lookup and structure
access operats correctly.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sync uapi/linux/bpf.h
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, the AF_XDP code uses a separate map in order to
determine if an xsk is bound to a queue. Instead of doing this,
have bpf_map_lookup_elem() return a xdp_sock.
Rearrange some xdp_sock members to eliminate structure holes.
Remove selftest - will be added back in later patch.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-05-31
The following pull-request contains BPF updates for your *net-next* tree.
Lots of exciting new features in the first PR of this developement cycle!
The main changes are:
1) misc verifier improvements, from Alexei.
2) bpftool can now convert btf to valid C, from Andrii.
3) verifier can insert explicit ZEXT insn when requested by 32-bit JITs.
This feature greatly improves BPF speed on 32-bit architectures. From Jiong.
4) cgroups will now auto-detach bpf programs. This fixes issue of thousands
bpf programs got stuck in dying cgroups. From Roman.
5) new bpf_send_signal() helper, from Yonghong.
6) cgroup inet skb programs can signal CN to the stack, from Lawrence.
7) miscellaneous cleanups, from many developers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
xdping allows us to get latency estimates from XDP. Output looks
like this:
./xdping -I eth4 192.168.55.8
Setting up XDP for eth4, please wait...
XDP setup disrupts network connectivity, hit Ctrl+C to quit
Normal ping RTT data
[Ignore final RTT; it is distorted by XDP using the reply]
PING 192.168.55.8 (192.168.55.8) from 192.168.55.7 eth4: 56(84) bytes of data.
64 bytes from 192.168.55.8: icmp_seq=1 ttl=64 time=0.302 ms
64 bytes from 192.168.55.8: icmp_seq=2 ttl=64 time=0.208 ms
64 bytes from 192.168.55.8: icmp_seq=3 ttl=64 time=0.163 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.275 ms
4 packets transmitted, 4 received, 0% packet loss, time 3079ms
rtt min/avg/max/mdev = 0.163/0.237/0.302/0.054 ms
XDP RTT data:
64 bytes from 192.168.55.8: icmp_seq=5 ttl=64 time=0.02808 ms
64 bytes from 192.168.55.8: icmp_seq=6 ttl=64 time=0.02804 ms
64 bytes from 192.168.55.8: icmp_seq=7 ttl=64 time=0.02815 ms
64 bytes from 192.168.55.8: icmp_seq=8 ttl=64 time=0.02805 ms
The xdping program loads the associated xdping_kern.o BPF program
and attaches it to the specified interface. If run in client
mode (the default), it will add a map entry keyed by the
target IP address; this map will store RTT measurements, current
sequence number etc. Finally in client mode the ping command
is executed, and the xdping BPF program will use the last ICMP
reply, reformulate it as an ICMP request with the next sequence
number and XDP_TX it. After the reply to that request is received
we can measure RTT and repeat until the desired number of
measurements is made. This is why the sequence numbers in the
normal ping are 1, 2, 3 and 8. We XDP_TX a modified version
of ICMP reply 4 and keep doing this until we get the 4 replies
we need; hence the networking stack only sees reply 8, where
we have XDP_PASSed it upstream since we are done.
In server mode (-s), xdping simply takes ICMP requests and replies
to them in XDP rather than passing the request up to the networking
stack. No map entry is required.
xdping can be run in native XDP mode (the default, or specified
via -N) or in skb mode (-S).
A test program test_xdping.sh exercises some of these options.
Note that native XDP does not seem to XDP_TX for veths, hence -N
is not tested. Looking at the code, it looks like XDP_TX is
supported so I'm not sure if that's expected. Running xdping in
native mode for ixgbe as both client and server works fine.
Changes since v4
- close fds on cleanup (Song Liu)
Changes since v3
- fixed seq to be __be16 (Song Liu)
- fixed fd checks in xdping.c (Song Liu)
Changes since v2
- updated commit message to explain why seq number of last
ICMP reply is 8 not 4 (Song Liu)
- updated types of seq number, raddr and eliminated csum variable
in xdpclient/xdpserver functions as it was not needed (Song Liu)
- added XDPING_DEFAULT_COUNT definition and usage specification of
default/max counts (Song Liu)
Changes since v1
- moved from RFC to PATCH
- removed unused variable in ipv4_csum() (Song Liu)
- refactored ICMP checks into icmp_check() function called by client
and server programs and reworked client and server programs due
to lack of shared code (Song Liu)
- added checks to ensure that SKB and native mode are not requested
together (Song Liu)
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The phylink conflict was between a bug fix by Russell King
to make sure we have a consistent PHY interface mode, and
a change in net-next to pull some code in phylink_resolve()
into the helper functions phylink_mac_link_{up,down}()
On the dp83867 side it's mostly overlapping changes, with
the 'net' side removing a condition that was supposed to
trigger for RGMII but because of how it was coded never
actually could trigger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix OOPS during nf_tables rule dump, from Florian Westphal.
2) Use after free in ip_vs_in, from Yue Haibing.
3) Fix various kTLS bugs (NULL deref during device removal resync,
netdev notification ignoring, etc.) From Jakub Kicinski.
4) Fix ipv6 redirects with VRF, from David Ahern.
5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet.
6) Missing memory allocation failure check in ip6_ra_control(), from
Gen Zhang. And likewise fix ip_ra_control().
7) TX clean budget logic error in aquantia, from Igor Russkikh.
8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet.
9) Double frees in mlx5, from Parav Pandit.
10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit.
11) Fix botched register access in mvpp2, from Antoine Tenart.
12) Use after free in napi_gro_frags(), from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits)
net: correct zerocopy refcnt with udp MSG_MORE
ethtool: Check for vlan etype or vlan tci when parsing flow_rule
net: don't clear sock->sk early to avoid trouble in strparser
net-gro: fix use-after-free read in napi_gro_frags()
net: dsa: tag_8021q: Create a stable binary format
net: dsa: tag_8021q: Change order of rx_vid setup
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
mlxsw: spectrum: Prevent force of 56G
mlxsw: spectrum_acl: Avoid warning after identical rules insertion
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
r8169: fix MAC address being lost in PCI D3
net: core: support XDP generic on stacked devices.
netvsc: unshare skb in VF rx handler
udp: Avoid post-GRO UDP checksum recalculation
net: phy: dp83867: Set up RGMII TX delay
net: phy: dp83867: do not call config_init twice
net: phy: dp83867: increase SGMII autoneg timer duration
net: phy: dp83867: fix speed 10 in sgmii mode
net: phy: marvell10g: report if the PHY fails to boot firmware
...
Demonstrate how the primary and backup TFO keys can be rotated while
minimizing the number of client cookies that are rejected.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ctinfo is a new tc filter action module. It is designed to restore
information contained in firewall conntrack marks to other packet fields
and is typically used on packet ingress paths. At present it has two
independent sub-functions or operating modes, DSCP restoration mode &
skb mark restoration mode.
The DSCP restore mode:
This mode copies DSCP values that have been placed in the firewall
conntrack mark back into the IPv4/v6 diffserv fields of relevant
packets.
The DSCP restoration is intended for use and has been found useful for
restoring ingress classifications based on egress classifications across
links that bleach or otherwise change DSCP, typically home ISP Internet
links. Restoring DSCP on ingress on the WAN link allows qdiscs such as
but by no means limited to CAKE to shape inbound packets according to
policies that are easier to set & mark on egress.
Ingress classification is traditionally a challenging task since
iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
lookups, hence are unable to see internal IPv4 addresses as used on the
typical home masquerading gateway. Thus marking the connection in some
manner on egress for later restoration of classification on ingress is
easier to implement.
Parameters related to DSCP restore mode:
dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the
conntrack mark field contain the DSCP value to be restored.
statemask - a 32 bit mask of (usually) 1 bit length, outside the area
specified by dscpmask. This represents a conditional operation flag
whereby the DSCP is only restored if the flag is set. This is useful to
implement a 'one shot' iptables based classification where the
'complicated' iptables rules are only run once to classify the
connection on initial (egress) packet and subsequent packets are all
marked/restored with the same DSCP. A mask of zero disables the
conditional behaviour ie. the conntrack mark DSCP bits are always
restored to the ip diffserv field (assuming the conntrack entry is found
& the skb is an ipv4/ipv6 type)
e.g. dscpmask 0xfc000000 statemask 0x01000000
|----0xFC----conntrack mark----000000---|
| Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
| DSCP | unused | flag |unused |
|-----------------------0x01---000000---|
| |
| |
---| Conditional flag
v only restore if set
|-ip diffserv-|
| 6 bits |
|-------------|
The skb mark restore mode (cpmark):
This mode copies the firewall conntrack mark to the skb's mark field.
It is completely the functional equivalent of the existing act_connmark
action with the additional feature of being able to apply a mask to the
restored value.
Parameters related to skb mark restore mode:
mask - a 32 bit mask applied to the firewall conntrack mark to mask out
bits unwanted for restoration. This can be useful where the conntrack
mark is being used for different purposes by different applications. If
not specified and by default the whole mark field is copied (i.e.
default mask of 0xffffffff)
e.g. mask 0x00ffffff to mask out the top 8 bits being used by the
aforementioned DSCP restore mode.
|----0x00----conntrack mark----ffffff---|
| Bits 31-24 | |
| DSCP & flag| some value here |
|---------------------------------------|
|
|
v
|------------skb mark-------------------|
| | |
| zeroed | |
|---------------------------------------|
Overall parameters:
zone - conntrack zone
control - action related control (reclassify | pipe | drop | continue |
ok | goto chain <CHAIN_INDEX>)
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are a bunch of lines of code or comments that are unnecessary
wrapped into multi-lines. Fix that without violating any code
guidelines.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
A bunch of typo and formatting fixes.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Extra check for type is unnecessary in first case.
Extra zeroing is unnecessary, as snprintf guarantees that it will
zero-terminate string.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
0 is a valid FD, so it's better to initialize it to -1, as is done in
other places. Also, technically, BTF type ID 0 is valid (it's a VOID
type), so it's more reliable to check btf_fd, instead of
btf_key_type_id, to determine if there is any BTF associated with a map.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
All of libbpf errors are negative, except this one. Fix it.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Validate there was no error retrieving symbol name corresponding to
a BPF map.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rewrite endianness check to use "more canonical" way, using
compiler-defined macros, similar to few other places in libbpf. It also
is more obvious and shorter.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
pr_warning ultimately may call into user-provided callback function,
which can clobber errno value, so we need to save it before that.
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Ensure that size of a section w/ BPF instruction is exactly a multiple
of BPF instruction size.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This Kselftest update for Linux 5.2-rc3 consists of
- Alexandre Belloni's fixes to rtc regressions introduced in kselftest
Makefile test run output refactoring work from Kees Cook.
- ftrace test checkbashisms fixes from Masami Hiramatsu
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAlzumDsACgkQCwJExA0N
QxymNQ/8DwDmpfkLCBpedHIy9kxY7PiJGNwQYkVMPBxAt/qY7c0BS46TnKxc6ndH
Ra/7Shm7nTbit9Upcap5s4Gtbjf62PW0FVVAZgMZe+2Raq80QV8ANDHOFz8KF7R1
iaDWyW8a5cP56fuzSMmJlpHPWw1G3tYAvjJwqKuTPegYrHpkKdeGpEVm4gyIoS3t
k7z4RuGwouRh3e9Bfks+Q1TZQCH8RbA5W1riA4XM9/iXi5wqp0Ib27sMYbH1eA4s
Gqg4lLFHfC2Pdo2ASOlC9aQAHgE23pCQFFFgXaY2wl00CgThXhhZ1JiJmdF4PryW
z59+3+Ncyr9di7ZL6H+Po/vin1b0902FDx0K4pOqSwr1yQ+54tmSfjylbQRpdnA2
GpswgqFuqCF5SpOTikjl3oiNSi2BStHaLmkEV8XU8LkTe4Px4TSTZ/3JVgJ6PBAL
55mMgD41W1OIQtK6LqLfqfRz7Uj5xdfVcobAigE9IEYLBB/PBnwo/xyW8pSxXoDO
80WiVUf9XHxSU1Mdmb516540thxvS4+TK8CPLSCZiqLUDY1facsNH/KYzkBU7Upt
eJdQxpCI55ORQCRQbfcbuIOrddZFWAjZ5PBieQvYv4W7O1wEcK3hTPl75aff5bIS
UlAt7aW8iLwU+xQ4byMtfvUdzVh2WJv7/jNvjcQCipXXD1PoyAg=
=zYD6
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- Alexandre Belloni's fixes to rtc regressions introduced in kselftest
Makefile test run output refactoring work from Kees Cook.
- ftrace test checkbashisms fixes from Masami Hiramatsu
* tag 'linux-kselftest-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: rtc: rtctest: specify timeouts
selftests/harness: Allow test to configure timeout
selftests/ftrace: Add checkbashisms meta-testcase
selftests/ftrace: Make a script checkbashisms clean
There are two functions in libbpf that support passing a log_level
parameter for the verifier for loading programs:
bpf_object__load_xattr() and bpf_prog_load_xattr(). Both accept an
attribute object containing the log_level, and apply it to the programs
to load.
It turns out that to effectively load the programs, the latter function
eventually relies on the former. This was not taken into account when
adding support for log_level in bpf_object__load_xattr(), and the
log_level passed to bpf_prog_load_xattr() later gets overwritten with a
zero value, thus disabling verifier logs for the program in all cases:
bpf_prog_load_xattr() // prog->log_level = attr1->log_level;
-> bpf_object__load() // attr2->log_level = 0;
-> bpf_object__load_xattr() // <pass prog and attr2>
-> bpf_object__load_progs() // prog->log_level = attr2->log_level;
Fix this by OR-ing the log_level in bpf_object__load_progs(), instead of
overwriting it.
v2: Fix commit log description (confusion on function names in v1).
Fixes: 60276f9849 ("libbpf: add bpf_object__load_xattr() API function to pass log_level")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When building the tools/testing/selftest/bpf subdirectory,
(running both a local directory "make" and a
"make -C tools/testing/selftests/bpf") I keep hitting the
following compilation error:
prog_tests/flow_dissector.c: In function ‘create_tap’:
prog_tests/flow_dissector.c:150:38: error: ‘IFF_NAPI’ undeclared (first
use in this function)
.ifr_flags = IFF_TAP | IFF_NO_PI | IFF_NAPI | IFF_NAPI_FRAGS,
^
prog_tests/flow_dissector.c:150:38: note: each undeclared identifier is
reported only once for each function it appears in
prog_tests/flow_dissector.c:150:49: error: ‘IFF_NAPI_FRAGS’ undeclared
Adding include/uapi/linux/if_tun.h to tools/include/uapi/linux
resolves the problem and ensures the compilation of the file
does not depend on having up-to-date kernel headers locally.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Test the IPv6 flowlabel control and datapath interfaces:
Acquire and release the right to use flowlabels with socket option
IPV6_FLOWLABEL_MGR.
Then configure flowlabels on send and read them on recv with cmsg
IPV6_FLOWINFO. Also verify auto-flowlabel if not explicitly set.
This helped identify the issue fixed in commit 95c169251b ("ipv6:
invert flowlabel sharing check in process and user mode")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the pmtu_vti6_link_change_mtu test, both local and remote addresses
for the vti6 tunnel are assigned to the same address given to the dummy
interface that we use as encapsulating device with a known MTU.
This works as long as the dummy interface is actually selected, via
rt6_lookup(), as encapsulating device. But if the remote address of the
tunnel is a local address too, the loopback interface could also be
selected, and there's nothing wrong with it.
This is what some older -stable kernels do (3.18.z, at least), and
nothing prevents us from subtly changing FIB implementation to revert
back to that behaviour in the future.
Define an IPv6 prefix instead, and use two separate addresses as local
and remote for vti6, so that the encapsulating device can't be a
loopback interface.
Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: 1fad59ea1c ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a kselftest to cover bpf auto-detachment functionality.
The test creates a cgroup, associates some resources with it,
attaches a couple of bpf programs and deletes the cgroup.
Then it checks that bpf programs are going away in 5 seconds.
Expected output:
$ ./test_cgroup_attach
#override:PASS
#multi:PASS
#autodetach:PASS
test_cgroup_attach:PASS
On a kernel without auto-detaching:
$ ./test_cgroup_attach
#override:PASS
#multi:PASS
#autodetach:FAIL
test_cgroup_attach:FAIL
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Enable all available cgroup v2 controllers when setting up
the environment for the bpf kselftests. It's required to properly test
the bpf prog auto-detach feature. Also it will generally increase
the code coverage.
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Convert test_cgrp2_attach2 example into a proper test_cgroup_attach
kselftest. It's better because we do run kselftest on a constant
basis, so there are better chances to spot a potential regression.
Also make it slightly less verbose to conform kselftests output style.
Output example:
$ ./test_cgroup_attach
#override:PASS
#multi:PASS
test_cgroup_attach:PASS
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Right now test_tunnel.sh always exits with success even if some
of the subtests fail. Since the output is very verbose, it's
hard to spot the issues with subtests. Let's fail the script
if any subtest fails.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The "-d" option is used to require all logs available for bpftool. So
far it meant telling libbpf to print even debug-level information. But
there is another source of info that can be made more verbose: when we
attemt to load programs with bpftool, we can pass a log_level parameter
to the verifier in order to control the amount of information that is
printed to the console.
Reuse the "-d" option to print all information the verifier can tell. At
this time, this means logs related to BPF_LOG_LEVEL1, BPF_LOG_LEVEL2 and
BPF_LOG_STATS. As mentioned in the discussion on the first version of
this set, these macros are internal to the kernel
(include/linux/bpf_verifier.h) and are not meant to be part of the
stable user API, therefore we simply use the related constants to print
whatever we can at this time, without trying to tell users what is
log_level1 or what is statistics.
Verifier logs are only used when loading programs for now (In the
future: for loading BTF objects with bpftool? Although libbpf does not
currently offer to print verifier info at debug level if no error
occurred when loading BTF objects), so bpftool.rst and bpftool-prog.rst
are the only man pages to get the update.
v3:
- Add details on log level and BTF loading at the end of commit log.
v2:
- Remove the possibility to select the log levels to use (v1 offered a
combination of "log_level1", "log_level2" and "stats").
- The macros from kernel header bpf_verifier.h are not used (and
therefore not moved to UAPI header).
- In v1 this was a distinct option, but is now merged in the only "-d"
switch to activate libbpf and verifier debug-level logs all at the
same time.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
libbpf was recently made aware of the log_level attribute for programs,
used to specify the level of information expected to be dumped by the
verifier. Function bpf_prog_load_xattr() got support for this log_level
parameter.
But some applications using libbpf rely on another function to load
programs, bpf_object__load(), which does accept any parameter for log
level. Create an API function based on bpf_object__load(), but accepting
an "attr" object as a parameter. Then add a log_level field to that
object, so that applications calling the new bpf_object__load_xattr()
can pick the desired log level.
v3:
- Rewrite commit log.
v2:
- We are in a new cycle, bump libbpf extraversion number.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
libbpf has three levels of priority for output messages: warn, info,
debug. By default, debug output is not printed to the console.
Add a new "--debug" (short name: "-d") option to bpftool to print libbpf
logs for all three levels.
Internally, we simply use the function provided by libbpf to replace the
default printing function by one that prints logs regardless of their
level.
v2:
- Remove the possibility to select the log-levels to use (v1 offered a
combination of "warn", "info" and "debug").
- Rename option and offer a short name: -d|--debug.
- Add option description to all bpftool manual pages (instead of
bpftool-prog.rst only), as all commands use libbpf.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Fix below warning reported by coccicheck:
/tools/lib/bpf/libbpf.c:3461:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use fgets() as the while loop condition.
Signed-off-by: Chang-Hsien Tsai <luke.tw@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Auto-complete BTF IDs for `btf dump id` sub-command. List of possible BTF
IDs is scavenged from loaded BPF programs that have associated BTFs, as
there is currently no API in libbpf to fetch list of all BTFs in the
system.
Suggested-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
I was really surprised that the IPv6 mtu exception followed by redirect
test was passing as nothing about the code suggests it should. The problem
is actually with the logic in the test script.
Fix the test cases as follows:
1. add debug function to dump the initial and redirect gateway addresses
for ipv6. This is shown only in verbose mode. It helps verify the
output of 'route get'.
2. fix the check_exception logic for the reset case to make sure that
for IPv4 neither mtu nor redirect appears in the 'route get' output.
For IPv6, make sure mtu is not present and the gateway is the initial
R1 lladdr.
3. fix the reset logic by using a function to delete the routes added by
initial_route_*. This format works better for the nexthop version of
the tests.
While improving the test cases, go ahead and ensure that forwarding is
disabled since IPv6 redirect requires it.
Also, runs with kernel debugging enabled sometimes show a failure with
one of the ipv4 tests, so spread the pings over longer time interval.
The end result is that 2 tests now show failures:
TEST: IPv6: mtu exception plus redirect [FAIL]
and the VRF version.
This is a bug in the IPv6 logic that will need to be fixed
separately. Redirect followed by MTU works because __ip6_rt_update_pmtu
hits the 'if (!rt6_cache_allowed_for_pmtu(rt6))' path and updates the
mtu on the exception rt6_info.
MTU followed by redirect does not have this logic. rt6_do_redirect
creates a new exception and then rt6_insert_exception removes the old
one which has the MTU exception.
Fixes: ec81053528 ("selftests: Add redirect tests")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a test which sends 15 bytes of data, and then tries
to read 10 byes twice. Previously the second read would
sleep indifinitely, since the record was already decrypted
and there is only 5 bytes left, not full 10.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set SO_RCVLOWAT and test it gets respected when gathering
data from multiple records.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
merge window, but should not wait four months before they appear in
a release. I also travelled a bit more than usual in the first part
of May, which didn't help with picking up patches and reports promptly.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJc6RkmAAoJEL/70l94x66DhEAH/ijCkibV9vOUu8n/lSxMjAzi
I/Y1VEaVRFuQ6u0QSjWBBg22tVsWuWiVbonJ63w3JMRwi5Q5zW9REE7EaKRAa/eC
FiFE7vTesYh6sGVwdMCwoinjMDyCp7hybvtBc608+MWhVmrdzTYtPm5N85wxIDtW
xH5Kr2mVeLC43X3vfegolmXZ1obAbZEToJvOgJrYFhnzsmVYYl182kfGtrppBoO0
XXDPuDRGpTrm6A2oADMdOv+mT9p51pHsedmHQaDGXwAGEC/BkOGKdIdBfwppEwy7
QP2NGqwkHIyghV1aCPacT6O6G6xL0i2rfvlJ7+e6o7deU4uMXAqIdQ2DbIcHy3g=
=5IW2
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"The usual smattering of fixes and tunings that came in too late for
the merge window, but should not wait four months before they appear
in a release.
I also travelled a bit more than usual in the first part of May, which
didn't help with picking up patches and reports promptly"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (33 commits)
KVM: x86: fix return value for reserved EFER
tools/kvm_stat: fix fields filter for child events
KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard
kvm: selftests: aarch64: compile with warnings on
kvm: selftests: aarch64: fix default vm mode
kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size
KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
KVM: x86/pmu: do not mask the value that is written to fixed PMUs
KVM: x86/pmu: mask the result of rdpmc according to the width of the counters
x86/kvm/pmu: Set AMD's virt PMU version to 1
KVM: x86: do not spam dmesg with VMCS/VMCB dumps
kvm: Check irqchip mode before assign irqfd
kvm: svm/avic: fix off-by-one in checking host APIC ID
KVM: selftests: do not blindly clobber registers in guest asm
KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c
KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace
KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow
kvm: vmx: Fix -Wmissing-prototypes warnings
KVM: nVMX: Fix using __this_cpu_read() in preemptible context
kvm: fix compilation on s390
...
The previous libbpf patch allows user to specify "prog_flags" to bpf
program load APIs. To enable high 32-bit randomization for a test, we need
to set BPF_F_TEST_RND_HI32 in "prog_flags".
To enable such randomization for all tests, we need to make sure all places
are passing BPF_F_TEST_RND_HI32. Changing them one by one is not
convenient, also, it would be better if a test could be switched to
"normal" running mode without code change.
Given the program load APIs used across bpf selftests are mostly:
bpf_prog_load: load from file
bpf_load_program: load from raw insns
A test_stub.c is implemented for bpf seltests, it offers two functions for
testing purpose:
bpf_prog_test_load
bpf_test_load_program
The are the same as "bpf_prog_load" and "bpf_load_program", except they
also set BPF_F_TEST_RND_HI32. Given *_xattr functions are the APIs to
customize any "prog_flags", it makes little sense to put these two
functions into libbpf.
Then, the following CFLAGS are passed to compilations for host programs:
-Dbpf_prog_load=bpf_prog_test_load
-Dbpf_load_program=bpf_test_load_program
They migrate the used load APIs to the test version, hence enable high
32-bit randomization for these tests without changing source code.
Besides all these, there are several testcases are using
"bpf_prog_load_attr" directly, their call sites are updated to pass
BPF_F_TEST_RND_HI32.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- bpf_fill_ld_abs_vlan_push_pop:
Prevent zext happens inside PUSH_CNT loop. This could happen because
of BPF_LD_ABS (32-bit def) + BPF_JMP (64-bit use), or BPF_LD_ABS +
EXIT (64-bit use of R0). So, change BPF_JMP to BPF_JMP32 and redefine
R0 at exit path to cut off the data-flow from inside the loop.
- bpf_fill_jump_around_ld_abs:
Jump range is limited to 16 bit. every ld_abs is replaced by 6 insns,
but on arches like arm, ppc etc, there will be one BPF_ZEXT inserted
to extend the error value of the inlined ld_abs sequence which then
contains 7 insns. so, set the dividend to 7 so the testcase could
work on all arches.
- bpf_fill_scale1/bpf_fill_scale2:
Both contains ~1M BPF_ALU32_IMM which will trigger ~1M insn patcher
call because of hi32 randomization later when BPF_F_TEST_RND_HI32 is
set for bpf selftests. Insn patcher is not efficient that 1M call to
it will hang computer. So , change to BPF_ALU64_IMM to avoid hi32
randomization.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
libbpf doesn't allow passing "prog_flags" during bpf program load in a
couple of load related APIs, "bpf_load_program_xattr", "load_program" and
"bpf_prog_load_xattr".
It makes sense to allow passing "prog_flags" which is useful for
customizing program loading.
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sync new bpf prog load flag "BPF_F_TEST_RND_HI32" to tools/.
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzobRYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptwcD/99hOkZWNqX0FKjkrofywXBjX//UqBb2OQS
/7vBoWgSMN+SXDI08YdePCjreviDs4VjbP1V1EgBTbb0HpEApbAuTqx7fszbsJLi
Ld6pMkDpRp6RKttmaDW6iT39gZC3w9wOYusbC8pfrVbvhXm9CRLum78Q8h2rdl0c
HzIMopvGvvJazTYj/ZD8L/83Z6oqHPWojnXPIK1CNw6PQ4+A1frD85WitW4Fragp
T5lx0ZBPLHe+1VPoIQg3Rq2ZZcQW2Kfm5mytw9sDG6KbG5/Vj7+jtF6X36QvuFhZ
fU2zWAN7zFVE0FvXxS/ze5lFI8/efkwIAa2xYvkkFWJ+FNBkOrNrhN1JgNyMQgTe
2r4dLPp3XGcfvCCndTnQdwNAGuc878X+bGwlxb1wjTRcElJRpflE1wBx2kzzdnjl
zD2dmUgxURJvY8clKbq/bpgoxLKtqGCsJy7mHOyCUTpflP7YrpvJnUcc14PARnDt
V2JlnTVNO2r9oZ7IBHPWtNLmFjZhba5BaQDD1EtUUgO3fId4wL1rJ52j5K9/2eg7
yC4qdKGZLQoHGTnn8qBY+BS8/bMeMxu6Lx4RqtgVa8r+dkKFhblIdOmYZnyevxSf
B5rtt8CJUU7d3edxZHp9jFiYVbmrc6CjIhRLYZyrLfQGCL3F6qFzozYd0Lwiwxhz
gx2TTsDfFg==
=lGyw
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190524' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith, with fixes from a few folks.
- bio and sbitmap before atomic barrier fixes (Andrea)
- Hang fix for blk-mq freeze and unfreeze (Bob)
- Single segment count regression fix (Christoph)
- AoE now has a new maintainer
- tools/io_uring/ Makefile fix, and sync with liburing (me)
* tag 'for-linus-20190524' of git://git.kernel.dk/linux-block: (23 commits)
tools/io_uring: sync with liburing
tools/io_uring: fix Makefile for pthread library link
blk-mq: fix hang caused by freeze/unfreeze sequence
block: remove the bi_seg_{front,back}_size fields in struct bio
block: remove the segment size check in bio_will_gap
block: force an unlimited segment size on queues with a virt boundary
block: don't decrement nr_phys_segments for physically contigous segments
sbitmap: fix improper use of smp_mb__before_atomic()
bio: fix improper use of smp_mb__before_atomic()
aoe: list new maintainer for aoe driver
nvme-pci: use blk-mq mapping for unmanaged irqs
nvme: update MAINTAINERS
nvme: copy MTFA field from identify controller
nvme: fix memory leak for power latency tolerance
nvme: release namespace SRCU protection before performing controller ioctls
nvme: merge nvme_ns_ioctl into nvme_ioctl
nvme: remove the ifdef around nvme_nvm_ioctl
nvme: fix srcu locking on error return in nvme_get_ns_from_disk
nvme: Fix known effects
nvme-pci: Sync queues on reset
...
This Kselftest fixes update for Linux 5.2-rc2 consists of:
- 2 fixes to regressions introduced in kselftest Makefile test run output
refactoring work from Kees Cook.
- Adding Atom support to syscall_arg_fault test from Tong Bo.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAlzoUmUACgkQCwJExA0N
QxzwxhAAnwCnP1Z5WXTet3GO4tUbzqtoXC05j1Z+zRbxnfA72RSoBu/by8uG4gZf
OR9YQ1F7SR+K2V4CkwO8j5BarHun992HuOAQuFXHFIx3oVxWMDprIQr4hVNm7hRF
14mLDsFWe/lHQFOmcZ2vFwCG3uyr1VukWLW0FPknpTaqb5NcaHyYwgheukmUyw5v
CSUZ/OKetxD5DmYoxoIONvM0nIk6iLZtnblSCfiNe9XTzpTD9ngs9Gwn5m1ILN8F
HetnOXcjq52SnN5VR6oJt75CvTlqmj9EmuLhhd8PJ4Jgq2TCmRQFxc6WjnBh03IZ
8AMq5w4JPd+XEIf1QeNmfV2w9T4/ftFvAlMz46aUPDkeBfKLRCcTAlfkhotru6a9
Vxkk1PD/+wxFH22ktA8asnXhyfQAegpUWlR2e8Efr7uE1PCl55KBkU9BZYjla9GP
Av6wo03ENcuy/eJ/i61H8eygOXRw9pGdJut/P/tEPwBlgdfz8V8/oelX/WQvfUIz
fff6lxZgtzoStICOYJFHNCsNZIdIeOtdU+27U2BHRIUQkrkhctcFrn0xXe4ETls8
ZH1SvkIwFnD7Ib+Iztf3ZSyF3Bi4G3tqPfc1WNSbHbJ2LOa5HGZecyd+zJOvnng1
6smviLK3FcwkJn+Pl0/CRkzEzP/dwpqgE1eJPSR7iy++ypv7E7o=
=BYfS
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- Two fixes to regressions introduced in kselftest Makefile test run
output refactoring work (Kees Cook)
- Adding Atom support to syscall_arg_fault test (Tong Bo)
* tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/timers: Add missing fflush(stdout) calls
selftests: Remove forced unbuffering for test running
selftests/x86: Support Atom for syscall_arg_fault test
Here is another set of reviewed patches that adds SPDX tags to different
kernel files, based on a set of rules that are being used to parse the
comments to try to determine that the license of the file is
"GPL-2.0-or-later". Only the "obvious" versions of these matches are
included here, a number of "non-obvious" variants of text have been
found but those have been postponed for later review and analysis.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on the
patches are reviewers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXOgmlw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk4rACfRqxGOGVLR/t6E9dDzOZRAdEz/mYAoJLZmziY
0YlSSSPtP5HI6JDh65Ng
=HXQb
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pule more SPDX updates from Greg KH:
"Here is another set of reviewed patches that adds SPDX tags to
different kernel files, based on a set of rules that are being used to
parse the comments to try to determine that the license of the file is
"GPL-2.0-or-later".
Only the "obvious" versions of these matches are included here, a
number of "non-obvious" variants of text have been found but those
have been postponed for later review and analysis.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers"
* tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98
...