Now that we use skb-less flow dissector let's return true nhoff and
thoff. We used to adjust them by ETH_HLEN because that's how it was
done in the skb case. For VLAN tests that looks confusing: nhoff is
pointing to vlan parts :-\
Warning, this is an API change for BPF_PROG_TEST_RUN! Feel free to drop
if you think that it's too late at this point to fix it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Right now we incorrectly return 'ret' which is always zero at that
point.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Export last_dissection map from flow dissector and use a known place in
tun driver to trigger BPF flow dissection.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When flow dissector is called without skb, we want to make sure
bpf_skb_load_bytes invocations return error. Add small test which tries
to read single byte from a packet.
bpf_skb_load_bytes should always fail under BPF_PROG_TEST_RUN because
it was converted to the skb-less mode.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-04-22
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) allow stack/queue helpers from more bpf program types, from Alban.
2) allow parallel verification of root bpf programs, from Alexei.
3) introduce bpf sysctl hook for trusted root cases, from Andrey.
4) recognize var/datasec in btf deduplication, from Andrii.
5) cpumap performance optimizations, from Jesper.
6) verifier prep for alu32 optimization, from Jiong.
7) libbpf xsk cleanup, from Magnus.
8) other various fixes and cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for your net tree:
1) Add a selftest for icmp packet too big errors with conntrack, from
Florian Westphal.
2) Validate inner header in ICMP error message does not lie to us
in conntrack, also from Florian.
3) Initialize ct->timeout to calm down KASAN, from Alexander Potapenko.
4) Skip ICMP error messages from tunnels in IPVS, from Julian Anastasov.
5) Use a hash to expose conntrack and expectation ID, from Florian Westphal.
6) Prevent shift wrap in nft_chain_parse_hook(), from Dan Carpenter.
7) Fix broken ICMP ID randomization with NAT, also from Florian.
8) Remove WARN_ON in ebtables compat that is reached via syzkaller,
from Florian Westphal.
9) Fix broken timestamps since fb420d5d91 ("tcp/fq: move back to
CLOCK_MONOTONIC"), from Florian.
10) Fix logging of invalid packets in conntrack, from Andrei Vagin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Build and run gpio when output directory is the src dir. gpio has
dependency on tools/gpio and builds tools/gpio objects in the src
directory in all cases making the src repo dirty even when object
relocation is specified.
This fixes the following commands from generating gpio objects in
the source repository:
make O=dir kselftest
export KBUILD_OUTPUT=dir; make kselftest
make O=dir -C tools/testing/selftests
expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests
The following commands still build gpio objects in the source repo
(gpio Makefile needs to fixed):
make O=dir kselftest TARGETS="gpio"
export KBUILD_OUTPUT=dir; make kselftest TARGETS="gpio"
make O=dir -C tools/testing/selftests TARGETS="gpio"
expoert KBUILD_OUTPUT=dir; make -C tools/testing/selftests TARGETS="gpio"
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add nfit_test 'watermarks' for the dax_pmem, dax_pmem_core, and
dax_pmem_compat modules. This causes the nfit_test module to fail
loading in case any of these modules are also not overridden with the
ldconfig wrapped modules. Without this, nfit_test would sometimes fail
creation of device-dax namespaces on the nfit_test_bus with an unhelpful
error log such as:
dax_pmem dax5.0: could not reserve metadata
dax_pmem: probe of dax5.0 failed with error -16
Which was caused due to the unwrapped version of
devm_request_mem_region() being called.
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAly8rGYeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGmZMH/1IRB0E1Qmzz8yzw
wj79UuRGYPqxDDSWW+wNc8sU4Ic7iYirn9APHAztCdQqsjmzU/OVLfSa3JhdBe5w
THo7pbGKBqEDcWnKfNk/21jXFNLZ1vr9BoQv2DGU2MMhHAyo/NZbalo2YVtpQPmM
OCRth5n+LzvH7rGrX7RYgWu24G9l3NMfgtaDAXBNXesCGFAjVRrdkU5CBAaabvtU
4GWh/nnutndOOLdByL3x+VZ3H3fIBnbNjcIGCglvvqzk7h3hrfGEl4UCULldTxcM
IFsfMUhSw1ENy7F6DHGbKIG90cdCJcrQ8J/ziEzjj/KLGALluutfFhVvr6YCM2J6
2RgU8CY=
=CfY1
-----END PGP SIGNATURE-----
Merge tag 'v5.1-rc6' into for-5.2/block
Pull in v5.1-rc6 to resolve two conflicts. One is in BFQ, in just a
comment, and is trivial. The other one is a conflict due to a later fix
in the bio multi-page work, and needs a bit more care.
* tag 'v5.1-rc6': (770 commits)
Linux 5.1-rc6
block: make sure that bvec length can't be overflow
block: kill all_q_node in request_queue
x86/cpu/intel: Lower the "ENERGY_PERF_BIAS: Set to normal" message's log priority
coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
mm/kmemleak.c: fix unused-function warning
init: initialize jump labels before command line option parsing
kernel/watchdog_hld.c: hard lockup message should end with a newline
kcov: improve CONFIG_ARCH_HAS_KCOV help text
mm: fix inactive list balancing between NUMA nodes and cgroups
mm/hotplug: treat CMA pages as unmovable
proc: fixup proc-pid-vm test
proc: fix map_files test on F29
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
mm/memory_hotplug: do not unlock after failing to take the device_hotplug_lock
mm: swapoff: shmem_unuse() stop eviction without igrab()
mm: swapoff: take notice of completion sooner
mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
mm: swapoff: shmem_find_swap_entries() filter out other types
slab: store tagged freelist for off-slab slabmgmt
...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a new test for Media Device Allocator API.
Media Device Allocator API to allows multiple drivers share a media device.
This API solves a very common use-case for media devices where one physical
device (an USB stick) provides both audio and video. When such media device
exposes a standard USB Audio class, a proprietary Video class, two or more
independent drivers will share a single physical USB bridge. In such cases,
it is necessary to coordinate access to the shared resource.
Using this API, drivers can allocate a media device with the shared struct
device as the key. Once the media device is allocated by a driver, other
drivers can get a reference to it. The media device is released when all
the references are released.
This test does a series of unbind/bind tests to make sure media device
is released correctly when it is no longer is use and when the last
driver releases the reference.
Signed-off-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The run_afpackettests will be marked as passed regardless the return
value of those sub-tests in the script:
--------------------
running psock_tpacket test
--------------------
[FAIL]
selftests: run_afpackettests [PASS]
Fix this by changing the return value for each tests.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull perf fixes from Ingo Molnar:
"Misc fixes:
- various tooling fixes
- kretprobe fixes
- kprobes annotation fixes
- kprobes error checking fix
- fix the default events for AMD Family 17h CPUs
- PEBS fix
- AUX record fix
- address filtering fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Avoid kretprobe recursion bug
kprobes: Mark ftrace mcount handler functions nokprobe
x86/kprobes: Verify stack frame on kretprobe
perf/x86/amd: Add event map for AMD Family 17h
perf bpf: Return NULL when RB tree lookup fails in perf_env__find_btf()
perf tools: Fix map reference counting
perf evlist: Fix side band thread draining
perf tools: Check maps for bpf programs
perf bpf: Return NULL when RB tree lookup fails in perf_env__find_bpf_prog_info()
tools include uapi: Sync sound/asound.h copy
perf top: Always sample time to satisfy needs of use of ordered queuing
perf evsel: Use hweight64() instead of hweight_long(attr.sample_regs_user)
tools lib traceevent: Fix missing equality check for strcmp
perf stat: Disable DIR_FORMAT feature for 'perf stat record'
perf scripts python: export-to-sqlite.py: Fix use of parent_id in calls_view
perf header: Fix lock/unlock imbalances when processing BPF/BTF info
perf/x86: Fix incorrect PEBS_REGS
perf/ring_buffer: Fix AUX record suppression
perf/core: Fix the address filtering fix
kprobes: Fix error check when reusing optimized probes
This fixes the various compiler warnings when building the msgque
selftest. The primary change is using sys/msg.h instead of linux/msg.h
directly to gain the API declarations.
Fixes: 3a665531a3 ("selftests: IPC message queue copy feature test")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Test files created by test_create() and test_create_empty() tests will
stay in the $efivarfs_mount directory until the system was rebooted.
When the tester tries to run this efivarfs test again on the same
system, the immutable characteristics in that directory will cause some
"Operation not permitted" noises, and a false-positve test result as the
file was created in previous run.
--------------------
running test_create
--------------------
./efivarfs.sh: line 59: /sys/firmware/efi/efivars/test_create-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted
[PASS]
--------------------
running test_create_empty
--------------------
./efivarfs.sh: line 78: /sys/firmware/efi/efivars/test_create_empty-210be57c-9849-4fc7-a635-e6382d1aec27: Operation not permitted
[PASS]
--------------------
Create a file_cleanup() to remove those test files in the end of each
test to solve this issue.
For the test_create_read, we can move the clean up task to the end of
the test to ensure the system is clean.
Also, use this function to replace the existing file removal code.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
"make kselftest" fails with "Circular Makefile.o <- prepare dependency
dropped." error, when lib.mk invokes "make headers_install".
Make level 0: Main make calls selftests run_tests target
...
Make level n: selftests lib.mk invokes main make's headers_install
The secondary level make inherits builtin-rules which will use the rule
to generate Makefile.o and runs into "Circular Makefile.o <- prepare
dependency dropped." error, and kselftest compile fails.
Invoke headers_install target with --no-builtin-rules to avoid circular
error.
In addition, lib.mk installs headers in the default HDR_PATH, even when
build relocation is requested with O= or export KBUILD_OUTPUT. Fix the
problem by passing in INSTALL_HDR_PATH. The headers are installed under
the specified output "dir/usr".
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements 9 tests for the freezer controller for
cgroup v2:
1) a simple test, which aims to freeze and unfreeze a cgroup with 100
processes
2) a more complicated tree test, which creates a hierarchy of cgroups,
puts some processes in some cgroups, and tries to freeze and unfreeze
different parts of the subtree
3) a forkbomb test: the test aims to freeze a forkbomb running in a
cgroup, kill all tasks in the cgroup and remove the cgroup without
the unfreezing.
4) rmdir test: the test creates two nested cgroups, freezes the parent
one, checks that the child can be successfully removed, and a new
child can be created
5) migration tests: the test checks migration of a task between
frozen cgroups: from a frozen to a running, from a running to a
frozen, and from a frozen to a frozen.
6) ptrace test: the test checks that it's possible to attach to
a process in a frozen cgroup, get some information and detach, and
the cgroup will remain frozen.
7) stopped test: the test checks that it's possible to freeze a cgroup
with a stopped task
8) ptraced test: the test checks that it's possible to freeze a cgroup
with a ptraced task
9) vfork test: the test checks that it's possible to freeze a cgroup
with a parent process waiting for the child process in vfork()
Expected output:
$ ./test_freezer
ok 1 test_cgfreezer_simple
ok 2 test_cgfreezer_tree
ok 3 test_cgfreezer_forkbomb
ok 4 test_cgrreezer_rmdir
ok 5 test_cgfreezer_migrate
ok 6 test_cgfreezer_ptrace
ok 7 test_cgfreezer_stopped
ok 8 test_cgfreezer_ptraced
ok 9 test_cgfreezer_vfork
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: kernel-team@fb.com
Cc: linux-kselftest@vger.kernel.org
If the cgroup destruction races with an exit() of a belonging
process(es), cg_kill_all() may fail. It's not a good reason to make
cg_destroy() fail and leave the cgroup in place, potentially causing
next test runs to fail.
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: kernel-team@fb.com
Cc: linux-kselftest@vger.kernel.org
map_fds[16] is the last one index-ed by fixup_map_array_small.
Hence, the MAX_NR_MAPS should be 17 instead.
Fixes: fb2abb73e5 ("bpf, selftest: test {rd, wr}only flags and direct value access")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
I meet below compile errors:
"
In file included from test_tcpnotify_kern.c:12:
/usr/include/netinet/in.h:101:5: error: expected identifier
IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. */
^
/usr/include/linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS'
^
In file included from test_tcpnotify_kern.c:12:
/usr/include/netinet/in.h:103:5: error: expected identifier
IPPROTO_ROUTING = 43, /* IPv6 routing header. */
^
/usr/include/linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING'
^
In file included from test_tcpnotify_kern.c:12:
/usr/include/netinet/in.h:105:5: error: expected identifier
IPPROTO_FRAGMENT = 44, /* IPv6 fragmentation header. */
^
/usr/include/linux/in6.h:133:26: note: expanded from macro 'IPPROTO_FRAGMENT'
"
The same compile errors are reported for test_tcpbpf_kern.c too.
My environment:
lsb_release -a:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial
dpkg -l | grep libc-dev:
ii libc-dev-bin 2.23-0ubuntu11 amd64 GNU C Library: Development binaries
ii linux-libc-dev:amd64 4.4.0-145.171 amd64 Linux Kernel Headers for development.
The reason is linux/in6.h and netinet/in.h aren't synchronous about how to
handle the same definitions, IPPROTO_HOPOPTS, etc.
This patch fixes the compile errors by moving <netinet/in.h> to before the
<linux/*.h>.
Signed-off-by: Wang YanQing <udknight@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Having a helpful compile time warning in libbpf_util.h is not a good
idea since all warnings are treated as errors. Change this into a
comment in the code instead.
Fixes: b7e3a28019 ("libbpf: remove dependency on barrier.h in xsk.h")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Unexpected power cycle occurs while the installation of the
kernel.
ssh root@Test sync ... [0 seconds] SUCCESS
ssh root@Test reboot ... [1 second] FAILED!
virsh destroy Test; sleep 5; virsh start Test ... [6 seconds] SUCCESS
That is because REBOOT, the default is "ssh $SSH_USER@$MACHINE
reboot", exits as 255 even if the reboot is successfully done,
like as:
]# ssh root@Test reboot
Connection to Test closed by remote host.
]# echo $?
255
]#
To avoid the unexpected power cycle, introduce a new parameter,
REBOOT_RETURN_CODE to judge whether REBOOT is successfully done
or not.
Link: http://lkml.kernel.org/r/20190418135943.12640-1-msys.mizuma@gmail.com
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull RCU and LKMM commits from Paul E. McKenney:
- An LKMM commit adding support for synchronize_srcu_expedited()
- A couple of straggling RCU flavor consolidation updates
- Documentation updates.
- Miscellaneous fixes
- SRCU updates
- RCU CPU stall-warning updates
- Torture-test updates
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This enables stfle.155 and adds the subfunctions for KDSA. Bit 155 is
added to the list of facilities that will be enabled when there is no
cpu model involved as MSA9 requires no additional handling from
userspace, e.g. for migration.
Please note that a cpu model enabled user space can and will have the
final decision on the facility bits for a guests.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
I hit the following compilation error with gcc 4.8.5.
prog_tests/flow_dissector.c: In function ‘test_flow_dissector’:
prog_tests/flow_dissector.c:155:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < ARRAY_SIZE(tests); i++) {
^
prog_tests/flow_dissector.c:155:2: note: use option -std=c99 or -std=gnu99 to compile your code
Let us fix the issue by avoiding this particular c99 feature.
Fixes: a5cb33464e ("selftests/bpf: make flow dissector tests more extensible")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ktest fails if meta characters are in GRUB_MENU, for example
GRUB_MENU = 'Fedora (test)'
The failure happens because the meta characters are not escaped,
so the menu doesn't match in any entries in GRUB_FILE.
Use quotemeta() to escape the meta characters.
Link: http://lkml.kernel.org/r/20190417235823.18176-1-msys.mizuma@gmail.com
Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If a test has an error, display not only the what type of test failed, but
if the test was giving a name, display that too, as well as the current
iteration of the tests. Each test has an iteration number associated to it.
For error messages display that iteration number along with the test type
and test name. This includes the message that gets sent via email.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The get_secureboot_mode() function unnecessarily requires both
CONFIG_EFIVAR_FS and CONFIG_EFI_VARS to be enabled to determine if the
system is booted in secure boot mode. On some systems the old EFI
variable support is not enabled or, possibly, even implemented.
This patch first checks the efivars filesystem for the SecureBoot and
SetupMode flags, but falls back to using the old EFI variable support.
The "secure_boot_file" and "setup_mode_file" couldn't be quoted due to
globbing. This patch also removes the globbing.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Verify IMA is enabled before failing tests or emitting irrelevant
messages.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Skip the kexec_load and kexec_file_load tests, if they aren't configured
in the kernel. This change adds a new requirement that ikconfig is
configured in the kexec_load test.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
so the file can be used as kernel config snippet.
Signed-off-by: Petr Vorel <pvorel@suse.cz>
[zohar@linux.ibm.com: remove CONFIG_KEXEC_VERIFY_SIG from config]
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The kernel can be configured to verify PE signed kernel images, IMA
kernel image signatures, both types of signatures, or none. This test
verifies only properly signed kernel images are loaded into memory,
based on the kernel configuration and runtime policies.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Many tests require root privileges. Define a common function.
Suggested-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Define, update and move get_secureboot_mode() to a common file for use
by other tests.
Updated to check both the efivar SecureBoot-$(UUID) and
SetupMode-$(UUID), based on Dave Young's review.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Remove the few bashisms and use the complete option name for clarity.
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
As requested move the existing kexec_load selftest and subsequent kexec
tests to the selftests/kexec directory.
Suggested-by: Dave Young <dyoung@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We don't return NULL when we don't find the bpf_prog_info_node, fix
that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Song Liu <songliubraving@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3792cb2ff4 ("perf bpf: Save BTF in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190417145539.11669-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By calling maps__insert() we assume to get 2 references on the map,
which we relese within maps__remove call.
However if there's already same map name, we currently don't bump the
reference and can crash, like:
Program received signal SIGABRT, Aborted.
0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff75e60f5 in raise () from /lib64/libc.so.6
#1 0x00007ffff75d0895 in abort () from /lib64/libc.so.6
#2 0x00007ffff75d0769 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff75de596 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000004fc006 in refcount_sub_and_test (i=1, r=0x1224e88) at tools/include/linux/refcount.h:131
#5 refcount_dec_and_test (r=0x1224e88) at tools/include/linux/refcount.h:148
#6 map__put (map=0x1224df0) at util/map.c:299
#7 0x00000000004fdb95 in __maps__remove (map=0x1224df0, maps=0xb17d80) at util/map.c:953
#8 maps__remove (maps=0xb17d80, map=0x1224df0) at util/map.c:959
#9 0x00000000004f7d8a in map_groups__remove (map=<optimized out>, mg=<optimized out>) at util/map_groups.h:65
#10 machine__process_ksymbol_unregister (sample=<optimized out>, event=0x7ffff7279670, machine=<optimized out>) at util/machine.c:728
#11 machine__process_ksymbol (machine=<optimized out>, event=0x7ffff7279670, sample=<optimized out>) at util/machine.c:741
#12 0x00000000004fffbb in perf_session__deliver_event (session=0xb11390, event=0x7ffff7279670, tool=0x7fffffffc7b0, file_offset=13936) at util/session.c:1362
#13 0x00000000005039bb in do_flush (show_progress=false, oe=0xb17e80) at util/ordered-events.c:243
#14 __ordered_events__flush (oe=0xb17e80, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:322
#15 0x00000000005005e4 in perf_session__process_user_event (session=session@entry=0xb11390, event=event@entry=0x7ffff72a4af8,
...
Add the map to the list and getting the reference event if we find the
map with same name.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 1e6285699b ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190416160127.30203-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Current perf_evlist__poll_thread() code could finish without draining
the data. Adding the logic that makes sure we won't finish before the
drain.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Fixes: 657ee55319 ("perf evlist: Introduce side band thread")
Link: http://lkml.kernel.org/r/20190416160127.30203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by Jiri Olsa in:
"[BUG] perf: intel_pt won't display kernel function"
https://lore.kernel.org/lkml/20190403143738.GB32001@krava
Recent changes to support PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT
broke --kallsyms option. This is because it broke test __map__is_kmodule.
This patch fixes this by adding check for bpf program, so that these maps
are not mistaken as kernel modules.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/20190416160127.30203-8-jolsa@kernel.org
Fixes: 76193a9452 ("perf, bpf: Introduce PERF_RECORD_KSYMBOL")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We currently don't return NULL in case we don't find the
bpf_prog_info_node, fixing that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: e4378f0cb9 ("perf bpf: Save bpf_prog_info in a rbtree in perf_env")
Link: http://lkml.kernel.org/r/20190416134151.15282-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
1) Handle init flow failures properly in iwlwifi driver, from Shahar S
Matityahu.
2) mac80211 TXQs need to be unscheduled on powersave start, from Felix
Fietkau.
3) SKB memory accounting fix in A-MDSU aggregation, from Felix Fietkau.
4) Increase RCU lock hold time in mlx5 FPGA code, from Saeed Mahameed.
5) Avoid checksum complete with XDP in mlx5, also from Saeed.
6) Fix netdev feature clobbering in ibmvnic driver, from Thomas Falcon.
7) Partial sent TLS record leak fix from Jakub Kicinski.
8) Reject zero size iova range in vhost, from Jason Wang.
9) Allow pending work to complete before clcsock release from Karsten
Graul.
10) Fix XDP handling max MTU in thunderx, from Matteo Croce.
11) A lot of protocols look at the sa_family field of a sockaddr before
validating it's length is large enough, from Tetsuo Handa.
12) Don't write to free'd pointer in qede ptp error path, from Colin Ian
King.
13) Have to recompile IP options in ipv4_link_failure because it can be
invoked from ARP, from Stephen Suryaputra.
14) Doorbell handling fixes in qed from Denis Bolotin.
15) Revert net-sysfs kobject register leak fix, it causes new problems.
From Wang Hai.
16) Spectre v1 fix in ATM code, from Gustavo A. R. Silva.
17) Fix put of BROPT_VLAN_STATS_PER_PORT in bridging code, from Nikolay
Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (111 commits)
socket: fix compat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW
tcp: tcp_grow_window() needs to respect tcp_space()
ocelot: Clean up stats update deferred work
ocelot: Don't sleep in atomic context (irqs_disabled())
net: bridge: fix netlink export of vlan_stats_per_port option
qed: fix spelling mistake "faspath" -> "fastpath"
tipc: set sysctl_tipc_rmem and named_timeout right range
tipc: fix link established but not in session
net: Fix missing meta data in skb with vlan packet
net: atm: Fix potential Spectre v1 vulnerabilities
net/core: work around section mismatch warning for ptp_classifier
net: bridge: fix per-port af_packet sockets
bnx2x: fix spelling mistake "dicline" -> "decline"
route: Avoid crash from dereferencing NULL rt->from
MAINTAINERS: normalize Woojung Huh's email address
bonding: fix event handling for stacked bonds
Revert "net-sysfs: Fix memory leak in netdev_register_kobject"
rtnetlink: fix rtnl_valid_stats_req() nlmsg_len check
qed: Fix the DORQ's attentions handling
qed: Fix missing DORQ attentions
...
The full memory barrier in the XDP socket rings on the consumer side
between the load of the data and the store of the consumer ring is
there to protect the store from being executed before the load of the
data. If this was allowed to happen, the producer might overwrite the
data field with a new entry before the consumer got the chance to read
it.
On x86, stores are guaranteed not to be reordered with older loads, so
it does not need a full memory barrier here. A compile time barrier
would be enough. This patch introdcues a new primitive in
libbpf_util.h that implements a new barrier type (libbpf_smp_rwmb)
hindering stores to be reordered with older loads. It is then used in
the XDP socket ring access code in libbpf to improve performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The use of smp_rmb() and smp_wmb() creates a Linux header dependency
on barrier.h that is unnecessary in most parts. This patch implements
the two small defines that are needed from barrier.h. As a bonus, the
new implementations are faster than the default ones as they default
to sfence and lfence for x86, while we only need a compiler barrier in
our case. Just as it is when the same ring access code is compiled in
the kernel.
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch removes the use of likely and unlikely in xsk.h since they
create a dependency on Linux headers as reported by several
users. There have also been reports that the use of these decreases
performance as the compiler puts the code on two different cache lines
instead of on a single one. All in all, I think we are better off
without them.
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The ring buffer code of XDP sockets is missing a memory barrier on the
consumer side between the load of the data and the write that signals
that it is ok for the producer to put new data into the buffer. On
architectures that does not guarantee that stores are not reordered
with older loads, the producer might put data into the ring before the
consumer had the chance to read it. As IA does guarantee this
ordering, it would only need a compiler barrier here, but there are no
primitives in barrier.h for this specific case (hinder writes to be ordered
before older reads) so I had to add a smp_mb() here which will
translate into a run-time synch operation on IA.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Let's print btf id of map similar to the way we are printing it
for programs.
Sample output:
user@test# bpftool map -f
61: lpm_trie flags 0x1
key 20B value 8B max_entries 1 memlock 4096B
133: array name test_btf_id flags 0x0
key 4B value 4B max_entries 4 memlock 4096B
pinned /sys/fs/bpf/test100
btf_id 174
170: array name test_btf_id flags 0x0
key 4B value 4B max_entries 4 memlock 4096B
btf_id 240
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Let's move the final newline printing in show_map_close_plain() at
the end of the function because it looks correct and consistent with
prog.c. Also let's do related changes for the line which prints
pinned file name.
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add support for recently added BPF_PROG_TYPE_CGROUP_SYSCTL program type
and BPF_CGROUP_SYSCTL attach type.
Example of bpftool output with sysctl program from selftests:
# bpftool p load ./test_sysctl_prog.o /mnt/bpf/sysctl_prog type cgroup/sysctl
# bpftool p l
9: cgroup_sysctl name sysctl_tcp_mem tag 0dd05f81a8d0d52e gpl
loaded_at 2019-04-16T12:57:27-0700 uid 0
xlated 1008B jited 623B memlock 4096B
# bpftool c a /mnt/cgroup2/bla sysctl id 9
# bpftool c t
CgroupPath
ID AttachType AttachFlags Name
/mnt/cgroup2/bla
9 sysctl sysctl_tcp_mem
# bpftool c d /mnt/cgroup2/bla sysctl id 9
# bpftool c t
CgroupPath
ID AttachType AttachFlags Name
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Using %ld for printing out value of ptrdiff_t type is not portable
between 32-bit and 64-bit archs. This is causing compilation errors for
libbpf on 32-bit platform (discovered as part of an effort to integrate
libbpf into systemd ([0])). Proper formatter is %td, which is used in
this patch.
v2->v1:
- add Reported-by
- provide more context on how this issue was discovered
[0] https://github.com/systemd/systemd/pull/12151
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds tests validating that VRF and BPF-LWT
encap work together well, as requested by David Ahern.
Signed-off-by: Peter Oskolkov <posk@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In order to keep tests from hanging forever, this adds an alarm signal
to each test run. This assumes an individual test doesn't take longer
than 30 seconds.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When running without USERNS or PIDNS the seccomp test would hang since
it was waiting forever for the child to trigger the user notification
since it seems the glibc() abort handler makes a call to getpid(),
which would trap again. This changes the getpid filter to getppid, and
makes sure ASSERTs execute to stop from spawning the listener.
Reported-by: Shuah Khan <shuah@kernel.org>
Fixes: 6a21cc50f0 ("seccomp: add a return code to trap to userspace")
Cc: stable@vger.kernel.org # > 5.0
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tycho Andersen <tycho@tycho.ws>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
* Fixes for nested VMX with ept=0
* Fixes for AMD (APIC virtualization, NMI injection)
* Fixes for Hyper-V under KVM and KVM under Hyper-V
* Fixes for 32-bit SMM and tests for SMM virtualization
* More array_index_nospec peppering
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJctdrUAAoJEL/70l94x66Deq8H/0OEIBBuDt53nPEHXufNSV1S
uzIVvwJoL6786URWZfWZ99Z/NTTA1rn9Vr/leLPkSidpDpw7IuK28KZtEMP2rdRE
Sb8eN2g4SoQ51ZDSIMUzjcx9VGNqkH8CWXc2yhDtTUSD21S3S1kidZ0O0YbmetkJ
OwF1EDx4m7JO6EUHaJhIfdTUb9ItRC1Vfo7hpOuRVxPx2USv5+CLbexpteKogMcI
5WDaXFIRwUWW6Z8Bwyi7yA9gELKcXTTXlz9T/A7iKeqxRMLBazVKnH8h7Lfd0M0A
wR4AI+tE30MuHT7WLh1VOAKZk6TDabq9FJrva3JlDq+T+WOjgUzYALLKEd4Vv4o=
=zsT5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"5.1 keeps its reputation as a big bugfix release for KVM x86.
- Fix for a memory leak introduced during the merge window
- Fixes for nested VMX with ept=0
- Fixes for AMD (APIC virtualization, NMI injection)
- Fixes for Hyper-V under KVM and KVM under Hyper-V
- Fixes for 32-bit SMM and tests for SMM virtualization
- More array_index_nospec peppering"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
KVM: fix spectrev1 gadgets
KVM: x86: fix warning Using plain integer as NULL pointer
selftests: kvm: add a selftest for SMM
selftests: kvm: fix for compilers that do not support -no-pie
selftests: kvm/evmcs_test: complete I/O before migrating guest state
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
KVM: x86: clear SMM flags before loading state while leaving SMM
KVM: x86: Open code kvm_set_hflags
KVM: x86: Load SMRAM in a single shot when leaving SMM
KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
KVM: x86: Raise #GP when guest vCPU do not support PMU
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
KVM: x86: svm: make sure NMI is injected after nmi_singlestep
svm/avic: Fix invalidate logical APIC id entry
Revert "svm: Fix AVIC incomplete IPI emulation"
kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM: nVMX: always use early vmcs check when EPT is disabled
KVM: nVMX: allow tests to use bad virtual-APIC page address
...
Picking the changes from:
Fixes: b5bdbb6ccd ("ALSA: uapi: #include <time.h> in asound.h")
Which entails no changes in the tooling side.
To silence this perf tools build warning:
Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h'
diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takashi Iwai <tiwai@suse.de>
Link: https://lkml.kernel.org/n/tip-15o4twfkbn6nny9aus90dyzx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bastian reported broken 'perf top -p PID' command, it won't display any
data.
The problem is that for -p option we monitor single thread, so we don't
enable time in samples, because it's not needed.
However since commit 16c66bc167 we use ordered queues to stash data
plus later commits added logic for dropping samples in case there's big
load and we don't keep up. All this needs timestamp for sample. Enabling
it unconditionally for perf top.
Reported-by: Bastian Beischer <bastian.beischer@rwth-aachen.de>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bastian beischer <bastian.beischer@rwth-aachen.de>
Fixes: 16c66bc167 ("perf top: Add processing thread")
Link: http://lkml.kernel.org/r/20190415125333.27160-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On 32-bits platform with more than 32 registers, the 64 bits mask is
truncate to the lower 32 bits and the return value of hweight_long will
always smaller than 32. When kernel outputs more than 32 registers, but
the user perf program only counts 32, there will be a data mismatch
result to overflow check fail.
Signed-off-by: Mao Han <han_mao@c-sky.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Fixes: 6a21c0b5c2 ("perf tools: Add core support for sampling intr machine state regs")
Fixes: d03f217054 ("perf tools: Expand perf_event__synthesize_sample()")
Fixes: 0f6a30150c ("perf tools: Support user regs and stack in sample parsing")
Link: http://lkml.kernel.org/r/29ad7947dc8fd1ff0abd2093a72cc27a2446be9f.1554883878.git.han_mao@c-sky.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There was a missing comparison with 0 when checking if type is "s64" or
"u64". Therefore, the body of the if-statement was entered if "type" was
"u64" or not "s64", which made the first strcmp() redundant since if
type is "u64", it's not "s64".
If type is "s64", the body of the if-statement is not entered but since
the remainder of the function consists of if-statements which will not
be entered if type is "s64", we will just return "val", which is
correct, albeit at the cost of a few more calls to strcmp(), i.e., it
will behave just as if the if-statement was entered.
If type is neither "s64" or "u64", the body of the if-statement will be
entered incorrectly and "val" returned. This means that any type that is
checked after "s64" and "u64" is handled the same way as "s64" and
"u64", i.e., the limiting of "val" to fit in for example "s8" is never
reached.
This was introduced in the kernel tree when the sources were copied from
trace-cmd in commit f7d82350e5 ("tools/events: Add files to create
libtraceevent.a"), and in the trace-cmd repo in 1cdbae6035cei
("Implement typecasting in parser") when the function was introduced,
i.e., it has always behaved the wrong way.
Detected by cppcheck.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Fixes: f7d82350e5 ("tools/events: Add files to create libtraceevent.a")
Link: http://lkml.kernel.org/r/20190409091529.2686-1-rikard.falkeborn@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo reported assertion in perf stat record:
assertion failed at util/header.c:875
There's no support for this in the 'perf state record' command, disable
the feature for that case.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 258031c017 ("perf header: Add DIR_FORMAT feature to describe directory data")
Link: http://lkml.kernel.org/r/20190409100156.20303-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix lock/unlock imbalances by refactoring the code a bit and adding
calls to up_write() before return.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Addresses-Coverity-ID: 1444315 ("Missing unlock")
Addresses-Coverity-ID: 1444316 ("Missing unlock")
Fixes: a70a112317 ("perf bpf: Save BTF information as headers to perf.data")
Fixes: 606f972b13 ("perf bpf: Save bpf_prog_info information as headers to perf.data")
Link: http://lkml.kernel.org/r/20190408173355.GA10501@embeddedor
[ Simplified the exit path to have just one up_write() + return ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a simple test for SMM, based on VMX. The test implements its own
sync between the guest and the host as using our ucall library seems to
be too cumbersome: SMI handler is happening in real-address mode.
This patch also fixes KVM_SET_NESTED_STATE to happen after
KVM_SET_VCPU_EVENTS, in fact it places it last. This is because
KVM needs to know whether the processor is in SMM or not.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-no-pie was added to GCC at the same time as their configuration option
--enable-default-pie. Compilers that were built before do not have
-no-pie, but they also do not need it. Detect the option at build
time.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Starting state migration after an IO exit without first completing IO
may result in test failures. We already have two tests that need this
(this patch in fact fixes evmcs_test, similar to what was fixed for
state_test in commit 0f73bbc851, "KVM: selftests: complete IO before
migrating guest state", 2019-03-13) and a third is coming. So, move the
code to vcpu_save_state, and while at it do not access register state
until after I/O is complete.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Rewrite selftest to iterate over an array with input packet and
expected flow_keys. This should make it easier to extend this test
with additional cases without too much boilerplate.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add two tests to check that sequence of 1024 jumps is verifiable.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
avoids outputting a series of
value:
No space left on device
The value itself is not wrong but bpf_fd_reuseport_array_lookup_elem() can
only return it if the map was created with value_size = 8. There's nothing
bpftool can do about it. Instead of repeating this error for every key in
the map, print an explanatory warning and a specialized error.
example before:
key: 00 00 00 00
value:
No space left on device
key: 01 00 00 00
value:
No space left on device
key: 02 00 00 00
value:
No space left on device
Found 0 elements
example after:
Warning: cannot read values from reuseport_sockarray map with value_size != 8
key: 00 00 00 00 value: <cannot read>
key: 01 00 00 00 value: <cannot read>
key: 02 00 00 00 value: <cannot read>
Found 0 elements
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit bf598a8f0f ("bpftool: Improve handling of ENOENT on map dumps")
used print_entry_plain() in case of ENOENT. However, that commit introduces
dead code. Per-cpu maps are zero-filled. When reading them, it's all or
nothing. There will never be a case where some cpus have an entry and
others don't.
The truth is that ENOENT is an error case. Use print_entry_error() to
output the desired message. That function's "value" parameter is also
renamed to indicate that we never use it for an actual map value.
The output format is unchanged.
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Linux kernel now supports statistics for BPF programs, and bpftool is
able to dump them. However, these statistics are not enabled by default,
and administrators may not know how to access them.
Add a paragraph in bpftool documentation, under the description of the
"bpftool prog show" command, to explain that such statistics are
available and that their collection is controlled via a dedicated sysctl
knob.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Manual pages would tell that option "-v" (lower case) would print the
version number for bpftool. This is wrong: the short name of the option
is "-V" (upper case). Fix the documentation accordingly.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The "pinmaps" keyword is present in the man page, in the verbose
description of the "bpftool prog load" command. However, it is missing
from the summary of available commands at the beginning of the file. Add
it there as well.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When trying to dump the tree of all cgroups under a given root node,
bpftool attempts to query programs of all available attach types. Some
of those attach types do not support queries, therefore several of the
calls are actually expected to fail.
Those calls set errno to EINVAL, which has no consequence for dumping
the rest of the tree. It does have consequences however if errno is
inspected at a later time. For example, bpftool batch mode relies on
errno to determine whether a command has succeeded, and whether it
should carry on with the next command. Setting errno to EINVAL when
everything worked as expected would therefore make such command fail:
# echo 'cgroup tree \n net show' | \
bpftool batch file -
To improve this, reset errno when its value is EINVAL after attempting
to show programs for all existing attach types in do_show_tree_fn().
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Commit 569b0c7773 ("tools/bpftool: show btf id in program information")
made bpftool print an empty line after each program entry when listing
the BPF programs loaded on the system (plain output). This is especially
confusing when some programs have an associated BTF id, and others
don't. Let's remove the blank line.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
replace tab after #define with space in line with rest of definitions
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
It was removed in commit 166b5a7f2c ("selftests_bpf: extend
test_tc_tunnel for UDP encap") without any explanation.
Otherwise I see:
progs/test_tc_tunnel.c:160:17: warning: taking address of packed member 'ip' of class or structure
'v4hdr' may result in an unaligned pointer value [-Waddress-of-packed-member]
set_ipv4_csum(&h_outer.ip);
^~~~~~~~~~
1 warning generated.
Cc: Alan Maguire <alan.maguire@oracle.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Fixes: 166b5a7f2c ("selftests_bpf: extend test_tc_tunnel for UDP encap")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add test case verifying that dedup happens (INTs are deduped in this
case) and VAR/DATASEC types are not deduped, but have their referenced
type IDs adjusted correctly.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch adds support for VAR and DATASEC in btf_dedup(). VAR/DATASEC
are never deduplicated, but they need to be processed anyway as types
they refer to might need to be remapped due to deduplication and
compaction.
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Yonghong Song <yhs@fb.com>
Cc: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
- Compatibility fix for nvdimm-security implementations with a default
zero-key.
- Miscellaneous small fixes for out-of-bound accesses, cleanup after
initialization failures, and missing debug messages.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJctQ6UAAoJEB7SkWpmfYgCV4AQAL18Kv0VJjeWzMirH+Q5B9Z2
WdHzKBvOWUWx8HeUhQoTtP+QnWriXI37EmhD7S34mVJZdYXxIQJBESPpFF1IpjUi
jMibrdgrPAzyXq+x6FS4gHwi8uwUwwHOYfBEPV+7UvA8Zi8AU+g1Sgl+FftY34Em
ACWc8/BtMtnwr2xFZT/4brzDCyvVHTK7f9HB280Je7DU6ghjEAaRFqqFXgAAbQ+l
HAOQz4GVweT/JUmu4cvABGwllTN3np4wR/ePKYdlZTVWpN02InECukiSFtgCWN4S
+bKm5EMTGDprLtNDz3m1SDWPrGSkWkoEtmVZljAXepJzAcZ1qbEw4xe++Kqrgr0z
YOawM0lMciTp78uiH797thYnS3fo5+Ccr0WE4lhrSC3kAZE+EfGvbyhv3T+Pz3M+
Z3hEpz+gGNMBwby0AmCLJHfwyujztNBE5hnXcsL5dC6BXKHZGZSgsUllRcZJ+xJ1
H6b5sdxmNvn7Ja0svhKJzfpP4j8v25v+KEns9VlbIejJAp62cQCmA1dHlGaC5pDc
0g9mtPbYsEZjKQ5/5grHgtre+JYmYDAIKwS4UK11ZyChqR+tmZ2Cp7XgmVab9a7T
QpFLczMV/Q8NSWIFYSHvXjj1/PWtUxf81lEtA+Y3+mDznn30QctPwufPcdxeTNJs
KSyFKhhKIOnasEplrLu4
=zISv
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
"I debated holding this back for the v5.2 merge window due to the size
of the "zero-key" changes, but affected users would benefit from
having the fixes sooner. It did not make sense to change the zero-key
semantic in isolation for the "secure-erase" command, but instead
include it for all security commands.
The short background on the need for these changes is that some NVDIMM
platforms enable security with a default zero-key rather than let the
OS specify the initial key. This makes the security enabling that
landed in v5.0 unusable for some users.
Summary:
- Compatibility fix for nvdimm-security implementations with a
default zero-key.
- Miscellaneous small fixes for out-of-bound accesses, cleanup after
initialization failures, and missing debug messages"
* tag 'libnvdimm-fixes-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: Retain security state after overwrite
libnvdimm/pmem: fix a possible OOB access when read and write pmem
libnvdimm/security, acpi/nfit: unify zero-key for all security commands
libnvdimm/security: provide fix for secure-erase to use zero-key
libnvdimm/btt: Fix a kmemdup failure check
libnvdimm/namespace: Fix a potential NULL pointer dereference
acpi/nfit: Always dump _DSM output payload
Test that neighbour entries are marked as offloaded.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Remove the broute pseudo hook, implement this from the bridge
prerouting hook instead. Now broute becomes real table in ebtables,
from Florian Westphal. This also includes a size reduction patch for the
bridge control buffer area via squashing boolean into bitfields and
a selftest.
2) Add OS passive fingerprint version matching, from Fernando Fernandez.
3) Support for gue encapsulation for IPVS, from Jacky Hu.
4) Add support for NAT to the inet family, from Florian Westphal.
This includes support for masquerade, redirect and nat extensions.
5) Skip interface lookup in flowtable, use device in the dst object.
6) Add jiffies64_to_msecs() and use it, from Li RongQing.
7) Remove unused parameter in nf_tables_set_desc_parse(), from Colin Ian King.
8) Statify several functions, patches from YueHaibing and Florian Westphal.
9) Add an optimized version of nf_inet_addr_cmp(), from Li RongQing.
10) Merge route extension to core, also from Florian.
11) Use IS_ENABLED(CONFIG_NF_NAT) instead of NF_NAT_NEEDED, from Florian.
12) Merge ip/ip6 masquerade extensions, from Florian. This includes
netdevice notifier unification.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functions.sh to TEST_PROGS_EXTENDED so that it is installed along
with the rest of the selftests and they can be run.
Originally-by: Shuah Khan <shuah@kernel.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Sven Auhagen reported that a 2nd ping request will fail if 'fully-random'
mode is used.
Reason is that if no proto information is given, min/max are both 0,
so we set the icmp id to 0 instead of chosing a random value between
0 and 65535.
Update test case as well to catch this, without fix this yields:
[..]
ERROR: cannot ping ns1 from ns2 with ip masquerade fully-random (attempt 2)
ERROR: cannot ping ns1 from ns2 with ipv6 masquerade fully-random (attempt 2)
... becaus 2nd ping clashes with existing 'id 0' icmp conntrack and gets
dropped.
Fixes: 203f2e7820 ("netfilter: nat: remove l4proto->unique_tuple")
Reported-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyw354QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiMyEAC4THUReCrTuv9oFRNg5uILVYIq51nP8dw7
XamC7A92jPXd6vl/QVjmvLwT34/Y2XvX0t62RBsk849CEjgGYTeF1/qI3tMkpN7c
huupab3aYM/Rrv4i1KSPQu6iIto3DYqfmREaGJJ1Ikbu/CKDuUGyEo+Z4wrKUPon
GWnE8QMS2fdc764eVzKKqB+GryaEiHmeD1N4NnPs+nla14ysueUvJUikkTt/Laef
h7nOmz9mrqE6u1xVHNpo0TlW0oJdLfaDIL9ghwHFJXqvriTh8Tg2tEHpXI6vSTTt
StnPbTA1s1uhHs4rWYl8J0UXSZnRRp0Ep8jCvqEb9CJ23uHCNyGEoy/R7q+x2quf
T+ruolMXY7IIJP30ZMHar374YfajJdw7EH/565nlbLnjSBXhqjmc07kQ7mIYvpg6
JgureSdDwOOHpfrJgVq5es48ndt5HBYUBPzkvVGTgkeSJkMydkkM1qZeYEnai105
8EnUFusRUnYZtb73HBPjKS7i0BZZvZlI1oKYHabiMtajqcKyvwDP2tTmhqXYLDLY
9uloW0u2B0lddfzCb9hTYZOroNWfifo4vuSU5DHvnJoKvf4z3auDxaFD9N8fGn6S
aZsRjMCpFqFd0YEnZPbsctgPg2Licrs02uPntlzBTJ0ByH20pX4OepYrvgQk3vao
tOQ1jRYMKw==
=cISy
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Set of fixes that should go into this round. This pull is larger than
I'd like at this time, but there's really no specific reason for that.
Some are fixes for issues that went into this merge window, others are
not. Anyway, this contains:
- Hardware queue limiting for virtio-blk/scsi (Dongli)
- Multi-page bvec fixes for lightnvm pblk
- Multi-bio dio error fix (Jason)
- Remove the cache hint from the io_uring tool side, since we didn't
move forward with that (me)
- Make io_uring SETUP_SQPOLL root restricted (me)
- Fix leak of page in error handling for pc requests (Jérôme)
- Fix BFQ regression introduced in this merge window (Paolo)
- Fix break logic for bio segment iteration (Ming)
- Fix NVMe cancel request error handling (Ming)
- NVMe pull request with two fixes (Christoph):
- fix the initial CSN for nvme-fc (James)
- handle log page offsets properly in the target (Keith)"
* tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
block: fix the return errno for direct IO
nvmet: fix discover log page when offsets are used
nvme-fc: correct csn initialization and increments on error
block: do not leak memory in bio_copy_user_iov()
lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
nvme: cancel request synchronously
blk-mq: introduce blk_mq_complete_request_sync()
scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
virtio-blk: limit number of hw queues by nr_cpu_ids
block, bfq: fix use after free in bfq_bfqq_expire
io_uring: restrict IORING_SETUP_SQPOLL to root
tools/io_uring: remove IOCQE_FLAG_CACHEHIT
block: don't use for-inside-for in bio_for_each_segment_all
When an icmp error such as pkttoobig is received, conntrack checks
if the "inner" header (header of packet that did not fit link mtu)
is matches an existing connection, and, if so, sets that packet as
being related to the conntrack entry it found.
It was recently reported that this "related" setting also works
if the inner header is from another, different connection (i.e.,
artificial/forged icmp error).
Add a test, followup patch will add additional "inner dst matches
outer dst in reverse direction" check before setting related state.
Link: https://www.synacktiv.com/posts/systems/icmp-reachable.html
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull core fixes from Ingo Molnar:
"Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
bug"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Add rewind_stack_do_exit() to the noreturn list
linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
Some netdevsim bpf debugfs files are per-sdev, yet they are defined per
netdevsim instance. Move them under sdev directory.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add C based test for a few bpf_sysctl_* helpers and bpf_strtoul.
Make sure that sysctl can be identified by name and that multiple
integers can be parsed from sysctl value with bpf_strtoul.
net/ipv4/tcp_mem is chosen as a testing sysctl, it contains 3 unsigned
longs, they all are parsed and compared (val[0] < val[1] < val[2]).
Example of output:
# ./test_sysctl
...
Test case: C prog: deny all writes .. [PASS]
Test case: C prog: deny access by name .. [PASS]
Test case: C prog: read tcp_mem .. [PASS]
Summary: 39 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that bpf_strtol and bpf_strtoul helpers can be used to convert
provided buffer to long or unsigned long correspondingly and return both
correct result and number of consumed bytes, or proper errno.
Example of output:
# ./test_sysctl
..
Test case: bpf_strtoul one number string .. [PASS]
Test case: bpf_strtoul multi number string .. [PASS]
Test case: bpf_strtoul buf_len = 0, reject .. [PASS]
Test case: bpf_strtoul supported base, ok .. [PASS]
Test case: bpf_strtoul unsupported base, EINVAL .. [PASS]
Test case: bpf_strtoul buf with spaces only, EINVAL .. [PASS]
Test case: bpf_strtoul negative number, EINVAL .. [PASS]
Test case: bpf_strtol negative number, ok .. [PASS]
Test case: bpf_strtol hex number, ok .. [PASS]
Test case: bpf_strtol max long .. [PASS]
Test case: bpf_strtol overflow, ERANGE .. [PASS]
Summary: 36 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that verifier handles new argument types properly, including
uninitialized or partially initialized value, misaligned stack access,
etc.
Example of output:
#456/p ARG_PTR_TO_LONG uninitialized OK
#457/p ARG_PTR_TO_LONG half-uninitialized OK
#458/p ARG_PTR_TO_LONG misaligned OK
#459/p ARG_PTR_TO_LONG size < sizeof(long) OK
#460/p ARG_PTR_TO_LONG initialized OK
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test access to file_pos field of bpf_sysctl context, both read (incl.
narrow read) and write.
# ./test_sysctl
...
Test case: ctx:file_pos sysctl:read read ok .. [PASS]
Test case: ctx:file_pos sysctl:read read ok narrow .. [PASS]
Test case: ctx:file_pos sysctl:read write ok .. [PASS]
...
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that new value provided by user space on sysctl write can be read
by bpf_sysctl_get_new_value and overridden by bpf_sysctl_set_new_value.
# ./test_sysctl
...
Test case: sysctl_get_new_value sysctl:read EINVAL .. [PASS]
Test case: sysctl_get_new_value sysctl:write ok .. [PASS]
Test case: sysctl_get_new_value sysctl:write ok long .. [PASS]
Test case: sysctl_get_new_value sysctl:write E2BIG .. [PASS]
Test case: sysctl_set_new_value sysctl:read EINVAL .. [PASS]
Test case: sysctl_set_new_value sysctl:write ok .. [PASS]
Summary: 22 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test sysctl_get_current_value on sysctl read and write, buffers with
enough space and too small buffers to get E2BIG and truncated result,
etc.
# ./test_sysctl
...
Test case: sysctl_get_current_value sysctl:read ok, gt .. [PASS]
Test case: sysctl_get_current_value sysctl:read ok, eq .. [PASS]
Test case: sysctl_get_current_value sysctl:read E2BIG truncated .. [PASS]
Test case: sysctl_get_current_value sysctl:read EINVAL .. [PASS]
Test case: sysctl_get_current_value sysctl:write ok .. [PASS]
Summary: 16 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test w/ and w/o BPF_F_SYSCTL_BASE_NAME, buffers with enough space and
too small buffers to get E2BIG and truncated result, etc.
# ./test_sysctl
...
Test case: sysctl_get_name sysctl_value:base ok .. [PASS]
Test case: sysctl_get_name sysctl_value:base E2BIG truncated .. [PASS]
Test case: sysctl_get_name sysctl:full ok .. [PASS]
Test case: sysctl_get_name sysctl:full E2BIG truncated .. [PASS]
Test case: sysctl_get_name sysctl:full E2BIG truncated small .. [PASS]
Summary: 11 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add unit test for BPF_PROG_TYPE_CGROUP_SYSCTL program type.
Test that program can allow/deny access.
Test both valid and invalid accesses to ctx->write.
Example of output:
# ./test_sysctl
Test case: sysctl wrong attach_type .. [PASS]
Test case: sysctl:read allow all .. [PASS]
Test case: sysctl:read deny all .. [PASS]
Test case: ctx:write sysctl:read read ok .. [PASS]
Test case: ctx:write sysctl:write read ok .. [PASS]
Test case: ctx:write sysctl:read write reject .. [PASS]
Summary: 6 PASSED, 0 FAILED
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add unit test to verify that program and attach types are properly
identified for "cgroup/sysctl" section name.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Support BPF_PROG_TYPE_CGROUP_SYSCTL program in libbpf: identifying
program and attach types by section name, probe.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
pmtu.sh script runs a number of tests and dumps a summary of pass/fail.
If a test fails, it is near impossible to debug why. For example:
TEST: ipv6: PMTU exceptions [FAIL]
There are a lot of commands run behind the scenes for this test. Which
one is failing?
Add a VERBOSE option to show commands that are run and any output from
those commands. Add a PAUSE_ON_FAIL option to halt the script if a test
fails allowing users to poke around with the setup in the failed state.
In the process, rename tracing to TRACING and move declaration to top
with the new variables.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-04-12
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Improve BPF verifier scalability for large programs through two
optimizations: i) remove verifier states that are not useful in pruning,
ii) stop walking parentage chain once first LIVE_READ is seen. Combined
gives approx 20x speedup. Increase limits for accepting large programs
under root, and add various stress tests, from Alexei.
2) Implement global data support in BPF. This enables static global variables
for .data, .rodata and .bss sections to be properly handled which allows
for more natural program development. This also opens up the possibility
to optimize program workflow by compiling ELFs only once and later only
rewriting section data before reload, from Daniel and with test cases and
libbpf refactoring from Joe.
3) Add config option to generate BTF type info for vmlinux as part of the
kernel build process. DWARF debug info is converted via pahole to BTF.
Latter relies on libbpf and makes use of BTF deduplication algorithm which
results in 100x savings compared to DWARF data. Resulting .BTF section is
typically about 2MB in size, from Andrii.
4) Add BPF verifier support for stack access with variable offset from
helpers and add various test cases along with it, from Andrey.
5) Extend bpf_skb_adjust_room() growth BPF helper to mark inner MAC header
so that L2 encapsulation can be used for tc tunnels, from Alan.
6) Add support for input __sk_buff context in BPF_PROG_TEST_RUN so that
users can define a subset of allowed __sk_buff fields that get fed into
the test program, from Stanislav.
7) Add bpf fs multi-dimensional array tests for BTF test suite and fix up
various UBSAN warnings in bpftool, from Yonghong.
8) Generate a pkg-config file for libbpf, from Luca.
9) Dump program's BTF id in bpftool, from Prashant.
10) libbpf fix to use smaller BPF log buffer size for AF_XDP's XDP
program, from Magnus.
11) kallsyms related fixes for the case when symbols are not present in
BPF selftests and samples, from Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ebtables -t broute allows to redirect packets in a way that
they get pushed up the stack, even if the interface is part
of a bridge.
In case of IP packets to non-local address, this means
those IP packets are routed instead of bridged-forwarded, just
as if the bridge would not have existed.
Expected test output is:
PASS: netns connectivity: ns1 and ns2 can reach each other
PASS: ns1/ns2 connectivity with active broute rule
PASS: ns1/ns2 connectivity with active broute rule and bridge forward drop
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Add the definition for smp_rmb(), smp_wmb(), and smp_mb() to the
tools include infrastructure: this patch adds the implementation
for x86-64 and arm64, and have it fall back as currently is for
other archs which do not have it implemented at this point. The
x86-64 one uses lock + add combination for smp_mb() with address
below red zone.
This is on top of 09d62154f6 ("tools, perf: add and use optimized
ring_buffer_{read_head, write_tail} helpers"), which didn't touch
smp_* barrier implementations. Magnus recently rightfully reported
however that the latter on x86-64 still wrongly falls back to sfence,
lfence and mfence respectively, thus fix that for applications under
tools making use of these to avoid such ugly surprises. The main
header under tools (include/asm/barrier.h) will in that case not
select the fallback implementation.
Reported-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:
$ ip ro ls match
Command line is not complete. Try option "help"
Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.
Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.
Fixes: d69faad765 ("selftests: fib_tests: Add prefix route tests with metric")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sync include/uapi/linux/bpf.h with tools/ equivalent to add
BPF_F_ADJ_ROOM_ENCAP_L2(len) macro.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
commit 868d523535 ("bpf: add bpf_skb_adjust_room encap flags")
introduced support to bpf_skb_adjust_room for GSO-friendly GRE
and UDP encapsulation and later introduced associated test_tc_tunnel
tests. Here those tests are extended to cover UDP encapsulation also.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Implementation of function rhashtable_insert_fast() check if its internal
helper function __rhashtable_insert_fast() returns non-NULL pointer and
seemingly return -EEXIST in such case. However, since
__rhashtable_insert_fast() is called with NULL key pointer, it never
actually checks for duplicates, which means that -EEXIST is never returned
to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In
order to verify that it works as expected and prevent the problem from
happening in future, extend tc-tests with new test that verifies that no
new filters with existing key can be inserted to flower classifier.
Fixes: 1f17f7742e ("net: sched: flower: insert filter to ht before offloading it to hw")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handling pcitest.sh along with pcitest (obtained by compiling
pcitest.c) results in pcitest.sh getting removed inadvertently
while "make -C tools/pci clean".
Fix it by handling pcitest.sh independently of pcitest.
Fixes: 1ce78ce094 ("tools: PCI: Change pcitest compiling process")
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
'h' is a valid option character for the pcitest tool used to print
the pcitest usage.
Add 'h' in optstring of getopt() in order to get rid of "pcitest:
invalid option -- 'h'" warning.
While at that remove unncessary case '?'.
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Simple test that sets cb to {1,2,3,4,5} and priority to 6, runs bpf
program that fails if cb is not what we expect and increments cb[i] and
priority. When the test finishes, we check that cb is now {2,3,4,5,6}
and priority is 7.
We also test the sanity checks:
* ctx_in is provided, but ctx_size_in is zero (same for
ctx_out/ctx_size_out)
* unexpected non-zero fields in __sk_buff return EINVAL
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Support recently introduced input/output context for test runs.
We extend only bpf_prog_test_run_xattr. bpf_prog_test_run is
unextendable and left as is.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Let's add a way to know whether a program has btf context.
Patch adds 'btf_id' in the output of program listing.
When btf_id is present, it means program has btf context.
Sample output:
user@test# bpftool prog list
25: xdp name xdp_prog1 tag 539ec6ce11b52f98 gpl
loaded_at 2019-04-10T11:44:20+0900 uid 0
xlated 488B not jited memlock 4096B map_ids 23
btf_id 1
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reported in [1].
With gcc 8.3.0 the following error is issued:
cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi
-fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer
-pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2
-Werror -Wall -Wno-pointer-arith -Wno-sign-compare -MD -MQ
'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o
'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c
../src/libbpf.c: In function 'bpf_object__elf_collect':
../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
if (map_def_sz <= sizeof(struct bpf_map_def)) {
~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/libbpf.c:827:18: note: 'map_def_sz' was declared here
int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
^~~~~~~~~~
According to [2] -Wmaybe-uninitialized is enabled by -Wall.
Same error is generated by clang's -Wconditional-uninitialized.
[1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601
[2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
Fixes: d859900c4c ("bpf, libbpf: support global data/bss/rodata sections")
Reported-by: Evgeny Vereshchagin <evvers@ya.ru>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Test that it is possible to set an IP address on a VRF and that it is
not vetoed.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit da11b41758 ("libbpf: teach libbpf about log_level bit 2"),
the BPF_LOG_BUF_SIZE was increased to 16M. The XDP socket part of
libbpf allocated the log_buf on the stack, but for the new 16M buffer
size this is not going to work. Change the code so it uses a 16K buffer
instead.
Fixes: da11b41758 ("libbpf: teach libbpf about log_level bit 2")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The issue is reported at https://github.com/libbpf/libbpf/issues/28.
Basically, per C standard, for
void *memcpy(void *dest, const void *src, size_t n)
if "dest" or "src" is NULL, regardless of whether "n" is 0 or not,
the result of memcpy is undefined. clang ubsan reported three such
instances in bpf.c with the following pattern:
memcpy(dest, 0, 0).
Although in practice, no known compiler will cause issues when
copy size is 0. Let us still fix the issue to silence ubsan
warnings.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Extend test_btf with various positive and negative tests around
BTF verification of kind Var and DataSec. All passing as well:
# ./test_btf
[...]
BTF raw test[4] (global data test #1): OK
BTF raw test[5] (global data test #2): OK
BTF raw test[6] (global data test #3): OK
BTF raw test[7] (global data test #4, unsupported linkage): OK
BTF raw test[8] (global data test #5, invalid var type): OK
BTF raw test[9] (global data test #6, invalid var type (fwd type)): OK
BTF raw test[10] (global data test #7, invalid var type (fwd type)): OK
BTF raw test[11] (global data test #8, invalid var size): OK
BTF raw test[12] (global data test #9, invalid var size): OK
BTF raw test[13] (global data test #10, invalid var size): OK
BTF raw test[14] (global data test #11, multiple section members): OK
BTF raw test[15] (global data test #12, invalid offset): OK
BTF raw test[16] (global data test #13, invalid offset): OK
BTF raw test[17] (global data test #14, invalid offset): OK
BTF raw test[18] (global data test #15, not var kind): OK
BTF raw test[19] (global data test #16, invalid var referencing sec): OK
BTF raw test[20] (global data test #17, invalid var referencing var): OK
BTF raw test[21] (global data test #18, invalid var loop): OK
BTF raw test[22] (global data test #19, invalid var referencing var): OK
BTF raw test[23] (global data test #20, invalid ptr referencing var): OK
BTF raw test[24] (global data test #21, var included in struct): OK
BTF raw test[25] (global data test #22, array of var): OK
[...]
PASS:167 SKIP:0 FAIL:0
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend test_verifier with various test cases around the two kernel
extensions, that is, {rd,wr}only map support as well as direct map
value access. All passing, one skipped due to xskmap not present
on test machine:
# ./test_verifier
[...]
#948/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
#949/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
#950/p XDP pkt read, pkt_data <= pkt_meta', good access OK
#951/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
#952/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
Summary: 1410 PASSED, 1 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the ability to bpftool to handle BTF Var and DataSec kinds
in order to dump them out of btf_dumper_type(). The value has a
single object with the section name, which itself holds an array
of variables it dumps. A single variable is an object by itself
printed along with its name. From there further type information
is dumped along with corresponding value information.
Example output from .rodata:
# ./bpftool m d i 150
[{
"value": {
".rodata": [{
"load_static_data.bar": 18446744073709551615
},{
"num2": 24
},{
"num5": 43947
},{
"num6": 171
},{
"str0": [97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,0,0,0,0,0,0
]
},{
"struct0": {
"a": 42,
"b": 4278120431,
"c": 1229782938247303441
}
},{
"struct2": {
"a": 0,
"b": 0,
"c": 0
}
}
]
}
}
]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This adds libbpf support for BTF Var and DataSec kinds. Main point
here is that libbpf needs to do some preparatory work before the
whole BTF object can be loaded into the kernel, that is, fixing up
of DataSec size taken from the ELF section size and non-static
variable offset which needs to be taken from the ELF's string section.
Upstream LLVM doesn't fix these up since at time of BTF emission
it is too early in the compilation process thus this information
isn't available yet, hence loader needs to take care of it.
Note, deduplication handling has not been in the scope of this work
and needs to be addressed in a future commit.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://reviews.llvm.org/D59441
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.
Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential live updates of their content. This work follows
this path by taking the following steps from loader side:
1) In bpf_object__elf_collect() step we pick up ".data",
".rodata", and ".bss" section information.
2) If present, in bpf_object__init_internal_map() we add
maps to the obj's map array that corresponds to each
of the present sections. Given section size and access
properties can differ, a single entry array map is
created with value size that is corresponding to the
ELF section size of .data, .bss or .rodata. These
internal maps are integrated into the normal map
handling of libbpf such that when user traverses all
obj maps, they can be differentiated from user-created
ones via bpf_map__is_internal(). In later steps when
we actually create these maps in the kernel via
bpf_object__create_maps(), then for .data and .rodata
sections their content is copied into the map through
bpf_map_update_elem(). For .bss this is not necessary
since array map is already zero-initialized by default.
Additionally, for .rodata the map is frozen as read-only
after setup, such that neither from program nor syscall
side writes would be possible.
3) In bpf_program__collect_reloc() step, we record the
corresponding map, insn index, and relocation type for
the global data.
4) And last but not least in the actual relocation step in
bpf_program__relocate(), we mark the ldimm64 instruction
with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
imm field the map's file descriptor is stored as similarly
done as in BPF_PSEUDO_MAP_FD, and in the second imm field
(as ldimm64 is 2-insn wide) we store the access offset
into the section. Given these maps have only single element
ldimm64's off remains zero in both parts.
5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
load will then store the actual target address in order
to have a 'map-lookup'-free access. That is, the actual
map value base address + offset. The destination register
in the verifier will then be marked as PTR_TO_MAP_VALUE,
containing the fixed offset as reg->off and backing BPF
map as reg->map_ptr. Meaning, it's treated as any other
normal map value from verification side, only with
efficient, direct value access instead of actual call to
map lookup helper as in the typical case.
Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat multi-object BPF loads. From BTF side,
libbpf will set the value type id of the types corresponding
to the ".bss", ".data" and ".rodata" names which LLVM will
emit without the object name prefix. The key type will be
left as zero, thus making use of the key-less BTF option in
array maps.
Simple example dump of program using globals vars in each
section:
# bpftool prog
[...]
6784: sched_cls name load_static_dat tag a7e1291567277844 gpl
loaded_at 2019-03-11T15:39:34+0000 uid 0
xlated 1776B jited 993B memlock 4096B map_ids 2238,2237,2235,2236,2239,2240
# bpftool map show id 2237
2237: array name test_glo.bss flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2235
2235: array name test_glo.data flags 0x0
key 4B value 64B max_entries 1 memlock 4096B
# bpftool map show id 2236
2236: array name test_glo.rodata flags 0x80
key 4B value 96B max_entries 1 memlock 4096B
# bpftool prog dump xlated id 6784
int load_static_data(struct __sk_buff * skb):
; int load_static_data(struct __sk_buff *skb)
0: (b7) r6 = 0
; test_reloc(number, 0, &num0);
1: (63) *(u32 *)(r10 -4) = r6
2: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
3: (07) r2 += -4
; test_reloc(number, 0, &num0);
4: (18) r1 = map[id:2238]
6: (18) r3 = map[id:2237][0]+0 <-- direct addr in .bss area
8: (b7) r4 = 0
9: (85) call array_map_update_elem#100464
10: (b7) r1 = 1
; test_reloc(number, 1, &num1);
[...]
; test_reloc(string, 2, str2);
120: (18) r8 = map[id:2237][0]+16 <-- same here at offset +16
122: (18) r1 = map[id:2239]
124: (18) r3 = map[id:2237][0]+16
126: (b7) r4 = 0
127: (85) call array_map_update_elem#100464
128: (b7) r1 = 120
; str1[5] = 'x';
129: (73) *(u8 *)(r9 +5) = r1
; test_reloc(string, 3, str1);
130: (b7) r1 = 3
131: (63) *(u32 *)(r10 -4) = r1
132: (b7) r9 = 3
133: (bf) r2 = r10
; int load_static_data(struct __sk_buff *skb)
134: (07) r2 += -4
; test_reloc(string, 3, str1);
135: (18) r1 = map[id:2239]
137: (18) r3 = map[id:2235][0]+16 <-- direct addr in .data area
139: (b7) r4 = 0
140: (85) call array_map_update_elem#100464
141: (b7) r1 = 111
; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
142: (73) *(u8 *)(r8 +6) = r1 <-- further access based on .bss data
143: (b7) r1 = 108
144: (73) *(u8 *)(r8 +5) = r1
[...]
For Cilium use-case in particular, this enables migrating configuration
constants from Cilium daemon's generated header defines into global
data sections such that expensive runtime recompilations with LLVM can
be avoided altogether. Instead, the ELF file becomes effectively a
"template", meaning, it is compiled only once (!) and the Cilium daemon
will then rewrite relevant configuration data from the ELF's .data or
.rodata sections directly instead of recompiling the program. The
updated ELF is then loaded into the kernel and atomically replaces
the existing program in the networking datapath. More info in [0].
Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
for static variables").
[0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
http://vger.kernel.org/lpc-bpf2018.html#session-3
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adjust the code for relocations slightly with no functional changes,
so that upcoming patches that will introduce support for relocations
into the .data, .rodata and .bss sections can be added independent
of these changes.
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull in latest changes from both headers, so we can make use of
them in libbpf.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This generic extension to BPF maps allows for directly loading
an address residing inside a BPF map value as a single BPF
ldimm64 instruction!
The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. The verifier will then
replace both imm parts with an address that points into the BPF
map value at the given value offset for maps that support this
operation. Currently supported is array map with single entry.
It is possible to support more than just single map element by
reusing both 16bit off fields of the insns as a map index, so
full array map lookup could be expressed that way. It hasn't
been implemented here due to lack of concrete use case, but
could easily be done so in future in a compatible way, since
both off fields right now have to be 0 and would correctly
denote a map index 0.
The BPF_PSEUDO_MAP_VALUE is a distinct flag as otherwise with
BPF_PSEUDO_MAP_FD we could not differ offset 0 between load of
map pointer versus load of map's value at offset 0, and changing
BPF_PSEUDO_MAP_FD's encoding into off by one to differ between
regular map pointer and map value pointer would add unnecessary
complexity and increases barrier for debugability thus less
suitable. Using the second part of the imm field as an offset
into the value does /not/ come with limitations since maximum
possible value size is in u32 universe anyway.
This optimization allows for efficiently retrieving an address
to a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg->off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
extra needed from verification side.
The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ACPICA commit 24870bd9e73d71e2a1ff0a1e94519f8f8409e57d
ACPI_NAME_SIZE changed to ACPI_NAMESEG_SIZE
This clarifies that this is the length of an individual
nameseg, not the length of a generic namestring/namepath.
Improves understanding of the code.
Link: https://github.com/acpica/acpica/commit/24870bd9
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 92ec0935f27e217dff0b176fca02c2ec3d782bb5
ACPI_COMPARE_NAME changed to ACPI_COMPARE_NAMESEG
This clarifies (1) this is a compare on 4-byte namesegs, not
a generic compare. Improves understanding of the code.
Link: https://github.com/acpica/acpica/commit/92ec0935
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPICA commit 19c18d3157945d1b8b64a826f0a8e848b7dbb127
ACPI_MOVE_NAME changed to ACPI_COPY_NAMESEG
This clarifies (1) this is a copy operation, and
(2) it operates on ACPI name_segs.
Improves understanding of the code.
Link: https://github.com/acpica/acpica/commit/19c18d31
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Erik Schmauss <erik.schmauss@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking fixes from David Miller:
1) Off by one and bounds checking fixes in NFC, from Dan Carpenter.
2) There have been many weird regressions in r8169 since we turned ASPM
support on, some are still not understood nor completely resolved.
Let's turn this back off for now. From Heiner Kallweit.
3) Signess fixes for ethtool speed value handling, from Michael
Zhivich.
4) Handle timestamps properly in macb driver, from Paul Thomas.
5) Two erspan fixes, it's the usual "skb ->data potentially reallocated
and we're holding a stale protocol header pointer". From Lorenzo
Bianconi.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bnxt_en: Reset device on RX buffer errors.
bnxt_en: Improve RX consumer index validity check.
net: macb driver, check for SKBTX_HW_TSTAMP
qlogic: qlcnic: fix use of SPEED_UNKNOWN ethtool constant
broadcom: tg3: fix use of SPEED_UNKNOWN ethtool constant
ethtool: avoid signed-unsigned comparison in ethtool_validate_speed()
net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
net: ip_gre: fix possible use-after-free in erspan_rcv
r8169: disable ASPM again
MAINTAINERS: ieee802154: update documentation file pattern
net: vrf: Fix ping failed when vrf mtu is set to 0
selftests: add a tc matchall test case
nfc: nci: Potential off by one in ->pipes[] array
NFC: nci: Add some bounds checking in nci_hci_cmd_received()
In order to have control over how many bytes are read or written
the device needs to be opened in unbuffered mode.
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Three new tests added:
1. Send get random cmd, read header in 1st read, read the rest in second
read - expect success
2. Send get random cmd, read only part of the response, send another
get random command, read the response - expect success
3. Send get random cmd followed by another get random cmd, without
reading the first response - expect the second cmd to fail with -EBUSY
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Dan reported, that cleanup path in test_memcg_subtree_control()
triggers a static checker warning:
./tools/testing/selftests/cgroup/test_memcontrol.c:76 \
test_memcg_subtree_control()
error: uninitialized symbol 'child2'.
Fix this by initializing child2 and parent2 variables and
split the cleanup path into few stages.
Signed-off-by: Roman Gushchin <guro@fb.com>
Fixes: 84092dbcf9 ("selftests: cgroup: add memory controller self-tests")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
After the first run, the test case 'test_create_read' will always
fail because the file is exist and file's attr is 'S_IMMUTABLE',
open with 'O_RDWR' will always return -EPERM.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.
Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: linux-kselftest@vger.kernel.org
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Add a test module for the new strscpy_pad() function. Tie it into the
kselftest infrastructure for lib/ tests.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
kselftest runs as a userspace process. Sometimes we need to test things
from kernel space. One way of doing this is by creating a test module.
Currently doing so requires developers to write a bunch of boiler plate
in the module if kselftest is to be used to run the tests. This means
we currently have a load of duplicate code to achieve these ends. If we
have a uniform method for implementing test modules then we can reduce
code duplication, ensure uniformity in the test framework, ease code
maintenance, and reduce the work required to create tests. This all
helps to encourage developers to write and run tests.
Add a C header file that can be included in test modules. This provides
a single point for common test functions/macros. Implement a few macros
that make up the start of the test framework.
Add documentation for new kselftest header to kselftest documentation.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Currently if we wish to use kselftest to run tests within a kernel
module we write a small script to load/unload and do error reporting.
There are a bunch of these under tools/testing/selftests/lib/ that are
all identical except for the test name. We can reduce code duplication
and improve maintainability if we have one version of this. However
kselftest requires an executable for each test. We can move all the
script logic to a central script then have each individual test script
call the main script.
Oneliner to call kselftest_module.sh courtesy of Kees, thanks!
Add test runner creation script. Convert
tools/testing/selftests/lib/*.sh to use new test creation script.
Testing
-------
Configure kselftests for lib/ then build and boot kernel. Then run
kselftests as follows:
$ cd /path/to/kernel/tree
$ sudo make O=$output_path -C tools/testing/selftests TARGETS="lib" run_tests
and also
$ cd /path/to/kernel/tree
$ cd tools/testing/selftests
$ sudo make O=$output_path TARGETS="lib" run_tests
and also
$ cd /path/to/kernel/tree
$ cd tools/testing/selftests
$ sudo make TARGETS="lib" run_tests
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
Add tests for ipv6 gateway with ipv4 route. Tests include basic
single path with ping to verify connectivity and multipath.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With older nft versions, this will cause:
[..]
PASS: ipv6 ping to ns1 was ip6 NATted to ns2
/dev/stdin:4:30-31: Error: syntax error, unexpected to, expecting newline or semicolon
ip daddr 10.0.1.99 dnat ip to 10.0.2.99
^^
SKIP: inet nat tests
PASS: ip IP masquerade for ns2
[..]
as there is currently no way to detect if nft will be able to parse
the inet format.
redirect and masquerade tests need to be skipped in this case for inet
too because nft userspace has overzealous family check and rejects their
use in the inet family.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This ended up not being included in the mainline version of io_uring,
so drop it from the test app as well.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Overwrite retains the security state after completion of operation. Fix
nfit_test to reflect this so that the kernel can test the behavior it is
more likely to see in practice.
Fixes: 926f74802c ("tools/testing/nvdimm: Add overwrite support for nfit_test")
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The k/uprobe_sytax_errors test case defines a check_error() function
used to run a command and check the position of the caret in the
output.
This would be useful for other ftrace facilities too, so move it to
test.d/functions for use by anyone. In the process, rename it to
ftrace_errlog_check() and parametrize it for general use.
Link: http://lkml.kernel.org/r/9f88080a06f1755811f69081926afe7e5cb53178.1554072478.git.tom.zanussi@linux.intel.com
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add error_log testcase for error logs on probe events.
This tests most of error cases and checks the error position
is correct.
Link: http://lkml.kernel.org/r/63d695b74e0965988fa54ffa12beeb2c3475250d.1554072478.git.tom.zanussi@linux.intel.com
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[tom.zanussi@linux.intel.com: changed >& redirection to 2>]
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
after previous changes, xfrm_mode contains no function pointers anymore
and all modules defining such struct contain no code except an init/exit
functions to register the xfrm_mode struct with the xfrm core.
Just place the xfrm modes core and remove the modules,
the run-time xfrm_mode register/unregister functionality is removed.
Before:
text data bss dec filename
7523 200 2364 10087 net/xfrm/xfrm_input.o
40003 628 440 41071 net/xfrm/xfrm_state.o
15730338 6937080 4046908 26714326 vmlinux
7389 200 2364 9953 net/xfrm/xfrm_input.o
40574 656 440 41670 net/xfrm/xfrm_state.o
15730084 6937068 4046908 26714060 vmlinux
The xfrm*_mode_{transport,tunnel,beet} modules are gone.
v2: replace CONFIG_INET6_XFRM_MODE_* IS_ENABLED guards with CONFIG_IPV6
ones rather than removing them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
This is a follow up of the commit 0db6f8befc ("net/sched: fix ->get
helper of the matchall cls").
To test it:
$ cd tools/testing/selftests/tc-testing
$ ln -s ../plugin-lib/nsPlugin.py plugins/20-nsPlugin.py
$ ./tdc.py -n -e 2638
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vsprintf() in __base_pr() uses nonliteral format string and it breaks
compilation for those who provide corresponding extra CFLAGS, e.g.:
https://github.com/libbpf/libbpf/issues/27
If libbpf is built with the flags from PR:
libbpf.c:68:26: error: format string is not a string literal
[-Werror,-Wformat-nonliteral]
return vfprintf(stderr, format, args);
^~~~~~
1 error generated.
Ignore this warning since the use case in libbpf.c is legit.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This test is split in two, the first part checks if a report creates a
corresponding mdb entry and if traffic is properly forwarded to it, and
the second part checks if the mdb entry is deleted after a leave and
if traffic is *not* forwarded to it. Since the mcast querier is enabled
we should see standard mcast snooping bridge behaviour.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
architectures. These types are required to support block device and/or
file sizes larger than 2 TiB, and have generally defaulted to on for
a long time. Enabling the option only increases the i386 tinyconfig
size by 145 bytes, and many data structures already always use
64-bit values for their in-core and on-disk data structures anyway,
so there should not be a large change in dynamic memory usage either.
Dropping this option removes a somewhat weird non-default config that
has cause various bugs or compiler warnings when actually used.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Test the case when reg->smax_value is too small/big and can overflow,
and separately min and max values outside of stack bounds.
Example of output:
# ./test_verifier
#856/p indirect variable-offset stack access, unbounded OK
#857/p indirect variable-offset stack access, max out of bound OK
#858/p indirect variable-offset stack access, min out of bound OK
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Test that verifier rejects indirect stack access with variable offset in
unprivileged mode and accepts same code in privileged mode.
Since pointer arithmetics is prohibited in unprivileged mode verifier
should reject the program even before it gets to helper call that uses
variable offset, at the time when that variable offset is trying to be
constructed.
Example of output:
# ./test_verifier
...
#859/u indirect variable-offset stack access, priv vs unpriv OK
#859/p indirect variable-offset stack access, priv vs unpriv OK
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Test that verifier rejects indirect access to uninitialized stack with
variable offset.
Example of output:
# ./test_verifier
...
#859/p indirect variable-offset stack access, uninitialized OK
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This fixes the following warning seen on GCC 7.3:
arch/x86/kernel/dumpstack.o: warning: objtool: oops_end() falls through to next function show_regs()
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/3418ebf5a5a9f6ed7e80954c741c0b904b67b5dc.1554398240.git.jpoimboe@redhat.com
perf record:
Alexey Budankov:
- Implement --mmap-flush=<number> option, to control a threshold for draining
the mmap ring buffers and consequently the size of the write calls to the
output, be it perf.data, pipe mode or soon a compressor that with bigger
buffers will do a better job before dumping compressed data into a new
perf.data content mode, which is in the final steps of reviewing and testing.
perf trace:
Arnaldo Carvalho de Melo:
- Add 'string' event alias to select syscalls with string args, i.e. for testing
the BPF program used to copy those strings, allow for:
# perf trace -e string
To select all the syscalls that have things like pathnames.
- Use a PERCPU_ARRAY BPF map to copy more string bytes than what is possible using
the BPF stack, just like pioneered by the sysdig tool.
Feature detection:
Alexey Budankov:
- Implement libzstd feature check, which is a library that provides a uniform
API to various compression formats, will be used in 'perf record', see note
about --mmap-flush feature.
perf stat:
Andi Kleen:
- Implement a tool specific 'duration_time' event to allow showing the "time
elapsed" line in the default 'perf stat' output as one of the events that
can be asked for when using --field-separator and other script consumable
outputs.
Intel vendor events (JSON files):
Andi Kleen:
- Update metrics from TMAM 3.5.
- Update events:
Bonnell to V4
Broadwell-DE to v7
Broadwell to v23
BroadwellX to v14
GoldmontPlus to v1.01
Goldmont to v13
Haswell to v28
HaswellX to v20
IvyBridge to v21
IvyTown to v20
JakeTown to v20
KnightsLanding to v9
SandyBridge to v16
Silvermont to v14
Skylake to v42
SkylakeX to v1.12
IBM S/390 vendor events (JSON):
Thomas Richter:
- Fix s390 counter long description for L1D_RO_EXCL_WRITES.
tools lib traceevent:
Steven Rostedt (Red Hat):
- Add more debugging to see various internal ring buffer entries.
Steven Rostedt (VMWare):
- Handle trace_printk() "%px".
- Add mono clocks to be parsed in seconds.
- Removed unneeded !! and return parenthesis.
Tzvetomir Stoyanov :
- Implement a new API, tep_list_events_copy().
- Implement new traceevent APIs for accessing struct tep_handler fields.
- Remove tep filter trivial APIs, not used anymore.
- Remove call to exit() from tep_filter_add_filter_str(), library routines shouldn't
kill tools using it.
- Make traceevent APIs more consistent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXKNwXgAKCRCyPKLppCJ+
JypjAPsGYcZF1YFO5053kU6yo0fkVJB/3gyGQVjlAGqXcjsRnAD/cTnhni0ocQDW
hu5nRBYCnw0l4r6yfg6Y6+jXlvCZyg8=
=wiIr
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.2-20190402' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo:
perf record:
Alexey Budankov:
- Implement --mmap-flush=<number> option, to control a threshold for draining
the mmap ring buffers and consequently the size of the write calls to the
output, be it perf.data, pipe mode or soon a compressor that with bigger
buffers will do a better job before dumping compressed data into a new
perf.data content mode, which is in the final steps of reviewing and testing.
perf trace:
Arnaldo Carvalho de Melo:
- Add 'string' event alias to select syscalls with string args, i.e. for testing
the BPF program used to copy those strings, allow for:
# perf trace -e string
To select all the syscalls that have things like pathnames.
- Use a PERCPU_ARRAY BPF map to copy more string bytes than what is possible using
the BPF stack, just like pioneered by the sysdig tool.
Feature detection:
Alexey Budankov:
- Implement libzstd feature check, which is a library that provides a uniform
API to various compression formats, will be used in 'perf record', see note
about --mmap-flush feature.
perf stat:
Andi Kleen:
- Implement a tool specific 'duration_time' event to allow showing the "time
elapsed" line in the default 'perf stat' output as one of the events that
can be asked for when using --field-separator and other script consumable
outputs.
Intel vendor events (JSON files):
Andi Kleen:
- Update metrics from TMAM 3.5.
- Update events:
Bonnell to V4
Broadwell-DE to v7
Broadwell to v23
BroadwellX to v14
GoldmontPlus to v1.01
Goldmont to v13
Haswell to v28
HaswellX to v20
IvyBridge to v21
IvyTown to v20
JakeTown to v20
KnightsLanding to v9
SandyBridge to v16
Silvermont to v14
Skylake to v42
SkylakeX to v1.12
IBM S/390 vendor events (JSON):
Thomas Richter:
- Fix s390 counter long description for L1D_RO_EXCL_WRITES.
tools lib traceevent:
Steven Rostedt (Red Hat):
- Add more debugging to see various internal ring buffer entries.
Steven Rostedt (VMWare):
- Handle trace_printk() "%px".
- Add mono clocks to be parsed in seconds.
- Removed unneeded !! and return parenthesis.
Tzvetomir Stoyanov :
- Implement a new API, tep_list_events_copy().
- Implement new traceevent APIs for accessing struct tep_handler fields.
- Remove tep filter trivial APIs, not used anymore.
- Remove call to exit() from tep_filter_add_filter_str(), library routines shouldn't
kill tools using it.
- Make traceevent APIs more consistent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull networking fixes from David Miller:
1) Several hash table refcount fixes in batman-adv, from Sven
Eckelmann.
2) Use after free in bpf_evict_inode(), from Daniel Borkmann.
3) Fix mdio bus registration in ixgbe, from Ivan Vecera.
4) Unbounded loop in __skb_try_recv_datagram(), from Paolo Abeni.
5) ila rhashtable corruption fix from Herbert Xu.
6) Don't allow upper-devices to be added to vrf devices, from Sabrina
Dubroca.
7) Add qmi_wwan device ID for Olicard 600, from Bjørn Mork.
8) Don't leave skb->next poisoned in __netif_receive_skb_list_ptype,
from Alexander Lobakin.
9) Missing IDR checks in mlx5 driver, from Aditya Pakki.
10) Fix false connection termination in ktls, from Jakub Kicinski.
11) Work around some ASPM issues with r8169 by disabling rx interrupt
coalescing on certain chips. From Heiner Kallweit.
12) Properly use per-cpu qstat values on NOLOCK qdiscs, from Paolo
Abeni.
13) Fully initialize sockaddr_in structures in SCTP, from Xin Long.
14) Various BPF flow dissector fixes from Stanislav Fomichev.
15) Divide by zero in act_sample, from Davide Caratti.
16) Fix bridging multicast regression introduced by rhashtable
conversion, from Nikolay Aleksandrov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
ibmvnic: Fix completion structure initialization
ipv6: sit: reset ip header pointer in ipip6_rcv
net: bridge: always clear mcast matching struct on reports and leaves
libcxgb: fix incorrect ppmax calculation
vlan: conditional inclusion of FCoE hooks to match netdevice.h and bnx2x
sch_cake: Make sure we can write the IP header before changing DSCP bits
sch_cake: Use tc_skb_protocol() helper for getting packet protocol
tcp: Ensure DCTCP reacts to losses
net/sched: act_sample: fix divide by zero in the traffic path
net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
net: hns: Fix sparse: some warnings in HNS drivers
net: hns: Fix WARNING when remove HNS driver with SMMU enabled
net: hns: fix ICMP6 neighbor solicitation messages discard problem
net: hns: Fix probabilistic memory overwrite when HNS driver initialized
net: hns: Use NAPI_POLL_WEIGHT for hns driver
net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()
flow_dissector: rst'ify documentation
ipv6: Fix dangling pointer when ipv6 fragment
net-gro: Fix GRO flush when receiving a GSO packet.
flow_dissector: document BPF flow dissector environment
...
Given that synchronize_rcu_expedited() is supported, this commit adds
support for synchronize_srcu_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-04-04
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Batch of fixes to the existing BPF flow dissector API to support
calling BPF programs from the eth_get_headlen context (support for
latter is planned to be added in bpf-next), from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-tools:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Since, ksym_search added with verification logic for symbols existence,
it could return NULL when the kernel symbols are not loaded.
This commit will add NULL check logic after ksym_search.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently, ksym_search located at trace_helpers won't check symbols are
existing or not.
In ksym_search, when symbol is not found, it will return &syms[0](_stext).
But when the kernel symbols are not loaded, it will return NULL, which is
not a desired action.
This commit will add verification logic whether symbols are loaded prior
to the symbol search.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a test to generate 1m ld_imm64 insns to stress the verifier.
Bump the size of fill_ld_abs_vlan_push_pop test from 4k to 29k
and jump_around_ld_abs from 4k to 5.5k.
Larger sizes are not possible due to 16-bit offset encoding
in jump instructions.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add 3 basic tests that stress verifier scalability.
test_verif_scale1.c calls non-inlined jhash() function 90 times on
different position in the packet.
This test simulates network packet parsing.
jhash function is ~140 instructions and main program is ~1200 insns.
test_verif_scale2.c force inlines jhash() function 90 times.
This program is ~15k instructions long.
test_verif_scale3.c calls non-inlined jhash() function 90 times on
But this time jhash has to process 32-bytes from the packet
instead of 14-bytes in tests 1 and 2.
jhash function is ~230 insns and main program is ~1200 insns.
$ test_progs -s
can be used to see verifier stats.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow bpf_prog_load_xattr() to specify log_level for program loading.
Teach libbpf to accept log_level with bit 2 set.
Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
There is no downside to increase it to a maximum allowed by old kernels.
Existing 256k limit caused ENOSPC errors and users were not able to see
verifier error which is printed at the end of the verifier log.
If ENOSPC is hit, double the verifier log and try again to capture
the verifier error.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is a preparation for the next commit that would prohibit access to
the most fields of __sk_buff from the BPF programs.
Instead of requiring BPF flow dissector programs to look into skb,
pass all input data in the flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.
Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.
Add simple test cases with VLAN tagged frames:
* single vlan for ipv4
* double vlan for ipv6
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Having DF escape is BAD(tm).
Linus; you suggested this one, but since DF really is only used from
ASM and the failure case is fairly obvious, do we really need this?
OTOH the patch is fairly small and simple, so let's just do this
to demonstrate objtool's superior awesomeness.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is important that UACCESS regions are as small as possible;
furthermore the UACCESS state is not scheduled, so doing anything that
might directly call into the scheduler will cause random code to be
ran with UACCESS enabled.
Teach objtool too track UACCESS state and warn about any CALL made
while UACCESS is enabled. This very much includes the __fentry__()
and __preempt_schedule() calls.
Note that exceptions _do_ save/restore the UACCESS state, and therefore
they can drive preemption. This also means that all exception handlers
must have an otherwise redundant UACCESS disable instruction;
therefore ignore this warning for !STT_FUNC code (exception handlers
are not normal functions).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It turned out that we failed to detect some sibling calls;
specifically those without relocation records; like:
$ ./objdump-func.sh defconfig-build/mm/kasan/generic.o __asan_loadN
0000 0000000000000840 <__asan_loadN>:
0000 840: 48 8b 0c 24 mov (%rsp),%rcx
0004 844: 31 d2 xor %edx,%edx
0006 846: e9 45 fe ff ff jmpq 690 <check_memory_region>
So extend the cross-function jump to also consider those that are not
between known (or newly detected) parent/child functions, as
sibling-cals when they jump to the start of the function.
The second part of that condition is to deal with random jumps to the
middle of other function, as can be found in
arch/x86/lib/copy_user_64.S for example.
This then (with later patches applied) makes the above recognise the
sibling call:
mm/kasan/generic.o: warning: objtool: __asan_loadN()+0x6: call to check_memory_region() with UACCESS enabled
Also make sure to set insn->call_dest for sibling calls so we can know
who we're calling. This is useful information when printing validation
warnings later.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Really skip the original instruction flow, instead of letting it
continue with NOPs.
Since the alternative code flow already continues after the original
instructions, only the alt-original is skipped.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Function aliases result in different symbols for the same set of
instructions; track a canonical symbol so there is a unique point of
access.
This again prepares the way for function attributes. And in particular
the need for aliases comes from how KASAN uses them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation of function attributes, we need each instruction to
have a valid link back to its function.
Therefore make sure we set the function association for alternative
instruction sequences; they are, after all, still part of the function.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use standard C99 %zu for sizeof, not GCC's custom %Zu:
bpf_obj_id.c:76:48: warning: invalid conversion specifier 'Z'
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
flow_dissector_load.c:55:19: warning: format string is not a string literal (potentially insecure)
[-Wformat-security]
error(1, errno, command);
^~~~~~~
flow_dissector_load.c:55:19: note: treat the string as an argument to avoid this
error(1, errno, command);
^
"%s",
1 warning generated.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This makes sure we don't put headers as input files when doing
compilation, because clang complains about the following:
clang-9: error: cannot specify -o when generating multiple output files
../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_verifier' failed
make: *** [xxx/tools/testing/selftests/bpf/test_verifier] Error 1
make: *** Waiting for unfinished jobs....
clang-9: error: cannot specify -o when generating multiple output files
../lib.mk:152: recipe for target 'xxx/tools/testing/selftests/bpf/test_progs' failed
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Update all the Intel JSON metrics from Ahmad Yasin's TMAM 3.5
for Intel big core from Sandy Bridge to Cascade Lake.
This has many improvements and new metircs
- New TopDownL1_SMT group that provides a per SMT thread version
of --topdown that does not require -a anymore. The drawback is
increased multiplexing though since L1 TopDown does not fit into
4 generic counters anymore.
- Added SMT aware versions of other metrics
- Split SMT aware metrics into separate metrics to avoid
unnecessary event collections
- New metrics for better branch analysis:
Estimated Branch_Mispredict_Costs, Instructions per taken Branch,
Branch Instructions per Taken Branch, etc.
- Instruction mix metrics:
Instructions per load, Instructions per store, Instructions per Branch,
Instructions per Call
- New Cache metrics:
Bandwidth to L1/L2/L3 caches. L1/L2/L3 misses per kilo instructions.
memory level parallelism
- New memory controller metrics:
Normalized memory bandwidth in interval mode, Average memory latency,
Average number of parallel read requests,
- 3DXP persistent memory metrics for Cascade Lake:
3dxp read latency, 3dxp read/write bandwidth
- Some other useful metrics like Instruction Level Parallelism,
- Various other improvements.
Not all metrics are available on all CPUs. Skylake has best coverage.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lkml.kernel.org/r/20190315165219.GA21223@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement a --mmap-flush option that specifies minimal number of bytes
that is extracted from mmaped kernel buffer to store into a trace. The
default option value is 1 byte what means every time trace writing
thread finds some new data in the mmaped buffer the data is extracted,
possibly compressed and written to a trace.
$ tools/perf/perf record --mmap-flush 1024 -e cycles -- matrix.gcc
$ tools/perf/perf record --aio --mmap-flush 1K -e cycles -- matrix.gcc
The option is independent from -z setting, doesn't vary with compression
level and can serve two purposes.
The first purpose is to increase the compression ratio of a trace data.
Larger data chunks are compressed more effectively so the implemented
option allows specifying data chunk size to compress. Also at some cases
executing more write syscalls with smaller data size can take longer
than executing less write syscalls with bigger data size due to syscall
overhead so extracting bigger data chunks specified by the option value
could additionally decrease runtime overhead.
The second purpose is to avoid self monitoring live-lock issue in system
wide (-a) profiling mode. Profiling in system wide mode with compression
(-a -z) can additionally induce data into the kernel buffers along with
the data from monitored processes. If performance data rate and volume
from the monitored processes is high then trace streaming and
compression activity in the tool is also high. High tool process
activity can lead to subtle live-lock effect when compression of single
new byte from some of mmaped kernel buffer leads to generation of the
next single byte at some mmaped buffer. So perf tool process ends up in
endless self monitoring.
Implemented synch parameter is the mean to force data move independently
from the specified flush threshold value. Despite the provided flush
value the tool needs capability to unconditionally drain memory buffers,
at least in the end of the collection.
Committer testing:
Running with the default value, i.e. as soon as there is something to
read go on consuming, we first write the synthesized events, small
chunks of about 128 bytes:
# perf trace -m 2048 --call-graph dwarf -e write -- perf record
<SNIP>
101.142 ( 0.004 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x210db60, count: 120) = 120
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
process_synthesized_event (/home/acme/bin/perf)
perf_tool__process_synth_event (inlined)
perf_event__synthesize_mmap_events (/home/acme/bin/perf)
Then we move to reading the mmap buffers consuming the events put there
by the kernel perf infrastructure:
107.561 ( 0.005 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02000, count: 336) = 336
__libc_write (/usr/lib64/libpthread-2.28.so)
ion (/home/acme/bin/perf)
record__write (inlined)
record__pushfn (/home/acme/bin/perf)
perf_mmap__push (/home/acme/bin/perf)
record__mmap_read_evlist (inlined)
record__mmap_read_all (inlined)
__cmd_record (inlined)
cmd_record (/home/acme/bin/perf)
12919.953 ( 0.136 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc83150, count: 184984) = 184984
<SNIP same backtrace as in the 107.561 timestamp>
12920.094 ( 0.155 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befc02150, count: 261816) = 261816
<SNIP same backtrace as in the 107.561 timestamp>
12920.253 ( 0.093 ms): perf/25821 write(fd: 3</root/perf.data>, buf: 0x7f1befb81120, count: 170832) = 170832
<SNIP same backtrace as in the 107.561 timestamp>
If we limit it to write only when more than 16MB are available for
reading, it throttles that to a quarter of the --mmap-pages set for
'perf record', which by default get to 528384 bytes, found out using
'record -v':
mmap flush: 132096
mmap size 528384B
With that in place all the writes coming from
record__mmap_read_evlist(), i.e. from the mmap buffers setup by the
kernel perf infrastructure were at least 132096 bytes long.
Trying with a bigger mmap size:
perf trace -e write perf record -v -m 2048 --mmap-flush 16M
74982.928 ( 2.471 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff94a6cc000, count: 3580888) = 3580888
74985.406 ( 2.353 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff949ecb000, count: 3453256) = 3453256
74987.764 ( 2.629 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9496ca000, count: 3859232) = 3859232
74990.399 ( 2.341 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff948ec9000, count: 3769032) = 3769032
74992.744 ( 2.064 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9486c8000, count: 3310520) = 3310520
74994.814 ( 2.619 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff947ec7000, count: 4194688) = 4194688
74997.439 ( 2.787 ms): perf/26500 write(fd: 3</root/perf.data>, buf: 0x7ff9476c6000, count: 4029760) = 4029760
Was again limited to a quarter of the mmap size:
mmap flush: 2098176
mmap size 8392704B
A warning about that would be good to have but can be added later,
something like:
"max flush is a quarter of the mmap size, if wanting to bump the mmap
flush further, bump the mmap size as well using -m/--mmap-pages"
Also rename the 'sync' parameters to 'synch' to keep tools/perf building
with older glibcs:
cc1: warnings being treated as errors
builtin-record.c: In function 'record__mmap_read_evlist':
builtin-record.c:775: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
builtin-record.c: In function 'record__mmap_read_all':
builtin-record.c:856: warning: declaration of 'sync' shadows a global declaration
/usr/include/unistd.h:933: warning: shadowed declaration is here
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/f6600d72-ecfa-2eb7-7e51-f6954547d500@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement libzstd feature check, NO_LIBZSTD and LIBZSTD_DIR defines to
override Zstd library sources or disable the feature from the command
line:
$ make -C tools/perf LIBZSTD_DIR=/path/to/zstd/sources/ clean all
$ make -C tools/perf NO_LIBZSTD=1 clean all
Auto detection feature status is reported just before compilation
starts. If your system has some version of the zstd library
preinstalled then the build system finds and uses it during the build.
If you still prefer to compile with some other version of zstd library
you have capability to refer the compilation to that version using
LIBZSTD_DIR define.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9b4cd8b0-10a3-1f1e-8d6b-5922a7ca216b@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
"pevent" to "tep" renaming of:
- all "pevent" input arguments of libtraceevent internal functions.
- all local "pevent" variables of libtraceevent.
This makes the implementation consistent with the chosen naming
convention, tep (trace event parser), and will avoid any confusion with
the old pevent name
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-5-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.944953447@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The member "pevent" of the struct tep_event_filter is renamed to "tep".
This makes the struct consistent with the chosen naming convention:
tep (trace event parser), instead of the old pevent.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-4-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.785896189@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The member "pevent" of the struct tep_event is renamed to "tep". This
makes the struct consistent with the chosen naming convention:
tep (trace event parser), instead of the old pevent.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-3-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.627724996@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Input arguments of libtraceevent APIs are renamed from "struct
tep_handle *pevent" to "struct tep_handle *tep". This makes the API
consistent with the chosen naming convention: tep (trace event parser),
instead of the old pevent.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190401132111.13727-2-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.465573837@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename some traceevent APIs for consistency:
tep_pid_is_registered() to tep_is_pid_registered()
tep_file_bigendian() to tep_is_file_bigendian()
to make the names and return values consistent with other tep_is_... APIs
tep_data_lat_fmt() to tep_data_latency_format()
to make the name more descriptive
tep_host_bigendian() to tep_is_bigendian()
tep_set_host_bigendian() to tep_set_local_bigendian()
tep_is_host_bigendian() to tep_is_local_bigendian()
"host" can be confused with VMs, and "local" is about the local
machine. All tep_is_..._bigendian(struct tep_handle *tep) APIs return
the saved data in the tep handle, while tep_is_bigendian() returns
the running machine's endianness.
All tep_is_... functions are modified to return bool value, instead of int.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20190327141946.4353-2-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.288624897@goodmis.org
[ Removed some extra parenthesis around return statements ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch removes call to exit() from tep_filter_add_filter_str(). A
library function should not force the application to exit. In the
current implementation tep_filter_add_filter_str() calls exit() when a
special "test_filters" mode is set, used only for debugging purposes.
When this mode is set and a filter is added - its string is printed to
the console and exit() is called. This patch changes the logic - when in
"test_filters" mode, the filter string is still printed, but the exit()
is not called. It is up to the application to track when "test_filters"
mode is set and to call exit, if needed.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20190326154328.28718-9-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164344.121717482@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch removes trivial filter tep APIs:
enum tep_filter_trivial_type
tep_filter_event_has_trivial()
tep_update_trivial()
tep_filter_clear_trivial()
Trivial filters is an optimization, used only in the first version of
KernelShark. The API is deprecated, the next KernelShark release does
not use it.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20190326154328.28718-4-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164343.968458918@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As return is not a function we do not need parenthesis around the return
value. Also, a function returning bool does not need to add !!.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Link: http://lkml.kernel.org/r/20190401164343.817886725@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As struct tep_handler definition is not exposed as part of libtraceevent
API, its fields cannot be accessed directly by the library users. This
patch implements new APIs, which can be used to access the struct
tep_handler fields:
tep_get_event() - retrieves an event pointer at a specific index
tep_get_first_event() - is modified to use tep_get_event()
tep_clear_flag() - clears a tep handle flag
tep_test_flag() - test if a given flag is set
tep_get_header_timestamp_size() - returns the size of the timestamp stored
in the header.
tep_get_cpus() - returns the number of CPUs
tep_is_old_format() - returns true if data was created by an
older kernel with the old data format
tep_set_print_raw() - have the output print in the raw format
tep_set_test_filters() - debugging utility for testing tep filters
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190325145017.30246-4-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164343.679629539@goodmis.org
[ Renamed some newly added "pevent" to "tep" ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
APIs descriptions should describe the purpose of the function, its
parameters and return value. While working on man pages implementation,
I noticed mismatches in the descriptions of few APIs. This patch
changes the description of these APIs, making them consistent with the
man pages:
- tep_print_num_field()
- tep_print_func_field()
- tep_get_header_page_size()
- tep_get_long_size()
- tep_set_long_size()
- tep_get_page_size()
- tep_set_page_size()
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20190325145017.30246-2-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164343.396759247@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When trace-cmd report --debug is set, show the internal ring buffer
entries like time-extends and padding. This requires adding new kbuffer
API to retrieve these items.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Link: http://lkml.kernel.org/r/20190401164343.257591565@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Existing API tep_list_events() is not thread safe, it uses the internal
array sort_events to keep cache of the sorted events and reuses it. This
patch implements a new API, tep_list_events_copy(), which allocates new
sorted array each time it is called. It could be used when a sorted
events functionality is needed in thread safe use cases. It is up to the
caller to free the array.
Signed-off-by: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/linux-trace-devel/20181218133013.31094-1-tstoyanov@vmware.com
Link: http://lkml.kernel.org/r/20190401164343.117437443@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With security updates, %p in the kernel is hashed to protect true kernel
locations. But trace_printk() is not allowed in production systems, and
when a pointer is used, there are many times that the actual address is
needed. "%px" produces the real address. But libtraceevent does not know how
to handle that extension. Add it.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov <tstoyanov@vmware.com>
Link: http://lkml.kernel.org/r/20190401164342.837312153@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support in 'perf list' to output tool internal events, currently
only 'duration_time'.
Committer testing:
$ perf list dur*
List of pre-defined events (to be used in -e):
duration_time [Tool event]
Metric Groups:
$ perf list sw
List of pre-defined events (to be used in -e):
alignment-faults [Software event]
bpf-output [Software event]
context-switches OR cs [Software event]
cpu-clock [Software event]
cpu-migrations OR migrations [Software event]
dummy [Software event]
emulation-faults [Software event]
major-faults [Software event]
minor-faults [Software event]
page-faults OR faults [Software event]
task-clock [Software event]
duration_time [Tool event]
$ perf list | grep duration
duration_time [Tool event]
[L1D miss outstandings duration in cycles]
page walk duration are excluded in Skylake]
load. EPT page walk duration are excluded in Skylake]
page walk duration are excluded in Skylake]
store. EPT page walk duration are excluded in Skylake]
(instruction fetch) request. EPT page walk duration are excluded in
instruction fetch request. EPT page walk duration are excluded in
$
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190326221823.11518-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement printing the correct name for duration_time
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190326221823.11518-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf metric expression use 'duration_time' internally to normalize
events. Normal 'perf stat' without -x also prints the duration time.
But when using -x, the interval is not output anywhere, which is
inconvenient for any post processing which often wants to normalize
values to time.
So implement 'duration_time' as a proper perf event that can be
specified explicitely with -e.
The previous implementation of 'duration_time' only worked for metric
processing. This adds the concept of a tool event that is handled by the
tool. On the kernel level it is still mapped to the dummy software
event, but the values are not read anymore, but instead computed by the
tool.
Add proper plumbing to handle this in the event parser, and display it
in 'perf stat'. We don't want 'duration_time' to be added up, so it's
only printed for the first CPU.
% perf stat -e duration_time,cycles true
Performance counter stats for 'true':
555,476 ns duration_time
771,958 cycles
0.000555476 seconds time elapsed
0.000644000 seconds user
0.000000000 seconds sys
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190326221823.11518-3-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reverts e864c5ca14 ("perf stat: Hide internal duration_time
counter") but doing it manually since the code has now moved to a
different file.
The next patch will properly implement duration_time as a full event, so
no need to hide it anymore.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190326221823.11518-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Command
# perf list --long-desc pmu
lists the long description of the available counters. For counter
named L1D_RO_EXCL_WRITES on machine types 3906 and 3907 the long
description contains the counter number 'Counter:128 Name:'
prefix. This is wrong.
The fix changes the description text and removes this prefix.
Output before:
[root@m35lp76 perf]# ./perf list --long-desc pmu
...
L1D_ONDRAWER_L4_SOURCED_WRITES
[A directory write to the Level-1 Data cache directory where the
returned cache line was sourced from On-Drawer Level-4 cache]
L1D_RO_EXCL_WRITES
[Counter:128 Name:L1D_RO_EXCL_WRITES A directory write to the Level-1
Data cache where the line was originally in a Read-Only state in the
cache but has been updated to be in the Exclusive state that allows
stores to the cache line]
...
Output after:
[root@m35lp76 perf]# ./perf list --long-desc pmu
...
L1D_ONDRAWER_L4_SOURCED_WRITES
[A directory write to the Level-1 Data cache directory where the
returned cache line was sourced from On-Drawer Level-4 cache]
L1D_RO_EXCL_WRITES
[L1D_RO_EXCL_WRITES A directory write to the Level-1
Data cache where the line was originally in a Read-Only state in the
cache but has been updated to be in the Exclusive state that allows
stores to the cache line]
...
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes: 109d59b900 ("perf vendor events s390: Add JSON files for IBM z14")
Link: http://lkml.kernel.org/r/20190329133337.60255-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When adding the 'struct namespaces_event' to event.h, referencing the
'struct perf_ns_link_info' type, we forgot to add the header where it is
defined, getting that definition only by sheer luck.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: f3b3614a28 ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info")
Link: https://lkml.kernel.org/n/tip-qkrld0v7boc9uabjbd8csxux@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is no use for what is in that file, as everything is
built by the tools/perf/trace/beauty/rename_flags.sh script from
the copied kernel headers, the end result being:
$ cat /tmp/build/perf/trace/beauty/generated/rename_flags_array.c
static const char *rename_flags[] = {
[0 + 1] = "NOREPLACE",
[1 + 1] = "EXCHANGE",
[2 + 1] = "WHITEOUT",
};
$
I.e. no use of any defines from uapi/linux/fs.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lgugmfa8z4bpw5zsbuoitllb@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous method, copying to the BPF stack limited us in how many
bytes we could copy from strings, use a PERCPU_ARRAY map like devised by
the sysdig guys[1] to copy more bytes:
Before:
# trace --no-inherit -e openat touch `python -c "print "$s" 'a' * 2000"`
touch: cannot touch 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa': File name too long
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", O_CREAT|O_NOCTTY|O_NONBLOCK|O_WRONLY, S_IRUGO|S_IWUGO) = -1 ENAMETOOLONG (File name too long)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
<SNIP some openat calls>
#
After:
[root@quaco acme]# trace --no-inherit -e openat touch `python -c "print "$s" 'a' * 2000"`
<STRIP what is the same as in the 'before' part>
openat(AT_FDCWD, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", O_CREAT|O_NOCTTY|O_NONBLOC) = -1 ENAMETOOLONG (File name too long)
<STRIP what is the same as in the 'before' part>
If we leave something like 'perf trace -e string' to trace all syscalls
with a string, and then do some 'perf top', to get some annotation for
the augmented_raw_syscalls.o BPF program we get:
│ → callq *ffffffffc45576d1 ▒
│ augmented_args->filename.size = probe_read_str(&augmented_args->filename.value, ▒
0.05 │ mov %eax,0x40(%r13)
Looking with pahole, expanding types, asking for hex offsets and sizes,
and use of BTF type information to see what is at that 0x40 offset from
%r13:
# pahole -F btf -C augmented_args_filename --expand_types --hex /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
struct augmented_args_filename {
struct syscall_enter_args {
long long unsigned int common_tp_fields; /* 0 0x8 */
long int syscall_nr; /* 0x8 0x8 */
long unsigned int args[6]; /* 0x10 0x30 */
} args; /* 0 0x40 */
/* --- cacheline 1 boundary (64 bytes) --- */
struct augmented_filename {
unsigned int size; /* 0x40 0x4 */
int reserved; /* 0x44 0x4 */
char value[4096]; /* 0x48 0x1000 */
} filename; /* 0x40 0x1008 */
/* size: 4168, cachelines: 66, members: 2 */
/* last cacheline: 8 bytes */
};
#
Then looking if PATH_MAX leaves some signature in the tests:
│ if (augmented_args->filename.size < sizeof(augmented_args->filename.value)) { ▒
│ cmp $0xfff,%rdi
0xfff == 4095
sizeof(augmented_args->filename.value) == PATH_MAX == 4096
[1] https://sysdig.com/blog/the-art-of-writing-ebpf-programs-a-primer/
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Gianluca Borello <g.borello@gmail.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
cc: Martin Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-76gce2d2ghzq537ubwhjkone@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Gets the augmented_raw_syscalls a bit more useful as-is, add a comment
stating that the intent is to have all this in a map populated by
userspace via the 'syscalls' BPF map, that right now has only a flag
stating if the syscall is filtered or not.
With it:
# grep -B1 augmented_raw ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.o
#
# perf trace -e string
weechat/6001 stat("/etc/localtime", 0x7ffe22c23d10) = 0
gnome-shell/1943 openat(AT_FDCWD, "/proc/self/stat", O_RDONLY) = 81
weechat/6001 stat("/etc/localtime", 0x7ffe22c23d10) = 0
gmain/2475 inotify_add_watch(20<anon_inode:inotify>, "/home/acme/.config/firewall", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/var/cache/app-info/yaml", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/var/lib/app-info/xmls", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/var/lib/app-info/yaml", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/usr/share/app-info/yaml", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/usr/local/share/app-info/xmls", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/usr/local/share/app-info/yaml", 16789454) = -1 ENOENT (No such file or directory)
gmain/2391 inotify_add_watch(3<anon_inode:inotify>, "/home/acme/.local/share/app-info/yaml", 16789454) = -1 ENOENT (No such file or directory)
gmain/1121 inotify_add_watch(12<anon_inode:inotify>, "/etc/NetworkManager/VPN", 16789454) = -1 ENOENT (No such file or directory)
weechat/6001 stat("/etc/localtime", 0x7ffe22c23d10) = 0
gmain/2050 inotify_add_watch(8<anon_inode:inotify>, "/home/acme/~", 16789454) = -1 ENOENT (No such file or directory)
gmain/2521 inotify_add_watch(6<anon_inode:inotify>, "/var/lib/fwupd/remotes.d/lvfs-testing", 16789454) = -1 ENOENT (No such file or directory)
weechat/6001 stat("/etc/localtime", 0x7ffe22c23d10) = 0
DOM Worker/22714 ... [continued]: openat()) = 257
FS Broker 3982/3990 openat(AT_FDCWD, "/dev/urandom", O_RDONLY|O_CLOEXEC|O_NOCTTY) = 187
DOMCacheThread/16652 mkdir("/home/acme/.mozilla/firefox/ina67tev.default/storage/default/https+++web.whatsapp.com/cache/morgue/192", S_IRUGO|S_IXUGO|S_IWUSR) = -1 EEXIST (File exists)
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Luis Cláudio Gonçalves <lclaudio@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-a1hxffoy8t43e0wq6bzhp23u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For multiple dimensional arrays like below,
int a[2][3]
both llvm and pahole generated one BTF_KIND_ARRAY type like
. element_type: int
. index_type: unsigned int
. number of elements: 6
Such a collapsed BTF_KIND_ARRAY type will cause the divergence
in BTF vs. the user code. In the compile-once-run-everywhere
project, the header file is generated from BTF and used for bpf
program, and the definition in the header file will be different
from what user expects.
But the kernel actually supports chained multi-dimensional array
types properly. The above "int a[2][3]" can be represented as
Type #n:
. element_type: int
. index_type: unsigned int
. number of elements: 3
Type #(n+1):
. element_type: type #n
. index_type: unsigned int
. number of elements: 2
The following llvm commit
https://reviews.llvm.org/rL357215
also enables llvm to generated proper chained multi-dimensional arrays.
The test_btf already has a raw test ("struct test #1") for chained
multi-dimensional arrays. This patch added amended bpffs test for
chained multi-dimensional arrays.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
On top of this, a cleanup of kvm_para.h headers, which were exported by
some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJcoM5VAAoJEL/70l94x66DU3EH/A8sYdsfeqALWElm2Sy9TYas
mntz+oTWsl3vDy8s8zp1ET2NpF7oBlBEMmCWhVEJaD+1qW3VpTRAseR3Zr9ML9xD
k+BQM8SKv47o86ZN+y4XALl30Ckb3DXh/X1xsrV5hF6J3ofC+Ce2tF560l8C9ygC
WyHDxwNHMWVA/6TyW3mhunzuVKgZ/JND9+0zlyY1LKmUQ0BQLle23gseIhhI0YDm
B4VGIYU2Mf8jCH5Ir3N/rQ8pLdo8U7f5P/MMfgXQafksvUHJBg6B6vOhLJh94dLh
J2wixYp1zlT0drBBkvJ0jPZ75skooWWj0o3otEA7GNk/hRj6MTllgfL5SajTHZg=
=/A7u
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"A collection of x86 and ARM bugfixes, and some improvements to
documentation.
On top of this, a cleanup of kvm_para.h headers, which were exported
by some architectures even though they not support KVM at all. This is
responsible for all the Kbuild changes in the diffstat"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION
KVM: doc: Document the life cycle of a VM and its resources
KVM: selftests: complete IO before migrating guest state
KVM: selftests: disable stack protector for all KVM tests
KVM: selftests: explicitly disable PIE for tests
KVM: selftests: assert on exit reason in CR4/cpuid sync test
KVM: x86: update %rip after emulating IO
x86/kvm/hyper-v: avoid spurious pending stimer on vCPU init
kvm/x86: Move MSR_IA32_ARCH_CAPABILITIES to array emulated_msrs
KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts
kvm: don't redefine flags as something else
kvm: mmu: Used range based flushing in slot_handle_level_range
KVM: export <linux/kvm_para.h> and <asm/kvm_para.h> iif KVM is supported
KVM: x86: remove check on nr_mmu_pages in kvm_arch_commit_memory_region()
kvm: nVMX: Add a vmentry check for HOST_SYSENTER_ESP and HOST_SYSENTER_EIP fields
KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)
KVM: Reject device ioctls from processes other than the VM's creator
KVM: doc: Fix incorrect word ordering regarding supported use of APIs
KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'
KVM: nVMX: Do not inherit quadrant and invalid for the root shadow EPT
...
Pull perf tooling fixes from Thomas Gleixner:
"Core libraries:
- Fix max perf_event_attr.precise_ip detection.
- Fix parser error for uncore event alias
- Fixup ordering of kernel maps after obtaining the main kernel map
address.
Intel PT:
- Fix TSC slip where A TSC packet can slip past MTC packets so that
the timestamp appears to go backwards.
- Fixes for exported-sql-viewer GUI conversion to python3.
ARM coresight:
- Fix the build by adding a missing case value for enumeration value
introduced in newer library, that now is the required one.
tool headers:
- Syncronize kernel headers with the kernel, getting new io_uring and
pidfd_send_signal syscalls so that 'perf trace' can handle them"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf pmu: Fix parser error for uncore event alias
perf scripts python: exported-sql-viewer.py: Fix python3 support
perf scripts python: exported-sql-viewer.py: Fix never-ending loop
perf machine: Update kernel map address and re-order properly
tools headers uapi: Sync powerpc's asm/kvm.h copy with the kernel sources
tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd
tools headers uapi: Update drm/i915_drm.h
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
tools headers uapi: Sync linux/fcntl.h to get the F_SEAL_FUTURE_WRITE addition
tools headers uapi: Sync asm-generic/mman-common.h and linux/mman.h
perf evsel: Fix max perf_event_attr.precise_ip detection
perf intel-pt: Fix TSC slip
perf cs-etm: Add missing case value
Pull core fixes from Thomas Gleixner:
"A small set of core updates:
- Make the watchdog respect the selected CPU mask again. That was
broken by the rework of the watchdog thread management and caused
inconsistent state and NMI watchdog being unstoppable.
- Ensure that the objtool build can find the libelf location.
- Remove dead kcore stub code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog: Respect watchdog cpumask on CPU hotplug
objtool: Query pkg-config for libelf location
proc/kcore: Remove unused kclist_add_remap()
Add a zero key in order to standardize hardware that want a key of 0's to
be passed. Some platforms defaults to a zero-key with security enabled
rather than allow the OS to enable the security. The zero key would allow
us to manage those platform as well. This also adds a fix to secure erase
so it can use the zero key to do crypto erase. Some other security commands
already use zero keys. This introduces a standard zero-key to allow
unification of semantics cross nvdimm security commands.
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-29
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Bug fix in BTF deduplication that was mishandling an equivalence
comparison, from Andrii.
2) libbpf Makefile fixes to properly link against libelf for the shared
object and to actually export AF_XDP's xsk.h header, from Björn.
3) Fix use after free in bpf inode eviction, from Daniel.
4) Fix a bug in skb creation out of cpumap redirect, from Jesper.
5) Remove an unnecessary and triggerable WARN_ONCE() in max number
of call stack frames checking in verifier, from Paul.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull turbostat utility updates for 5.1 from Len Brown:
"Misc fixes and updates."
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Test different scenarios of indirect variable-offset stack access: out of
bound access (>0), min_off below initialized part of the stack,
max_off+size above initialized part of the stack, initialized stack.
Example of output:
...
#856/p indirect variable-offset stack access, out of bound OK
#857/p indirect variable-offset stack access, max_off+size > max_initialized OK
#858/p indirect variable-offset stack access, min_off < min_initialized OK
#859/p indirect variable-offset stack access, ok OK
...
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add 36 pedit action tests to check pedit options described in tc-pedit(8)
man page. Test cases can be specified by categories: actions, pedit,
raw_op, layered_op. RAW_OP cases check offset option for u8, u16 and u32
offset size. LAYERED_OP cases check fields option for eth, ip, ip6,
tcp and udp headers.
Include following tests:
377e - Add pedit action with RAW_OP offset u32
a0ca - Add pedit action with RAW_OP offset u32 (INVALID)
dd8a - Add pedit action with RAW_OP offset u16 u16
53db - Add pedit action with RAW_OP offset u16 (INVALID)
5c7e - Add pedit action with RAW_OP offset u8 add value
2893 - Add pedit action with RAW_OP offset u8 quad
3a07 - Add pedit action with RAW_OP offset u8-u16-u8
ab0f - Add pedit action with RAW_OP offset u16-u8-u8
9d12 - Add pedit action with RAW_OP offset u32 set u16 clear u8 invert
ebfa - Add pedit action with RAW_OP offset overflow u32 (INVALID)
f512 - Add pedit action with RAW_OP offset u16 at offmask shift set
c2cb - Add pedit action with RAW_OP offset u32 retain value
86d4 - Add pedit action with LAYERED_OP eth set src & dst
c715 - Add pedit action with LAYERED_OP eth set src (INVALID)
ba22 - Add pedit action with LAYERED_OP eth type set/clear sequence
5810 - Add pedit action with LAYERED_OP ip set src & dst
1092 - Add pedit action with LAYERED_OP ip set ihl & dsfield
02d8 - Add pedit action with LAYERED_OP ip set ttl & protocol
3e2d - Add pedit action with LAYERED_OP ip set ttl (INVALID)
31ae - Add pedit action with LAYERED_OP ip ttl clear/set
486f - Add pedit action with LAYERED_OP ip set duplicate fields
e790 - Add pedit action with LAYERED_OP ip set ce, df, mf, firstfrag,
nofrag fields
6829 - Add pedit action with LAYERED_OP beyond ip set dport & sport
afd8 - Add pedit action with LAYERED_OP beyond ip set icmp_type &
icmp_code
3143 - Add pedit action with LAYERED_OP beyond ip set dport (INVALID)
fc1f - Add pedit action with LAYERED_OP ip6 set src & dst
6d34 - Add pedit action with LAYERED_OP ip6 dst retain value (INVALID)
6f5e - Add pedit action with LAYERED_OP ip6 flow_lbl
6795 - Add pedit action with LAYERED_OP ip6 set payload_len, nexthdr,
hoplimit
1442 - Add pedit action with LAYERED_OP tcp set dport & sport
b7ac - Add pedit action with LAYERED_OP tcp sport set (INVALID)
cfcc - Add pedit action with LAYERED_OP tcp flags set
3bc4 - Add pedit action with LAYERED_OP tcp set dport, sport & flags
fields
f1c8 - Add pedit action with LAYERED_OP udp set dport & sport
d784 - Add pedit action with mixed RAW/LAYERED_OP #1
70ca - Add pedit action with mixed RAW/LAYERED_OP #2
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that when strict priority is configured on a system, the
higher-priority traffic does actually win all the available bandwidth.
The test uses a similar approach to qos_mc_aware.sh to run and account
the traffic.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract reusable code from qos_mc_aware.sh and put into a new library.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This test runs two streams of traffic from two independent ports to
create congestion on one egress port. It is necessary to configure the
shared buffer thresholds correctly, to make sure that there is traffic
from both streams in the shared buffer. Only then can the test actually
test prioritization among these streams.
Without this configuration, it is possible, that one of the streams
takes all of port-pool quota, and the other stream is not even admitted,
thus invalidating the result.
On Spectrum-1, this is not a problem, because MC traffic uses a separate
pool. But for Spectrum-2, MC and UC share the same pool, and the correct
configuration is important.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add helpers to obtain, set, and restore a pool size, and a port-pool and
tc-pool threshold.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use devlink -j and jq for more accurate querying. Use cut -f-2 instead
of rev-cut-rev combo.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't source lib.sh twice and make the script work with ifnames passed
on the command line.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Construct a basic topology consisting of two hosts connected using a
VLAN-aware bridge. Put each port in a different VLAN and test that ping
fails.
Add ingress and egress filters with a VLAN modify action and test that
ping passes.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send packets with VLAN and PCP set and check that TC flower filters can
match on these keys.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a packet is routed using a multicast route whose specified
ingress interface does not match the interface from which the packet was
received, the packet is dropped.
Add IPv4 and IPv6 test cases for above mentioned scenario.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perf fails to parse uncore event alias, for example:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
event syntax error: 'unc_m_clockticks'
\___ parser error
Current code assumes that the event alias is from one specific PMU.
To find the PMU, perf strcmps the PMU name of event alias with the real
PMU name on the system.
However, the uncore event alias may be from multiple PMUs with common
prefix. The PMU name of uncore event alias is the common prefix.
For example, UNC_M_CLOCKTICKS is clock event for iMC, which include 6
PMUs with the same prefix "uncore_imc" on a skylake server.
The real PMU names on the system for iMC are uncore_imc_0 ...
uncore_imc_5.
The strncmp is used to only check the common prefix for uncore event
alias.
With the patch:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
Performance counter stats for 'system wide':
723,594,722 unc_m_clockticks [uncore_imc_5]
724,001,954 unc_m_clockticks [uncore_imc_3]
724,042,655 unc_m_clockticks [uncore_imc_1]
724,161,001 unc_m_clockticks [uncore_imc_4]
724,293,713 unc_m_clockticks [uncore_imc_2]
724,340,901 unc_m_clockticks [uncore_imc_0]
1.002090060 seconds time elapsed
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: ea1fa48c05 ("perf stat: Handle different PMU names with common prefix")
Link: http://lkml.kernel.org/r/1552672814-156173-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Unlike python2, python3 strings are not compatible with byte strings.
That results in disassembly not working for the branches reports. Fixup
those places overlooked in the port to python3.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Link: http://lkml.kernel.org/r/20190327072826.19168-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
pyside version 1 fails to handle python3 large integers in some cases,
resulting in Qt getting into a never-ending loop. This affects:
samples Table
samples_view Table
All branches Report
Selected branches Report
Add workarounds for those cases.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Link: http://lkml.kernel.org/r/20190327072826.19168-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit 1fb87b8e95 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791 ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
2b57ecd020 ("KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()")
That don't cause any changes in the tools.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/kvm.h' differs from latest version at 'arch/powerpc/include/uapi/asm/kvm.h'
diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Link: https://lkml.kernel.org/n/tip-4pb7ywp9536hub2pnj4hu6i4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes introduced in the following csets:
2b188cc1bb ("Add io_uring IO interface")
edafccee56 ("io_uring: add support for pre-mapped user IO buffers")
3eb39f4793 ("signal: add pidfd_send_signal() syscall")
This makes 'perf trace' to become aware of these new syscalls, so that
one can use them like 'perf trace -e ui_uring*,*signal' to do a system
wide strace-like session looking at those syscalls, for instance.
For example:
# perf trace -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (383), 1208866 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------- ------- ------
io_uring_enter 605780 2955.615 0.000 0.005 33.804 1.94%
openat 4 459.446 0.004 114.861 459.435 100.00%
munmap 4 0.073 0.009 0.018 0.042 44.03%
mmap 10 0.054 0.002 0.005 0.026 43.24%
brk 28 0.038 0.001 0.001 0.003 7.51%
io_uring_setup 1 0.030 0.030 0.030 0.030 0.00%
mprotect 4 0.014 0.002 0.004 0.005 14.32%
close 5 0.012 0.001 0.002 0.004 28.87%
fstat 3 0.006 0.001 0.002 0.003 35.83%
read 4 0.004 0.001 0.001 0.002 13.58%
access 1 0.003 0.003 0.003 0.003 0.00%
lseek 3 0.002 0.001 0.001 0.001 9.00%
arch_prctl 2 0.002 0.001 0.001 0.001 0.69%
execve 1 0.000 0.000 0.000 0.000 0.00%
#
# perf trace -e io_uring* -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (390), 1191250 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------ ------ ------
io_uring_enter 597093 2706.060 0.001 0.005 14.761 1.10%
io_uring_setup 1 0.038 0.038 0.038 0.038 0.00%
#
More work needed to make the tools/perf/examples/bpf/augmented_raw_syscalls.c
BPF program to copy the 'struct io_uring_params' arguments to perf's ring
buffer so that 'perf trace' can use the BTF info put in place by pahole's
conversion of the kernel DWARF and then auto-beautify those arguments.
This patch produces the expected change in the generated syscalls table
for x86_64:
--- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before 2019-03-26 13:37:46.679057774 -0300
+++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2019-03-26 13:38:12.755990383 -0300
@@ -334,5 +334,9 @@ static const char *syscalltbl_x86_64[] =
[332] = "statx",
[333] = "io_pgetevents",
[334] = "rseq",
+ [424] = "pidfd_send_signal",
+ [425] = "io_uring_setup",
+ [426] = "io_uring_enter",
+ [427] = "io_uring_register",
};
-#define SYSCALLTBL_x86_64_MAX_ID 334
+#define SYSCALLTBL_x86_64_MAX_ID 427
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-p0ars3otuc52x5iznf21shhw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
e46c2e99f6 ("drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)")
That don't cause changes in the generated perf binaries.
To silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://lkml.kernel.org/n/tip-h6bspm1nomjnpr90333rrx7q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes from:
52f6490940 ("x86: Add TSX Force Abort CPUID/MSR")
That don't cause any changes in the generated perf binaries.
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/n/tip-zv8kw8vnb1zppflncpwfsv2w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
ab3948f58f ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lvfx5cgf0xzmdi9mcjva1ttl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To deal with the move of some defines from asm-generic/mmap-common.h to
linux/mman.h done in:
746c9398f5 ("arch: move common mmap flags to linux/mman.h")
The generated mmap_flags array stays the same:
$ tools/perf/trace/beauty/mmap_flags.sh
static const char *mmap_flags[] = {
[ilog2(0x40) + 1] = "32BIT",
[ilog2(0x01) + 1] = "SHARED",
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
[ilog2(0x8000) + 1] = "POPULATE",
[ilog2(0x10000) + 1] = "NONBLOCK",
[ilog2(0x20000) + 1] = "STACK",
[ilog2(0x40000) + 1] = "HUGETLB",
[ilog2(0x80000) + 1] = "SYNC",
};
$
And to have the system's sys/mman.h find the definition of MAP_SHARED
and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h
in a way that keeps it the same as the kernel's, need for keeping the
Android's NDK cross build working.
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After a discussion with Andi, move the perf_event_attr.precise_ip
detection for maximum precise config (via :P modifier or for default
cycles event) to perf_evsel__open().
The current detection in perf_event_attr__set_max_precise_ip() is
tricky, because precise_ip config is specific for given event and it
currently checks only hw cycles.
We now check for valid precise_ip value right after failing
sys_perf_event_open() for specific event, before any of the
perf_event_attr fallback code gets executed.
This way we get the proper config in perf_event_attr together with
allowed precise_ip settings.
We can see that code activity with -vv, like:
$ perf record -vv ls
...
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
...
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -95
decreasing precise_ip by one (2)
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
...
precise_ip 2
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 9926 cpu 0 group_fd -1 flags 0x8 = 4
...
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-dkvxxbeg7lu74155d4jhlmc9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A TSC packet can slip past MTC packets so that the timestamp appears to
go backwards. One estimate is that can be up to about 40 CPU cycles,
which is certainly less than 0x1000 TSC ticks, but accept slippage an
order of magnitude more to be on the safe side.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 79b58424b8 ("perf tools: Add Intel PT support for decoding MTC packets")
Link: http://lkml.kernel.org/r/20190325135135.18348-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following error was thrown when compiling `tools/perf` using OpenCSD
v0.11.1. This patch fixes said error.
CC util/intel-pt-decoder/intel-pt-log.o
CC util/cs-etm-decoder/cs-etm-decoder.o
util/cs-etm-decoder/cs-etm-decoder.c: In function
‘cs_etm_decoder__buffer_range’:
util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value
‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum]
switch (elem->last_i_type) {
^~~~~~
CC util/intel-pt-decoder/intel-pt-decoder.o
cc1: all warnings being treated as errors
Because `OCSD_INSTR_WFI_WFE` case was added only in v0.11.0, the minimum
required OpenCSD library version for this patch is no longer v0.10.0.
Signed-off-by: Solomon Tan <solomonbobstoner@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Suzuki K Poulouse <suzuki.poulose@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20190322052255.GA4809@w-OptiPlex-7050
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Documentation/virtual/kvm/api.txt states:
NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and
KVM_EXIT_EPR the corresponding operations are complete (and guest
state is consistent) only after userspace has re-entered the
kernel with KVM_RUN. The kernel side will first finish incomplete
operations and then check for pending signals. Userspace can
re-enter the guest with an unmasked signal pending to complete
pending operations.
Because guest state may be inconsistent, starting state migration after
an IO exit without first completing IO may result in test failures, e.g.
a proposed change to KVM's handling of %rip in its fast PIO handling[1]
will cause the new VM, i.e. the post-migration VM, to have its %rip set
to the IN instruction that triggered KVM_EXIT_IO, leading to a test
assertion due to a stage mismatch.
For simplicitly, require KVM_CAP_IMMEDIATE_EXIT to complete IO and skip
the test if it's not available. The addition of KVM_CAP_IMMEDIATE_EXIT
predates the state selftest by more than a year.
[1] https://patchwork.kernel.org/patch/10848545/
Fixes: fa3899add1 ("kvm: selftests: add basic test for state save and restore")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since 4.8.3, gcc has enabled -fstack-protector by default. This is
problematic for the KVM selftests as they do not configure fs or gs
segments (the stack canary is pulled from fs:0x28). With the default
behavior, gcc will insert a stack canary on any function that creates
buffers of 8 bytes or more. As a result, ucall() will hit a triple
fault shutdown due to reading a bad fs segment when inserting its
stack canary, i.e. every test fails with an unexpected SHUTDOWN.
Fixes: 14c47b7530 ("kvm: selftests: introduce ucall")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM selftests embed the guest "image" as a function in the test itself
and extract the guest code at runtime by manually parsing the elf
headers. The parsing is very simple and doesn't supporting fancy things
like position independent executables. Recent versions of gcc enable
pie by default, which results in triple fault shutdowns in the guest due
to the virtual address in the headers not matching up with the virtual
address retrieved from the function pointer.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
...so that the test doesn't end up in an infinite loop if it fails for
whatever reason, e.g. SHUTDOWN due to gcc inserting stack canary code
into ucall() and attempting to derefence a null segment.
Fixes: ca35906688 ("kvm: selftests: add cr4_cpuid_sync_test")
Cc: Wei Huang <wei@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Generate a libbpf.pc file at build time so that users can rely
on pkg-config to find the library, its CFLAGS and LDFLAGS.
Signed-off-by: Luca Boccassi <bluca@debian.org>
Acked-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The DPDK project is moving forward with its AF_XDP PMD, and during
that process some libbpf issues surfaced [1]: When libbpf was built
as a shared library, libelf was not included in the linking phase.
Since libelf is an internal depedency to libbpf, libelf should be
included. This patch adds '-lelf' to resolve that.
[1] https://patches.dpdk.org/patch/50704/#93571
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Suggested-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The xsk.h header file was missing from the install_headers target in
the Makefile. This patch simply adds xsk.h to the set of installed
headers.
Fixes: 1cad078842 ("libbpf: add support for using AF_XDP sockets")
Reported-by: Bruce Richardson <bruce.richardson@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pull networking fixes from David Miller:
"Fixes here and there, a couple new device IDs, as usual:
1) Fix BQL race in dpaa2-eth driver, from Ioana Ciornei.
2) Fix 64-bit division in iwlwifi, from Arnd Bergmann.
3) Fix documentation for some eBPF helpers, from Quentin Monnet.
4) Some UAPI bpf header sync with tools, also from Quentin Monnet.
5) Set descriptor ownership bit at the right time for jumbo frames in
stmmac driver, from Aaro Koskinen.
6) Set IFF_UP properly in tun driver, from Eric Dumazet.
7) Fix load/store doubleword instruction generation in powerpc eBPF
JIT, from Naveen N. Rao.
8) nla_nest_start() return value checks all over, from Kangjie Lu.
9) Fix asoc_id handling in SCTP after the SCTP_*_ASSOC changes this
merge window. From Marcelo Ricardo Leitner and Xin Long.
10) Fix memory corruption with large MTUs in stmmac, from Aaro
Koskinen.
11) Do not use ipv4 header for ipv6 flows in TCP and DCCP, from Eric
Dumazet.
12) Fix topology subscription cancellation in tipc, from Erik Hugne.
13) Memory leak in genetlink error path, from Yue Haibing.
14) Valid control actions properly in packet scheduler, from Davide
Caratti.
15) Even if we get EEXIST, we still need to rehash if a shrink was
delayed. From Herbert Xu.
16) Fix interrupt mask handling in interrupt handler of r8169, from
Heiner Kallweit.
17) Fix leak in ehea driver, from Wen Yang"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (168 commits)
dpaa2-eth: fix race condition with bql frame accounting
chelsio: use BUG() instead of BUG_ON(1)
net: devlink: skip info_get op call if it is not defined in dumpit
net: phy: bcm54xx: Encode link speed and activity into LEDs
tipc: change to check tipc_own_id to return in tipc_net_stop
net: usb: aqc111: Extend HWID table by QNAP device
net: sched: Kconfig: update reference link for PIE
net: dsa: qca8k: extend slave-bus implementations
net: dsa: qca8k: remove leftover phy accessors
dt-bindings: net: dsa: qca8k: support internal mdio-bus
dt-bindings: net: dsa: qca8k: fix example
net: phy: don't clear BMCR in genphy_soft_reset
bpf, libbpf: clarify bump in libbpf version info
bpf, libbpf: fix version info and add it to shared object
rxrpc: avoid clang -Wuninitialized warning
tipc: tipc clang warning
net: sched: fix cleanup NULL pointer exception in act_mirr
r8169: fix cable re-plugging issue
net: ethernet: ti: fix possible object reference leak
net: ibm: fix possible object reference leak
...
This patch adds specific test exposing bug in btf_dedup_is_equiv() when
comparing candidate VOID type to a non-VOID canonical type. It's
important for canonical type to be anonymous, otherwise name equality
check will do the right thing and will exit early.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
btf_dedup_is_equiv() used to compare btf_type->info fields, before doing
kind-specific equivalence check. This comparsion implicitly verified
that candidate and canonical types are of the same kind. With enum fwd
resolution logic this check couldn't be done generically anymore, as for
enums info contains vlen, which differs between enum fwd and
fully-defined enum, so this check was subsumed by kind-specific
equivalence checks.
This change caused btf_dedup_is_equiv() to let through VOID vs other
types check to reach switch, which was never meant to be handing VOID
kind, as VOID kind is always pre-resolved to itself and is only
equivalent to itself, which is checked early in btf_dedup_is_equiv().
This change adds back BTF kind equality check in place of more generic
btf_type->info check, still defering further kind-specific checks to
a per-kind switch.
Fixes: 9768095ba9 ("btf: resolve enum fwds in btf_dedup")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adrian reports that 'make -C tools clean' results in removal of the
livepatch selftest shell scripts.
As per the selftest lib.mk file, TEST_PROGS are for test shell scripts,
not TEST_GEN_PROGS. Adjust the livepatch selftest Makefile accordingly.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
The scripting must supply the CONFIG_INITRAMFS_SOURCE Kconfig option
so that kbuild can find the desired initrd, but the configcheck.sh
script gets confused by this option because it takes a string instead
of the expected y/n/m. This causes checkconfig.sh to complain about
CONFIG_INITRAMFS_SOURCE in the torture-test output (though not in the
summary). As more people use rcutorture, the resulting confusion is
an increasing concern.
This commit therefore suppresses this false-positive warning by filtering
CONFIG_INITRAMFS_SOURCE from within the checkconfig.sh script.
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address and add copyright notices.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This patch adds a test case with an excessive number of call stack frames
in dead code.
Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
Tested-by: Xiao Han <xiao.han@orange.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When running stacktrace_build_id_nmi, try to query
kernel.perf_event_max_sample_rate sysctl and use it as a sample_freq.
If there was an error reading sysctl, fallback to 5000.
kernel.perf_event_max_sample_rate sysctl can drift and/or can be
adjusted by the perf tool, so assuming a fixed number might be
problematic on a long running machine.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_tc_tunnel.sh sets up a pair of namespaces connected by a
veth pair to verify encap/decap using bpf_skb_adjust_room. In
testing this, it uses tunnel links as the peer of the bpf-based
encap/decap. However because the same IP header is used for inner
and outer IP, when packets arrive at the tunnel interface they will
be dropped by reverse path filtering as those packets are expected
on the veth interface (where the destination IP of the decapped
packet is configured).
To avoid this, ensure reverse path filtering is disabled for the
namespace using tunneling.
Fixes: 98cdabcd07 ("selftests/bpf: bpf tunnel encap test")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Alexei Starovoitov says:
====================
pull-request: bpf 2019-03-24
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) libbpf verision fix up from Daniel.
2) fix liveness propagation from Jakub.
3) fix verbose print of refcounted regs from Martin.
4) fix for large map allocations from Martynas.
5) fix use after free in sanitize_ptr_alu from Xu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current documentation suggests that we would need to bump the
libbpf version on every change. Lets clarify this a bit more and
reflect what we do today in practice, that is, bumping it once per
development cycle.
Fixes: 76d1b894c5 ("libbpf: Document API and ABI conventions")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Even though libbpf's versioning script for the linker (libbpf.map)
is pointing to 0.0.2, the BPF_EXTRAVERSION in the Makefile has
not been updated along with it and is therefore still on 0.0.1.
While fixing up, I also noticed that the generated shared object
versioning information is missing, typical convention is to have
a linker name (libbpf.so), soname (libbpf.so.0) and real name
(libbpf.so.0.0.2) for library management. This is based upon the
LIBBPF_VERSION as well.
The build will then produce the following bpf libraries:
# ll libbpf*
libbpf.a
libbpf.so -> libbpf.so.0.0.2
libbpf.so.0 -> libbpf.so.0.0.2
libbpf.so.0.0.2
# readelf -d libbpf.so.0.0.2 | grep SONAME
0x000000000000000e (SONAME) Library soname: [libbpf.so.0]
And install them accordingly:
# rm -rf /tmp/bld; mkdir /tmp/bld; make -j$(nproc) O=/tmp/bld install
Auto-detecting system features:
... libelf: [ on ]
... bpf: [ on ]
CC /tmp/bld/libbpf.o
CC /tmp/bld/bpf.o
CC /tmp/bld/nlattr.o
CC /tmp/bld/btf.o
CC /tmp/bld/libbpf_errno.o
CC /tmp/bld/str_error.o
CC /tmp/bld/netlink.o
CC /tmp/bld/bpf_prog_linfo.o
CC /tmp/bld/libbpf_probes.o
CC /tmp/bld/xsk.o
LD /tmp/bld/libbpf-in.o
LINK /tmp/bld/libbpf.a
LINK /tmp/bld/libbpf.so.0.0.2
LINK /tmp/bld/test_libbpf
INSTALL /tmp/bld/libbpf.a
INSTALL /tmp/bld/libbpf.so.0.0.2
# ll /usr/local/lib64/libbpf.*
/usr/local/lib64/libbpf.a
/usr/local/lib64/libbpf.so -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0 -> libbpf.so.0.0.2
/usr/local/lib64/libbpf.so.0.0.2
Fixes: 1bf4b05810 ("tools: bpftool: add probes for eBPF program types")
Fixes: 1b76c13e4b ("bpf tools: Introduce 'bpf' library and add bpf feature check")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...
Pull core fixes from Thomas Gleixner:
"Two small fixes:
- Move the large objtool_file struct off the stack so objtool works
in setups with a tight stack limit.
- Make a few variables static in the watchdog core code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Make variables static
objtool: Move objtool_file struct off the stack
Add a small test that shows how to shape a TCP flow in tc-bpf
with EDT and ECN.
Signed-off-by: Peter Oskolkov <posk@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Vendor events:
Mamatha Inamdar:
- Remove P8 HW events which are not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXJOmigAKCRCyPKLppCJ+
J+EPAQDNzH1M3uJ6cOhyzAMowpsl0Dgs0Q+5iNlOnDYVr2RfhgEA2Sr2fQyl/qiG
h6jRbzvdE+PTXbcMNO79ajmufAHdLgQ=
=DuTU
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.1-20190321' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo:
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Vendor events:
Mamatha Inamdar:
- Remove P8 HW events which are not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
kernel:
Stephane Eranian :
- Restore mmap record type correctly when handling PERF_RECORD_MMAP2
events, as the same template is used for all the threads interested
in mmap events, some may want just PERF_RECORD_MMAP, while some
may want the extra info in MMAP2 records.
perf probe:
Adrian Hunter:
- Fix getting the kernel map, because since changes related to x86 PTI
entry trampolines handling, there are more than one kernel map.
perf script:
Andi Kleen:
- Support insn output for normal samples, i.e.:
perf script -F ip,sym,insn --xed
Will fetch the sample IP from the thread address space and feed it
to Intel's XED disassembler, producing lines such as:
ffffffffa4068804 native_write_msr wrmsr
ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx
That match 'perf annotate's output.
- Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
addition to PERF_RECORD_SAMPLE.
perf report:
- Add a new --samples option to save a small random number of samples
per hist entry, using a reservoir technique to select a representative
number of samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or CPU of the sample is displayed. Then we use less' search
functionality to directly jump to the time stamp of the selected sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
- Fix the UI browser scripts pop up menu when there are many scripts
available.
perf report:
Andi Kleen:
- Add 'time' sort option. E.g.:
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
tools headers:
Arnaldo Carvalho de Melo:
- Update x86's syscall_64.tbl, no change in tools/perf behaviour.
- Sync copies asm-generic/unistd.h and linux/in with the kernel sources.
perf data:
Jiri Olsa:
- Prep work to support having perf.data stored as a directory, with one
file per CPU, that ultimately will allow having one ring buffer reading
thread per CPU.
Vendor events:
Martin Liška:
- perf PMU events for AMD Family 17h.
perf script python:
Tony Jones:
- Add python3 support for the remaining Intel PT related scripts, with
these we should have a clean build of perf with python3 while still
supporting the build with python2.
libbpf:
Arnaldo Carvalho de Melo:
- Fix the build on uCLibc, adding the missing stdarg.h since we use
va_list in one typedef.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXIbMlgAKCRCyPKLppCJ+
J/fzAQDNlP1cEuryAfWCDZ/sf5N/76srvkt/kIyYO0CliCjiBAEAiHRWrhsNs1Gd
Z8626lCTYt7BTdz5yfTb7gbt/n7xNAY=
=Ycye
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.1-20190311' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/core improvements and fixes from Arnaldo:
kernel:
Stephane Eranian :
- Restore mmap record type correctly when handling PERF_RECORD_MMAP2
events, as the same template is used for all the threads interested
in mmap events, some may want just PERF_RECORD_MMAP, while some
may want the extra info in MMAP2 records.
perf probe:
Adrian Hunter:
- Fix getting the kernel map, because since changes related to x86 PTI
entry trampolines handling, there are more than one kernel map.
perf script:
Andi Kleen:
- Support insn output for normal samples, i.e.:
perf script -F ip,sym,insn --xed
Will fetch the sample IP from the thread address space and feed it
to Intel's XED disassembler, producing lines such as:
ffffffffa4068804 native_write_msr wrmsr
ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx
That match 'perf annotate's output.
- Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
addition to PERF_RECORD_SAMPLE.
perf report:
- Add a new --samples option to save a small random number of samples
per hist entry, using a reservoir technique to select a representative
number of samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or CPU of the sample is displayed. Then we use less' search
functionality to directly jump to the time stamp of the selected sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
- Fix the UI browser scripts pop up menu when there are many scripts
available.
perf report:
Andi Kleen:
- Add 'time' sort option. E.g.:
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
tools headers:
Arnaldo Carvalho de Melo:
- Update x86's syscall_64.tbl, no change in tools/perf behaviour.
- Sync copies asm-generic/unistd.h and linux/in with the kernel sources.
perf data:
Jiri Olsa:
- Prep work to support having perf.data stored as a directory, with one
file per CPU, that ultimately will allow having one ring buffer reading
thread per CPU.
Vendor events:
Martin Liška:
- perf PMU events for AMD Family 17h.
perf script python:
Tony Jones:
- Add python3 support for the remaining Intel PT related scripts, with
these we should have a clean build of perf with python3 while still
supporting the build with python2.
libbpf:
Arnaldo Carvalho de Melo:
- Fix the build on uCLibc, adding the missing stdarg.h since we use
va_list in one typedef.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make the tests correctly annotate skbs with tunnel metadata.
This makes the gso tests succeed. Enable them.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Lower route MTU to ensure packets fit in device MTU after encap, then
skip the gso_size changes.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Avoid moving the network layer header when prefixing tunnel headers.
This avoids an explicit call to bpf_skb_store_bytes and an implicit
move of the network header bytes in bpf_skb_adjust_room.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sync include/uapi/linux/bpf.h with tools/
Changes
v1->v2:
- BPF_F_ADJ_ROOM_MASK moved, no longer in this commit
v2->v3:
- BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved, no longer in this commit
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Segmentation offload takes a longer path. Verify that the feature
works with large packets.
The test succeeds if not setting dodgy in bpf_skb_adjust_room, as veth
TSO is permissive.
If not setting SKB_GSO_DODGY, this enables tunneled TSO offload on
supporting NICs.
The feature sets SKB_GSO_DODGY because the caller is untrusted. As a
result the packets traverse through the gso stack at least up to TCP.
And fail the gso_type validation, such as the skb->encapsulation check
in gre_gso_segment and the gso_type checks introduced in commit
418e897e07 ("gso: validate gso_type on ipip style tunnel").
This will be addressed in a follow-on feature patch. In the meantime,
disable the new gso tests.
Changes v1->v2:
- not all netcat versions support flag '-q', use timeout instead
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
GRE is a commonly used protocol. Add GRE cases for both IPv4 and IPv6.
It also inserts different sized headers, which can expose some
unexpected edge cases.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test only uses ipv4 so far, expand to ipv6.
This is mostly a boilerplate near copy of the ipv4 path.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The bpf tunnel test encapsulates using bpf, then decapsulates using
a standard tunnel device to verify correctness.
Once encap is verified, also test decap, by replacing the tunnel
device on decap with another bpf program.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Validate basic tunnel encapsulation using ipip.
Set up two namespaces connected by veth. Connect a client and server.
Do this with and without bpf encap.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This makes it easier to use pcitest in automated setups.
Signed-off-by: Jean-Jacques Hiblot <jjhiblot@ti.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
Commit 7640ead939 ("bpf: verifier: make sure callees don't prune
with caller differences") connected up parentage chains of all
frames of the stack. It didn't, however, ensure propagate_liveness()
propagates all liveness information along those chains.
This means pruning happening in the callee may generate explored
states with incomplete liveness for the chains in lower frames
of the stack.
The included selftest is similar to the prior one from commit
7640ead939 ("bpf: verifier: make sure callees don't prune with
caller differences"), where callee would prune regardless of the
difference in r8 state.
Now we also initialize r9 to 0 or 1 based on a result from get_random().
r9 is never read so the walk with r9 = 0 gets pruned (correctly) after
the walk with r9 = 1 completes.
The selftest is so arranged that the pruning will happen in the
callee. Since callee does not propagate read marks of r8, the
explored state at the pruning point prior to the callee will
now ignore r8.
Propagate liveness on all frames of the stack when pruning.
Fixes: f4d7e40a5b ("bpf: introduce function calls (verification)")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add tests which verify that the new helpers work for both IPv4 and
IPv6, by forcing SYN cookies to always on. Use a new network namespace
to avoid clobbering the global SYN cookie settings.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make sure that returning a struct sock_common * reference invokes
the reference tracking machinery in the verifier.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make the BPF_SK_LOOKUP macro take a helper function, to ease
writing tests for new helpers.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch enables showing bpf program name, address, and size in the
header.
Before the patch:
perf report --header-only
...
# bpf_prog_info of id 9
# bpf_prog_info of id 10
# bpf_prog_info of id 13
After the patch:
# bpf_prog_info 9: bpf_prog_7be49e3934a125ba addr 0xffffffffa0024947 size 229
# bpf_prog_info 10: bpf_prog_2a142ef67aaad174 addr 0xffffffffa007c94d size 229
# bpf_prog_info 13: bpf_prog_47368425825d7384_task__task_newt addr 0xffffffffa0251137 size 369
Committer notes:
Fix the fallback definition when HAVE_LIBBPF_SUPPORT is not defined,
i.e. add the missing 'static inline' and add the __maybe_unused to the
args. Also add stdio.h since we now use FILE * in bpf-event.h.
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extract logic to create program names to synthesize_bpf_prog_name(), so
that it can be reused in header.c:print_bpf_prog_info().
This commit doesn't change the behavior.
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To fully annotate BPF programs with source code mapping, 4 different
information are needed:
1) PERF_RECORD_KSYMBOL
2) PERF_RECORD_BPF_EVENT
3) bpf_prog_info
4) btf
This patch handles 3) and 4) for BPF programs loaded after 'perf
record|top'.
For timely process of these information, a dedicated event is added to
the side band evlist.
When PERF_RECORD_BPF_EVENT is received via the side band event, the
polling thread gathers 3) and 4) vis sys_bpf and store them in perf_env.
This information is saved to perf.data at the end of 'perf record'.
Committer testing:
The 'wakeup_watermark' member in 'struct perf_event_attr' is inside a
unnamed union, so can't be used in a struct designated initialization
with older gccs, get it out of that, isolating as 'attr.wakeup_watermark
= 1;' to work with all gcc versions.
We also need to add '--no-bpf-event' to the 'perf record'
perf_event_attr tests in 'perf test', as the way that that test goes is
to intercept the events being setup and looking if they match the fields
described in the control files, since now it finds first the side band
event used to catch the PERF_RECORD_BPF_EVENT, they all fail.
With these issues fixed:
Same scenario as for testing BPF programs loaded before 'perf record' or
'perf top' starts, only start the BPF programs after 'perf record|top',
so that its information get collected by the sideband threads, the rest
works as for the programs loaded before start monitoring.
Add missing 'inline' to the bpf_event__add_sb_event() when
HAVE_LIBBPF_SUPPORT is not defined, fixing the build in systems without
binutils devel files installed.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-16-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces side band thread that captures extended
information for events like PERF_RECORD_BPF_EVENT.
This new thread uses its own evlist that uses ring buffer with very low
watermark for lower latency.
To use side band thread, we need to:
1. add side band event(s) by calling perf_evlist__add_sb_event();
2. calls perf_evlist__start_sb_thread();
3. at the end of perf run, perf_evlist__stop_sb_thread().
In the next patch, we use this thread to handle PERF_RECORD_BPF_EVENT.
Committer notes:
Add fix by Jiri Olsa for when te sb_tread can't get started and then at
the end the stop_sb_thread() segfaults when joining the (non-existing)
thread.
That can happen when running 'perf top' or 'perf record' as a normal
user, for instance.
Further checks need to be done on top of this to more graciously handle
these possible failure scenarios.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-15-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Objtool uses over 512k of stack, thanks to the hash table embedded in
the objtool_file struct. This causes an unnecessarily large stack
allocation and breaks users with low stack limits.
Move the struct off the stack.
Fixes: 042ba73fe7 ("objtool: Add several performance improvements")
Reported-by: Vassili Karpov <moosotc@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/df92dcbc4b84b02ffa252f46876df125fb56e2d7.1552954176.git.jpoimboe@redhat.com
eBPF "restricted C" code can be compiled with LLVM/clang using target
triplets like armv7l-unknown-linux-gnueabihf and loaded/run with small
cross-compiled gobpf/elf [1] programs without requiring a full BCC
port which is also undesirable on small embedded systems due to its
size footprint. The only missing pieces are these helper macros which
otherwise have to be redefined by each eBPF arm program.
[1] https://github.com/iovisor/gobpf/tree/master/elf
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On some systems /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
or /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
return a file error because of bad ACPI LPIT data from a misconfigured BIOS.
turbostat interprets this failure as a fatal error and outputs
turbostat: CPU LPI: No data available
If the ACPI LPIT sysfs files return an error output a warning instead of
a fatal error, disable the ACPI LPIT evaluation code, and continue.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Most calls to fgets() and fscanf() are followed by error checks.
Add an exit-on-error in the remaining cases.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Len Brown <len.brown@intel.com>
The package power can also be read from an MSR. It's not clear exactly
what is included, and whether it's aggregated over all nodes or
reported separately.
It does look like this is reported separately per CCX (I get a single
value on the Ryzen R7 1700), but it might be reported separately per-
die (node?) on larger processors. If that's the case, it would have to
be recorded per node and aggregated for the socket.
Note that although Zen has these MSRs reporting power, it looks like
the actual RAPL configuration (power limits, configured TDP) is done
through PCI configuration space. I have not yet found any public
documentation for this.
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Based on the Open-Source Register Reference for AMD Family 17h
Processors Models 00h-2Fh:
https://support.amd.com/TechDocs/56255_OSRR.pdf
These processors report RAPL support in bit 14 of CPUID 0x80000007 EDX,
and the following MSRs are present:
0xc0010299 (RAPL_PWR_UNIT), like Intel's RAPL_POWER_UNIT
0xc001029a (CORE_ENERGY_STAT), kind of like Intel's PP0_ENERGY_STATUS
0xc001029b (PKG_ENERGY_STAT), like Intel's PKG_ENERGY_STATUS
A notable difference from the Intel implementation is that AMD reports
the "Cores" energy usage separately for each core, rather than a
per-package total. The code has been adjusted to handle either case in a
generic way.
I haven't yet enabled collection of package power, due to being unable
to test it on multi-node systems (TR, EPYC).
Signed-off-by: Calvin Walton <calvin.walton@kepstin.ca>
Signed-off-by: Len Brown <len.brown@intel.com>
Running without a cpufreq driver is a valid case so warnings output in
this case should not be to stderr.
Use outf instead of stderr for these warnings.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat executes on CPUs in "topology order".
This is an optimization for measuring profoundly idle systems --
as the closest hardware is woken next...
Fix a typo that was added with the sub-die-node support,
that broke topology ordering on multi-node systems.
Signed-off-by: Len Brown <len.brown@intel.com>
Commit 003ca0fd2286 ("Refactor disassembler selection") in the binutils
repo, which changed the disassembler() function signature, so we must
use the feature test introduced in fb982666e3 ("tools/bpftool: fix
bpftool build with bintutils >= 2.9") to deal with that.
Committer testing:
After adding the missing function call to test-all.c, and:
FEATURE_CHECK_LDFLAGS-disassembler-four-args = -bfd -lopcodes
And the fallbacks for cases where we need -liberty and sometimes -lz to
tools/perf/Makefile.config, we get:
$ make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory '/home/acme/git/perf/tools/perf'
BUILD: Doing 'make -j8' parallel build
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ]
... libbfd: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libslang: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... disassembler-four-args: [ on ]
CC /tmp/build/perf/jvmti/libjvmti.o
CC /tmp/build/perf/builtin-bench.o
<SNIP>
$
$
The feature detection test-all.bin gets successfully built and linked:
$ ls -la /tmp/build/perf/feature/test-all.bin
-rwxrwxr-x. 1 acme acme 2680352 Mar 19 11:07 /tmp/build/perf/feature/test-all.bin
$ nm /tmp/build/perf/feature/test-all.bin | grep -w disassembler
0000000000061f90 T disassembler
$
Time to move on to the patches that make use of this disassembler()
routine in binutils's libopcodes.
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
[ split from a larger patch, added missing FEATURE_CHECK_LDFLAGS-disassembler-four-args ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds processing of PERF_BPF_EVENT_PROG_LOAD, which sets
proper DSO type/id/etc of memory regions mapped to BPF programs to
DSO_BINARY_TYPE__BPF_PROG_INFO.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-14-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a new dso type DSO_BINARY_TYPE__BPF_PROG_INFO for BPF programs. In
symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso will call into a new
function symbol__disassemble_bpf() in an upcoming patch, where annotation line
information is filled based bpf_prog_info and btf saved in given perf_env.
Committer notes:
Removed the unnamed union with 'bpf_prog' and 'cache' in 'struct dso',
to fix this bug when exiting 'perf top':
# perf top
perf: Segmentation fault
-------- backtrace --------
perf[0x5a785a]
/lib64/libc.so.6(+0x385bf)[0x7fd68443c5bf]
perf(rb_first+0x2b)[0x4d6eeb]
perf(dso__delete+0xb7)[0x4dffb7]
perf[0x4f9e37]
perf(perf_session__delete+0x64)[0x504df4]
perf(cmd_top+0x1957)[0x454467]
perf[0x4aad18]
perf(main+0x61c)[0x42ec7c]
/lib64/libc.so.6(__libc_start_main+0xf2)[0x7fd684428412]
perf(_start+0x2d)[0x42eead]
#
# addr2line -fe ~/bin/perf 0x4dffb7
dso_cache__free
/home/acme/git/perf/tools/perf/util/dso.c:713
That is trying to access the dso->data.cache, and that is not used with
BPF programs, so we end up accessing what is in bpf_prog.first_member,
b00m.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Both libbfd and libopcodes are distributed with binutil-dev/devel. When
libbfd is present, it is OK to assume that libopcodes also present. This
has been a safe assumption for bpftool.
This patch adds -lopcodes to perf/Makefile.config. libopcodes will be
used in the next commit for BPF annotation.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-12-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch enables 'perf record' to save BTF information as headers to
perf.data.
A new header type HEADER_BPF_BTF is introduced for this data.
Committer testing:
As root, being on the kernel sources top level directory, run:
# perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c -e *msg
Just to compile and load a BPF program that attaches to the
raw_syscalls:sys_{enter,exit} tracepoints to trace the syscalls ending
in "msg" (recvmsg, sendmsg, recvmmsg, sendmmsg, etc).
Make sure you have a recent enough clang, say version 9, to get the
BTF ELF sections needed for this testing:
# clang --version | head -1
clang version 9.0.0 (https://git.llvm.org/git/clang.git/ 7906282d3afec5dfdc2b27943fd6c0309086c507) (https://git.llvm.org/git/llvm.git/ a1b5de1ff8ae8bc79dc8e86e1f82565229bd0500)
# readelf -SW tools/perf/examples/bpf/augmented_raw_syscalls.o | grep BTF
[22] .BTF PROGBITS 0000000000000000 000ede 000b0e 00 0 0 1
[23] .BTF.ext PROGBITS 0000000000000000 0019ec 0002a0 00 0 0 1
[24] .rel.BTF.ext REL 0000000000000000 002fa8 000270 10 30 23 8
Then do a systemwide perf record session for a few seconds:
# perf record -a sleep 2s
Then look at:
# perf report --header-only | grep b[pt]f
# event : name = cycles:ppp, , id = { 1116204, 1116205, 1116206, 1116207, 1116208, 1116209, 1116210, 1116211 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 51
# bpf_prog_info of id 52
# btf info of id 8
#
We need to show more info about these BPF and BTF entries , but that can
be done later.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-10-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
BTF contains information necessary to annotate BPF programs. This patch
saves BTF for BPF programs loaded in the system.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-9-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch enables perf-record to save bpf_prog_info information as
headers to perf.data. A new header type HEADER_BPF_PROG_INFO is
introduced for this data.
Committer testing:
As root, being on the kernel sources top level directory, run:
# perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c -e *msg
Just to compile and load a BPF program that attaches to the
raw_syscalls:sys_{enter,exit} tracepoints to trace the syscalls ending
in "msg" (recvmsg, sendmsg, recvmmsg, sendmmsg, etc).
Then do a systemwide perf record session for a few seconds:
# perf record -a sleep 2s
Then look at:
# perf report --header-only | grep -i bpf
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 208
# bpf_prog_info of id 209
#
We need to show more info about these programs, like bpftool does for
the ones running on the system, i.e. 'perf record/perf report' become a
way of saving the BPF state in a machine to then analyse on another,
together with all the other information that is already saved in the
perf.data header:
# perf report --header-only
# ========
# captured on : Tue Mar 12 11:42:13 2019
# header version : 1
# data offset : 296
# data size : 16294184
# feat offset : 16294480
# hostname : quaco
# os release : 5.0.0+
# perf version : 5.0.gd783c8
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
# cpuid : GenuineIntel,6,142,10
# total memory : 24555720 kB
# cmdline : /home/acme/bin/perf (deleted) record -a
# event : name = cycles:ppp, , id = { 3190123, 3190124, 3190125, 3190126, 3190127, 3190128, 3190129, 3190130 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: intel_pt = 8, software = 1, power = 11, uprobe = 7, uncore_imc = 12, cpu = 4, cstate_core = 18, uncore_cbox_2 = 15, breakpoint = 5, uncore_cbox_0 = 13, tracepoint = 2, cstate_pkg = 19, uncore_arb = 17, kprobe = 6, i915 = 10, msr = 9, uncore_cbox_3 = 16, uncore_cbox_1 = 14
# CACHE info available, use -I to display
# time of first sample : 116392.441701
# time of last sample : 116400.932584
# sample duration : 8490.883 ms
# MEM_TOPOLOGY info available, use -I to display
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 208
# bpf_prog_info of id 209
# missing features: TRACING_DATA BRANCH_STACK GROUP_DESC AUXTRACE STAT CLOCKID DIR_FORMAT
# ========
#
Committer notes:
We can't use the libbpf unconditionally, as the build may have been with
NO_LIBBPF, when we end up with linking errors, so provide dummy
{process,write}_bpf_prog_info() wrapped by HAVE_LIBBPF_SUPPORT for that
case.
Printing are not affected by this, so can continue as is.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-8-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
bpf_prog_info contains information necessary to annotate bpf programs.
This patch saves bpf_prog_info for bpf programs loaded in the system.
Some big picture of the next few patches:
To fully annotate BPF programs with source code mapping, 4 different
informations are needed:
1) PERF_RECORD_KSYMBOL
2) PERF_RECORD_BPF_EVENT
3) bpf_prog_info
4) btf
Before this set, 1) and 2) in the list are already saved to perf.data
file. For BPF programs that are already loaded before perf run, 1) and 2)
are synthesized by perf_event__synthesize_bpf_events(). For short living
BPF programs, 1) and 2) are generated by kernel.
This set handles 3) and 4) from the list. Again, it is necessary to handle
existing BPF program and short living program separately.
This patch handles 3) for exising BPF programs while synthesizing 1) and
2) in perf_event__synthesize_bpf_events(). These data are stored in
perf_env. The next patch saves these data from perf_env to perf.data as
headers.
Similarly, the two patches after the next saves 4) of existing BPF
programs to perf_env and perf.data.
Another patch later will handle 3) and 4) for short living BPF programs
by monitoring 1) and 2) in a dedicate thread.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-7-songliubraving@fb.com
[ set env->bpf_progs.infos_cnt to zero in perf_env__purge_bpf() as noted by jolsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch changes the arguments of perf_event__synthesize_bpf_events()
to include perf_session* instead of perf_tool*. perf_session will be
used in the next patch.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/20190312053051.2690567-6-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With bpf_program__get_prog_info_linear, we can simplify the logic that
synthesizes bpf events.
This patch doesn't change the behavior of the code.
Commiter notes:
Needed this (for all four variables), suggested by Song, to overcome
build failure on debian experimental cross building to MIPS 32-bit:
- u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(info->prog_tags);
+ u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(uintptr_t)(info->prog_tags);
util/bpf-event.c: In function 'perf_event__synthesize_one_bpf_prog':
util/bpf-event.c:143:35: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
u8 (*prog_tags)[BPF_TAG_SIZE] = (void *)(info->prog_tags);
^
util/bpf-event.c:144:22: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
__u32 *prog_lens = (__u32 *)(info->jited_func_lens);
^
util/bpf-event.c:145:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
__u64 *prog_addrs = (__u64 *)(info->jited_ksyms);
^
util/bpf-event.c:146:22: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
void *func_infos = (void *)(info->func_info);
^
cc1: all warnings being treated as errors
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: kernel-team@fb.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-5-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, bpf_prog_info includes 9 arrays. The user has the option to
fetch any combination of these arrays. However, this requires a lot of
handling.
This work becomes more tricky when we need to store bpf_prog_info to a
file, because these arrays are allocated independently.
This patch introduces 'struct bpf_prog_info_linear', which stores arrays
of bpf_prog_info in continuous memory.
Helper functions are introduced to unify the work to get different sets
of bpf_prog_info. Specifically, bpf_program__get_prog_info_linear()
allows the user to select which arrays to fetch, and handles details for
the user.
Please see the comments right before 'enum bpf_prog_info_array' for more
details and examples.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lkml.kernel.org/r/ce92c091-e80d-a0c1-4aa0-987706c42b20@iogearbox.net
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: kernel-team@fb.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, monitoring of BPF programs through bpf_event is off by
default for 'perf record'.
To turn it on, the user need to use option "--bpf-event". As BPF gets
wider adoption in different subsystems, this option becomes
inconvenient.
This patch makes bpf_event on by default, and adds option "--no-bpf-event"
to turn it off. Since option --bpf-event is not released yet, it is safe
to remove it.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: kernel-team@fb.com
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
=================================================================
==20875==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 1160 byte(s) in 1 object(s) allocated from:
#0 0x7f1b6fc84138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
#1 0x55bd50005599 in zalloc util/util.h:23
#2 0x55bd500068f5 in perf_evsel__newtp_idx util/evsel.c:327
#3 0x55bd4ff810fc in perf_evsel__newtp /home/work/linux/tools/perf/util/evsel.h:216
#4 0x55bd4ff81608 in test__perf_evsel__tp_sched_test tests/evsel-tp-sched.c:69
#5 0x55bd4ff528e6 in run_test tests/builtin-test.c:358
#6 0x55bd4ff52baf in test_and_print tests/builtin-test.c:388
#7 0x55bd4ff543fe in __cmd_test tests/builtin-test.c:583
#8 0x55bd4ff5572f in cmd_test tests/builtin-test.c:722
#9 0x55bd4ffc4087 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
#10 0x55bd4ffc45c6 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
#11 0x55bd4ffc49ca in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
#12 0x55bd4ffc5138 in main /home/changbin/work/linux/tools/perf/perf.c:520
#13 0x7f1b6e34809a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Indirect leak of 19 byte(s) in 1 object(s) allocated from:
#0 0x7f1b6fc83f30 in __interceptor_malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xedf30)
#1 0x7f1b6e3ac30f in vasprintf (/lib/x86_64-linux-gnu/libc.so.6+0x8830f)
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 6a6cd11d4e ("perf test: Add test for the sched tracepoint format fields")
Link: http://lkml.kernel.org/r/20190316080556.3075-17-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using gcc's ASan, Changbin reports:
=================================================================
==7494==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 48 byte(s) in 1 object(s) allocated from:
#0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
#1 0x5625e5330a5e in zalloc util/util.h:23
#2 0x5625e5330a9b in perf_counts__new util/counts.c:10
#3 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
#4 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
#5 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
#6 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
#7 0x5625e51528e6 in run_test tests/builtin-test.c:358
#8 0x5625e5152baf in test_and_print tests/builtin-test.c:388
#9 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
#10 0x5625e515572f in cmd_test tests/builtin-test.c:722
#11 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
#12 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
#13 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
#14 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
#15 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Indirect leak of 72 byte(s) in 1 object(s) allocated from:
#0 0x7f0333a89138 in calloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee138)
#1 0x5625e532560d in zalloc util/util.h:23
#2 0x5625e532566b in xyarray__new util/xyarray.c:10
#3 0x5625e5330aba in perf_counts__new util/counts.c:15
#4 0x5625e5330ca0 in perf_evsel__alloc_counts util/counts.c:47
#5 0x5625e520d8e5 in __perf_evsel__read_on_cpu util/evsel.c:1505
#6 0x5625e517a985 in perf_evsel__read_on_cpu /home/work/linux/tools/perf/util/evsel.h:347
#7 0x5625e517ad1a in test__openat_syscall_event tests/openat-syscall.c:47
#8 0x5625e51528e6 in run_test tests/builtin-test.c:358
#9 0x5625e5152baf in test_and_print tests/builtin-test.c:388
#10 0x5625e51543fe in __cmd_test tests/builtin-test.c:583
#11 0x5625e515572f in cmd_test tests/builtin-test.c:722
#12 0x5625e51c3fb8 in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
#13 0x5625e51c44f7 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
#14 0x5625e51c48fb in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
#15 0x5625e51c5069 in main /home/changbin/work/linux/tools/perf/perf.c:520
#16 0x7f033214d09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
His patch took care of evsel->prev_raw_counts, but the above backtraces
are about evsel->counts, so fix that instead.
Reported-by: Changbin Du <changbin.du@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/n/tip-hd1x13g59f0nuhe4anxhsmfp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add function __maps__purge_names() to purge all maps from the names
tree. We need to cleanup the names tree in maps__exit().
Detected with gcc's ASan.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 1e6285699b ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190316080556.3075-12-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are two trees for each map inserted by maps__insert(), so remove
it from the 'names' tree in __maps__remove().
Detected with gcc's ASan.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Saint-Etienne <eric.saint.etienne@oracle.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 1e6285699b ("perf symbols: Fix slowness due to -ffunction-section")
Link: http://lkml.kernel.org/r/20190316080556.3075-11-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to map__put() before returning from failure of
sample__resolve_callchain().
Detected with gcc's ASan.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 9c68ae98c6 ("perf callchain: Reference count maps")
Link: http://lkml.kernel.org/r/20190316080556.3075-10-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We should go to the cleanup path, to avoid leaks, detected using gcc's
ASan.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190316080556.3075-9-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Detected with gcc's ASan:
Direct leak of 4356 byte(s) in 120 object(s) allocated from:
#0 0x7ff1a2b5a070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
#1 0x55719aef4814 in build_id_cache__origname util/build-id.c:215
#2 0x55719af649b6 in print_sdt_events util/parse-events.c:2339
#3 0x55719af66272 in print_events util/parse-events.c:2542
#4 0x55719ad1ecaa in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
#5 0x55719aec745d in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
#6 0x55719aec7d1a in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
#7 0x55719aec8184 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
#8 0x55719aeca41a in main /home/changbin/work/linux/tools/perf/perf.c:520
#9 0x7ff1a07ae09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 40218daea1 ("perf list: Show SDT and pre-cached events")
Link: http://lkml.kernel.org/r/20190316080556.3075-7-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Detected with gcc's ASan:
Direct leak of 66 byte(s) in 5 object(s) allocated from:
#0 0x7ff3b1f32070 in __interceptor_strdup (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x3b070)
#1 0x560c8761034d in collect_config util/config.c:597
#2 0x560c8760d9cb in get_value util/config.c:169
#3 0x560c8760dfd7 in perf_parse_file util/config.c:285
#4 0x560c8760e0d2 in perf_config_from_file util/config.c:476
#5 0x560c876108fd in perf_config_set__init util/config.c:661
#6 0x560c87610c72 in perf_config_set__new util/config.c:709
#7 0x560c87610d2f in perf_config__init util/config.c:718
#8 0x560c87610e5d in perf_config util/config.c:730
#9 0x560c875ddea0 in main /home/changbin/work/linux/tools/perf/perf.c:442
#10 0x7ff3afb8609a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Fixes: 20105ca124 ("perf config: Introduce perf_config_set class")
Link: http://lkml.kernel.org/r/20190316080556.3075-6-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The option 'sort-order' should be 'sort_order'.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 893c5c798b ("perf config: Show default report configuration in example and docs")
Link: http://lkml.kernel.org/r/20190316080556.3075-5-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Optimization level '-Og' offers a reasonable level of optimization while
maintaining fast compilation and a good debugging experience. This patch
tries to make it work.
$ make DEBUG=1 EXTRA_CFLAGS='-Og'
bench/epoll-ctl.c: In function ‘do_threads’:
bench/epoll-ctl.c:274:9: error: ‘ret’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
return ret;
^~~
...
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190316080556.3075-4-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Detected via gcc's ASan:
Direct leak of 2048 byte(s) in 64 object(s) allocated from:
6 #0 0x7f606512e370 in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xee370)
7 #1 0x556b0f1d7ddd in thread_map__realloc util/thread_map.c:43
8 #2 0x556b0f1d84c7 in thread_map__new_by_tid util/thread_map.c:85
9 #3 0x556b0f0e045e in is_event_supported util/parse-events.c:2250
10 #4 0x556b0f0e1aa1 in print_hwcache_events util/parse-events.c:2382
11 #5 0x556b0f0e3231 in print_events util/parse-events.c:2514
12 #6 0x556b0ee0a66e in cmd_list /home/changbin/work/linux/tools/perf/builtin-list.c:58
13 #7 0x556b0f01e0ae in run_builtin /home/changbin/work/linux/tools/perf/perf.c:302
14 #8 0x556b0f01e859 in handle_internal_command /home/changbin/work/linux/tools/perf/perf.c:354
15 #9 0x556b0f01edc8 in run_argv /home/changbin/work/linux/tools/perf/perf.c:398
16 #10 0x556b0f01f71f in main /home/changbin/work/linux/tools/perf/perf.c:520
17 #11 0x7f6062ccf09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 89896051f8 ("perf tools: Do not put a variable sized type not at the end of a struct")
Link: http://lkml.kernel.org/r/20190316080556.3075-3-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
AddressSanitizer (or ASan) and UndefinedBehaviorSanitizer (or UBSan) are
very useful tools to detect program bugs:
- AddressSanitizer (or ASan) is a GCC feature that detects memory
corruption bugs such as buffer overflows and memory leaks.
- UndefinedBehaviorSanitizer (or UBSan) is a fast undefined behavior
detector supported by GCC. UBSan detects undefined behaviors of programs
at runtime.
This patch adds a document about how to use them on perf. Later patches will fix
some of the issues disclosed by them.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20190316080556.3075-2-changbin.du@gmail.com
[ Make some changes based on comments made by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -c option to enable multiplex scaling has been useless for quite
some time because scaling is default.
It's only useful as --no-scale to disable scaling. But the non scaling
code path has bitrotted and doesn't print anything because perf output
code relies on value run/ena information.
Also even when we don't want to scale a value it's still useful to show
its multiplex percentage.
This patch:
- Fixes help and documentation to show --no-scale instead of -c
- Removes -c, only keeps the long option because -c doesn't support negatives.
- Enables running/enabled even with --no-scale
- And fixes some other problems in the no-scale output.
Before:
$ perf stat --no-scale -e cycles true
Performance counter stats for 'true':
<not counted> cycles
0.000984154 seconds time elapsed
After:
$ ./perf stat --no-scale -e cycles true
Performance counter stats for 'true':
706,070 cycles
0.001219821 seconds time elapsed
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LPU-Reference: 20190314225002.30108-9-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-xggjvwcdaj2aqy8ib3i4b1g6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When comparing time stamps in 'perf script' traces it can be annoying to
work with the full perf time stamps.
Add a --reltime option that displays time stamps relative to the trace
start to make it easier to read the traces.
Note: not currently supported for --time. Report an error in this
case.
Before:
% perf script
swapper 0 [000] 245402.891216: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891223: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891227: 5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 245402.891231: 41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
swapper 0 [000] 245402.891235: 355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
swapper 0 [000] 245402.891239: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
After:
% perf script --reltime
swapper 0 [000] 0.000000: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000006: 1 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000010: 5 cycles:ppp: ffffffffa0068814 native_write_msr+0x4 ([kernel.kallsyms])
swapper 0 [000] 0.000014: 41 cycles:ppp: ffffffffa0068816 native_write_msr+0x6 ([kernel.kallsyms])
swapper 0 [000] 0.000018: 355 cycles:ppp: ffffffffa000dd51 intel_bts_enable_local+0x21 ([kernel.kallsyms])
swapper 0 [000] 0.000022: 3084 cycles:ppp: ffffffffa0a0150a end_repeat_nmi+0x48 ([kernel.kallsyms])
Committer notes:
Do not use 'time' as the name of a variable, as this breaks the build on
older glibcs:
cc1: warnings being treated as errors
builtin-script.c: In function 'perf_sample__fprintf_start':
builtin-script.c:691: warning: declaration of 'time' shadows a global declaration
/usr/include/time.h:187: warning: shadowed declaration is here
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-8-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-bpahyi6pr9r399mvihu65fvc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show all the supported sort keys in the command line help output, so
that it's not needed to refer to the manpage.
Before:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): pid, comm, dso, symbol, parent, cpu, srcline, ... Please refer the man page for the complete list.
After:
% perf report -h
...
-s, --sort <key[,key2...]>
sort by key(s): overhead overhead_sys overhead_us overhead_guest_sys overhead_guest_us overhead_children sample period pid comm dso symbol parent cpu ...
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-5-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-9r3uz2ch4izoi1uln3f889co@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The help description for --switch-output looks like there are multiple
comma separated fields. But it's actually a choice of different options.
Make it clear and less confusing.
Before:
% perf record -h
...
--switch-output[=<signal,size,time>]
Switch output when receive SIGUSR2 or cross size,time threshold
After:
% perf record -h
...
--switch-output[=<signal or size[BKMG] or time[smhd]>]
Switch output when receiving SIGUSR2 (signal) or cross a size or time threshold
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-4-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-9yecyuha04nyg8toyd1b2pgi@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
turbostat failed to return a non-zero exit status even though the
supplied command (turbostat <command>) failed. Currently when turbostat
forks a command it returns zero instead of the actual exit status of the
command. Modify the code to return the exit status.
Signed-off-by: David Arcari <darcari@redhat.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When doing long term recording and waiting for some event to snapshot
on, we often only care about the last minute or so.
The --switch-output command line option supports rotating the perf.data
file when the size exceeds a threshold. But the disk would still be
filled with unnecessary old files.
Add a new option to only keep a number of rotated files, so that the
disk space usage can be limited.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-3-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-y5u2lik0ragt4vlktz6qc9ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a filter is specified on the command line, filter the metrics too.
Before:
% perf list foo
List of pre-defined events (to be used in -e):
Metric Groups:
DSB:
DSB_Coverage
[Fraction of Uops delivered by the DSB (aka Decoded Icache; or Uop Cache)]
... more metrics ...
After:
% perf list foo
List of pre-defined events (to be used in -e):
Metric Groups:
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
LPU-Reference: 20190314225002.30108-1-andi@firstfloor.org
Link: https://lkml.kernel.org/n/tip-1y8oi2s8c4jhjtykgs5zvda1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, herdtools version information appears no fewer than three
times in the LKMM source, which is difficult to maintain. This commit
therefore places the required version in one place, namely the
tools/memory-model/README file.
Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
This commit checks that the return value of srcu_read_lock() is passed
to the matching srcu_read_unlock(), where "matching" is determined by
nesting. This check operates as follows:
1. srcu_read_lock() creates an integer token, which is stored into
the generated events.
2. srcu_read_unlock() records its second (token) argument into the
generated event.
3. A new herd primitive 'different-values' filters out pairs of events
with identical values from the relation passed as its argument.
4. The bell file applies the above primitive to the (srcu)
read-side-critical-section relation 'srcu-rscs' and flags non-empty
results.
BEWARE: Works only with herd version 7.51+6 and onwards.
Signed-off-by: Luc Maranget <Luc.Maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Apply Andrea Parri's off-list feedback. ]
Acked-by: Alan Stern <stern@rowland.harvard.edu>
The recent commit adding support for SRCU to the Linux Kernel Memory
Model ended up changing the names and meanings of several relations.
This patch updates the explanation.txt documentation file to reflect
those changes.
It also revises the statement of the RCU Guarantee to a more accurate
form, and it adds a short paragraph mentioning the new support for SRCU.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Akira Yokosawa <akiyks@gmail.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Daniel Lustig <dlustig@nvidia.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jade Alglave <j.alglave@ucl.ac.uk>
Cc: Luc Maranget <luc.maranget@inria.fr>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Parri <andrea.parri@amarulasolutions.com>
This commit updates the section on LKMM limitations to no longer say
that SRCU is not modeled, but instead describe how LKMM's modeling of
SRCU departs from the Linux-kernel implementation.
TL;DR: There is no known valid use case that cares about the Linux
kernel's ability to have partially overlapping SRCU read-side critical
sections.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Acked-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Add support for SRCU. Herd creates srcu events and linux-kernel.def
associates them with three possible annotations (srcu-lock,
srcu-unlock, and sync-srcu) corresponding to the API routines
srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu().
The linux-kernel.bell file now declares the annotations
and determines matching lock/unlock pairs delimiting SRCU read-side
critical sections, and it also checks for synchronize_srcu() calls
inside an RCU critical section (which would generate a "sleeping in
atomic context" error in real kernel code). The linux-kernel.cat file
now adds SRCU-induced ordering, analogous to the existing RCU-induced
ordering, to the gp and rcu-fence relations.
Curiously enough, these small changes to the model's .cat code are all
that is needed to describe SRCU.
Portions of this patch (linux-kernel.def and the first hunk in
linux-kernel.bell) were written by Luc Maranget.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: Luc Maranget <luc.maranget@inria.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>
In preparation for adding support for SRCU, refactor the definitions
of rcu-fence, rcu-rscsi, rcu-link, and rb by moving the po and po?
terms from the first two to the second two. An rcu-gp relation is
added; it is equivalent to gp with the po and po? terms removed.
This is necessary because for SRCU, we will have to use the loc
relation to check that the terms at the start and end of each disjunct
in the definition of rcu-fence refer to the same srcu_struct
location. If these terms are hidden behind po and po?, there's no way
to carry out this check.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>
In preparation for adding support for SRCU, rename "crit" to
"rcu-rscs", rename "rscs" to "rcu-rscsi", and remove the restriction
to only the outermost level of nesting.
The name change is needed for disambiguating RCU read-side critical
sections from SRCU read-side critical sections. Adding the "i" at the
end of "rcu-rscsi" emphasizes that the relation is inverted; it links
rcu_read_unlock() events to their corresponding preceding
rcu_read_lock() events.
The restriction to outermost nesting levels was never essential; it
was included mostly to show that it could be done. Rather than add
equivalent unnecessary code for SRCU lock nesting, it seemed better to
remove the existing code.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Tested-by: Andrea Parri <andrea.parri@amarulasolutions.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlx+nn4ACgkQjrBW1T7s
sS2kwg//aJUCwLIhV91gXUFN2jHTCf0/+5fnigEk7JhAT5wmAykxLM8tprLlIlyp
HtwNQx54hq/6p010Ulo9K50VS6JRii+2lNSpC6IkqXXdHXXm0ViH+5I9Nru8SVJ+
avRCYWNjW9Gn1EtcB2yv6KP3XffgnQ6ZLIr4QJwglOxgAqUaWZ68woSUlrIR5yFj
j48wAxjsC3g2qwGLvXPeiwYZHwk6VnYmrZ3eWXPDthWRDC4zkjyBdchZZzFJagSC
6sX8T9s5ua5juZMokEJaWjuBQQyfg0NYu41hupSdVjV7/0D3E+5/DiReInvLmSup
63bZ85uKRqWTNgl4cmJ1W3aVe2RYYemMZCXVVYYvU+IKpvTSzzYY7us+FyMAIRUV
bT+XPGzTWcGrChzv9bHZcBrkL91XGqyxRJz56jLl6EhRtqxmzmywf6mO6pS2WK4N
r+aBDgXeJbG39KguCzwUgVX8hC6YlSxSP8Md+2sK+UoAdfTUvFtdCYnjhuACofCt
saRvDIPF8N9qn4Ch3InzCKkrUTL/H3BZKBl2jo6tYQ9smUsFZW7lQoip5Ui/0VS+
qksJ91djOc9facGoOorPazojY5fO5Lj3Hg+cGIoxUV0jPH483z7hWH0ALynb0f6z
EDsgNyEUpIO2nJMJJfm37ysbU/j1gOpzQdaAEaWeknwtfecFPzM=
=yOWp
-----END PGP SIGNATURE-----
Merge tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd system call from Christian Brauner:
"This introduces the ability to use file descriptors from /proc/<pid>/
as stable handles on struct pid. Even if a pid is recycled the handle
will not change. For a start these fds can be used to send signals to
the processes they refer to.
With the ability to use /proc/<pid> fds as stable handles on struct
pid we can fix a long-standing issue where after a process has exited
its pid can be reused by another process. If a caller sends a signal
to a reused pid it will end up signaling the wrong process.
With this patchset we enable a variety of use cases. One obvious
example is that we can now safely delegate an important part of
process management - sending signals - to processes other than the
parent of a given process by sending file descriptors around via scm
rights and not fearing that the given process will have been recycled
in the meantime. It also allows for easy testing whether a given
process is still alive or not by sending signal 0 to a pidfd which is
quite handy.
There has been some interest in this feature e.g. from systems
management (systemd, glibc) and container managers. I have requested
and gotten comments from glibc to make sure that this syscall is
suitable for their needs as well. In the future I expect it to take on
most other pid-based signal syscalls. But such features are left for
the future once they are needed.
This has been sitting in linux-next for quite a while and has not
caused any issues. It comes with selftests which verify basic
functionality and also test that a recycled pid cannot be signaled via
a pidfd.
Jon has written about a prior version of this patchset. It should
cover the basic functionality since not a lot has changed since then:
https://lwn.net/Articles/773459/
The commit message for the syscall itself is extensively documenting
the syscall, including it's functionality and extensibility"
* tag 'pidfd-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests: add tests for pidfd_send_signal()
signal: add pidfd_send_signal() syscall
* Replace the /sys/class/dax device model with /sys/bus/dax, and include
a compat driver so distributions can opt-in to the new ABI.
* Allow for an alternative driver for the device-dax address-range
* Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
* Arrange for the device-dax target-node to be onlined so that the newly
added memory range can be uniquely referenced by numa apis.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJchWpGAAoJEB7SkWpmfYgCJk8P/0Q1DINszUDO/vKjJ09cDs9P
Jw3it6GBIL50rDOu9QdcprSpwYDD0h1mLAV/m6oa3bVO+p4uWGvnxaxRx2HN2c/v
vhZFtUDpHlqR63vzWMNVKRprYixCRJDUr6xQhhCcE3ak/ELN6w7LWfikKVWv15UL
MfR96IQU38f+xRda/zSXnL9606Dvkvu/inEHj84lRcHIwj3sQAUalrE8bR3O32gZ
bDg/l5kzT49o8ZXUo/TegvRSSSZpJmOl2DD0RW+ax5q3NI2bOXFrVDUKBKxf/hcQ
E/V9i57TrqQx0GqRhnU7rN/v53cFZGGs31TEEIB/xs3bzCnADxwXcjL5b5K005J6
vJjBA2ODBewHFK3uVx46Hy1iV4eCtZWj4QrMnrjdSrjXOfbF5GTbWOhPFgoq7TWf
S7VqFEf3I2gDPaMq4o8Ej1kLH4HMYeor2NSOZjyvGn87rSZ3ZIQguwbaNIVl+itz
gdDt0ZOU0BgOBkV+rZIeZDaGdloWCHcDPL15CkZaOZyzdWhfEZ7dod6ad+9udilU
EUPH62RgzXZtfm5zpebYyjNVLbb9pLZ0nT+UypyGR6zqWx1SqU3mXi63NFXPco+x
XA9j//edPeI6NHg2CXLEh8DLuCg3dG1zWRJANkiF+niBwyCR8CHtGWAoY6soXbKe
2UrXGcIfXxyJ8V9v8v4q
=hfa3
-----END PGP SIGNATURE-----
Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".
Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.
One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.
Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.
- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"
NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.
And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.
Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.
I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"
* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-16
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a umem memory leak on cleanup in AF_XDP, from Björn.
2) Fix BTF to properly resolve forward-declared enums into their corresponding
full enum definition types during deduplication, from Andrii.
3) Fix libbpf to reject invalid flags in xsk_socket__create(), from Magnus.
4) Fix accessing invalid pointer returned from bpf_tcp_sock() and
bpf_sk_fullsock() after bpf_sk_release() was called, from Martin.
5) Fix generation of load/store DW instructions in PPC JIT, from Naveen.
6) Various fixes in BPF helper function documentation in bpf.h UAPI header
used to bpf-helpers(7) man page, from Quentin.
7) Fix segfault in BPF test_progs when prog loading failed, from Yonghong.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
for 32-bit guests
s390: interrupt cleanup, introduction of the Guest Information Block,
preparation for processor subfunctions in cpu models
PPC: bug fixes and improvements, especially related to machine checks
and protection keys
x86: many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations; plus AVIC fixes.
Generic: memcg accounting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
=XIzU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests
s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models
PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys
x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes
Generic:
- memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...
Merge misc patches from Andrew Morton:
- a little bit more MM
- a few fixups
[ The "little bit more MM" is actually just one of the three patches
Andrew sent for mm/filemap.c, I'm still mulling over two more of them
from Josef Bacik - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
tools/testing/selftests/proc/proc-pid-vm.c: test with vsyscall in mind
zram: default to lzo-rle instead of lzo
filemap: pass vm_fault to the mmap ra helpers
Synchronise the bpf.h header under tools, to report the latest fixes and
additions to the documentation for the BPF helpers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>