- PERAMAENT flag to ftrace_ops when attaching a callback to a function
As /proc/sys/kernel/ftrace_enabled when set to zero will disable all
attached callbacks in ftrace, this has a detrimental impact on live
kernel tracing, as it disables all that it patched. If a ftrace_ops
is registered to ftrace with the PERMANENT flag set, it will prevent
ftrace_enabled from being disabled, and if ftrace_enabled is already
disabled, it will prevent a ftrace_ops with PREMANENT flag set from
being registered.
- New register_ftrace_direct(). As eBPF would like to register its own
trampolines to be called by the ftrace nop locations directly,
without going through the ftrace trampoline, this function has been
added. This allows for eBPF trampolines to live along side of
ftrace, perf, kprobe and live patching. It also utilizes the ftrace
enabled_functions file that keeps track of functions that have been
modified in the kernel, to allow for security auditing.
- Allow for kernel internal use of ftrace instances. Subsystems in
the kernel can now create and destroy their own tracing instances
which allows them to have their own tracing buffer, and be able
to record events without worrying about other users from writing over
their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency tracers.
- Wider spread use of generic compare operations for use of bsearch and
friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXdwv4BQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnB5AP91vsdHQjwE1+/UWG/cO+qFtKvn2QJK
QmBRIJNH/s+1TAD/fAOhgw+ojSK3o/qc+NpvPTEW9AEwcJL1wacJUn+XbQc=
=ztql
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"New tracing features:
- New PERMANENT flag to ftrace_ops when attaching a callback to a
function.
As /proc/sys/kernel/ftrace_enabled when set to zero will disable
all attached callbacks in ftrace, this has a detrimental impact on
live kernel tracing, as it disables all that it patched. If a
ftrace_ops is registered to ftrace with the PERMANENT flag set, it
will prevent ftrace_enabled from being disabled, and if
ftrace_enabled is already disabled, it will prevent a ftrace_ops
with PREMANENT flag set from being registered.
- New register_ftrace_direct().
As eBPF would like to register its own trampolines to be called by
the ftrace nop locations directly, without going through the ftrace
trampoline, this function has been added. This allows for eBPF
trampolines to live along side of ftrace, perf, kprobe and live
patching. It also utilizes the ftrace enabled_functions file that
keeps track of functions that have been modified in the kernel, to
allow for security auditing.
- Allow for kernel internal use of ftrace instances.
Subsystems in the kernel can now create and destroy their own
tracing instances which allows them to have their own tracing
buffer, and be able to record events without worrying about other
users from writing over their data.
- New seq_buf_hex_dump() that lets users use the hex_dump() in their
seq_buf usage.
- Notifications now added to tracing_max_latency to allow user space
to know when a new max latency is hit by one of the latency
tracers.
- Wider spread use of generic compare operations for use of bsearch
and friends.
- More synthetic event fields may be defined (32 up from 16)
- Use of xarray for architectures with sparse system calls, for the
system call trace events.
This along with small clean ups and fixes"
* tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
tracing: Enable syscall optimization for MIPS
tracing: Use xarray for syscall trace events
tracing: Sample module to demonstrate kernel access to Ftrace instances.
tracing: Adding new functions for kernel access to Ftrace instances
tracing: Fix Kconfig indentation
ring-buffer: Fix typos in function ring_buffer_producer
ftrace: Use BIT() macro
ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
ftrace: Add helper find_direct_entry() to consolidate code
ftrace: Add another check for match in register_ftrace_direct()
ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
ftrace/samples: Add a sample module that implements modify_ftrace_direct()
ftrace: Add modify_ftrace_direct()
tracing: Add missing "inline" in stub function of latency_fsnotify()
tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
tracing: Use seq_buf_hex_dump() to dump buffers
...
New features:
- SECCOMP support
- nommu support
- SBI-less system support
- M-Mode support
- TLB flush optimizations
Other improvements:
- Pass the complete RISC-V ISA string supported by the CPU cores to
userspace, rather than redacting parts of it in the kernel
- Add platform DMA IP block data to the HiFive Unleashed board DT file
- Add Makefile support for BZ2, LZ4, LZMA, LZO kernel image
compression formats, in line with other architectures
Cleanups:
- Remove unnecessary PTE_PARENT_SIZE macro
- Standardize include guard naming across arch/riscv
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl3dkMcACgkQx4+xDQu9
KksoQA/+MZdtMrSe+j5yep7ZO803ivdS5tPCjNI06An2Ps8OJmJKoU8zSCIbygfM
xIQ8R5KVLjd1DTW3SKNrL3bUsH0PdwGHKj69z0gMA01kGBe5CfJEzHw03U4It++t
xaPAYJje99xtdEzZMro/Z+/Gsz/XM5SDZmY4i6quSlHOmL0HP6zftk3jPvxO+XtD
7unQJSHuZElqs984OmR+6DUTiaIlJ4P+/xxlFydswq19/tDNCXpukLhMHkOND3W+
3dBHAp9U18dz2I8lD3xlppVbvgCnud7z+xa+XY1Q/BluoYIEe6714S+9lr8TCdTP
43wL9hSvbDPyaydnmKU7VqwdcWrbVQkA7H3GtxeiUZYvyDdidk9sARqg0n6uoedw
48uRfeI0jCQXh6XdFNMCqNgYPpcpHY3isMKVCi2U6TT0fNdUUC3f62qsOIP0vQXh
4hDys5JzIoBKv8nz/ap+1xzUluBt6FGmySPlTbd8ryn2m2Yd0EsHjywcEGRJMrlM
KcOYgLAKcr9rRF4ap/NqQy74AvC+pOcDVeUHPD0WXYDRQn6L0wbp8YElzRf8W1BC
b7osrRSNLfzC2LsEUEK+qYHrT4+cuWBbkuobAEHMY/OnLW1OKEifZQIdknAE1/yg
n2qy4DuA6qta+Q1GiH8OOt9AEYHPw+SDP3+qFaJK5ZZMb3v565A=
=nbwY
-----END PGP SIGNATURE-----
Merge tag 'riscv/for-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"New features:
- SECCOMP support
- nommu support
- SBI-less system support
- M-Mode support
- TLB flush optimizations
Other improvements:
- Pass the complete RISC-V ISA string supported by the CPU cores to
userspace, rather than redacting parts of it in the kernel
- Add platform DMA IP block data to the HiFive Unleashed board DT
file
- Add Makefile support for BZ2, LZ4, LZMA, LZO kernel image
compression formats, in line with other architectures
Cleanups:
- Remove unnecessary PTE_PARENT_SIZE macro
- Standardize include guard naming across arch/riscv"
* tag 'riscv/for-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (22 commits)
riscv: provide a flat image loader
riscv: add nommu support
riscv: clear the instruction cache and all registers when booting
riscv: read the hart ID from mhartid on boot
riscv: provide native clint access for M-mode
riscv: dts: add support for PDMA device of HiFive Unleashed Rev A00
riscv: add support for MMIO access to the timer registers
riscv: implement remote sfence.i using IPIs
riscv: cleanup the default power off implementation
riscv: poison SBI calls for M-mode
riscv: don't allow selecting SBI based drivers for M-mode
RISC-V: Add multiple compression image format.
riscv: clean up the macro format in each header file
riscv: Use PMD_SIZE to replace PTE_PARENT_SIZE
riscv: abstract out CSR names for supervisor vs machine mode
riscv: separate MMIO functions into their own header file
riscv: enter WFI in default_power_off() if SBI does not shutdown
RISC-V: Issue a tlb page flush if possible
RISC-V: Issue a local tlbflush if possible.
RISC-V: Do not invoke SBI call if cpumask is empty
...
Pull RCU updates from Ingo Molnar:
"The main changes in this cycle were:
- Dynamic tick (nohz) updates, perhaps most notably changes to force
the tick on when needed due to lengthy in-kernel execution on CPUs
on which RCU is waiting.
- Linux-kernel memory consistency model updates.
- Replace rcu_swap_protected() with rcu_prepace_pointer().
- Torture-test updates.
- Documentation updates.
- Miscellaneous fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
rcu: Suppress levelspread uninitialized messages
rcu: Fix uninitialized variable in nocb_gp_wait()
rcu: Update descriptions for rcu_future_grace_period tracepoint
rcu: Update descriptions for rcu_nocb_wake tracepoint
rcu: Remove obsolete descriptions for rcu_barrier tracepoint
rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
workqueue: Convert for_each_wq to use built-in list check
rcu: Several rcu_segcblist functions can be static
rcu: Remove unused function hlist_bl_del_init_rcu()
Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
...
Pull x86 iopl updates from Ingo Molnar:
"This implements a nice simplification of the iopl and ioperm code that
Thomas Gleixner discovered: we can implement the IO privilege features
of the iopl system call by using the IO permission bitmap in
permissive mode, while trapping CLI/STI/POPF/PUSHF uses in user-space
if they change the interrupt flag.
This implements that feature, with testing facilities and related
cleanups"
[ "Simplification" may be an over-statement. The main goal is to avoid
the cli/sti of iopl by effectively implementing the IO port access
parts of iopl in terms of ioperm.
This may end up not workign well in case people actually depend on
cli/sti being available, or if there are mixed uses of iopl and
ioperm. We will see.. - Linus ]
* 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/ioperm: Fix use of deprecated config option
x86/entry/32: Clarify register saving in __switch_to_asm()
selftests/x86/iopl: Extend test to cover IOPL emulation
x86/ioperm: Extend IOPL config to control ioperm() as well
x86/iopl: Remove legacy IOPL option
x86/iopl: Restrict iopl() permission scope
x86/iopl: Fixup misleading comment
selftests/x86/ioperm: Extend testing so the shared bitmap is exercised
x86/ioperm: Share I/O bitmap if identical
x86/ioperm: Remove bitmap if all permissions dropped
x86/ioperm: Move TSS bitmap update to exit to user work
x86/ioperm: Add bitmap sequence number
x86/ioperm: Move iobitmap data into a struct
x86/tss: Move I/O bitmap data into a seperate struct
x86/io: Speedup schedule out of I/O bitmap user
x86/ioperm: Avoid bitmap allocation if no permissions are set
x86/ioperm: Simplify first ioperm() invocation logic
x86/iopl: Cleanup include maze
x86/tss: Fix and move VMX BUILD_BUG_ON()
x86/cpu: Unify cpu_init()
...
Pull x86 fixes from Ingo Molnar:
"These are the fixes left over from the v5.4 cycle:
- Various low level 32-bit entry code fixes and improvements by Andy
Lutomirski, Peter Zijlstra and Thomas Gleixner.
- Fix 32-bit Xen PV breakage, by Jan Beulich"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/32: Fix FIXUP_ESPFIX_STACK with user CR3
x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise
selftests/x86/sigreturn/32: Invalidate DS and ES when abusing the kernel
selftests/x86/mov_ss_trap: Fix the SYSENTER test
x86/entry/32: Fix NMI vs ESPFIX
x86/entry/32: Unwind the ESPFIX stack earlier on exception entry
x86/entry/32: Move FIXUP_FRAME after pushing %fs in SAVE_ALL
x86/entry/32: Use %ss segment where required
x86/entry/32: Fix IRET exception
x86/cpu_entry_area: Add guard page for entry stack on 32bit
x86/pti/32: Size initial_page_table correctly
x86/doublefault/32: Fix stack canaries in the double fault handler
x86/xen/32: Simplify ring check in xen_iret_crit_fixup()
x86/xen/32: Make xen_iret_crit_fixup() independent of frame layout
x86/stackframe/32: Repair 32-bit Xen PV
Pull networking updates from David Miller:
"Another merge window, another pull full of stuff:
1) Support alternative names for network devices, from Jiri Pirko.
2) Introduce per-netns netdev notifiers, also from Jiri Pirko.
3) Support MSG_PEEK in vsock/virtio, from Matias Ezequiel Vara
Larsen.
4) Allow compiling out the TLS TOE code, from Jakub Kicinski.
5) Add several new tracepoints to the kTLS code, also from Jakub.
6) Support set channels ethtool callback in ena driver, from Sameeh
Jubran.
7) New SCTP events SCTP_ADDR_ADDED, SCTP_ADDR_REMOVED,
SCTP_ADDR_MADE_PRIM, and SCTP_SEND_FAILED_EVENT. From Xin Long.
8) Add XDP support to mvneta driver, from Lorenzo Bianconi.
9) Lots of netfilter hw offload fixes, cleanups and enhancements,
from Pablo Neira Ayuso.
10) PTP support for aquantia chips, from Egor Pomozov.
11) Add UDP segmentation offload support to igb, ixgbe, and i40e. From
Josh Hunt.
12) Add smart nagle to tipc, from Jon Maloy.
13) Support L2 field rewrite by TC offloads in bnxt_en, from Venkat
Duvvuru.
14) Add a flow mask cache to OVS, from Tonghao Zhang.
15) Add XDP support to ice driver, from Maciej Fijalkowski.
16) Add AF_XDP support to ice driver, from Krzysztof Kazimierczak.
17) Support UDP GSO offload in atlantic driver, from Igor Russkikh.
18) Support it in stmmac driver too, from Jose Abreu.
19) Support TIPC encryption and auth, from Tuong Lien.
20) Introduce BPF trampolines, from Alexei Starovoitov.
21) Make page_pool API more numa friendly, from Saeed Mahameed.
22) Introduce route hints to ipv4 and ipv6, from Paolo Abeni.
23) Add UDP segmentation offload to cxgb4, Rahul Lakkireddy"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1857 commits)
libbpf: Fix usage of u32 in userspace code
mm: Implement no-MMU variant of vmalloc_user_node_flags
slip: Fix use-after-free Read in slip_open
net: dsa: sja1105: fix sja1105_parse_rgmii_delays()
macvlan: schedule bc_work even if error
enetc: add support Credit Based Shaper(CBS) for hardware offload
net: phy: add helpers phy_(un)lock_mdio_bus
mdio_bus: don't use managed reset-controller
ax88179_178a: add ethtool_op_get_ts_info()
mlxsw: spectrum_router: Fix use of uninitialized adjacency index
mlxsw: spectrum_router: After underlay moves, demote conflicting tunnels
bpf: Simplify __bpf_arch_text_poke poke type handling
bpf: Introduce BPF_TRACE_x helper for the tracing tests
bpf: Add bpf_jit_blinding_enabled for !CONFIG_BPF_JIT
bpf, testing: Add various tail call test cases
bpf, x86: Emit patchable direct jump as tail call
bpf: Constant map key tracking for prog array pokes
bpf: Add poke dependency tracking for prog array maps
bpf: Add initial poke descriptor table for jit images
bpf: Move owner type, jited info into array auxiliary data
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl3bz1gACgkQUqAMR0iA
lPI5fw//db5dOqAvBu/fz4k38Mc30okgCjtRh0+vhFXCCUXauQv2IhI19J2IiPpy
4t/CjaUk2QSB06NDNUxt7XsuR0yAF4E0nJHUmkDKkN8UFsi7jAjxJ/92zH3x/LE0
YWtVoWjGduO+QfLVlP22VVYh1pX5kOxXG2WTEiJtnkWYdkqtkEy7Cw2Rlzzrrym6
6kIVi3nEPtn/hOnlF/Ii5SJWh+jJrSf+XwXiuIiBupT49Ujoa4KscmhkiHnAccXb
ICJpsxBIdZLxHLe/c0YO3b8r4Hvms124vlIC19Z0l9VEJ++sphDOWRrL3Zq9tFw8
FwIKq8Ex9UFfldOnpD5q/PRhM9Xgfsw8UYYZZpXQeW6z7qzv1iVpM6BQlt3dYQaL
pb21UXIfzaTdZIsUgnetypbqWIFNZxovrhORpjqxo6WV4lSaVx4MPE2/Le/G8xPR
DT+a6yyzTyM0XbZ0MCVDfuZ+ZRaA1zfKEcsYEu883b7yK4z+TZbT61vnEKqq8+40
EgOZnNjBENZLRQY0RofQsO5zGwcaanVOtBOmYDXtP/fup8/1SRZ25zmmIVhvChJG
iQwCDw6IMqnae/FsMb+kBTDCRJXpN028iYGY7UAoiCuvzc0qm0gJXsGdZLqMvjEh
nEdKKN2ze+03s6I9AcISKdnbUVphhb/xeDKRBkMgcooWLrfWj5E=
=i37E
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- New API to track system state changes done be livepatch callbacks. It
helps to maintain compatibility between livepatches.
- Update Kconfig help text. ORC is another reliable unwinder.
- Disable generic selftest timeout. Livepatch selftests have their own
per-operation fine-grained timeouts.
* tag 'livepatching-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
x86/stacktrace: update kconfig help text for reliable unwinders
livepatch: Selftests of the API for tracking system state changes
livepatch: Documentation of the new API for tracking system state changes
livepatch: Allow to distinguish different version of system state changes
livepatch: Basic API to track system state changes
livepatch: Keep replaced patches until post_patch callback is called
selftests/livepatch: Disable the timeout
Pull cgroup updates from Tejun Heo:
"There are several notable changes here:
- Single thread migrating itself has been optimized so that it
doesn't need threadgroup rwsem anymore.
- Freezer optimization to avoid unnecessary frozen state changes.
- cgroup ID unification so that cgroup fs ino is the only unique ID
used for the cgroup and can be used to directly look up live
cgroups through filehandle interface on 64bit ino archs. On 32bit
archs, cgroup fs ino is still the only ID in use but it is only
unique when combined with gen.
- selftest and other changes"
* 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
writeback: fix -Wformat compilation warnings
docs: cgroup: mm: Fix spelling of "list"
cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root()
cgroup: use cgrp->kn->id as the cgroup ID
kernfs: use 64bit inos if ino_t is 64bit
kernfs: implement custom exportfs ops and fid type
kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
kernfs: convert kernfs_node->id from union kernfs_node_id to u64
kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes
kernfs: use dumber locking for kernfs_find_and_get_node_by_ino()
netprio: use css ID instead of cgroup ID
writeback: use ino_t for inodes in tracepoints
kernfs: fix ino wrap-around detection
kselftests: cgroup: Avoid the reuse of fd after it is deallocated
cgroup: freezer: don't change task and cgroups status unnecessarily
cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency
cgroup: remove cgroup_enable_task_cg_lists() optimization
cgroup: pids: use atomic64_t for pids->limit
selftests: cgroup: Run test_core under interfering stress
selftests: cgroup: Add task migration tests
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXdfjBwAKCRCRxhvAZXjc
onCBAP47WZ/ie7yjoDWhOI1QB7II3NGSzToakxpgJaWoB+NjTwEA7PGrSYVEbPrf
pUhiEaEJ29t+cWUxX3+yDO+k7SA6BAY=
=Ra58
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread management updates from Christian Brauner:
- A pidfd's fdinfo file currently contains the field "Pid:\t<pid>"
where <pid> is the pid of the process in the pid namespace of the
procfs instance the fdinfo file for the pidfd was opened in.
The fdinfo file has now gained a new "NSpid:\t<ns-pid1>[\t<ns-pid2>[...]]"
field which lists the pids of the process in all child pid namespaces
provided the pid namespace of the procfs instance it is looked up
under has an ancestoral relationship with the pid namespace of the
process. If it does not 0 will be shown and no further pid namespaces
will be listed. Tests included. (Christian Kellner)
- If the process the pidfd references has already exited, print -1 for
the Pid and NSpid fields in the pidfd's fdinfo file. Tests included.
(me)
- Add CLONE_CLEAR_SIGHAND. This lets callers clear all signal handler
that are not SIG_DFL or SIG_IGN at process creation time. This
originated as a feature request from glibc to improve performance and
elimate races in their posix_spawn() implementation. Tests included.
(me)
- Add support for choosing a specific pid for a process with clone3().
This is the feature which was part of the thread update for v5.4 but
after a discussion at LPC in Lisbon we decided to delay it for one
more cycle in order to make the interface more generic. This has now
done. It is now possible to choose a specific pid in a whole pid
namespaces (sub)hierarchy instead of just one pid namespace. In order
to choose a specific pid the caller must have CAP_SYS_ADMIN in all
owning user namespaces of the target pid namespaces. Tests included.
(Adrian Reber)
- Test improvements and extensions. (Andrei Vagin, me)
* tag 'threads-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: skip if clone3() is ENOSYS
selftests/clone3: check that all pids are released on error paths
selftests/clone3: report a correct number of fails
selftests/clone3: flush stdout and stderr before clone3() and _exit()
selftests: add tests for clone3() with *set_tid
fork: extend clone3() to support setting a PID
selftests: add tests for clone3()
tests: test CLONE_CLEAR_SIGHAND
clone3: add CLONE_CLEAR_SIGHAND
pid: use pid_has_task() in pidfd_open()
exit: use pid_has_task() in do_wait()
pid: use pid_has_task() in __change_pid()
test: verify fdinfo for pidfd of reaped process
pidfd: check pid has attached task in fdinfo
pidfd: add tests for NSpid info in fdinfo
pidfd: add NSpid entries to fdinfo
- Data abort report and injection
- Steal time support
- GICv4 performance improvements
- vgic ITS emulation fixes
- Simplify FWB handling
- Enable halt polling counters
- Make the emulated timer PREEMPT_RT compliant
s390:
- Small fixes and cleanups
- selftest improvements
- yield improvements
PPC:
- Add capability to tell userspace whether we can single-step the guest.
- Improve the allocation of XIVE virtual processor IDs
- Rewrite interrupt synthesis code to deliver interrupts in virtual
mode when appropriate.
- Minor cleanups and improvements.
x86:
- XSAVES support for AMD
- more accurate report of nested guest TSC to the nested hypervisor
- retpoline optimizations
- support for nested 5-level page tables
- PMU virtualization optimizations, and improved support for nested
PMU virtualization
- correct latching of INITs for nested virtualization
- IOAPIC optimization
- TSX_CTRL virtualization for more TAA happiness
- improved allocation and flushing of SEV ASIDs
- many bugfixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJd27PMAAoJEL/70l94x66DspsH+gPc6YWtKJFJH58Zj8NrNh6y
t0FwDFcvUa51+m4jaY4L5Y8+zqu1dZFnPPhFGqNWpxrjCEvE/glQJv3BiUX06Seh
aYUHNymGoYCTJOHaaGhV+NlgQaDuZOCOkIsOLAPehyFd1KojwB+FRC0xmO6aROPw
9yQgYrKuK1UUn5HwxBNrMS4+Xv+2iKv/9sTnq1G4W2qX2NZQg84LVPg1zIdkCh3D
3GOvoCBEk3ivQqjmdE7rP/InPr0XvW0b6TFhchIk8J6jEIQFHsmOUefiTvTxsIHV
OKAZwvyeYPrYHA/aDZpaBmY2aR0ydfKDUQcviNIJoF1vOktGs0hvl3VbsmG8QCg=
=OSI1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- data abort report and injection
- steal time support
- GICv4 performance improvements
- vgic ITS emulation fixes
- simplify FWB handling
- enable halt polling counters
- make the emulated timer PREEMPT_RT compliant
s390:
- small fixes and cleanups
- selftest improvements
- yield improvements
PPC:
- add capability to tell userspace whether we can single-step the
guest
- improve the allocation of XIVE virtual processor IDs
- rewrite interrupt synthesis code to deliver interrupts in virtual
mode when appropriate.
- minor cleanups and improvements.
x86:
- XSAVES support for AMD
- more accurate report of nested guest TSC to the nested hypervisor
- retpoline optimizations
- support for nested 5-level page tables
- PMU virtualization optimizations, and improved support for nested
PMU virtualization
- correct latching of INITs for nested virtualization
- IOAPIC optimization
- TSX_CTRL virtualization for more TAA happiness
- improved allocation and flushing of SEV ASIDs
- many bugfixes and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints
KVM: x86: Grab KVM's srcu lock when setting nested state
KVM: x86: Open code shared_msr_update() in its only caller
KVM: Fix jump label out_free_* in kvm_init()
KVM: x86: Remove a spurious export of a static function
KVM: x86: create mmu/ subdirectory
KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page
KVM: x86: remove set but not used variable 'called'
KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning
KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it
KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality
KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID
KVM: x86: do not modify masked bits of shared MSRs
KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES
KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT
KVM: x86: Unexport kvm_vcpu_reload_apic_access_page()
KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1
KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization
...
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The patches
introduce an arch_faults_on_old_pte() macro, defined as false on x86.
When true, cow_user_page() makes the pte young before attempting
__copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a MAINTAINERS
update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with the
wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in the
IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale
macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl3YJswACgkQa9axLQDI
XvFwYg//aTGhNLew3ADgW2TYal7LyqetRROixPBrzqHLu2A8No1+QxHMaKxpZVyf
pt25tABuLtPHql3qBzE0ltmfbLVsPj/3hULo404EJb9HLRfUnVGn7gcPkc+p4YAr
IYkYPXJbk6OlJ84vI+4vXmDEF12bWCqamC9qZ+h99qTpMjFXFO17DSJ7xQ8Xic3A
HHgCh4uA7gpTVOhLxaS6KIw+AZNYwvQxLXch2+wj6agbGX79uw9BeMhqVXdkPq8B
RTDJpOdS970WOT4cHWOkmXwsqqGRqgsgyu+bRUJ0U72+0y6MX0qSHIUnVYGmNc5q
Dtox4rryYLvkv/hbpkvjgVhv98q3J1mXt/CalChWB5dG4YwhJKN2jMiYuoAvB3WS
6dR7Dfupgai9gq1uoKgBayS2O6iFLSa4g58vt3EqUBqmM7W7viGFPdLbuVio4ycn
CNF2xZ8MZR6Wrh1JfggO7Hc11EJdSqESYfHO6V/pYB4pdpnqJLDoriYHXU7RsZrc
HvnrIvQWKMwNbqBvpNbWvK5mpBMMX2pEienA3wOqKNH7MbepVsG+npOZTVTtl9tN
FL0ePb/mKJu/2+gW8ntiqYn7EzjKprRmknOiT2FjWWo0PxgJ8lumefuhGZZbaOWt
/aTAeD7qKd/UXLKGHF/9v3q4GEYUdCFOXP94szWVPyLv+D9h8L8=
=TPL9
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Apart from the arm64-specific bits (core arch and perf, new arm64
selftests), it touches the generic cow_user_page() (reviewed by
Kirill) together with a macro for x86 to preserve the existing
behaviour on this architecture.
Summary:
- On ARMv8 CPUs without hardware updates of the access flag, avoid
failing cow_user_page() on PFN mappings if the pte is old. The
patches introduce an arch_faults_on_old_pte() macro, defined as
false on x86. When true, cow_user_page() makes the pte young before
attempting __copy_from_user_inatomic().
- Covert the synchronous exception handling paths in
arch/arm64/kernel/entry.S to C.
- FTRACE_WITH_REGS support for arm64.
- ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4
- Several kselftest cases specific to arm64, together with a
MAINTAINERS update for these files (moved to the ARM64 PORT entry).
- Workaround for a Neoverse-N1 erratum where the CPU may fetch stale
instructions under certain conditions.
- Workaround for Cortex-A57 and A72 errata where the CPU may
speculatively execute an AT instruction and associate a VMID with
the wrong guest page tables (corrupting the TLB).
- Perf updates for arm64: additional PMU topologies on HiSilicon
platforms, support for CCN-512 interconnect, AXI ID filtering in
the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2.
- GICv3 optimisation to avoid a heavy barrier when accessing the
ICC_PMR_EL1 register.
- ELF HWCAP documentation updates and clean-up.
- SMC calling convention conduit code clean-up.
- KASLR diagnostics printed during boot
- NVIDIA Carmel CPU added to the KPTI whitelist
- Some arm64 mm clean-ups: use generic free_initrd_mem(), remove
stale macro, simplify calculation in __create_pgd_mapping(), typos.
- Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for
endinanness to help with allmodconfig"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: Kconfig: add a choice for endianness
kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous"
arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE
MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry
arm64: kaslr: Check command line before looking for a seed
arm64: kaslr: Announce KASLR status on boot
kselftest: arm64: fake_sigreturn_misaligned_sp
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: add helper get_current_context
kselftest: arm64: extend test_init functionalities
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: extend toplevel skeleton Makefile
drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform
arm64: mm: reserve CMA and crashkernel in ZONE_DMA32
...
This kselftest update for Linux 5.5-rc1 adds KUnit, a lightweight unit
testing and mocking framework for the Linux kernel from Brendan Higgins.
KUnit is not an end-to-end testing framework. It is currently supported
on UML and sub-systems can write unit tests and run them in UML env.
KUnit documentation is included in this update.
In addition, this Kunit update adds 3 new kunit tests:
- kunit test for proc sysctl from Iurii Zaikin
- kunit test for the 'list' doubly linked list from David Gow
- ext4 kunit test for decoding extended timestamps from Iurii Zaikin
In the future KUnit will be linked to Kselftest framework to provide
a way to trigger KUnit tests from user-space.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl3YfBQACgkQCwJExA0N
QxzTjxAAiFhaDMhlpLhn1DpIUNvfKrIgDjJgajQAyMMs6TJK3OrD6J4WbpVD7wGo
aqF9l6o64sY18JAo3s00j6AcAmVwNH7qzEEuzIQPjJvQ8C4sCWL3esEP4JHgFb2F
snlSn5KjSsdC1D9N7uQIhgW76xPSyDrTwWQpglvmB9TwmJVBIl9zhu+unp73ufFJ
N+ieDg8A6W/wDGYSq5JBSkJbuI0gL+daNwUYzxEEZIskndhpovOc82WAldECRm6x
TfI0u39zTbrEO0DHgmYpyGbTN8TB2mXjH5HMjwg+KbHfKVTKKGvTK7XFs8mWGQpO
n2meypZuwuIsRPOPcAVs+Gt2dc0jFODJVIV1EzA0WSv6TEdPqyhM/d13tHdCqjm9
ic5wQ/hQQNEB1Dvg5ereXBaGGaoqP95y61ZpCS9vCXFXH+28E/B63Ebfs2IBIuqS
Jv2KcoxabyZq3uGdjnn+mD7IM8rkvscRP4Ba31nXRgJIYDHAzqe7APN7y3on4NGx
1q7lBlA3XZZ8qgo0zpLST20ck/qaL3tk4k8E1f8emh6CuyrCWtazgrWkMIlyEX0O
8nre3uEAF9xUzB4+gZK4YmelN9Bld3Uv7Ippt1zTCiQ0FkEABQIMUrTZygy7Wfg6
6qi4dk8frWW4Kt63gOXsMxr9FWTqDk+Ys4GPAVDVm1d0dzERn8k=
=0zEe
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest KUnit support gtom Shuah Khan:
"This adds KUnit, a lightweight unit testing and mocking framework for
the Linux kernel from Brendan Higgins.
KUnit is not an end-to-end testing framework. It is currently
supported on UML and sub-systems can write unit tests and run them in
UML env. KUnit documentation is included in this update.
In addition, this Kunit update adds 3 new kunit tests:
- proc sysctl test from Iurii Zaikin
- the 'list' doubly linked list test from David Gow
- ext4 tests for decoding extended timestamps from Iurii Zaikin
In the future KUnit will be linked to Kselftest framework to provide a
way to trigger KUnit tests from user-space"
* tag 'linux-kselftest-5.5-rc1-kunit' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (23 commits)
lib/list-test: add a test for the 'list' doubly linked list
ext4: add kunit test for decoding extended timestamps
Documentation: kunit: Fix verification command
kunit: Fix '--build_dir' option
kunit: fix failure to build without printk
MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section
kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()
MAINTAINERS: add entry for KUnit the unit testing framework
Documentation: kunit: add documentation for KUnit
kunit: defconfig: add defconfigs for building KUnit tests
kunit: tool: add Python wrappers for running KUnit tests
kunit: test: add tests for KUnit managed resources
kunit: test: add the concept of assertions
kunit: test: add tests for kunit test abort
kunit: test: add support for test abort
objtool: add kunit_try_catch_throw to the noreturn list
kunit: test: add initial tests
lib: enable building KUnit in lib/
kunit: test: add the concept of expectations
kunit: test: add assertion printing library
...
This kselftest fixes update for Linux 5.5-rc1 consists of several
fixes to tests and framework. Masami Hiramatsu fixed several tests
to build and run correctly on arm and other 32bit architectures.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl3YLAsACgkQCwJExA0N
QxySoxAAwTJgNDhyxWGLFzlReuILBitWuTbEg1KzlOMjKWHnDBHhO9GYqxScKt5Y
fgjjwukvQyai0PGMKd4U+AXtxeSB+Wr9kID7AXuTeDyBOK1iu9Nr++vSkAByJla4
qSVYf+Ja3Zu8kWs3S9rC1zs8dvMutA4FrjW7ig77quFxfSncwqIsTufo53gpBo5e
tJ/VakouWiMV7oyDDPzqrs7Uz01CMmEsJXAkyHzb/nc+6NjJUpewRhZGkYXi89Le
2nBvEUUpdLMphds+KxO51+e+5pnU0ZD4PtjGcQ4/SABWWrCvNKJ9Tn398AdcndYX
NGQe9eohcSB47YKA+sNagi+B4UBfgz6GtYEsdAy8oNxxEGTg8apKygvIlfjmp8in
EMcFF/r6Wwxufx4F9uRFLn48t4d/SAYqCne/RaSwb+IkAe+wyFKeTNP7bwqAwakE
Uvfpjoya0n5IXBKV9v6MrJJZKwKw6J9mu51wahmx2MQveW8p4nqnzy3xZ8AWOqb/
BezRWOgs85Yank8ovXfbAtxcdqKfw3BpDDO379OM7syvTMWNbLcfhpZ9wbyxB108
lcPsjrvu2SNzMs21vsyoDkTE/CPl0byr2v4SWOfUq1YTbsxVIveP4qcy9B12C3aR
m7vRGLG2HwMrlVZ1mvXY2Tz0Jjhx7S+0eyYYzEBMHGT4zRqxjwU=
=KGOl
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This consists of several fixes to tests and framework.
Masami Hiramatsu fixed several tests to build and run correctly on arm
and other 32bit architectures"
* tag 'linux-kselftest-5.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: sync: Fix cast warnings on arm
selftests: net: Fix printf format warnings on arm
selftests: net: Use size_t and ssize_t for counting file size
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: proc: Make va_max 1MB
kselftest: Fix NULL INSTALL_PATH for TARGETS runlist
selftests: Move kselftest_module.sh into kselftest/
selftests: gen_kselftest_tar.sh: Do not clobber kselftest/
selftests: breakpoints: Fix a typo of function name
selftests: Fix O= and KBUILD_OUTPUT handling for relative paths
For BPF_PROG_TYPE_TRACING, the bpf_prog's ctx is an array of u64.
This patch borrows the idea from BPF_CALL_x in filter.h to
convert a u64 to the arg type of the traced function.
The new BPF_TRACE_x has an arg to specify the return type of a bpf_prog.
It will be used in the future TCP-ops bpf_prog that may return "void".
The new macros are defined in the new header file "bpf_trace_helpers.h".
It is under selftests/bpf/ for now. It could be moved to libbpf later
after seeing more upcoming non-tracing use cases.
The tests are changed to use these new macros also. Hence,
the k[s]u8/16/32/64 are no longer needed and they are removed
from the bpf_helpers.h.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191123202504.1502696-1-kafai@fb.com
Add several BPF kselftest cases for tail calls which test the various
patch directions, and that multiple locations are patched in same and
different programs.
# ./test_progs -n 45
#45/1 tailcall_1:OK
#45/2 tailcall_2:OK
#45/3 tailcall_3:OK
#45/4 tailcall_4:OK
#45/5 tailcall_5:OK
#45 tailcalls:OK
Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED
I've also verified the JITed dump after each of the rewrite cases that
it matches expectations.
Also regular test_verifier suite passes fine which contains further tail
call tests:
# ./test_verifier
[...]
Summary: 1563 PASSED, 0 SKIPPED, 0 FAILED
Checked under JIT, interpreter and JIT + hardening.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/3d6cbecbeb171117dccfe153306e479798fb608d.1574452833.git.daniel@iogearbox.net
Add a test that benchmarks different ways of attaching BPF program to a kernel function.
Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations:
$ ./test_progs -n 49 -v|grep events
task_rename base 2743K events per sec
task_rename kprobe 2419K events per sec
task_rename kretprobe 1876K events per sec
task_rename raw_tp 2578K events per sec
task_rename fentry 2710K events per sec
task_rename fexit 2685K events per sec
On a kernel with retpoline:
$ ./test_progs -n 49 -v|grep events
task_rename base 2401K events per sec
task_rename kprobe 1930K events per sec
task_rename kretprobe 1485K events per sec
task_rename raw_tp 2053K events per sec
task_rename fentry 2351K events per sec
task_rename fexit 2185K events per sec
All 5 approaches:
- kprobe/kretprobe in __set_task_comm()
- raw tracepoint in trace_task_rename()
- fentry/fexit in __set_task_comm()
are roughly equivalent.
__set_task_comm() by itself is quite fast, so any extra instructions add up.
Until BPF trampoline was introduced the fastest mechanism was raw tracepoint.
kprobe via ftrace was second best. kretprobe is slow due to trap. New
fentry/fexit methods via BPF trampoline are clearly the fastest and the
difference is more pronounced with retpoline on, since BPF trampoline doesn't
use indirect jumps.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org
Three test cases are added.
Test 1: jmp32 'reg op imm'.
Test 2: jmp32 'reg op reg' where dst 'reg' has unknown constant
and src 'reg' has known constant
Test 3: jmp32 'reg op reg' where dst 'reg' has known constant
and src 'reg' has unknown constant
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121170651.449096-1-yhs@fb.com
test_core_reloc_kernel.c selftest is the only CO-RE test that reads and
returns for validation calling thread's information (pid, tgid, comm). Thus it
has to make sure that only test_prog's invocations are honored.
Fixes: df36e62141 ("selftests/bpf: add CO-RE relocs testing setup")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191121175900.3486133-1-andriin@fb.com
Initialized global variables are no different in ELF from static variables,
and don't require any extra support from libbpf. But they are matching
semantics of global data (backed by BPF maps) more closely, preventing
LLVM/Clang from aggressively inlining constant values and not requiring
volatile incantations to prevent those. This patch enables global variables.
It still disables uninitialized variables, which will be put into special COM
(common) ELF section, because BPF doesn't allow uninitialized data to be
accessed.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-5-andriin@fb.com
Add -mattr=dwarfris attribute to llc to avoid having relocations against DWARF
data. These relocations make it impossible to inspect DWARF contents: all
strings are invalid.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-2-andriin@fb.com
Add exra level of verboseness, activated by -vvv argument. When -vv is
specified, verbose libbpf and verifier log (level 1) is output, even for
successful tests. With -vvv, verifier log goes to level 2.
This is extremely useful to debug verifier failures, as well as just see the
state and flow of verification. Before this, you'd have to go and modify
load_program()'s source code inside libbpf to specify extra log_level flags,
which is suboptimal to say the least.
Currently -vv and -vvv triggering verifier output is integrated into
test_stub's bpf_prog_load as well as bpf_verif_scale.c tests.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191120003548.4159797-1-andriin@fb.com
If selftests are copied over to another machine/location
for execution the build test of bpftool will obviously
not work, since the sources are not copied.
Skip it if we can't find bpftool's Makefile.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191119105010.19189-3-quentin.monnet@netronome.com
The trap on EXIT is used to clean up any temporary directory left by the
build attempts. It is not needed when the user simply calls the script
with its --help option, and may not be needed either if we add checks
(e.g. on the availability of bpftool files) before the build attempts.
Let's move this trap and related variables lower down in the code, so
that we don't accidentally change the value returned from the script
on early exits at pre-checks.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191119105010.19189-2-quentin.monnet@netronome.com
If the kernel accidentally uses DS or ES while the user values are
loaded, it will work fine for sane userspace. In the interest of
simulating maximally insane userspace, make sigreturn_32 zero out DS
and ES for the nasty parts so that inadvertent use of these segments
will crash.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
For reasons that I haven't quite fully diagnosed, running
mov_ss_trap_32 on a 32-bit kernel results in an infinite loop in
userspace. This appears to be because the hacky SYSENTER test
doesn't segfault as desired; instead it corrupts the program state
such that it infinite loops.
Fix it by explicitly clearing EBP before doing SYSENTER. This will
give a more reliable segfault.
Fixes: 59c2a7226f ("x86/selftests: Add mov_to_ss test")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca748:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca748
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca748
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca748
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
With the most recent Clang, alu32 is enabled by default if -mcpu=probe or
-mcpu=v3 is specified. Use a separate build rule with -mcpu=v2 to enforce no
ALU32 mode.
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191120002510.4130605-1-andriin@fb.com
Check configurations and packets transference with different variations
of autoneg and speed.
Test plan:
1. Test force of same speed with autoneg off
2. Test force of different speeds with autoneg off (should fail)
3. One side is autoneg on and other side sets force of common speeds
4. One side is autoneg on and other side only advertises a subset of the
common speeds (one speed of the subset)
5. One side is autoneg on and other side only advertises a subset of the
common speeds. Check that highest speed is negotiated
6. Test autoneg on, but each side advertises different speeds (should
fail)
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a function that waits for device with maximum number of iterations.
It enables to limit the waiting and prevent infinite loop.
This will be used by the subsequent patch which will set two ports to
different speeds in order to make sure they cannot negotiate a link.
Waiting for all the setup is limited with 10 minutes for each device.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Functions:
1. speeds_arr_get
The function returns an array of speed values from
/usr/include/linux/ethtool.h The array looks as follows:
[10baseT/Half] = 0,
[10baseT/Full] = 1,
...
2. ethtool_set:
params: cmd
The function runs ethtool by cmd (ethtool -s cmd) and checks if
there was an error in configuration
3. dev_speeds_get:
params: dev, with_mode (0 or 1), adver (0 or 1)
return value: Array of supported/Advertised link modes
with/without mode
* Example 1:
speeds_get swp1 0 0
return: 1000 10000 40000
* Example 2:
speeds_get swp1 1 1
return: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full
4. common_speeds_get:
params: dev1, dev2, with_mode (0 or 1), adver (0 or 1)
return value: Array of common speeds of dev1 and dev2
* Example:
common_speeds_get swp1 swp2 0 0
return: 1000 10000
Assuming that swp1 supports 1000 10000 40000 and swp2 supports
1000 10000
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scale test for Spectrum-2 should only be invoked for Spectrum-2.
Skip the test otherwise.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as for Spectrum-1, test the ability to add the maximum number of
routes possible to the switch.
Invoke the test from the 'resource_scale' wrapper script.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:
infinite loop detected at insn 76
The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:
48: w3 = 100
49: *(u32 *)(r10 - 68) = r3
...
; if (tlv.type == SR6_TLV_PADDING) {
76: if w3 == 5 goto -18 <LBB0_19>
...
85: r1 = *(u32 *)(r10 - 68)
; for (int i = 0; i < 100; i++) {
86: w1 += -1
87: if w1 == 0 goto +5 <LBB0_20>
88: *(u32 *)(r10 - 68) = r1
The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".
Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:
w3 = 100
R3_w=inv100 fp-64_w=inv1086626730498
*(u32 *)(r10 - 68) = r3
R3_w=inv100 fp-64_w=inv1086626730498
...
r1 = *(u32 *)(r10 - 68)
R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
fp-64=inv1086626730498
To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".
Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com
When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.
For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.
Ensure a remaining server is terminated on cleanup.
Fixes: f6ad6accaa ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.1573821780.git.jbenc@redhat.com
The actual test to run is test_xdping.sh, which is already in TEST_PROGS.
The xdping program alone is not runnable with 'make run_tests', it
immediatelly fails due to missing arguments.
Move xdping to TEST_GEN_PROGS_EXTENDED in order to be built but not run.
Fixes: cd5385029f ("selftests/bpf: measure RTT from xdp using xdping")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/4365c81198f62521344c2215909634407184387e.1573821726.git.jbenc@redhat.com
Add selftests validating mmap()-ing BPF array maps: both single-element and
multi-element ones. Check that plain bpf_map_update_elem() and
bpf_map_lookup_elem() work correctly with memory-mapped array. Also convert
CO-RE relocation tests to use memory-mapped views of global data.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-6-andriin@fb.com
If the clone3() syscall is not implemented we should skip the tests.
Fixes: 41585bbeee ("selftests: add tests for clone3() with *set_tid")
Fixes: 17a810699c ("selftests: add tests for clone3()")
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This is a regression test case for an issue when pids have not been
released on error paths.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-3-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
In clone3_set_tid, a few test cases are running in a child process. And
right now, if one of these test cases fails, the whole test will exit
with the success status.
Fixes: 41585bbeee ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-2-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Buffers have to be flushed before clone3() to avoid double messages in
the log.
Fixes: 41585bbeee ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-1-avagin@gmail.com
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Pull networking fixes from David Miller:
1) Fix memory leak in xfrm_state code, from Steffen Klassert.
2) Fix races between devlink reload operations and device
setup/cleanup, from Jiri Pirko.
3) Null deref in NFC code, from Stephan Gerhold.
4) Refcount fixes in SMC, from Ursula Braun.
5) Memory leak in slcan open error paths, from Jouni Hogander.
6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
7) Info leak on short USB request answers in ax88172a driver, from
Oliver Neukum.
8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
9) PTP config timestamp flags validation, from Richard Cochran.
10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
ipmr: Fix skb headroom in ipmr_get_route().
net: hns3: cleanup of stray struct hns3_link_mode_mapping
net/smc: fix fastopen for non-blocking connect()
rds: ib: update WR sizes when bringing up connection
net: gemini: add missed free_netdev
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
seg6: fix skb transport_header after decap_and_validate()
seg6: fix srh pointer in get_srh()
net: stmmac: Use the correct style for SPDX License Identifier
octeontx2-af: Use the correct style for SPDX License Identifier
ptp: Extend the test program to check the external time stamp flags.
mlx5: Reject requests to enable time stamping on both edges.
igb: Reject requests that fail to enable time stamping on both edges.
dp83640: Reject requests to enable time stamping on both edges.
mv88e6xxx: Reject requests to enable time stamping on both edges.
ptp: Introduce strict checking of external time stamp options.
renesas: reject unsupported external timestamp flags
mlx5: reject unsupported external timestamp flags
igb: reject unsupported external timestamp flags
dp83640: reject unsupported external timestamp flags
...
tcp_mmap is used as a reference program for TCP rx zerocopy,
so it is important to point out some potential issues.
If multiple threads are concurrently using getsockopt(...
TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
functions compete on shared ptl lock, if vma are arbitrary placed.
Instead of letting the mm layer place the chunks back to back,
this patch enforces an alignment so that each thread uses
a different ptl lock.
Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
launched at the same time :
$ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done
In the following run, we reproduce the old behavior by requesting no alignment :
$ tcp_mmap -sz -C $((128*1024)) -a 4096
received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches
New behavior (automatic alignment based on Hugepagesize),
we can see the system overhead being dramatically reduced.
$ tcp_mmap -sz -C $((128*1024))
received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests that the now emulated iopl() functionality:
- does not longer allow user space to disable interrupts.
- does restore a I/O bitmap when IOPL is dropped
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add code to the fork path which forces the shared bitmap to be duplicated
and the reference count to be dropped. Verify that the child modifications
did not affect the parent.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>