linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-25 21:40:53 +07:00

History

Daniel Borkmann e0cea7ce98 bpf: implement ld_abs/ld_ind in native bpf The main part of this work is to finally allow removal of LD_ABS and LD_IND from the BPF core by reimplementing them through native eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and keeping them around in native eBPF caused way more trouble than actually worth it. To just list some of the security issues in the past: * `fdfaf64e75` ("x86: bpf_jit: support negative offsets") * `35607b02db` ("sparc: bpf_jit: fix loads from negative offsets") * `e0ee9c1215` ("x86: bpf_jit: fix two bugs in eBPF JIT compiler") * `07aee94394` ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call") * `6d59b7dbf7` ("bpf, s390x: do not reload skb pointers in non-skb context") * `87338c8e2c` ("bpf, ppc64: do not reload skb pointers in non-skb context") For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy these days due to their limitations and more efficient/flexible alternatives that have been developed over time such as direct packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a register, the load happens in host endianness and its exception handling can yield unexpected behavior. The latter is explained in depth in `f6b1b3bf0d` ("bpf: fix subprog verifier bypass by div/mod by 0 exception") with similar cases of exceptions we had. In native eBPF more recent program types will disable LD_ABS/LD_IND altogether through may_access_skb() in verifier, and given the limitations in terms of exception handling, it's also disabled in programs that use BPF to BPF calls. In terms of cBPF, the LD_ABS/LD_IND is used in networking programs to access packet data. It is not used in seccomp-BPF but programs that use it for socket filtering or reuseport for demuxing with cBPF. This is mostly relevant for applications that have not yet migrated to native eBPF. The main complexity and source of bugs in LD_ABS/LD_IND is coming from their implementation in the various JITs. Most of them keep the model around from cBPF times by implementing a fastpath written in asm. They use typically two from the BPF program hidden CPU registers for caching the skb's headlen (skb->len - skb->data_len) and skb->data. Throughout the JIT phase this requires to keep track whether LD_ABS/LD_IND are used and if so, the two registers need to be recached each time a BPF helper would change the underlying packet data in native eBPF case. At least in eBPF case, available CPU registers are rare and the additional exit path out of the asm written JIT helper makes it also inflexible since not all parts of the JITer are in control from plain C. A LD_ABS/LD_IND implementation in eBPF therefore allows to significantly reduce the complexity in JITs with comparable performance results for them, e.g.: test_bpf tcpdump port 22 tcpdump complex x64 - before 15 21 10 14 19 18 - after 7 10 10 7 10 15 arm64 - before 40 91 92 40 91 151 - after 51 64 73 51 62 113 For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter() and cache the skb's headlen and data in the cBPF prologue. The BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just used as a local temporary variable. This allows to shrink the image on x86_64 also for seccomp programs slightly since mapping to %rsi is not an ereg. In callee-saved R8 and R9 we now track skb data and headlen, respectively. For normal prologue emission in the JITs this does not add any extra instructions since R8, R9 are pushed to stack in any case from eBPF side. cBPF uses the convert_bpf_ld_abs() emitter which probes the fast path inline already and falls back to bpf_skb_load_helper_{8,16,32}() helper relying on the cached skb data and headlen as well. R8 and R9 never need to be reloaded due to bpf_helper_changes_pkt_data() since all skb access in cBPF is read-only. Then, for the case of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally, does neither cache skb data and headlen nor has an inlined fast path. The reason for the latter is that native eBPF does not have any extra registers available anyway, but even if there were, it avoids any reload of skb data and headlen in the first place. Additionally, for the negative offsets, we provide an alternative bpf_skb_load_bytes_relative() helper in eBPF which operates similarly as bpf_skb_load_bytes() and allows for more flexibility. Tested myself on x64, arm64, s390x, from Sandipan on ppc64. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>		2018-05-03 16:49:19 -07:00
..
bpf	bpf: implement ld_abs/ld_ind in native bpf	2018-05-03 16:49:19 -07:00
cgroup	Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2018-04-03 18:00:13 -07:00
configs	KVM changes for 4.16	2018-02-10 13:16:35 -08:00
debug	* Fix 2032 time access issues and new compiler warnings	2018-04-12 10:21:19 -07:00
events	perf: Remove superfluous allocation error check	2018-04-17 09:47:40 -03:00
gcov	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
irq	genirq/affinity: Spread irq vectors among present CPUs as far as possible	2018-04-06 12:19:51 +02:00
livepatch	livepatch: Allow to call a custom callback when freeing shadow variables	2018-04-17 13:42:48 +02:00
locking	locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches	2018-03-31 07:30:50 +02:00
power	PM / QoS: mark expected switch fall-throughs	2018-04-09 13:49:40 +02:00
printk	New features:	2018-04-10 11:27:30 -07:00
rcu	Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD	2018-02-23 15:15:41 -08:00
sched	Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2018-04-15 12:43:30 -07:00
time	posix-cpu-timers: Ensure set_process_cpu_timer is always evaluated	2018-04-19 12:54:57 +02:00
trace	bpf: Allow bpf_current_task_under_cgroup in interrupt	2018-04-29 09:18:04 -07:00
.gitignore
acct.c	kernel/acct.c: fix the acct->needcheck check in check_free_space()	2018-01-04 16:45:09 -08:00
async.c	kernel/async.c: revert "async: simplify lowest_in_progress()"	2018-02-06 18:32:44 -08:00
audit_fsnotify.c
audit_tree.c	audit: track the owner of the command mutex ourselves	2018-02-23 11:22:22 -05:00
audit_watch.c	audit/stable-4.13 PR 20170816	2017-08-16 16:48:34 -07:00
audit.c	audit/stable-4.17 PR 20180403	2018-04-06 15:01:25 -07:00
audit.h	audit: track the owner of the command mutex ourselves	2018-02-23 11:22:22 -05:00
auditfilter.c	audit: deprecate the AUDIT_FILTER_ENTRY filter	2018-02-15 14:36:29 -05:00
auditsc.c	audit: bail before bug check if audit disabled	2018-02-15 14:40:25 -05:00
backtracetest.c
bounds.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
capability.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
compat.c	mm: add kernel_move_pages() helper, move compat syscall to mm/migrate.c	2018-04-02 20:15:32 +02:00
configs.c
context_tracking.c
cpu_pm.c	PM / CPU: replace raw_notifier with atomic_notifier	2017-07-31 13:09:49 +02:00
cpu.c	cpu/hotplug: Fix unused function warning	2018-03-15 20:34:40 +01:00
crash_core.c	kexec: export PG_swapbacked to VMCOREINFO	2018-04-13 17:10:27 -07:00
crash_dump.c
cred.c
delayacct.c	delayacct: Account blkio completion on the correct task	2018-01-16 03:29:36 +01:00
dma.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
elfcore.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
exec_domain.c	get rid of pointless includes of fs_struct.h	2018-02-22 14:28:50 -05:00
exit.c	kernel: use kernel_wait4() instead of sys_wait4()	2018-04-02 20:14:51 +02:00
extable.c	extable: Make init_kernel_text() global	2018-02-21 16:54:06 +01:00
fail_function.c	error-injection: Fix to prohibit jump optimization	2018-03-12 16:16:00 +01:00
fork.c	fork: unconditionally clear stack on fork	2018-04-20 17:18:35 -07:00
freezer.c
futex_compat.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
futex.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
groups.c	kernel: make groups_sort calling a responsibility group_info allocators	2017-12-14 16:00:49 -08:00
hung_task.c
irq_work.c	irq/work: Improve the flag definitions	2018-01-08 19:43:15 +01:00
jump_label.c	jump_label: Disable jump labels in __exit code	2018-03-20 08:57:17 +01:00
kallsyms.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk	2018-02-01 13:36:15 -08:00
kcmp.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: detect double association with a single task	2018-02-06 18:32:46 -08:00
kexec_core.c	x86/mm, kexec: Allow kexec to be used with SME	2017-07-18 11:38:04 +02:00
kexec_file.c	kernel/kexec_file.c: allow archs to set purgatory load address	2018-04-13 17:10:28 -07:00
kexec_internal.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
kexec.c	kexec: call do_kexec_load() in compat syscall directly	2018-04-02 20:15:01 +02:00
kmod.c	kmod: move #ifdef CONFIG_MODULES wrapper to Makefile	2017-09-08 18:26:51 -07:00
kprobes.c	kprobes: Propagate error from disarm_kprobe_ftrace()	2018-02-16 09:12:58 +01:00
ksysfs.c	kexec: move vmcoreinfo out of the kernel's .bss section	2017-07-12 16:25:59 -07:00
kthread.c	treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts	2017-11-21 16:35:54 -08:00
latencytop.c
Makefile	error-injection: Support fault injection framework	2018-01-12 17:33:38 -08:00
memremap.c	kernel/memremap: Remove stale devres_free() call	2018-03-06 10:58:54 -08:00
module_signing.c
module-internal.h
module.c	arch: remove obsolete architecture ports	2018-04-02 20:20:12 -07:00
notifier.c
nsproxy.c
padata.c	padata: add SPDX identifier	2018-01-05 18:43:00 +11:00
panic.c	taint: add taint for randstruct	2018-04-11 10:28:35 -07:00
params.c	kernel/params.c: downgrade warning for unsafe parameters	2018-04-11 10:28:37 -07:00
pid_namespace.c	Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2018-04-03 19:15:32 -07:00
pid.c	xarray: add the xa_lock to the radix_tree_root	2018-04-11 10:28:39 -07:00
profile.c
ptrace.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
range.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
reboot.c	kernel/reboot.c: add devm_register_reboot_notifier()	2017-11-17 16:10:04 -08:00
relay.c	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE	2018-02-21 15:35:43 -08:00
resource.c	resource: fix integer overflow at reallocation	2018-04-13 17:10:27 -07:00
seccomp.c	- Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen)	2018-02-22 10:50:24 -08:00
signal.c	Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security	2018-04-07 11:11:41 -07:00
smp.c	smp/core: Use lockdep to assert IRQs are disabled/enabled	2017-11-08 11:13:50 +01:00
smpboot.c	watchdog/core, powerpc: Lock cpus across reconfiguration	2017-10-04 10:53:54 +02:00
smpboot.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
softirq.c	softirq: Consolidate common code in tasklet_[hi]_action()	2018-03-09 11:50:55 +01:00
stacktrace.c
stop_machine.c
sys_ni.c	syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls	2018-04-05 16:59:38 +02:00
sys.c	kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()	2018-04-02 20:16:06 +02:00
sysctl_binary.c	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sysctl.c	kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param	2018-04-11 10:28:38 -07:00
task_work.c	locking/barriers: Convert users of lockless_dereference() to READ_ONCE()	2017-12-17 13:57:15 +01:00
taskstats.c	pids: introduce find_get_task_by_vpid() helper	2018-02-06 18:32:46 -08:00
test_kprobes.c	kprobes: Disable the jprobes test code	2017-10-20 11:02:54 +02:00
torture.c	torture: Save a line in stutter_wait(): while -> for	2017-12-11 09:18:30 -08:00
tracepoint.c	tracepoint: Remove smp_read_barrier_depends() from comment	2017-12-04 10:52:56 -08:00
tsacct.c
ucount.c	headers: untangle kmemleak.h from mm.h	2018-04-05 21:36:27 -07:00
uid16.c	fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers	2018-04-02 20:15:59 +02:00
uid16.h	kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c	2018-04-02 20:15:30 +02:00
umh.c	kernel: use kernel_wait4() instead of sys_wait4()	2018-04-02 20:14:51 +02:00
up.c	smp: Avoid using two cache lines for struct call_single_data	2017-08-29 15:14:38 +02:00
user_namespace.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace	2017-11-16 12:20:15 -08:00
user-return-notifier.c
user.c	efivarfs: Limit the rate for non-root to read files	2018-02-22 10:21:02 -08:00
utsname_sysctl.c
utsname.c	uts: create "struct uts_namespace" from kmem_cache	2018-04-11 10:28:35 -07:00
watchdog_hld.c	Merge branch 'linus' into core/urgent, to pick up dependent commits	2017-11-04 08:53:04 +01:00
watchdog.c	Merge branch 'linus' into sched/core, to pick up fixes	2017-11-08 10:17:15 +01:00
workqueue_internal.h	Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2017-11-06 12:26:49 -08:00
workqueue.c	Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq	2018-04-03 18:00:13 -07:00