linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 16:30:52 +07:00

History

Daniel Borkmann 5fc6ed1831 bpf: Fix leakage under speculation on mispredicted branches [ Upstream commit 9183671af6dbf60a1219371d4ed73e23f43b49db ] The verifier only enumerates valid control-flow paths and skips paths that are unreachable in the non-speculative domain. And so it can miss issues under speculative execution on mispredicted branches. For example, a type confusion has been demonstrated with the following crafted program: // r0 = pointer to a map array entry // r6 = pointer to readable stack slot // r9 = scalar controlled by attacker 1: r0 = (u64 )(r0) // cache miss 2: if r0 != 0x0 goto line 4 3: r6 = r9 4: if r0 != 0x1 goto line 6 5: r9 = (u8 )(r6) 6: // leak r9 Since line 3 runs iff r0 == 0 and line 5 runs iff r0 == 1, the verifier concludes that the pointer dereference on line 5 is safe. But: if the attacker trains both the branches to fall-through, such that the following is speculatively executed ... r6 = r9 r9 = (u8 )(r6) // leak r9 ... then the program will dereference an attacker-controlled value and could leak its content under speculative execution via side-channel. This requires to mistrain the branch predictor, which can be rather tricky, because the branches are mutually exclusive. However such training can be done at congruent addresses in user space using different branches that are not mutually exclusive. That is, by training branches in user space ... A: if r0 != 0x0 goto line C B: ... C: if r0 != 0x0 goto line D D: ... ... such that addresses A and C collide to the same CPU branch prediction entries in the PHT (pattern history table) as those of the BPF program's lines 2 and 4, respectively. A non-privileged attacker could simply brute force such collisions in the PHT until observing the attack succeeding. Alternative methods to mistrain the branch predictor are also possible that avoid brute forcing the collisions in the PHT. A reliable attack has been demonstrated, for example, using the following crafted program: // r0 = pointer to a [control] map array entry // r7 = (u64 )(r0 + 0), training/attack phase // r8 = (u64 )(r0 + 8), oob address // [...] // r0 = pointer to a [data] map array entry 1: if r7 == 0x3 goto line 3 2: r8 = r0 // crafted sequence of conditional jumps to separate the conditional // branch in line 193 from the current execution flow 3: if r0 != 0x0 goto line 5 4: if r0 == 0x0 goto exit 5: if r0 != 0x0 goto line 7 6: if r0 == 0x0 goto exit [...] 187: if r0 != 0x0 goto line 189 188: if r0 == 0x0 goto exit // load any slowly-loaded value (due to cache miss in phase 3) ... 189: r3 = (u64 )(r0 + 0x1200) // ... and turn it into known zero for verifier, while preserving slowly- // loaded dependency when executing: 190: r3 &= 1 191: r3 &= 2 // speculatively bypassed phase dependency 192: r7 += r3 193: if r7 == 0x3 goto exit 194: r4 = (u8 )(r8 + 0) // leak r4 As can be seen, in training phase (phase != 0x3), the condition in line 1 turns into false and therefore r8 with the oob address is overridden with the valid map value address, which in line 194 we can read out without issues. However, in attack phase, line 2 is skipped, and due to the cache miss in line 189 where the map value is (zeroed and later) added to the phase register, the condition in line 193 takes the fall-through path due to prior branch predictor training, where under speculation, it'll load the byte at oob address r8 (unknown scalar type at that point) which could then be leaked via side-channel. One way to mitigate these is to 'branch off' an unreachable path, meaning, the current verification path keeps following the is_branch_taken() path and we push the other branch to the verification stack. Given this is unreachable from the non-speculative domain, this branch's vstate is explicitly marked as speculative. This is needed for two reasons: i) if this path is solely seen from speculative execution, then we later on still want the dead code elimination to kick in in order to sanitize these instructions with jmp-1s, and ii) to ensure that paths walked in the non-speculative domain are not pruned from earlier walks of paths walked in the speculative domain. Additionally, for robustness, we mark the registers which have been part of the conditional as unknown in the speculative path given there should be no assumptions made on their content. The fix in here mitigates type confusion attacks described earlier due to i) all code paths in the BPF program being explored and ii) existing verifier logic already ensuring that given memory access instruction references one specific data structure. An alternative to this fix that has also been looked at in this scope was to mark aux->alu_state at the jump instruction with a BPF_JMP_TAKEN state as well as direction encoding (always-goto, always-fallthrough, unknown), such that mixing of different always-* directions themselves as well as mixing of always-* with unknown directions would cause a program rejection by the verifier, e.g. programs with constructs like 'if ([...]) { x = 0; } else { x = 1; }' with subsequent 'if (x == 1) { [...] }'. For unprivileged, this would result in only single direction always-* taken paths, and unknown taken paths being allowed, such that the former could be patched from a conditional jump to an unconditional jump (ja). Compared to this approach here, it would have two downsides: i) valid programs that otherwise are not performing any pointer arithmetic, etc, would potentially be rejected/broken, and ii) we are required to turn off path pruning for unprivileged, where both can be avoided in this work through pushing the invalid branch to the verification stack. The issue was originally discovered by Adam and Ofek, and later independently discovered and reported as a result of Benedict and Piotr's research work. Fixes: `b2157399cc` ("bpf: prevent out-of-bounds speculation") Reported-by: Adam Morrison <mad@cs.tau.ac.il> Reported-by: Ofek Kirzner <ofekkir@gmail.com> Reported-by: Benedict Schlueter <benedict.schlueter@rub.de> Reported-by: Piotr Krysiuk <piotras@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Reviewed-by: Benedict Schlueter <benedict.schlueter@rub.de> Reviewed-by: Piotr Krysiuk <piotras@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>		2021-06-23 14:42:45 +02:00
..
bpf	bpf: Fix leakage under speculation on mispredicted branches	2021-06-23 14:42:45 +02:00
cgroup	cgroup1: don't allow '\n' in renaming	2021-06-16 12:01:40 +02:00
configs
debug	kgdb: fix to kill breakpoints on initmem after boot	2021-03-04 11:38:46 +01:00
dma	swiotlb: Fix the type of index	2021-05-19 10:13:04 +02:00
entry	x86/entry: Move nmi entry/exit into common code	2021-03-17 17:06:36 +01:00
events	perf: Fix data race between pin_count increment/decrement	2021-06-16 12:01:45 +02:00
gcov	gcov: re-fix clang-11+ support	2021-04-14 08:41:58 +02:00
irq	genirq/matrix: Prevent allocation counter corruption	2021-05-11 14:47:17 +02:00
kcsan	kcsan: Fix debugfs initcall return type	2021-05-26 12:06:54 +02:00
livepatch	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
locking	locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal	2021-05-26 12:06:50 +02:00
power	PM: EM: postpone creating the debugfs dir till fs_initcall	2021-03-30 14:32:04 +02:00
printk	printk: fix deadlock when kernel panic	2021-03-04 11:38:41 +01:00
rcu	rcu: Remove spurious instrumentation_end() in rcu_nmi_enter()	2021-05-14 09:50:22 +02:00
sched	sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling	2021-06-16 12:01:46 +02:00
time	posix-timers: Preserve return value in clock_adjtime32()	2021-05-11 14:47:16 +02:00
trace	tracing: Correct the length check which causes memory corruption	2021-06-16 12:01:47 +02:00
.gitignore	kbuild: update config_data.gz only when the content of .config is changed	2021-05-11 14:47:37 +02:00
acct.c	kernel: acct.c: fix some kernel-doc nits	2020-10-16 11:11:19 -07:00
async.c	treewide: Remove uninitialized_var() usage	2020-07-16 12:35:15 -07:00
audit_fsnotify.c	fsnotify: generalize handle_inode_event()	2020-12-30 11:54:18 +01:00
audit_tree.c	fsnotify: generalize handle_inode_event()	2020-12-30 11:54:18 +01:00
audit_watch.c	fsnotify: generalize handle_inode_event()	2020-12-30 11:54:18 +01:00
audit.c	audit: Remove redundant null check	2020-08-26 09:10:39 -04:00
audit.h	audit: change unnecessary globals into statics	2020-08-17 20:26:58 -04:00
auditfilter.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
auditsc.c	audit/stable-5.9 PR 20200803	2020-08-04 14:20:26 -07:00
backtracetest.c	treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()	2020-07-30 11:15:58 -07:00
bounds.c
capability.c	LSM: Signal to SafeSetID when setting group IDs	2020-10-13 09:17:34 -07:00
compat.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c
cpu_pm.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
cpu.c	kernel/cpu: add arch override for clear_tasks_mm_cpumask() mm handling	2020-11-27 00:10:39 +11:00
crash_core.c	kdump: append kernel build-id string to VMCOREINFO	2020-08-12 10:58:01 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c	kernel/io_uring: cancel io_uring before task works	2021-01-30 13:55:18 +01:00
extable.c
fail_function.c	fail_function: Remove a redundant mutex unlock	2020-11-19 11:58:16 -08:00
fork.c	mm/fork: clear PASID for new mm	2021-03-30 14:31:52 +02:00
freezer.c	Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"	2021-04-07 15:00:14 +02:00
futex.c	futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI	2021-05-11 14:47:37 +02:00
gen_kheaders.sh
groups.c	LSM: Signal to SafeSetID when setting group IDs	2020-10-13 09:17:34 -07:00
hung_task.c	kernel/hung_task.c: make type annotations consistent	2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c	static_call: Fix static_call_update() sanity check	2021-03-25 09:04:18 +01:00
kallsyms.c	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
kcmp.c	exec: Transform exec_update_mutex into a rw_semaphore	2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: make some symbols static	2020-08-12 10:58:02 -07:00
kexec_core.c	kernel: kexec: remove the lock operation of system_transition_mutex	2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c	kernel: kexec_file: fix error return code of kexec_calculate_store_digests()	2021-05-19 10:13:09 +02:00
kexec_internal.h
kexec.c	LSM: Introduce kernel_post_load_data() hook	2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c	kmod: remove redundant "be an" in the comment	2020-08-12 10:58:01 -07:00
kprobes.c	kprobes: Fix to delay the kprobes jump optimization	2021-03-04 11:38:35 +01:00
ksysfs.c
kthread.c	kthread: Extract KTHREAD_IS_PER_CPU	2021-02-07 15:37:17 +01:00
latencytop.c
Makefile	kbuild: update config_data.gz only when the content of .config is changed	2021-05-11 14:47:37 +02:00
module_signature.c	module: harden ELF info handling	2021-03-25 09:04:11 +01:00
module_signing.c	module: harden ELF info handling	2021-03-25 09:04:11 +01:00
module-internal.h
module.c	module: harden ELF info handling	2021-03-25 09:04:11 +01:00
notifier.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
nsproxy.c	nsproxy: support CLONE_NEWTIME with setns()	2020-07-08 11:14:22 +02:00
padata.c	padata: fix possible padata_works_lock deadlock	2020-09-04 17:51:55 +10:00
panic.c	panic: don't dump stack twice on warn	2020-11-14 11:26:04 -08:00
params.c	params: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
pid_namespace.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
pid.c	exec: Transform exec_update_mutex into a rw_semaphore	2021-01-09 13:46:24 +01:00
profile.c
ptrace.c	ptrace: make ptrace() fail if the tracee changed its pid unexpectedly	2021-05-26 12:06:49 +02:00
range.c	kernel.h: split out min()/max() et al. helpers	2020-10-16 11:11:19 -07:00
reboot.c	reboot: fix overflow parsing reboot cpu number	2020-11-14 11:26:03 -08:00
regset.c	regset: kill ->get()	2020-07-27 14:31:12 -04:00
relay.c	kernel/relay.c: drop unneeded initialization	2020-10-16 11:11:22 -07:00
resource.c	kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources	2021-05-19 10:13:09 +02:00
rseq.c
scftorture.c	scftorture: Add cond_resched() to test loop	2020-08-24 18:38:38 -07:00
scs.c	mm: memcontrol: account kernel stack per node	2020-08-07 11:33:25 -07:00
seccomp.c	seccomp: Refactor notification handler to prepare for new semantics	2021-06-03 09:00:31 +02:00
signal.c	ptrace: fix task_join_group_stop() for the case when current is traced	2020-11-02 12:14:19 -08:00
smp.c	smp: Fix smp_call_function_single_async prototype	2021-05-14 09:50:46 +02:00
smpboot.c	kthread: Extract KTHREAD_IS_PER_CPU	2021-02-07 15:37:17 +01:00
smpboot.h
softirq.c	softirq: Add debug check to __raise_softirq_irqoff()	2020-09-16 15:18:56 +02:00
stackleak.c	stackleak: let stack_erasing_sysctl take a kernel pointer buffer	2020-09-19 13:13:39 -07:00
stacktrace.c	stacktrace: Remove reliable argument from arch_stack_walk() callback	2020-09-18 14:24:16 +01:00
static_call.c	static_call: Align static_call_is_init() patching condition	2021-04-07 15:00:06 +02:00
stop_machine.c	stop_machine, rcu: Mark functions as notrace	2020-10-26 12:12:27 +01:00
sys_ni.c	mm/madvise: introduce process_madvise() syscall: an external memory hinting API	2020-10-18 09:27:10 -07:00
sys.c	kernel/sys.c: fix prototype of prctl_get_tid_address()	2020-10-25 11:44:16 -07:00
sysctl-test.c
sysctl.c	sysctl.c: fix underflow value setting risk in vm_table	2021-03-17 17:06:25 +01:00
task_work.c	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
taskstats.c	taskstats: move specifying netlink policy back to ops	2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c	torture: Dump ftrace at shutdown only if requested	2020-06-29 12:01:45 -07:00
tracepoint.c	tracepoint: Do not fail unregistering a probe due to memory failure	2021-03-04 11:38:03 +01:00
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c	usermodehelper: reset umask to default before executing user process	2020-10-06 10:31:52 -07:00
up.c	smp: Fix smp_call_function_single_async prototype	2021-05-14 09:50:46 +02:00
user_namespace.c	capabilities: require CAP_SETFCAP to map uid 0	2021-05-07 11:04:31 +02:00
user-return-notifier.c
user.c
usermode_driver.c	bpf: Fix umd memory leak in copy_process()	2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c	watch_queue: Limit the number of watches a user can hold	2020-08-17 09:39:18 -07:00
watchdog_hld.c
watchdog.c	watchdog: fix barriers when printing backtraces from all CPUs	2021-05-19 10:13:00 +02:00
workqueue_internal.h
workqueue.c	wq: handle VM suspension in stall detection	2021-06-16 12:01:36 +02:00