linux_dsm_epyc7002/kernel
Andrey Ignatov 7dd68b3279 bpf: Support replacing cgroup-bpf program in MULTI mode
The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
  new version(s) again in the right order. This introduces an
  interruption in the work a program is doing and may not be acceptable
  (e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
  detach the old version. This introduces the time interval when two
  versions of same program are working, what may not be acceptable if a
  program is not idempotent. It also imposes additional burden on
  program developers to make sure that two versions of their program can
  co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
  descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
  error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
  program is not attached to specified cgroup, BPF_PROG_ATTACH will
  return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
2019-12-19 21:22:25 -08:00
..
bpf bpf: Support replacing cgroup-bpf program in MULTI mode 2019-12-19 21:22:25 -08:00
cgroup bpf: Support replacing cgroup-bpf program in MULTI mode 2019-12-19 21:22:25 -08:00
configs
debug kdb: Tweak escape handling for vi users 2019-10-28 12:08:29 +00:00
dma lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr 2019-12-04 19:44:13 -08:00
events mm/mmap.c: use IS_ERR_VALUE to check return value of get_unmapped_area 2019-12-01 06:29:19 -08:00
gcov
irq irqchip updates for Linux 5.5 2019-11-20 14:16:34 +01:00
livepatch New tracing features: 2019-11-27 11:42:01 -08:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 16:02:40 -08:00
power Additional power management updates for 5.5-rc1 2019-12-04 10:48:09 -08:00
printk Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 09:29:50 -08:00
rcu Merge branches 'doc.2019.10.29a', 'fixes.2019.10.30a', 'nohz.2019.10.28a', 'replace.2019.10.30a', 'torture.2019.10.05a' and 'lkmm.2019.10.05a' into HEAD 2019-10-30 08:47:13 -07:00
sched Merge branch 'thermal/next' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux 2019-12-05 11:21:24 -08:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 12:20:25 -08:00
trace Two fixes and one patch that was missed: 2019-12-04 19:13:52 -08:00
.gitignore
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c audit_get_nd(): don't unlock parent too early 2019-11-10 11:56:55 -05:00
audit.c audit: remove redundant condition check in kauditd_thread() 2019-10-25 11:48:14 -04:00
audit.h
auditfilter.c
auditsc.c Revert "bpf: Emit audit messages upon successful prog load and unload" 2019-11-23 09:56:02 -08:00
backtracetest.c
bounds.c
capability.c
compat.c y2038: itimer: compat handling to itimer.c 2019-11-15 14:38:30 +01:00
configs.c
context_tracking.c context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled() 2019-10-29 10:01:12 +01:00
cpu_pm.c
cpu.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 16:02:40 -08:00
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c Pipework for general notification queue 2019-11-30 14:12:13 -08:00
extable.c bpf: Add support for BTF pointers to x86 JIT 2019-10-17 16:44:36 +02:00
fail_function.c
fork.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 12:20:25 -08:00
freezer.c Revert "libata, freezer: avoid block device removal while system is frozen" 2019-10-06 09:11:37 -06:00
futex.c futex: Prevent exit livelock 2019-11-20 09:40:38 +01:00
gen_kheaders.sh kheaders: explain why include/config/autoconf.h is excluded from md5sum 2019-11-11 20:10:01 +09:00
groups.c
hung_task.c
iomem.c
irq_work.c irq_work: Fix IRQ_WORK_BUSY bit clearing 2019-11-15 10:48:37 +01:00
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt sched/Kconfig: Fix spelling mistake in user-visible help text 2019-11-12 11:35:32 +01:00
kcov.c kcov: remote coverage support 2019-12-04 19:44:14 -08:00
kexec_core.c
kexec_elf.c
kexec_file.c kexec: Fix pointer-to-int-cast warnings 2019-11-01 21:42:58 +01:00
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c
ksysfs.c
kthread.c kthread: make __kthread_queue_delayed_work static 2019-10-16 09:20:58 -07:00
latencytop.c
Makefile Kbuild updates for v5.5 2019-12-02 17:35:04 -08:00
module_signature.c
module_signing.c
module-internal.h
module.c Modules updates for v5.5 2019-12-05 12:27:16 -08:00
notifier.c kernel/notifier.c: remove blocking_notifier_chain_cond_register() 2019-12-04 19:44:12 -08:00
nsproxy.c
padata.c
panic.c locking/refcount: Remove unused 'refcount_error_report()' function 2019-11-25 09:15:42 +01:00
params.c
pid_namespace.c fork: extend clone3() to support setting a PID 2019-11-15 23:49:22 +01:00
pid.c fork: extend clone3() to support setting a PID 2019-11-15 23:49:22 +01:00
profile.c kernel/profile.c: use cpumask_available to check for NULL cpumask 2019-12-04 19:44:12 -08:00
ptrace.c
range.c
reboot.c
relay.c
resource.c
rseq.c
seccomp.c seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE 2019-10-10 14:45:51 -07:00
signal.c cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop() 2019-10-11 08:39:57 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c stacktrace: Get rid of unneeded '!!' pattern 2019-11-11 10:30:59 +01:00
stop_machine.c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2019-10-31 09:33:19 +01:00
sys_ni.c y2038: allow disabling time32 system calls 2019-11-15 14:38:30 +01:00
sys.c kernel/sys.c: avoid copying possible padding bytes in copy_to_user 2019-12-04 19:44:12 -08:00
sysctl_binary.c sysctl: Remove the sysctl system call 2019-11-26 13:03:56 -06:00
sysctl-test.c
sysctl.c kernel: sysctl: make drop_caches write-only 2019-12-01 12:59:07 -08:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 15:42:43 -08:00