linux_dsm_epyc7002/kernel/bpf
Yonghong Song cf28f3bbfc bpf: Use get_file_rcu() instead of get_file() for task_file iterator
With latest `bpftool prog` command, we observed the following kernel
panic.
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    PGD dfe894067 P4D dfe894067 PUD deb663067 PMD 0
    Oops: 0010 [#1] SMP
    CPU: 9 PID: 6023 ...
    RIP: 0010:0x0
    Code: Bad RIP value.
    RSP: 0000:ffffc900002b8f18 EFLAGS: 00010286
    RAX: ffff8883a405f400 RBX: ffff888e46a6bf00 RCX: 000000008020000c
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883a405f400
    RBP: ffff888e46a6bf50 R08: 0000000000000000 R09: ffffffff81129600
    R10: ffff8883a405f300 R11: 0000160000000000 R12: 0000000000002710
    R13: 000000e9494b690c R14: 0000000000000202 R15: 0000000000000009
    FS:  00007fd9187fe700(0000) GS:ffff888e46a40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 0000000de5d33002 CR4: 0000000000360ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <IRQ>
     rcu_core+0x1a4/0x440
     __do_softirq+0xd3/0x2c8
     irq_exit+0x9d/0xa0
     smp_apic_timer_interrupt+0x68/0x120
     apic_timer_interrupt+0xf/0x20
     </IRQ>
    RIP: 0033:0x47ce80
    Code: Bad RIP value.
    RSP: 002b:00007fd9187fba40 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
    RAX: 0000000000000002 RBX: 00007fd931789160 RCX: 000000000000010c
    RDX: 00007fd9308cdfb4 RSI: 00007fd9308cdfb4 RDI: 00007ffedd1ea0a8
    RBP: 00007fd9187fbab0 R08: 000000000000000e R09: 000000000000002a
    R10: 0000000000480210 R11: 00007fd9187fc570 R12: 00007fd9316cc400
    R13: 0000000000000118 R14: 00007fd9308cdfb4 R15: 00007fd9317a9380

After further analysis, the bug is triggered by
Commit eaaacd2391 ("bpf: Add task and task/file iterator targets")
which introduced task_file bpf iterator, which traverses all open file
descriptors for all tasks in the current namespace.
The latest `bpftool prog` calls a task_file bpf program to traverse
all files in the system in order to associate processes with progs/maps, etc.
When traversing files for a given task, rcu read_lock is taken to
access all files in a file_struct. But it used get_file() to grab
a file, which is not right. It is possible file->f_count is 0 and
get_file() will unconditionally increase it.
Later put_file() may cause all kind of issues with the above
as one of sympotoms.

The failure can be reproduced with the following steps in a few seconds:
    $ cat t.c
    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <unistd.h>

    #define N 10000
    int fd[N];
    int main() {
      int i;

      for (i = 0; i < N; i++) {
        fd[i] = open("./note.txt", 'r');
        if (fd[i] < 0) {
           fprintf(stderr, "failed\n");
           return -1;
        }
      }
      for (i = 0; i < N; i++)
        close(fd[i]);

      return 0;
    }
    $ gcc -O2 t.c
    $ cat run.sh
    #/bin/bash
    for i in {1..100}
    do
      while true; do ./a.out; done &
    done
    $ ./run.sh
    $ while true; do bpftool prog >& /dev/null; done

This patch used get_file_rcu() which only grabs a file if the
file->f_count is not zero. This is to ensure the file pointer
is always valid. The above reproducer did not fail for more
than 30 minutes.

Fixes: eaaacd2391 ("bpf: Add task and task/file iterator targets")
Suggested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/bpf/20200817174214.252601-1-yhs@fb.com
2020-08-17 14:42:58 -07:00
..
arraymap.c bpf: Implement bpf iterator for array maps 2020-07-25 20:16:33 -07:00
bpf_iter.c bpf: Change uapi for bpf iterator map elements 2020-08-06 16:39:14 -07:00
bpf_lru_list.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
bpf_lru_list.h bpf: Fix a typo "inacitve" -> "inactive" 2020-04-06 21:54:10 +02:00
bpf_lsm.c bpf: Use tracing helpers for lsm programs 2020-06-01 15:08:04 -07:00
bpf_struct_ops_types.h bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
bpf_struct_ops.c bpf: Set map_btf_{name, id} for all map types 2020-06-22 22:22:58 +02:00
btf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
cgroup.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
core.c bpf: Delete repeated words in comments 2020-08-07 18:57:24 +02:00
cpumap.c bpf: cpumap: Fix possible rcpu kthread hung 2020-07-21 09:15:28 -07:00
devmap.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-07-04 17:48:34 -07:00
disasm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
disasm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
dispatcher.c bpf: Remove bpf_image tree 2020-03-13 12:49:52 -07:00
hashtab.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-08-03 18:27:40 -07:00
helpers.c bpf: Implement BPF ring buffer and verifier support for it 2020-06-01 14:38:22 -07:00
inode.c bpf: Create file bpf iterator 2020-05-09 17:05:26 -07:00
local_storage.c bpf/local_storage: Fix build without CONFIG_CGROUP 2020-07-25 20:16:36 -07:00
lpm_trie.c bpf: Remove redundant synchronize_rcu. 2020-07-01 08:07:13 -07:00
Makefile bpf: Add bpf_prog iterator 2020-07-25 20:16:32 -07:00
map_in_map.c bpf: Implement CAP_BPF 2020-05-15 17:29:41 +02:00
map_in_map.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
map_iter.c bpf: Change uapi for bpf iterator map elements 2020-08-06 16:39:14 -07:00
net_namespace.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
offload.c bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-17 16:53:49 +01:00
percpu_freelist.c bpf: Dont iterate over possible CPUs with interrupts disabled 2020-02-24 16:18:20 -08:00
percpu_freelist.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Remove redundant synchronize_rcu. 2020-07-01 08:07:13 -07:00
reuseport_array.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
ringbuf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-07-11 00:46:00 -07:00
stackmap.c bpf: Iterate through all PT_NOTE sections when looking for build id 2020-08-12 18:14:49 -07:00
syscall.c bpf: Change uapi for bpf iterator map elements 2020-08-06 16:39:14 -07:00
sysfs_btf.c bpf: Support llvm-objcopy for vmlinux BTF 2020-03-19 12:32:38 +01:00
task_iter.c bpf: Use get_file_rcu() instead of get_file() for task_file iterator 2020-08-17 14:42:58 -07:00
tnum.c bpf: Verifier, do explicit ALU32 bounds tracking 2020-03-30 14:59:53 -07:00
trampoline.c bpf: lsm: Implement attach, detach and execution 2020-03-30 01:34:00 +02:00
verifier.c bpf: Delete repeated words in comments 2020-08-07 18:57:24 +02:00