linux_dsm_epyc7002/net
Daniel Borkmann 74451e66d5 bpf: make jited programs visible in traces
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.

This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.

Before:

  7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)

After:

  7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
  [...]
  7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
  7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
         f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-17 13:40:05 -05:00
..
6lowpan 6lowpan: use rb_entry() 2017-01-22 16:46:13 -05:00
9p IB/core: add support to create a unsafe global rkey to ib_create_pd 2016-09-23 13:47:44 -04:00
802 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
8021q net: remove ndo_neigh_{construct, destroy} from stacked devices 2017-02-06 11:25:57 -05:00
appletalk
atm net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS 2016-12-28 15:11:32 -05:00
ax25 ax25: Fix segfault after sock connection timeout 2017-01-16 14:39:58 -05:00
batman-adv Here are two fixes for batman-adv for net-next: 2017-01-29 19:21:26 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2016-12-16 10:24:44 -08:00
bridge bridge: vlan_tunnel: explicitly reset metadata attrs to NULL on failure 2017-02-17 13:33:41 -05:00
caif net: caif: Remove unused stats member from struct chnl_net 2017-01-19 11:45:21 -05:00
can can: bcm: fix hrtimer/tasklet termination in bcm op removal 2017-01-30 11:05:04 +01:00
ceph libceph: make sure ceph_aes_crypt() IV is aligned 2017-01-18 17:58:45 +01:00
core bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
dcb net: dcb: set error code on failures 2016-12-03 23:54:25 -05:00
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-28 10:33:06 -05:00
decnet Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
dns_resolver
dsa Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
ethernet Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
hsr net-next: treewide use is_vlan_dev() helper function. 2017-02-06 16:33:29 -05:00
ieee802154 Makefile: drop -D__CHECK_ENDIAN__ from cflags 2016-12-16 00:13:43 +02:00
ife net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ipv4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
ipv6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
ipx ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
irda Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
iucv net/af_iucv: don't use paged skbs for TX on HiperSockets 2017-01-10 21:08:29 -05:00
kcm kcm: fix a null pointer dereference in kcm_sendmsg() 2017-02-14 13:06:37 -05:00
key netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
l2tp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-11 02:31:11 -05:00
l3mdev
lapb Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
llc net/llc: avoid BUG_ON() in skb_orphan() 2017-02-12 22:14:49 -05:00
mac80211 Some more updates: 2017-02-10 14:31:51 -05:00
mac802154 ktime: Cleanup ktime_set() usage 2016-12-25 17:21:22 +01:00
mpls lwtunnel: remove device arg to lwtunnel_build_state 2017-01-30 15:14:22 -05:00
ncsi net/ncsi: Improve HNCDSC AEN handler 2016-10-20 11:23:08 -04:00
netfilter netfilter: nf_tables: honor NFT_SET_OBJECT in set backend selection 2017-02-12 14:45:14 +01:00
netlabel netlabel: add CALIPSO to the list of built-in protocols 2017-01-06 22:20:45 -05:00
netlink net: adjust skb->truesize in pskb_expand_head() 2017-01-27 12:03:29 -05:00
netrom
nfc genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
openvswitch openvswitch: Set internal device max mtu to ETH_MAX_MTU. 2017-02-15 12:40:27 -05:00
packet Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
phonet netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
psample net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
qrtr net: qrtr: Mark 'buf' as little endian 2017-01-10 20:45:04 -05:00
rds RDS: validate the requested traces user input against max supported 2017-01-06 22:14:26 -05:00
rfkill rfkill: remove rfkill-regulator 2017-01-24 11:07:35 +01:00
rose Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
rxrpc rxrpc: Allow listen(sock, 0) to be used to disable listening 2017-01-09 11:10:02 +00:00
sched net/sched: cls_bpf: Reflect HW offload status 2017-02-17 12:08:06 -05:00
sctp sctp: implement sender-side procedures for Add Incoming/Outgoing Streams Request Parameter 2017-02-09 16:57:38 -05:00
smc smc: some potential use after free bugs 2017-01-30 16:37:55 -05:00
strparser strparser: Propagate correct error code in strp_recv() 2016-10-12 01:51:49 -04:00
sunrpc net: sunrpc: fix build errors when linux/phy*.h is removed from net/dsa.h 2017-02-10 13:51:01 -05:00
switchdev Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-10-30 12:42:58 -04:00
tipc tipc: Fix tipc_sk_reinit race conditions 2017-02-17 12:28:35 -05:00
unix unix: add ioctl to open a unix socket file with O_PATH 2017-02-02 21:58:02 -05:00
vmw_vsock Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-17 20:17:04 -08:00
wimax genetlink: mark families as __ro_after_init 2016-10-27 16:16:09 -04:00
wireless Some more updates: 2017-02-10 14:31:51 -05:00
x25 Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2017-02-16 21:25:49 -05:00
compat.c net: Assert at build time the assumptions we make about the CMSG header. 2017-01-04 13:24:19 -05:00
Kconfig bpf: make jited programs visible in traces 2017-02-17 13:40:05 -05:00
Makefile net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-11 14:43:39 -05:00
sysctl_net.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2016-10-06 09:52:23 -07:00