linux_dsm_epyc7002/net
Daniel Borkmann e0cea7ce98 bpf: implement ld_abs/ld_ind in native bpf
The main part of this work is to finally allow removal of LD_ABS
and LD_IND from the BPF core by reimplementing them through native
eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
keeping them around in native eBPF caused way more trouble than
actually worth it. To just list some of the security issues in
the past:

  * fdfaf64e75 ("x86: bpf_jit: support negative offsets")
  * 35607b02db ("sparc: bpf_jit: fix loads from negative offsets")
  * e0ee9c1215 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
  * 07aee94394 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
  * 6d59b7dbf7 ("bpf, s390x: do not reload skb pointers in non-skb context")
  * 87338c8e2c ("bpf, ppc64: do not reload skb pointers in non-skb context")

For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
these days due to their limitations and more efficient/flexible
alternatives that have been developed over time such as direct
packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
register, the load happens in host endianness and its exception
handling can yield unexpected behavior. The latter is explained
in depth in f6b1b3bf0d ("bpf: fix subprog verifier bypass by
div/mod by 0 exception") with similar cases of exceptions we had.
In native eBPF more recent program types will disable LD_ABS/LD_IND
altogether through may_access_skb() in verifier, and given the
limitations in terms of exception handling, it's also disabled
in programs that use BPF to BPF calls.

In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
to access packet data. It is not used in seccomp-BPF but programs
that use it for socket filtering or reuseport for demuxing with
cBPF. This is mostly relevant for applications that have not yet
migrated to native eBPF.

The main complexity and source of bugs in LD_ABS/LD_IND is coming
from their implementation in the various JITs. Most of them keep
the model around from cBPF times by implementing a fastpath written
in asm. They use typically two from the BPF program hidden CPU
registers for caching the skb's headlen (skb->len - skb->data_len)
and skb->data. Throughout the JIT phase this requires to keep track
whether LD_ABS/LD_IND are used and if so, the two registers need
to be recached each time a BPF helper would change the underlying
packet data in native eBPF case. At least in eBPF case, available
CPU registers are rare and the additional exit path out of the
asm written JIT helper makes it also inflexible since not all
parts of the JITer are in control from plain C. A LD_ABS/LD_IND
implementation in eBPF therefore allows to significantly reduce
the complexity in JITs with comparable performance results for
them, e.g.:

test_bpf             tcpdump port 22             tcpdump complex
x64      - before    15 21 10                    14 19  18
         - after      7 10 10                     7 10  15
arm64    - before    40 91 92                    40 91 151
         - after     51 64 73                    51 62 113

For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
and cache the skb's headlen and data in the cBPF prologue. The
BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
used as a local temporary variable. This allows to shrink the
image on x86_64 also for seccomp programs slightly since mapping
to %rsi is not an ereg. In callee-saved R8 and R9 we now track
skb data and headlen, respectively. For normal prologue emission
in the JITs this does not add any extra instructions since R8, R9
are pushed to stack in any case from eBPF side. cBPF uses the
convert_bpf_ld_abs() emitter which probes the fast path inline
already and falls back to bpf_skb_load_helper_{8,16,32}() helper
relying on the cached skb data and headlen as well. R8 and R9
never need to be reloaded due to bpf_helper_changes_pkt_data()
since all skb access in cBPF is read-only. Then, for the case
of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
does neither cache skb data and headlen nor has an inlined fast
path. The reason for the latter is that native eBPF does not have
any extra registers available anyway, but even if there were, it
avoids any reload of skb data and headlen in the first place.
Additionally, for the negative offsets, we provide an alternative
bpf_skb_load_bytes_relative() helper in eBPF which operates
similarly as bpf_skb_load_bytes() and allows for more flexibility.
Tested myself on x64, arm64, s390x, from Sandipan on ppc64.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 16:49:19 -07:00
..
6lowpan License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
9p net/9p/client.c: fix potential refcnt problem of trans module 2018-04-05 21:36:23 -07:00
802 treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
8021q vlan: also check phy_driver ts_info for vlan's real device 2018-04-01 20:53:50 -04:00
appletalk net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
atm net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
ax25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2018-04-08 17:19:15 -04:00
bpf bpf: making bpf_prog_test run aware of possible data_end ptr change 2018-04-18 23:34:16 +02:00
bridge netfilter: ebtables: don't attempt to allocate 0-sized compat array 2018-04-09 17:05:48 +02:00
caif net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN" 2018-04-19 13:37:10 -04:00
can net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ceph The big ticket items are: 2018-04-10 12:25:30 -07:00
core bpf: implement ld_abs/ld_ind in native bpf 2018-05-03 16:49:19 -07:00
dcb rtnetlink: make rtnl_register accept a flags parameter 2017-08-09 16:57:38 -07:00
dccp dccp: initialize ireq->ir_mark 2018-04-07 22:32:31 -04:00
decnet net: fib_rules: add extack support 2018-04-23 10:21:24 -04:00
dns_resolver KEYS: DNS: limit the length of option strings 2018-04-17 15:17:41 -04:00
dsa net: dsa: Discard frames from unused ports 2018-04-08 10:34:49 -04:00
ethernet networking: make skb_push & __skb_push return void pointers 2017-06-16 11:48:40 -04:00
hsr net: hsr: Convert timers to use timer_setup() 2017-10-25 13:00:27 +09:00
ieee802154 inet: frags: fix ip6frag_low_thresh boundary 2018-04-04 12:04:59 -04:00
ife net: sched: ife: check on metadata length 2018-04-22 21:12:00 -04:00
ipv4 udp: add gso segment cmsg 2018-04-26 15:08:51 -04:00
ipv6 udp: add gso segment cmsg 2018-04-26 15:08:51 -04:00
iucv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-03-23 11:31:58 -04:00
kcm net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
key net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
l2tp l2tp: check sockaddr length in pppol2tp_connect() 2018-04-23 21:10:43 -04:00
l3mdev
lapb treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
llc llc: fix NULL pointer deref for SOCK_ZAPPED 2018-04-22 14:56:22 -04:00
mac80211 We have a fair number of patches, but many of them are from the 2018-03-29 16:23:26 -04:00
mac802154 net/mac802154: disambiguate mac80215 vs mac802154 trace events 2018-03-28 22:55:18 +02:00
mpls net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
ncsi net/ncsi: Refactor MAC, VLAN filters 2018-04-17 13:50:58 -04:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-24 23:59:11 -04:00
netlabel netlabel: If PF_INET6, check sk_buff ip header version 2018-02-14 14:01:41 -05:00
netlink netlink: fix uninit-value in netlink_sendmsg 2018-04-07 22:32:31 -04:00
netrom net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-19 18:46:11 -05:00
nsh openvswitch: enable NSH support 2017-11-08 16:12:33 +09:00
openvswitch ovs: Remove rtnl_lock() from ovs_exit_net() 2018-03-29 13:47:54 -04:00
packet dev: packet: make packet_direct_xmit a common function 2018-05-03 15:55:24 -07:00
phonet net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00
psample MAINTAINERS: Update Yotam's E-mail 2017-11-01 12:19:03 +09:00
qrtr net: qrtr: add MODULE_ALIAS_NETPROTO macro 2018-04-17 09:58:00 -04:00
rds rds: ib: Fix missing call to rds_ib_dev_put in rds_ib_setup_qp 2018-04-25 14:34:08 -04:00
rfkill vfs: do bulk POLL* -> EPOLL* replacement 2018-02-11 14:34:03 -08:00
rose net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
rxrpc rxrpc: Fix undefined packet handling 2018-04-04 11:04:08 -04:00
sched net: sched: ife: handle malformed tlv length 2018-04-22 21:12:00 -04:00
sctp sctp: remove the unused sctp_assoc_is_match function 2018-04-25 14:09:23 -04:00
smc net/smc: keep clcsock reference in smc_tcp_listen_work() 2018-04-25 14:13:41 -04:00
strparser strparser: Do not call mod_delayed_work with a timeout of LONG_MAX 2018-04-22 21:09:16 -04:00
sunrpc rpc_pipefs: fix double-dput() 2018-04-15 23:49:27 -04:00
switchdev net: bridge: Add/del switchdev object on host join/leave 2017-11-10 13:41:40 +09:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-21 16:32:48 -04:00
tls net/tls: remove redundant second null check on sgout 2018-04-24 16:02:10 -04:00
unix af_unix: remove redundant lockdep class 2018-04-04 11:13:40 -04:00
vmw_vsock VSOCK: make af_vsock.ko removable again 2018-04-17 09:44:30 -04:00
wimax License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-03-31 23:33:04 -04:00
x25 net: Use octal not symbolic permissions 2018-03-26 12:07:48 -04:00
xdp xsk: statistics support 2018-05-03 15:55:25 -07:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-04-01 19:49:34 -04:00
compat.c net: socket: add __compat_sys_...msg() helpers; remove in-kernel calls to compat syscalls 2018-04-02 20:15:20 +02:00
Kconfig net: initial AF_XDP skeleton 2018-05-03 15:55:23 -07:00
Makefile xsk: add user memory registration support sockopt 2018-05-03 15:55:23 -07:00
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2018-04-05 11:56:35 -07:00
sysctl_net.c net: Drop pernet_operations::async 2018-03-27 13:18:09 -04:00