linux_dsm_epyc7002/include/net
Lawrence Brakmo 40304b2a15 bpf: BPF support for sock_ops
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
        struct sock *sk;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version
 * Some fields are in network byte order reflecting the sock struct
 * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
 * convert them to host byte order.
 */
struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;     /* In network byte order */
        __u32 local_ip4;      /* In network byte order */
        __u32 remote_ip6[4];  /* In network byte order */
        __u32 local_ip6[4];   /* In network byte order */
        __u32 remote_port;    /* In network byte order */
        __u32 local_port;     /* In host byte horder */
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
..
9p 9p: constify ->d_name handling 2017-01-12 04:01:17 -05:00
bluetooth Bluetooth: Set LE Default PHY preferences 2017-05-18 13:52:49 +02:00
caif
irda scripts/spelling.txt: add "overide" pattern and fix typo instances 2017-03-09 17:01:09 -08:00
iucv s390/iucv: do not use arrays as argument 2015-09-21 16:03:04 -07:00
netfilter net: convert nf_bridge_info.use from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
netns tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
nfc NFC: Add nfc_dbg() macro 2017-04-05 10:15:20 +02:00
phonet sock: struct proto hash function may error 2016-02-11 03:54:14 -05:00
sctp sctp: remove the typedef sctp_init_chunk_t 2017-07-01 09:08:42 -07:00
tc_act net: sched: introduce helper to identify gact trap action 2017-06-06 12:45:23 -04:00
6lowpan.h 6lowpan: Fix IID format for Bluetooth 2017-04-12 22:02:36 +02:00
act_api.h net: sched: add termination action to allow goto chain 2017-05-17 15:22:13 -04:00
addrconf.h Ipvlan should return an error when an address is already in use. 2017-06-09 12:26:07 -04:00
af_ieee802154.h ieee802154: af_ieee802154: fix typo in comment. 2015-09-17 13:20:05 +02:00
af_rxrpc.h rxrpc: Provide a cmsg to specify the amount of Tx data for a call 2017-06-07 17:15:46 +01:00
af_unix.h net: convert unix_address.refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
af_vsock.h VSOCK: Add vsockmon tap functions 2017-04-24 12:35:56 -04:00
ah.h
arp.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
atmclip.h
ax25.h ax25: Stop using sock->sk_protinfo. 2015-06-28 16:55:44 -07:00
ax88796.h
bond_3ad.h bonding: 3ad: apply ad_actor settings changes immediately 2016-02-09 04:45:49 -05:00
bond_alb.h
bond_options.h bonding: Prevent duplicate userspace notification 2017-05-27 18:51:41 -04:00
bonding.h bonding: fix wq initialization for links created via netlink 2017-04-21 15:28:37 -04:00
busy_poll.h net: Commonize busy polling code to focus on napi_id instead of socket 2017-03-24 20:49:31 -07:00
calipso.h calipso: Add a label cache. 2016-06-27 15:06:17 -04:00
cfg80211-wext.h
cfg80211.h nl80211: add authorized flag to ROAM event 2017-06-13 11:04:37 +02:00
cfg802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
checksum.h csum: eliminate sparse warning in remcsum_unadjust() 2017-01-20 12:12:13 -05:00
cipso_ipv4.h netlabel: out of bound access in cipso_v4_validate() 2017-02-04 19:44:22 -05:00
cls_cgroup.h cls_cgroup: get sk_classid only from full sockets 2016-04-19 20:09:25 -04:00
codel_impl.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
codel_qdisc.h net_sched: fq_codel: cache skb->truesize into skb->cb 2016-06-25 12:19:35 -04:00
codel.h codel: split into multiple files 2016-04-25 16:44:27 -04:00
compat.h packet: compat support for sock_fprog 2016-06-09 23:41:03 -07:00
datalink.h
dcbevent.h
dcbnl.h
devlink.h net/devlink: Add E-Switch encapsulation control 2017-04-22 20:26:37 +03:00
dn_dev.h
dn_fib.h
dn_neigh.h netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
dn_nsp.h
dn_route.h
dn.h
dsa.h net: dsa: Associate slave network device with CPU port 2017-06-13 16:35:03 -04:00
dsfield.h
dst_cache.h net: add dst_cache support 2016-02-16 20:21:48 -05:00
dst_metadata.h net: store port/representator id in metadata_dst 2017-06-25 11:42:01 -04:00
dst_ops.h net: add confirm_neigh method to dst_ops 2017-02-07 13:07:46 -05:00
dst.h net: add debug atomic_inc_not_zero() in dst_hold() 2017-06-17 22:54:01 -04:00
esp.h esp6: Reorganize esp_output 2017-04-14 10:06:42 +02:00
ethoc.h net/ethoc: support big-endian register layout 2015-09-23 15:33:15 -07:00
fib_rules.h net: convert fib_rule.refcnt from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
firewire.h
flow_dissector.h net/flow_dissector: add support for dissection of misc ip header fields 2017-06-04 18:12:23 -04:00
flow.h flowcache: make flow_key_size() return "unsigned int" 2017-04-03 19:04:48 -07:00
flowcache.h flowcache: more "unsigned int" 2017-04-03 19:04:48 -07:00
fou.h fou: Add encap ops for IPv6 tunnels 2016-05-20 18:03:16 -04:00
fq_impl.h fq.h: Port memory limit mechanism from fq_codel 2016-09-30 13:29:21 +02:00
fq.h fq.h: Port memory limit mechanism from fq_codel 2016-09-30 13:29:21 +02:00
garp.h
gen_stats.h net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
genetlink.h genetlink: remove ops_list from genetlink header. 2017-06-05 10:54:55 -04:00
geneve.h net: Remove deprecated tunnel specific UDP offload functions 2016-06-17 20:23:32 -07:00
gre.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-08-18 01:17:32 -04:00
gro_cells.h gro_cells: move to net/core/gro_cells.c 2017-02-08 14:38:18 -05:00
gtp.h gtp: #define #define _GTP_H_ and not #define _GTP_H 2016-07-25 17:55:43 -07:00
gue.h
hwbm.h net: add a hardware buffer management helper API 2016-03-14 12:19:46 -04:00
icmp.h net: snmp: kill STATS_BH macros 2016-04-27 22:48:25 -04:00
ieee80211_radiotap.h wireless: radiotap: rewrite the radiotap header file 2017-01-25 16:00:33 +01:00
ieee802154_netdev.h mac802154: constify ieee802154_llsec_ops structure 2016-01-04 20:40:41 +01:00
if_inet6.h net/ipv6: allow sysctl to change link-local address generation mode 2017-01-27 10:25:34 -05:00
ife.h net: Introduce ife encapsulation module 2017-02-03 15:16:45 -05:00
ila.h ila: Add generic ILA translation facility 2015-12-15 23:25:20 -05:00
inet6_connection_sock.h inet: drop ->bind_conflict 2017-01-18 13:04:28 -05:00
inet6_hashtables.h tcp/dccp: do not touch listener sk_refcnt under synflood 2016-04-04 22:11:20 -04:00
inet_common.h net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
inet_connection_sock.h tcp: ULP infrastructure 2017-06-15 12:12:40 -04:00
inet_ecn.h ipv6: suppress sparse warnings in IP6_ECN_set_ce() 2016-08-13 15:08:00 -07:00
inet_frag.h net: convert inet_frag_queue.refcnt from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
inet_hashtables.h net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
inet_sock.h net/tcp-fastopen: Add new API support 2017-01-25 14:04:38 -05:00
inet_timewait_sock.h ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob 2016-12-29 11:38:31 -05:00
inetpeer.h net: convert inet_peer.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
ip6_checksum.h ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short 2016-03-13 23:55:13 -04:00
ip6_fib.h net: remove DST_NOCACHE flag 2017-06-17 22:54:01 -04:00
ip6_route.h ipv6: get rid of icmp6 dst garbage collector 2017-06-17 22:54:00 -04:00
ip6_tunnel.h ip6_tunnel: Allow policy-based routing through tunnels 2017-04-21 13:21:30 -04:00
ip_fib.h net: ipv4: Add extack message for invalid prefix or length 2017-05-30 11:55:31 -04:00
ip_tunnels.h ip_tunnel: Allow policy-based routing through tunnels 2017-04-21 13:21:31 -04:00
ip_vs.h ipvs: remove unused function ip_vs_set_state_timeout 2017-04-28 12:00:10 +02:00
ip.h net: ipv4: Refine the ipv4_default_advmss 2017-04-13 13:19:48 -04:00
ipcomp.h
ipconfig.h
ipv6.h net: ping: do not abuse udp_poll() 2017-06-04 22:56:55 -04:00
ipx.h
iw_handler.h wext: uninline stream addition functions 2017-01-13 09:38:42 +01:00
kcm.h kcm: Use stream parser 2016-08-17 19:36:23 -04:00
l3mdev.h net: ipv4: Do not drop to make_route if oif is l3mdev 2016-10-13 12:05:26 -04:00
lapb.h
lib80211.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
llc.h
lwtunnel.h net: add extack arg to lwtunnel build state 2017-05-30 11:55:32 -04:00
mac80211.h mac80211: manage RX BA session offload without SKB queue 2017-06-08 14:16:29 +02:00
mac802154.h ieee802154: cleanup WARN_ON for fc fetch 2016-07-08 13:23:12 +02:00
mip6.h
mld.h
mpls_iptunnel.h net: mpls: Increase max number of labels for lwt encap 2017-04-01 20:21:44 -07:00
mpls.h openvswitch: use mpls_hdr 2016-10-03 02:00:22 -04:00
mrp.h
ncsi.h net/ncsi: Introduce ncsi_stop_dev() 2016-10-04 02:11:51 -04:00
ndisc.h net: convert neighbour.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
neighbour.h net: convert neigh_params.refcnt from atomic_t to refcount_t 2017-07-01 07:39:07 -07:00
net_namespace.h net: convert net.passive from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
net_ratelimit.h
netevent.h neigh: Send a notification when DELAY_PROBE_TIME changes 2016-07-05 09:06:29 -07:00
netlabel.h net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t 2017-07-01 07:39:09 -07:00
netlink.h netlink: pass extended ACK struct to parsing functions 2017-04-13 13:58:22 -04:00
netprio_cgroup.h net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct 2015-12-08 22:02:33 -05:00
netrom.h
nexthop.h
nl802154.h ieee802154: add netns support 2016-07-08 12:20:57 +02:00
p8022.h
ping.h net: ping: make ping_v6_sendmsg static 2016-03-23 22:09:58 -04:00
pkt_cls.h sched: add helper for updating statistics on all actions 2017-05-31 17:58:13 -04:00
pkt_sched.h net: sched: move tc_classify function to cls_api.c 2017-05-17 15:22:13 -04:00
pptp.h pptp: Refactor the struct and macros of PPTP codes 2016-08-15 10:55:53 -07:00
protocol.h net: Add sysctl to toggle early demux for tcp and udp 2017-03-24 13:17:07 -07:00
psample.h net: Introduce psample, a new genetlink channel for packet sampling 2017-01-24 13:44:28 -05:00
psnap.h
raw.h net: ip, diag -- Add diag interface for raw sockets 2016-10-23 19:35:24 -04:00
rawv6.h net: ip, diag -- Add diag interface for raw sockets 2016-10-23 19:35:24 -04:00
red.h ktime: Get rid of the union 2016-12-25 17:21:22 +01:00
regulatory.h
request_sock.h net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
rose.h
route.h ipv4: call dst_hold_safe() properly 2017-06-17 22:54:00 -04:00
rtnetlink.h net: add netlink_ext_ack argument to rtnl_link_ops.slave_validate 2017-06-26 23:13:22 -04:00
sch_generic.h net: sched: add termination action to allow goto chain 2017-05-17 15:22:13 -04:00
scm.h sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
secure_seq.h tcp: Namespaceify sysctl_tcp_timestamps 2017-06-08 10:53:29 -04:00
seg6_hmac.h ipv6: sr: add core files for SR HMAC support 2016-11-09 20:40:06 -05:00
seg6.h ipv6: sr: add core files for SR HMAC support 2016-11-09 20:40:06 -05:00
slhc_vj.h
smc.h smc: netlink interface for SMC sockets 2017-01-09 16:07:41 -05:00
snmp.h net: snmp: fix 64bit stats on 32bit arches 2016-04-28 11:49:45 -04:00
sock_reuseport.h soreuseport: fix NULL ptr dereference SO_REUSEPORT after bind 2016-01-19 14:44:23 -05:00
sock.h net: convert sock.sk_refcnt from atomic_t to refcount_t 2017-07-01 07:39:08 -07:00
Space.h
stp.h
strparser.h kcm: Remove TCP specific references from kcm and strparser 2016-08-28 23:32:41 -04:00
switchdev.h net: switchdev: add SET_SWITCHDEV_OPS helper 2017-07-01 08:51:32 -07:00
tcp_states.h
tcp.h bpf: BPF support for sock_ops 2017-07-01 16:15:13 -07:00
timewait_sock.h inet: remove BUG_ON() in twsk_destructor() 2015-07-09 15:12:20 -07:00
tls.h tls: kernel TLS support 2017-06-15 12:12:40 -04:00
transp_v6.h ipv6: add new struct ipcm6_cookie 2016-05-03 16:08:14 -04:00
tso.h net: tso: add support for IPv6 2015-10-26 22:24:22 -07:00
udp_tunnel.h vxlan: Add new UDP encapsulation offload type for VXLAN-GPE 2016-06-17 20:23:32 -07:00
udp.h udp: move scratch area helpers into the include file 2017-06-27 15:43:56 -04:00
udplite.h udp: use a separate rx queue for packet reception 2017-05-16 15:41:29 -04:00
vsock_addr.h
vxlan.h vxlan: check valid combinations of address scopes 2017-06-20 13:37:02 -04:00
wext.h dev_ioctl: copy only the smaller struct iwreq for wext 2017-06-14 13:52:44 +02:00
wimax.h
x25.h net: x25: fix one potential use-after-free issue 2017-05-18 10:05:40 -04:00
x25device.h
xfrm.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-06-30 12:43:08 -04:00