linux_dsm_epyc7002/net
Eric Dumazet 93f154b594 net: release dst entry in dev_hard_start_xmit()
One point of contention in high network loads is the dst_release() performed
when a transmited skb is freed. This is because NIC tx completion calls
dev_kree_skb() long after original call to dev_queue_xmit(skb).

CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
quite visible if one CPU is 100% handling softirqs for a network device,
since dst_clone() is done by other cpus, involving cache line ping pongs.

It seems right place to release dst is in dev_hard_start_xmit(), for most
devices but ones that are virtual, and some exceptions.

David Miller suggested to define a new device flag, set in alloc_netdev_mq()
(so that most devices set it at init time), and carefuly unset in devices
which dont want a NULL skb->dst in their ndo_start_xmit().

List of devices that must clear this flag is :

- loopback device, because it calls netif_rx() and quoting Patrick :
    "ip_route_input() doesn't accept loopback addresses, so loopback packets
     already need to have a dst_entry attached."
- appletalk/ipddp.c : needs skb->dst in its xmit function

- And all devices that call again dev_queue_xmit() from their xmit function
(as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-05-18 22:19:19 -07:00
..
9p net/9p: handle correctly interrupted 9P requests 2009-04-05 16:54:53 -05:00
802 tr: fix leakage of device in net/802/tr.c 2009-04-11 01:43:17 -07:00
8021q net: release dst entry in dev_hard_start_xmit() 2009-05-18 22:19:19 -07:00
appletalk proc 2/2: remove struct proc_dir_entry::owner 2009-03-31 01:14:44 +04:00
atm Subject: [PATCH] br2684: restore net_dev initialization 2009-05-02 13:49:36 -07:00
ax25 ax25: proc uid file misses header 2009-04-20 02:14:59 -07:00
bluetooth Bluetooth: Don't trigger disconnect timeout for security mode 3 pairing 2009-05-09 18:09:52 -07:00
bridge net: Fix bridgeing sysfs handling of rtnl_lock 2009-05-18 22:15:59 -07:00
can can: Network Drop Monitor: Make use of consume_skb() in af_can.c 2009-04-17 01:38:46 -07:00
core net: release dst entry in dev_hard_start_xmit() 2009-05-18 22:19:19 -07:00
dcb DCB: fix kfree(skb) 2009-01-04 17:29:21 -08:00
dccp dccp: Do not let initial option overhead shrink the MPS 2009-03-02 03:07:23 -08:00
decnet ipv4: remove an unused parameter from configure method of fib_rules_ops. 2009-05-17 11:59:45 -07:00
dsa dsa: add switch chip cascading support 2009-03-21 19:06:54 -07:00
econet net: convert usage of packet_type to read_mostly 2009-03-10 05:22:43 -07:00
ethernet eth: Declare an optimized compare_ether_addr_64bits() function 2008-11-23 23:24:32 -08:00
ipv4 net: Fix devinet_sysctl_forward 2009-05-18 22:15:58 -07:00
ipv6 net: FIX ipv6_forward sysctl restart 2009-05-18 22:15:58 -07:00
ipx ipx: use constant for strings and desciptor 2009-03-21 19:06:51 -07:00
irda proc tty: switch ircomm to ->proc_fops 2009-04-01 08:59:10 -07:00
iucv af_iucv: Fix merge. 2009-04-23 06:37:16 -07:00
key af_key: remove some pointless conditionals before kfree_skb() 2009-02-26 23:07:32 -08:00
lapb
llc net: remove needless (now buggy) & from dev->dev_addr (part2) 2009-05-17 11:59:53 -07:00
mac80211 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
netfilter ipvs: Fix IPv4 FWMARK virtual services 2009-05-08 14:54:47 -07:00
netlabel netlabel: Always remove the correct address selector 2009-04-22 00:46:09 -07:00
netlink Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2009-03-26 15:23:24 -07:00
netrom net/netrom: Fix socket locking 2009-04-22 00:49:51 -07:00
packet net: TX_RING and packet mmap 2009-05-18 22:11:22 -07:00
phonet trivial: fix typos/grammar errors in Kconfig texts 2009-03-30 15:22:01 +02:00
rds Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
rfkill net/rfkill/rfkill.c: fix build with CONFIG_RFKILL_LEDS=n 2009-05-06 15:14:40 -04:00
rose Revert "rose: zero length frame filtering in af_rose.c" 2009-04-14 20:28:00 -07:00
rxrpc RxRPC: Fix a potential NULL dereference 2009-02-06 21:50:52 -08:00
sched Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-18 21:08:20 -07:00
sctp sctp: add feature bit for SCTP offload in hardware 2009-04-28 01:53:14 -07:00
sunrpc Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux 2009-05-12 17:11:56 -07:00
tipc net: remove needless (now buggy) & from dev->dev_addr 2009-05-17 11:59:47 -07:00
unix New helper - current_umask() 2009-03-31 23:00:26 -04:00
wanrouter wanrouter: fix sparse warnings: context imbalance 2009-02-26 23:13:36 -08:00
wimax Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2009-05-08 02:48:30 -07:00
wireless wext: remove seq_start/stop sparse annotations 2009-05-13 15:44:40 -04:00
x25 af_rose/x25: Sanity check the maximum user frame size 2009-03-27 00:28:21 -07:00
xfrm xfrm: wrong hash value for temporary SA 2009-04-27 02:58:59 -07:00
compat.c net: socket infrastructure for SO_TIMESTAMPING 2009-02-15 22:43:35 -08:00
Kconfig net: remove stale reference to fastroute from Kconfig help text 2009-05-07 16:31:01 -07:00
Makefile RDS: Kconfig and Makefile 2009-02-26 23:43:35 -08:00
nonet.c
socket.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2009-04-06 18:05:43 -07:00
sysctl_net.c net: sysctl_net - use net_eq to compare nets 2009-03-16 16:23:30 +01:00
TUNABLE