linux_dsm_epyc7002/net/sched
Nik Unger 5080f39e8c netem: apply correct delay when rate throttling
I recently reported on the netem list that iperf network benchmarks
show unexpected results when a bandwidth throttling rate has been
configured for netem. Specifically:

1) The measured link bandwidth *increases* when a higher delay is added
2) The measured link bandwidth appears higher than the specified limit
3) The measured link bandwidth for the same very slow settings varies significantly across
  machines

The issue can be reproduced by using tc to configure netem with a
512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
veth pair between network namespaces, and then using iperf (or any
other network benchmarking tool) to test throughput. Complete detailed
instructions are in the original email chain here:
https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html

There appear to be two underlying bugs causing these effects:

- The first issue causes long delays when the rate is slow and no
  delay is configured (e.g., "rate 512kbit"). This is because SKBs are
  not orphaned when no delay is configured, so orphaning does not
  occur until *after* the rate-induced delay has been applied. For
  this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
  dramatically increases the measured bandwidth.

- The second issue is that rate-induced delays are not correctly
  applied, allowing SKB delays to occur in parallel. The indended
  approach is to compute the delay for an SKB and to add this delay to
  the end of the current queue. However, the code does not detect
  existing SKBs in the queue due to improperly testing sch->q.qlen,
  which is nonzero even when packets exist only in the
  rbtree. Consequently, new SKBs do not wait for the current queue to
  empty. When packet delays vary significantly (e.g., if packet sizes
  are different), then this also causes unintended reordering.

I modified the code to expect a delay (and orphan the SKB) when a rate
is configured. I also added some defensive tests that correctly find
the latest scheduled delivery time, even if it is (unexpectedly) for a
packet in sch->q. I have tested these changes on the latest kernel
(4.11.0-rc1+) and the iperf / ping test results are as expected.

Signed-off-by: Nik Unger <njunger@uwaterloo.ca>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:14:06 -07:00
..
act_api.c net sched actions: do not overwrite status of action creation. 2017-02-26 21:31:32 -05:00
act_bpf.c bpf: rework prog_digest into prog_tag 2017-01-16 14:03:31 -05:00
act_connmark.c act_connmark: avoid crashing on malformed nlattrs with null parms 2017-03-12 23:32:41 -07:00
act_csum.c net/sched: act_csum: compute crc32c on SCTP packets 2017-01-09 14:36:57 -05:00
act_gact.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
act_ife.c net/sched: act_ife: Staticfy find_decode_metaid() 2017-03-16 12:02:14 -07:00
act_ipt.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
act_meta_mark.c Support to encoding decoding skb mark on IFE action 2016-03-01 17:15:23 -05:00
act_meta_skbprio.c Support to encoding decoding skb prio on IFE action 2016-03-01 17:15:23 -05:00
act_meta_skbtcindex.c net sched ife action: Introduce skb tcindex metadata encap decap 2016-09-19 21:55:28 -04:00
act_mirred.c net/sched: act_mirred: remove duplicated include from act_mirred.c 2017-02-07 11:42:34 -05:00
act_nat.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
act_pedit.c net/act_pedit: Introduce 'add' operation 2017-02-10 13:18:33 -05:00
act_police.c net_sched: gen_estimator: complete rewrite of rate estimators 2016-12-05 15:21:59 -05:00
act_sample.c net/sched: act_psample: Remove unnecessary ASSERT_RTNL 2017-02-01 14:10:03 -05:00
act_simple.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
act_skbedit.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
act_skbmod.c net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump 2017-03-07 14:13:03 -08:00
act_tunnel_key.c net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 2016-12-23 11:59:56 -05:00
act_vlan.c netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
cls_api.c sched: Fix accidental removal of errout goto 2017-02-14 11:44:14 -05:00
cls_basic.c net, sched: respect rcu grace period on cls destruction 2016-11-28 10:47:35 -05:00
cls_bpf.c net/sched: cls_bpf: Reflect HW offload status 2017-02-17 12:08:06 -05:00
cls_cgroup.c net, sched: respect rcu grace period on cls destruction 2016-11-28 10:47:35 -05:00
cls_flow.c skbuff: add and use skb_nfct helper 2017-02-02 14:31:53 +01:00
cls_flower.c net/sched: cls_flower: Reflect HW offload status 2017-02-17 12:08:05 -05:00
cls_fw.c net sched: stylistic cleanups 2016-09-19 22:04:14 -04:00
cls_matchall.c net/sched: cls_matchall: Reflect HW offloading status 2017-02-17 12:08:06 -05:00
cls_route.c net_sched: check NULL on error path in route4_change() 2016-09-23 06:51:49 -04:00
cls_rsvp6.c
cls_rsvp.c
cls_rsvp.h net, sched: respect rcu grace period on cls destruction 2016-11-28 10:47:35 -05:00
cls_tcindex.c net, sched: respect rcu grace period on cls destruction 2016-11-28 10:47:35 -05:00
cls_u32.c net/sched: cls_u32: Reflect HW offload status 2017-02-17 12:08:06 -05:00
em_canid.c net: sched: remove tcf_proto from ematch calls 2014-10-06 18:02:32 -04:00
em_cmp.c
em_ipset.c netfilter: x_tables: move hook state into xt_action_param structure 2016-11-03 10:56:21 +01:00
em_meta.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h> 2017-03-02 08:42:27 +01:00
em_nbyte.c net: sched: remove tcf_proto from ematch calls 2014-10-06 18:02:32 -04:00
em_text.c net: Remove state argument from skb_find_text() 2015-02-22 15:59:54 -05:00
em_u32.c
ematch.c ematch: Fix auto-loading of ematch modules. 2015-02-20 15:30:56 -05:00
Kconfig net/sched: act_ife: Change to use ife module 2017-02-03 15:16:46 -05:00
Makefile net/sched: Introduce sample tc action 2017-01-24 13:44:28 -05:00
sch_api.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_atm.c sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api 2017-02-10 11:38:08 -05:00
sch_blackhole.c net_sched: drop packets after root qdisc lock is released 2016-06-25 12:19:35 -04:00
sch_cbq.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_choke.c sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api 2017-02-10 11:38:08 -05:00
sch_codel.c sched: replace __skb_dequeue with __qdisc_dequeue_head 2016-09-19 01:47:18 -04:00
sch_drr.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_dsmark.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_fifo.c sched: don't use skb queue helpers 2016-09-19 01:47:18 -04:00
sch_fq_codel.c net/sched: fq_codel: Avoid set-but-unused variable 2017-03-16 12:02:14 -07:00
sch_fq.c net_sched: sch_fq: use rb_entry() 2016-12-20 14:22:48 -05:00
sch_generic.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_gred.c net_sched: drop packets after root qdisc lock is released 2016-06-25 12:19:35 -04:00
sch_hfsc.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_hhf.c net_sched: fix error recovery at qdisc creation 2017-02-11 21:38:58 -05:00
sch_htb.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_ingress.c sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api 2017-02-10 11:38:08 -05:00
sch_mq.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_mqprio.c mqprio: Modify mqprio to pass user parameters via ndo_setup_tc. 2017-03-15 15:20:27 -07:00
sch_multiq.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_netem.c netem: apply correct delay when rate throttling 2017-03-16 20:14:06 -07:00
sch_pie.c sched: replace __skb_dequeue with __qdisc_dequeue_head 2016-09-19 01:47:18 -04:00
sch_plug.c net_sched: drop packets after root qdisc lock is released 2016-06-25 12:19:35 -04:00
sch_prio.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_qfq.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_red.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_sfb.c net: sched: make default fifo qdiscs appear in the dump 2017-03-12 22:53:02 -07:00
sch_sfq.c net_sched: fix error recovery at qdisc creation 2017-02-11 21:38:58 -05:00
sch_tbf.c sch_tbf: Remove bogus semicolon in if() conditional. 2017-03-13 00:00:03 -07:00
sch_teql.c net: make ndo_get_stats64 a void function 2017-01-08 17:51:44 -05:00