mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 18:41:00 +07:00
edb09eb17e
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale : For each qdisc/class found in the dump, we currently lock the root qdisc spinlock in order to get stats. Sampling stats every 5 seconds from thousands of HTB classes is a challenge when the root qdisc spinlock is under high pressure. Not only the dumps take time, they also slow down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases. An audit of existing qdiscs showed that sch_fq_codel is the only qdisc that might need the qdisc lock in fq_codel_dump_stats() and fq_codel_dump_class_stats() In v2 of this patch, I now use the Qdisc running seqcount to provide consistent reads of packets/bytes counters, regardless of 32/64 bit arches. I also changed rate estimators to use the same infrastructure so that they no longer need to lock root qdisc lock. [1] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kevin Athey <kda@google.com> Cc: Xiaotian Pei <xiaotian@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
120 lines
3.6 KiB
Plaintext
120 lines
3.6 KiB
Plaintext
Generic networking statistics for netlink users
|
|
======================================================================
|
|
|
|
Statistic counters are grouped into structs:
|
|
|
|
Struct TLV type Description
|
|
----------------------------------------------------------------------
|
|
gnet_stats_basic TCA_STATS_BASIC Basic statistics
|
|
gnet_stats_rate_est TCA_STATS_RATE_EST Rate estimator
|
|
gnet_stats_queue TCA_STATS_QUEUE Queue statistics
|
|
none TCA_STATS_APP Application specific
|
|
|
|
|
|
Collecting:
|
|
-----------
|
|
|
|
Declare the statistic structs you need:
|
|
struct mystruct {
|
|
struct gnet_stats_basic bstats;
|
|
struct gnet_stats_queue qstats;
|
|
...
|
|
};
|
|
|
|
Update statistics, in dequeue() methods only, (while owning qdisc->running)
|
|
mystruct->tstats.packet++;
|
|
mystruct->qstats.backlog += skb->pkt_len;
|
|
|
|
|
|
Export to userspace (Dump):
|
|
---------------------------
|
|
|
|
my_dumping_routine(struct sk_buff *skb, ...)
|
|
{
|
|
struct gnet_dump dump;
|
|
|
|
if (gnet_stats_start_copy(skb, TCA_STATS2, &mystruct->lock, &dump,
|
|
TCA_PAD) < 0)
|
|
goto rtattr_failure;
|
|
|
|
if (gnet_stats_copy_basic(&dump, &mystruct->bstats) < 0 ||
|
|
gnet_stats_copy_queue(&dump, &mystruct->qstats) < 0 ||
|
|
gnet_stats_copy_app(&dump, &xstats, sizeof(xstats)) < 0)
|
|
goto rtattr_failure;
|
|
|
|
if (gnet_stats_finish_copy(&dump) < 0)
|
|
goto rtattr_failure;
|
|
...
|
|
}
|
|
|
|
TCA_STATS/TCA_XSTATS backward compatibility:
|
|
--------------------------------------------
|
|
|
|
Prior users of struct tc_stats and xstats can maintain backward
|
|
compatibility by calling the compat wrappers to keep providing the
|
|
existing TLV types.
|
|
|
|
my_dumping_routine(struct sk_buff *skb, ...)
|
|
{
|
|
if (gnet_stats_start_copy_compat(skb, TCA_STATS2, TCA_STATS,
|
|
TCA_XSTATS, &mystruct->lock, &dump,
|
|
TCA_PAD) < 0)
|
|
goto rtattr_failure;
|
|
...
|
|
}
|
|
|
|
A struct tc_stats will be filled out during gnet_stats_copy_* calls
|
|
and appended to the skb. TCA_XSTATS is provided if gnet_stats_copy_app
|
|
was called.
|
|
|
|
|
|
Locking:
|
|
--------
|
|
|
|
Locks are taken before writing and released once all statistics have
|
|
been written. Locks are always released in case of an error. You
|
|
are responsible for making sure that the lock is initialized.
|
|
|
|
|
|
Rate Estimator:
|
|
--------------
|
|
|
|
0) Prepare an estimator attribute. Most likely this would be in user
|
|
space. The value of this TLV should contain a tc_estimator structure.
|
|
As usual, such a TLV needs to be 32 bit aligned and therefore the
|
|
length needs to be appropriately set, etc. The estimator interval
|
|
and ewma log need to be converted to the appropriate values.
|
|
tc_estimator.c::tc_setup_estimator() is advisable to be used as the
|
|
conversion routine. It does a few clever things. It takes a time
|
|
interval in microsecs, a time constant also in microsecs and a struct
|
|
tc_estimator to be populated. The returned tc_estimator can be
|
|
transported to the kernel. Transfer such a structure in a TLV of type
|
|
TCA_RATE to your code in the kernel.
|
|
|
|
In the kernel when setting up:
|
|
1) make sure you have basic stats and rate stats setup first.
|
|
2) make sure you have initialized stats lock that is used to setup such
|
|
stats.
|
|
3) Now initialize a new estimator:
|
|
|
|
int ret = gen_new_estimator(my_basicstats,my_rate_est_stats,
|
|
mystats_lock, attr_with_tcestimator_struct);
|
|
|
|
if ret == 0
|
|
success
|
|
else
|
|
failed
|
|
|
|
From now on, every time you dump my_rate_est_stats it will contain
|
|
up-to-date info.
|
|
|
|
Once you are done, call gen_kill_estimator(my_basicstats,
|
|
my_rate_est_stats) Make sure that my_basicstats and my_rate_est_stats
|
|
are still valid (i.e still exist) at the time of making this call.
|
|
|
|
|
|
Authors:
|
|
--------
|
|
Thomas Graf <tgraf@suug.ch>
|
|
Jamal Hadi Salim <hadi@cyberus.ca>
|