linux_dsm_epyc7002/Documentation
David Ahern 58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
..
ABI LED fixes for 4.20-rc2 2018-11-08 17:49:04 -06:00
accelerators
accounting psi: cgroup support 2018-10-26 16:26:32 -07:00
acpi ACPI: property: graph: Update graph documentation to use generic references 2018-07-23 12:44:52 +02:00
admin-guide Char/Misc driver fixes for 4.20-rc4 2018-11-22 08:43:06 -08:00
aoe
arm ARM: SoC device tree updates for 4.20 2018-10-29 15:05:20 -07:00
arm64 Documentation/arm64: HugeTLB page implementation 2018-10-10 18:08:36 +01:00
auxdisplay
backlight
block Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
blockdev This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-08-07 11:02:05 -07:00
bus-devices
cdrom Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
cgroup-v1 Merge branch 'for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-10-25 17:15:46 -07:00
cma
connector
console Documentation: corrections to console/console.txt 2018-08-10 16:09:40 -06:00
core-api XArray: Fix Documentation 2018-11-05 16:38:10 -05:00
cpu-freq Documentation: cpu-freq: Frequencies aren't always sorted 2018-11-07 13:29:04 +01:00
cpuidle
crypto KEYS: Implement PKCS#8 RSA Private Key parser [ver #2] 2018-10-26 09:30:46 +01:00
dev-tools docs: dev-tools: coccinelle: Update documentation 2018-08-31 16:51:59 -06:00
device-mapper This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
devicetree net: documentation: build a directory structure for drivers 2018-12-05 11:30:06 -08:00
doc-guide Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
driver-api Char/Misc driver patches for 4.20-rc1 2018-10-26 09:11:43 -07:00
driver-model dmaengine: add a new helper dmaenginem_async_device_register 2018-07-30 10:50:22 +05:30
early-userspace initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
EDID
extcon
fault-injection
fb This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
features ARM: 8777/1: Hook up SYNC_CORE functionality for sys_membarrier() 2018-07-11 11:02:08 +01:00
filesystems This pull request contains updates for UBIFS: 2018-11-04 14:46:04 -08:00
firmware_class
fmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
fpga docs: fpga: add a document for FPGA Device Feature List (DFL) Framework Overview 2018-07-15 13:55:44 +02:00
gpio Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
gpu Merge branch 'drm-next-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-next 2018-09-21 09:52:53 +10:00
hid
hwmon hwmon: (ina3221) Read channel input source info from DT 2018-10-10 20:37:13 -07:00
i2c i2c: add i2c bus driver for NVIDIA GPU 2018-11-09 17:46:43 +01:00
ia64
ide Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
iio
infiniband
input Revert "Input: Add the REL_WHEEL_HI_RES event code" 2018-11-22 08:57:44 +01:00
ioctl drm pull for 4.20-rc1 2018-10-28 17:49:53 -07:00
isdn Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
kbuild kbuild: remove unused cc-fullversion variable 2018-11-02 00:15:26 +09:00
kdump
kernel-hacking doc:it_IT: translation for kernel-hacking 2018-07-26 16:21:09 -06:00
laptops platform-drivers-x86 for v4.20-1 2018-11-01 08:42:21 -07:00
leds Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
lightnvm
livepatch
locking This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
m68k Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
maintainer docs: Fix more broken references 2018-06-15 18:11:26 -03:00
md
media media: docs: Document metadata format in struct v4l2_format 2018-11-06 07:10:12 -05:00
memory-devices
mic
mips Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
misc-devices pci_endpoint_test: Add 2 ioctl commands 2018-07-19 11:46:57 +01:00
mmc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
mtd Documentation: mtd: remove stale pxa3xx NAND controller documentation 2018-09-04 23:37:38 +02:00
namespaces
netlabel Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
networking neighbor: Improve garbage collection 2018-12-07 16:03:10 -08:00
nfc
nios2
nvdimm
nvmem Documentation: nvmem: document cell tables and lookup entries 2018-09-28 15:14:54 +02:00
openrisc
parisc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
PCI pci-v4.20-changes 2018-10-25 06:50:48 -07:00
pcmcia pcmcia: remove long deprecated pcmcia_request_exclusive_irq() function 2018-08-18 12:30:42 -07:00
perf
phy
platform
power This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
powerpc Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
pps
process The Compiler Attributes series 2018-11-01 18:34:46 -07:00
pti
ptp
rapidio
RCU This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
riscv
s390 KVM updates for v4.20 2018-10-25 17:57:35 -07:00
scheduler This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
scsi SCSI misc on 20181024 2018-10-25 07:40:30 -07:00
security Merge branch 'next-keys2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2018-11-01 15:23:59 -07:00
serial TTY/Serial patches for 4.20-rc1 2018-10-29 10:42:20 -07:00
sh
sound ALSA: doc: Brush up the old writing-an-alsa-driver 2018-10-18 10:30:01 +02:00
sparc
sphinx Documentation/sphinx: allow "functions" with no parameters 2018-06-30 07:52:42 -06:00
sphinx-static docs: improve readability for people with poorer eyesight 2018-10-07 09:16:50 -06:00
spi Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
sysctl New gcc plugin: stackleak 2018-11-01 11:46:27 -07:00
target
thermal
timers Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
trace The biggest change here is the updates to kprobes 2018-10-30 09:49:56 -07:00
translations This was a moderately busy cycle for docs, with the usual collection of 2018-08-14 14:29:31 -07:00
usb USB-serial updates for v4.19-rc1 2018-07-20 21:47:15 +02:00
userspace-api
virtual KVM updates for v4.20 2018-10-25 17:57:35 -07:00
vm slub: extend slub debug to handle multiple slabs 2018-10-26 16:25:19 -07:00
w1 Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00
watchdog documentation: watchdog: add documentation for armada-37xx-wdt 2018-10-13 15:19:40 +02:00
wimax
x86 x86/mm: Move LDT remap out of KASLR region on 5-level paging 2018-11-06 21:35:11 +01:00
xilinx Documentation: xilinx: Add documentation for eemi APIs 2018-10-09 13:26:05 +02:00
xtensa
.gitignore
atomic_bitops.txt
atomic_t.txt
bt8xxgpio.txt
btmrvl.txt
bus-virt-phys-mapping.txt
Changes
clearing-warn-once.txt
CodingStyle
conf.py This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt
dell_rbu.txt
digsig.txt
DMA-API-HOWTO.txt
DMA-API.txt
DMA-attributes.txt
DMA-ISA-LPC.txt
docutils.conf
dontdiff
efi-stub.txt efi_stub: update documentation on dtb= parameter 2018-09-09 14:46:44 -06:00
eisa.txt
flexible-arrays.txt
futex-requeue-pi.txt
gcc-plugins.txt
highuid.txt
hw_random.txt
hwspinlock.txt
index.rst docs: tidy up TOCs and refs to license-rules.rst 2018-08-31 16:50:50 -06:00
intel_txt.txt
Intel-IOMMU.txt
io_ordering.txt
io-mapping.txt
iostats.txt block: Track DISCARD statistics and output them in stat and diskstat 2018-07-18 08:44:22 -06:00
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt
IRQ.txt
irqflags-tracing.txt
isa.txt
isapnp.txt
kernel-per-CPU-kthreads.txt doc: Update removal of RCU-bh/sched update machinery 2018-08-30 10:59:48 -07:00
kobject.txt
kprobes.txt kprobes/Documentation: Fix various typos 2018-06-22 11:10:55 +02:00
kref.txt
ldm.txt
lockup-watchdogs.txt
logo.gif
logo.txt
lsm.txt
lzo.txt
mailbox.txt
Makefile
memory-barriers.txt locking/memory-barriers: Replace smp_cond_acquire() with smp_cond_load_acquire() 2018-10-02 10:28:05 +02:00
men-chameleon-bus.txt
nommu-mmap.txt Documentation: nommu-map: Fix duplicate word typo 2018-06-26 09:01:27 -06:00
ntb.txt
numastat.txt
padata.txt
parport-lowlevel.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pnp.txt
preempt-locking.txt Documentation: preempt-locking: Use better example 2018-10-12 11:35:47 -06:00
pwm.txt
rbtree.txt
remoteproc.txt
rfkill.txt rfkill: Fix several typos in documentation 2018-06-15 13:36:08 +02:00
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
sgi-ioc4.txt
siphash.txt
SM501.txt
smsc_ece1099.txt
speculation.txt
static-keys.txt
SubmittingPatches
svga.txt
switchtec.txt NTB: switchtec_ntb: Update switchtec documentation with prerequisites for NTB 2018-10-11 11:28:53 -05:00
sync_file.txt
tee.txt
this_cpu_ops.txt
unaligned-memory-access.txt
vfio-mediated-device.txt vfio/mdev: Check globally for duplicate devices 2018-06-08 10:24:27 -06:00
vfio.txt
video-output.txt
xillybus.txt
xz.txt
zorro.txt