linux_dsm_epyc7002/lib
Hannes Frederic Sowa 809fa972fd reciprocal_divide: update/correction of the algorithm
Jakub Zawadzki noticed that some divisions by reciprocal_divide()
were not correct [1][2], which he could also show with BPF code
after divisions are transformed into reciprocal_value() for runtime
invariance which can be passed to reciprocal_divide() later on;
reverse in BPF dump ended up with a different, off-by-one K in
some situations.

This has been fixed by Eric Dumazet in commit aee636c480
("bpf: do not use reciprocal divide"). This follow-up patch
improves reciprocal_value() and reciprocal_divide() to work in
all cases by using Granlund and Montgomery method, so that also
future use is safe and without any non-obvious side-effects.
Known problems with the old implementation were that division by 1
always returned 0 and some off-by-ones when the dividend and divisor
where very large. This seemed to not be problematic with its
current users, as far as we can tell. Eric Dumazet checked for
the slab usage, we cannot surely say so in the case of flex_array.
Still, in order to fix that, we propose an extension from the
original implementation from commit 6a2d7a955d resp. [3][4],
by using the algorithm proposed in "Division by Invariant Integers
Using Multiplication" [5], Torbjörn Granlund and Peter L.
Montgomery, that is, pseudocode for q = n/d where q, n, d is in
u32 universe:

1) Initialization:

  int l = ceil(log_2 d)
  uword m' = floor((1<<32)*((1<<l)-d)/d)+1
  int sh_1 = min(l,1)
  int sh_2 = max(l-1,0)

2) For q = n/d, all uword:

  uword t = (n*m')>>32
  q = (t+((n-t)>>sh_1))>>sh_2

The assembler implementation from Agner Fog [6] also helped a lot
while implementing. We have tested the implementation on x86_64,
ppc64, i686, s390x; on x86_64/haswell we're still half the latency
compared to normal divide.

Joint work with Daniel Borkmann.

  [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c
  [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c
  [3] https://gmplib.org/~tege/division-paper.pdf
  [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html
  [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556
  [6] http://www.agner.org/optimize/asmlib.zip

Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: Jesse Gross <jesse@nicira.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 23:17:20 -08:00
..
fonts lib: Move fonts from drivers/video/console/ to lib/fonts/ 2013-06-28 10:28:22 +02:00
lz4 lz4: fix compression/decompression signedness mismatch 2013-09-11 15:59:45 -07:00
lzo lib/lzo: Update LZO compression to current upstream version 2013-02-20 19:36:01 +01:00
mpi MPILIB: add module description and license 2013-09-25 17:17:01 +01:00
raid6 md update for v3.12 2013-09-10 13:03:41 -07:00
reed_solomon
xz decompressors: fix typo "POWERPC" 2013-03-13 15:21:48 -07:00
zlib_deflate zlib: slim down zlib_deflate() workspace when possible 2011-03-22 17:44:17 -07:00
zlib_inflate inflate_fast: sout is already a short so ptr arith was off by one. 2010-03-12 15:52:44 -08:00
.gitignore X.509: Implement simple static OID registry 2012-10-08 13:50:18 +10:30
argv_split.c argv_split(): teach it to handle mutable strings 2013-04-29 18:28:19 -07:00
asn1_decoder.c Nothing all that exciting; a new module-from-fd syscall for those who want 2012-12-19 07:55:08 -08:00
assoc_array.c KEYS: Fix multiple key add into associative array 2013-12-02 11:24:18 +00:00
atomic64_test.c atomic64_test: simplify the #ifdef for atomic64_dec_if_positive() test 2012-07-30 17:25:16 -07:00
atomic64.c lib: atomic64: Initialize locks statically to fix early users 2012-12-20 13:50:16 -08:00
audit.c audit: support the "standard" <asm-generic/unistd.h> 2011-05-04 14:41:28 -04:00
average.c lib: Ensure EWMA does not store wrong intermediate values 2014-01-16 23:46:06 -08:00
bcd.c usb/core: use bin2bcd() for bcdDevice in RH 2012-09-10 11:13:16 -07:00
bch.c lib: add shared BCH ECC library 2011-03-11 14:25:50 +00:00
bitmap.c propagate name change to comments in kernel source 2012-12-06 10:39:54 +01:00
bitrev.c lib: export bitrev16 2008-06-06 11:29:10 -07:00
bsearch.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
btree.c btree: catch NULL value before it does harm 2012-06-07 14:43:55 -07:00
bug.c taint: add explicit flag to show whether lock dep is still OK. 2013-01-21 17:17:57 +10:30
build_OID_registry X.509: do not emit any informational output 2013-06-19 17:54:06 +02:00
bust_spinlocks.c printk: Provide a wake_up_klogd() off-case 2013-03-22 16:41:20 -07:00
check_signature.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
checksum.c asm-generic headers: Allow yet more arch overrides in checksum.h 2013-02-11 20:00:33 +05:30
clz_ctz.c lib: add weak clz/ctz functions 2013-07-09 10:33:30 -07:00
clz_tab.c lib: Fix multiple definitions of clz_tab 2012-02-02 10:34:23 +11:00
cmdline.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
cordic.c Docs: wording: functions -> algorithm 2011-10-29 21:20:22 +02:00
cpu_rmap.c Remove GENERIC_HARDIRQ config option 2013-09-13 15:09:52 +02:00
cpu-notifier-error-inject.c cpu: rewrite cpu-notifier-error-inject module 2012-07-30 17:25:22 -07:00
cpumask.c bootmem: fix wrong call parameter for free_bootmem() 2012-12-11 17:22:28 -08:00
crc7.c
crc8.c lib: crc8: add new library module providing crc8 algorithm 2011-06-03 15:01:06 -04:00
crc16.c
crc32.c lib: crc32: reduce number of cases for crc32{, c}_combine 2013-11-04 15:27:08 -05:00
crc32defs.h crc32: select an algorithm via Kconfig 2012-03-23 16:58:38 -07:00
crc-ccitt.c
crc-itu-t.c
crc-t10dif.c crypto: crct10dif - Add fallback for broken initrds 2013-09-12 15:31:34 +10:00
ctype.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
debug_locks.c mutex: Add support for wound/wait style locks 2013-06-26 12:10:56 +02:00
debugobjects.c lib/debugobjects.c: remove unnecessary work pending test 2013-11-13 12:09:22 +09:00
dec_and_lock.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
decompress_bunzip2.c decompress_bunzip2: remove invalid vi modeline 2011-12-06 10:00:05 +01:00
decompress_inflate.c lib/decompressors: fix "no limit" output buffer length 2013-09-11 15:58:38 -07:00
decompress_unlz4.c lib: add support for LZ4-compressed kernel 2013-07-09 10:33:30 -07:00
decompress_unlzma.c treewide: Fix comment and string typo 'bufer' 2011-12-06 09:53:40 +01:00
decompress_unlzo.c lib/lzo: Rename lzo1x_decompress.c to lzo1x_decompress_safe.c 2013-02-20 19:36:00 +01:00
decompress_unxz.c Fix common misspellings 2011-03-31 11:26:23 -03:00
decompress.c lib: add support for LZ4-compressed kernel 2013-07-09 10:33:30 -07:00
devres.c lib/devres.c: fix misplaced #endif 2013-02-27 19:10:09 -08:00
digsig.c lib/digsig.c: use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)) 2013-11-13 12:09:22 +09:00
div64.c math64: New separate div64_u64_rem helper 2013-08-23 09:02:14 -04:00
dma-debug.c dma-debug: update DMA debug API to better handle multiple mappings of a buffer 2013-03-22 16:41:20 -07:00
dump_stack.c x86, asmlinkage: Make dump_stack visible 2013-08-06 14:21:01 -07:00
dynamic_debug.c dynamic debug: line queries failing due to uninitialized local variable 2013-08-28 12:10:53 -07:00
dynamic_queue_limits.c bql: Avoid possible inconsistent calculation. 2012-05-31 18:18:17 -04:00
earlycpio.c earlycpio.c: Fix the confusing comment of find_cpio_data(). 2013-08-14 23:24:01 +02:00
extable.c module: trim exception table on init free. 2009-06-12 21:47:04 +09:30
fault-inject.c debugfs: add get/set for atomic types 2013-06-03 13:55:01 -07:00
fdt_ro.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_rw.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_strerror.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_sw.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt_wip.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
fdt.c of/lib: Allow scripts/dtc/libfdt to be used from kernel code 2012-07-23 13:54:52 +01:00
find_last_bit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
find_next_bit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
flex_array.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
flex_proportions.c lib/flex_proportions.c: fix corruption of denominator in flexible proportions 2012-09-25 08:59:21 -07:00
gcd.c lib/gcd.c: prevent possible div by 0 2012-10-06 03:04:57 +09:00
gen_crc32table.c sections: fix const sections for crc32 table 2012-10-06 03:04:46 +09:00
genalloc.c lib/genalloc: add a helper function for DMA buffer allocation 2013-11-13 12:09:22 +09:00
halfmd4.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
hash.c lib: hash: follow-up fixups for arch hash 2013-12-19 00:14:53 -05:00
hexdump.c lib: introduce upper case hex ascii helpers 2013-09-20 15:38:26 -04:00
hweight.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
idr.c idr: print a stack dump after ida_remove warning 2013-07-03 16:08:04 -07:00
inflate.c MN10300: Don't try and #include <linux/slab.h> in lib/inflate.c from bootloader 2010-08-12 09:51:35 -07:00
int_sqrt.c lib/int_sqrt.c: optimize square root algorithm 2013-04-29 18:28:19 -07:00
interval_tree_test_main.c random32: rename random32 to prandom 2012-12-17 17:15:26 -08:00
interval_tree.c mm: interval tree updates 2012-10-09 16:22:40 +09:00
iomap_copy.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iomap.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iommu-helper.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
ioremap.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
iovec.c Hoist memcpy_fromiovec/memcpy_toiovec into lib/ 2013-05-20 10:24:22 +09:30
irq_regs.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
is_single_threaded.c kernel: is_current_single_threaded: don't use ->mmap_sem 2009-07-17 09:11:31 +10:00
jedec_ddr_data.c ddr: add LPDDR2 data from JESD209-2 2012-05-02 00:04:06 -07:00
kasprintf.c lib/kasprintf.c: use kmalloc_track_caller() to get accurate traces for kvasprintf 2012-10-11 08:50:15 +09:00
Kconfig Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2013-11-21 19:46:00 -08:00
Kconfig.debug percpu: add test module for various percpu operations 2013-11-13 12:09:11 +09:00
Kconfig.kgdb treewide: Fix typo in printk 2013-06-18 13:48:45 +02:00
Kconfig.kmemcheck kmemcheck: depend on HAVE_ARCH_KMEMCHECK 2009-07-01 22:28:44 +02:00
kfifo.c kfifo: kfifo_copy_{to,from}_user: fix copied bytes calculation 2013-11-15 09:32:23 +09:00
klist.c klist: del waiter from klist_remove_waiters before wakeup waitting process 2013-05-21 10:16:39 -07:00
kobject_uevent.c net: fix "queues" uevent between network namespaces 2014-01-19 20:02:02 -08:00
kobject.c Revert "sysfs: drop kobj_ns_type handling" 2013-11-07 20:47:28 +09:00
kstrtox.c kstrto*: add documentation 2012-12-17 17:15:22 -08:00
kstrtox.h lib/kstrtox: common code between kstrto*() and simple_strto*() functions 2011-10-31 17:30:56 -07:00
lcm.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
libcrc32c.c libcrc32c: Fix "crc32c undefined" compilation error 2008-12-25 11:01:42 +11:00
list_debug.c rcu: Fix broken strings in RCU's source code. 2012-07-06 06:01:49 -07:00
list_sort.c lib/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
llist.c llists-move-llist_reverse_order-from-raid5-to-llistc-fix 2013-11-15 09:32:22 +09:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c sched: Introduce preempt_count accessor functions 2013-09-25 14:07:32 +02:00
lockref.c lockref: include mutex.h rather than reinvent arch_mutex_cpu_relax 2013-11-27 20:37:33 -08:00
lru_cache.c lru_cache: introduce lc_get_cumulative() 2013-03-22 22:17:36 -06:00
Makefile lib: introduce arch optimized hash library 2013-12-17 14:27:17 -05:00
md5.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
memory-notifier-error-inject.c memory: memory notifier error injection module 2012-07-30 17:25:22 -07:00
memweight.c string: introduce memweight() 2012-07-30 17:25:16 -07:00
net_utils.c net: core: move mac_pton() to lib/net_utils.c 2013-06-05 12:00:27 -07:00
nlattr.c netlink: add minlen validation for the new signed types 2012-08-30 13:11:46 -04:00
notifier-error-inject.c mode_t, whack-a-mole at 11... 2013-04-09 14:13:05 -04:00
notifier-error-inject.h fault-injection: notifier error injection 2012-07-30 17:25:22 -07:00
of-reconfig-notifier-error-inject.c powerpc+of: Rename and fix OF reconfig notifier error inject module 2012-12-14 10:32:52 +11:00
oid_registry.c Give the OID registry file module info to avoid kernel tainting 2013-05-05 14:38:00 -07:00
parser.c lib/parser.c: fix up comments for valid return values from match_number 2013-02-21 17:22:25 -08:00
pci_iomap.c lib: add NO_GENERIC_PCI_IOPORT_MAP 2012-01-31 23:19:47 +02:00
percpu_counter.c percpu_counter: unbreak __percpu_counter_add() 2014-01-17 11:32:24 +11:00
percpu_ida.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2013-11-22 10:52:03 -08:00
percpu_test.c percpu: add test module for various percpu operations 2013-11-13 12:09:11 +09:00
percpu-refcount.c percpu-refcount: Add EXPORT_SYMBOL to use percpu_ref from modules 2013-11-07 12:14:52 -08:00
plist.c lib/plist.c: make plist test announcements KERN_DEBUG 2012-10-06 03:04:58 +09:00
pm-notifier-error-inject.c PM: PM notifier error injection module 2012-07-30 17:25:22 -07:00
prio_heap.c lib: fix sparse shadowed variable warning 2009-01-06 15:59:11 -08:00
proportions.c locking, lib/proportions: Annotate prop_local_percpu::lock as raw 2011-09-13 11:11:50 +02:00
radix-tree.c lib/radix-tree.c: make radix_tree_node_alloc() work correctly within interrupt 2013-09-11 15:59:36 -07:00
random32.c random32: use msecs_to_jiffies for reseed timer 2013-11-14 16:06:02 -05:00
ratelimit.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
rational.c lib: Change mail address of Oskar Schirmer 2012-05-17 15:18:37 +02:00
rbtree_test.c rbtree_test: add test for postorder iteration 2013-09-11 15:59:20 -07:00
rbtree.c rbtree: add postorder iteration functions 2013-09-11 15:59:19 -07:00
reciprocal_div.c reciprocal_divide: update/correction of the algorithm 2014-01-21 23:17:20 -08:00
scatterlist.c lib/scatterlist.c: don't flush_kernel_dcache_page on slab page 2013-10-31 16:58:13 -07:00
sha1.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
show_mem.c mm: do not walk all of system memory during show_mem 2013-11-13 12:09:09 +09:00
smp_processor_id.c sched: Introduce preempt_count accessor functions 2013-09-25 14:07:32 +02:00
sort.c generic swap(): lib/sort.c: rename swap to swap_func 2009-01-08 08:31:14 -08:00
stmp_device.c lib: add support for stmp-style devices 2012-04-20 23:27:08 +02:00
string_helpers.c lib/string_helpers: introduce generic string_unescape 2013-04-30 17:04:03 -07:00
string.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
strncpy_from_user.c word-at-a-time: make the interfaces truly generic 2012-05-26 11:33:40 -07:00
strnlen_user.c lib: Fix generic strnlen_user for 32-bit big-endian machines 2012-05-27 20:59:46 -07:00
swiotlb.c Merge remote-tracking branch 'stefano/swiotlb-xen-9.1' into stable/for-linus-3.13 2013-11-08 16:10:48 -05:00
syscall.c lib: reduce the use of module.h wherever possible 2012-03-07 15:04:04 -05:00
test-kstrtox.c lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata 2012-05-29 16:22:32 -07:00
test-string_helpers.c lib/string_helpers: introduce generic string_unescape 2013-04-30 17:04:03 -07:00
textsearch.c textsearch: doc - fix spelling in lib/textsearch.c. 2011-01-24 23:33:30 -08:00
timerqueue.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
ts_bm.c textsearch: ts_bm: support case insensitive searching in Boyer-Moore algorithm 2008-07-08 02:37:54 -07:00
ts_fsm.c textsearch: ts_fsm: return error on request for case insensitive search 2008-07-08 02:38:27 -07:00
ts_kmp.c textsearch: ts_kmp: support case insensitive searching in Knuth-Morris-Pratt algorithm 2008-07-08 02:38:09 -07:00
ucs2_string.c Move utf16 functions to kernel core and rename 2013-04-15 21:23:03 +01:00
usercopy.c Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2013-04-30 17:04:09 -07:00
uuid.c uuid: use prandom_bytes() 2013-04-29 18:28:42 -07:00
vsprintf.c vsprintf: ignore %n again 2013-11-15 09:32:20 +09:00