linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 22:50:57 +07:00

History

Nick Piggin 8174c430e4 x86: lockless get_user_pages_fast() Implement get_user_pages_fast without locking in the fastpath on x86. Do an optimistic lockless pagetable walk, without taking mmap_sem or any page table locks or even mmap_sem. Page table existence is guaranteed by turning interrupts off (combined with the fact that we're always looking up the current mm, means we can do the lockless page table walk within the constraints of the TLB shootdown design). Basically we can do this lockless pagetable walk in a similar manner to the way the CPU's pagetable walker does not have to take any locks to find present ptes. This patch (combined with the subsequent ones to convert direct IO to use it) was found to give about 10% performance improvement on a 2 socket 8 core Intel Xeon system running an OLTP workload on DB2 v9.5 "To test the effects of the patch, an OLTP workload was run on an IBM x3850 M2 server with 2 processors (quad-core Intel Xeon processors at 2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing runs with and without the patch resulted in an overall performance benefit of ~9.8%. Correspondingly, oprofiles showed that samples from __up_read and __down_read routines that is seen during thread contention for system resources was reduced from 2.8% down to .05%. Monitoring the /proc/vmstat output from the patched run showed that the counter for fast_gup contained a very high number while the fast_gup_slow value was zero." (fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a counter we had for the number of times the slowpath was invoked). The main reason for the improvement is that DB2 has multiple threads each issuing direct-IO. Direct-IO uses get_user_pages, and thus the threads contend the mmap_sem cacheline, and can also contend on page table locks. I would anticipate larger performance gains on larger systems, however I think DB2 uses an adaptive mix of threads and processes, so it could be that thread contention remains pretty constant as machine size increases. In which case, we stuck with "only" a 10% gain. The downside of using get_user_pages_fast is that if there is not a pte with the correct permissions for the access, we end up falling back to get_user_pages and so the get_user_pages_fast is a bit of extra work. However this should not be the common case in most performance critical code. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: Kconfig fix] [akpm@linux-foundation.org: Makefile fix/cleanup] [akpm@linux-foundation.org: warning fix] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2008-07-26 12:00:06 -07:00
..
acpi	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6	2008-07-16 17:25:46 -07:00
asm-alpha	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-arm	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-avr32	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-blackfin	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-cris	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-frv	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-generic	Better interface for hooking early initcalls	2008-07-26 12:00:04 -07:00
asm-h8300	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-ia64	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-m32r	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-m68k	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-m68knommu	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-mips	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-mn10300	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-parisc	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-powerpc	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-s390	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-sh	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-sparc	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
asm-sparc64	remove dummy asm/kvm.h files	2008-07-25 14:35:50 -04:00
asm-um	Merge git://git.infradead.org/~dwmw2/random-2.6	2008-07-25 12:01:37 -07:00
asm-v850	remove the v850 port	2008-07-24 10:47:24 -07:00
asm-x86	x86: lockless get_user_pages_fast()	2008-07-26 12:00:06 -07:00
asm-xtensa	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
crypto	crypto: hash - Move ahash functions into crypto/hash.h	2008-07-10 20:35:18 +08:00
drm	drm/radeon: fixup issue with radeon and PAT support.	2008-07-15 15:48:05 +10:00
keys
linux	mm: introduce get_user_pages_fast	2008-07-26 12:00:05 -07:00
math-emu
media	V4L/DVB (8395): saa7134: Fix Kbuild dependency of ir-kbd-i2c	2008-07-20 07:29:03 -03:00
mtd	UBI: fix checkpatch.pl errors and warnings	2008-07-24 13:36:09 +03:00
net	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-07-25 17:40:16 -07:00
pcmcia
rdma	dma-mapping: add the device argument to dma_mapping_error()	2008-07-26 12:00:03 -07:00
rxrpc
scsi	driver core: remove KOBJ_NAME_LEN define	2008-07-21 21:54:52 -07:00
sound	ALSA: Release v1.0.17	2008-07-14 09:54:43 +02:00
video	include/video/atmel_lcdc.h must #include <linux/workqueue.h>	2008-07-26 12:00:01 -07:00
xen	xen: implement Xen-specific spinlocks	2008-07-16 11:15:53 +02:00
Kbuild	drm: reorganise drm tree to be more future proof.	2008-07-14 10:45:01 +10:00