linux_dsm_epyc7002/include/trace/events
Alex Shi e7b52ffd45 x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
	(512 - X) * 100ns(assumed TLB refill cost) =
		X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
	       	        with patch   unpatched 3.4-rc4
./mprotect -t 1           14ns		24ns
./mprotect -t 2           13ns		22ns
./mprotect -t 4           12ns		19ns
./mprotect -t 8           14ns		16ns
./mprotect -t 16          28ns		26ns
./mprotect -t 32          54ns		51ns
./mprotect -t 128         200ns		199ns

Single process with sequencial flushing and memory accessing:

		       	with patch   unpatched 3.4-rc4
./mprotect		    7ns			11ns
./mprotect -p 4096  -l 8 -n 10240
			    21ns		21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
  has additional performance numbers. ]

Signed-off-by: Alex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-06-27 19:29:07 -07:00
..
9p.h net/9p: Convert net/9p protocol dumps to tracepoints 2011-10-24 11:13:12 -05:00
asoc.h ASoC: dapm: Fix x86_64 build warning. 2012-04-23 13:15:35 +01:00
block.h blktrace: add FLUSH/FUA support 2011-08-11 10:36:05 +02:00
btrfs.h btrfs: return void in functions without error conditions 2012-03-22 01:45:34 +01:00
compaction.h mm: compaction: add trace events for memory compaction activity 2011-01-13 17:32:33 -08:00
ext3.h userns: Convert ext3 to use kuid/kgid where appropriate 2012-05-15 14:59:27 -07:00
ext4.h userns: Convert ext4 to user kuid/kgid where appropriate 2012-05-15 14:59:27 -07:00
gfpflags.h mm: tracing: add missing GFP flags to tracing 2011-05-11 18:50:45 -07:00
gpio.h gpio: add trace events for setting direction and value 2011-05-20 00:40:19 -06:00
irq.h rcu: Use softirq to address performance regression 2011-06-14 15:25:39 -07:00
jbd2.h jbd2: issue cache flush after checkpointing even with internal journal 2012-03-13 22:22:54 -04:00
jbd.h jbd: Write journal superblock with WRITE_FUA after checkpointing 2012-05-15 23:34:37 +02:00
kmem.h mm-tracepoint: rename page-free events 2012-01-10 16:30:41 -08:00
kvm.h KVM: cleanup async_pf tracepoints 2011-01-12 11:28:57 +02:00
lock.h tracing: Factorize lock events in a lock class 2010-05-09 13:45:35 +02:00
mce.h tracing: Fix event alignment: mce:mce_record 2011-03-10 10:34:28 -05:00
module.h include: replace linux/module.h with "struct module" wherever possible 2011-10-31 19:32:32 -04:00
napi.h napi: Convert trace_napi_poll to TRACE_EVENT 2010-09-07 17:51:01 +02:00
net.h net: tracepoint of net_dev_xmit sees freed skb and causes panic 2011-06-02 14:06:31 -07:00
oom.h tracepoint: add tracepoints for debugging oom_score_adj 2012-01-10 16:30:44 -08:00
power.h PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints 2012-05-01 21:25:25 +02:00
printk.h printk/tracing: Add console output tracing 2012-02-13 13:46:05 -05:00
rcu.h rcu: Make RCU_FAST_NO_HZ handle timer migration 2012-05-09 14:26:56 -07:00
regmap.h The following text was taken from the original review request: 2012-03-24 10:41:37 -07:00
regulator.h regulator: Add basic trace facilities 2011-01-12 14:33:00 +00:00
rpm.h device.h: audit and cleanup users in main include dir 2012-03-16 10:38:24 -04:00
sched.h tracing, sched, vfs: Fix 'old_pid' usage in trace_sched_process_exec() 2012-03-31 11:53:22 +02:00
scsi.h [SCSI] Include protection operation in SCSI command trace 2011-03-14 18:36:02 -05:00
signal.h tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info 2012-01-13 18:48:50 +01:00
skb.h tracing: Fix event alignment: skb:kfree_skb 2011-03-10 10:34:31 -05:00
sock.h core: add tracepoints for queueing skb to rcvbuf 2011-06-21 16:06:10 -07:00
sunrpc.h SUNRPC: Adding status trace points 2012-02-06 10:37:53 -05:00
syscalls.h tracing: Allow raw syscall trace events for non privileged users 2010-11-18 14:37:43 +01:00
task.h tracepoint: add tracepoints for debugging oom_score_adj 2012-01-10 16:30:44 -08:00
timer.h tracing: Fix timer tracing 2010-08-19 13:00:41 +02:00
udp.h udp: add tracepoints for queueing skb to rcvbuf 2011-06-21 16:06:10 -07:00
vmscan.h mm: vmscan: remove reclaim_mode_t 2012-05-29 16:22:19 -07:00
workqueue.h workqueue: Fix workqueue_execute_end() comment 2012-04-10 10:49:47 +02:00
writeback.h writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() 2012-05-06 13:43:38 +08:00
xen.h x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range 2012-06-27 19:29:07 -07:00