linux_dsm_epyc7002/arch/x86
Waiman Long f99fd22e4d x86/hpet: Reduce HPET counter read contention
On a large system with many CPUs, using HPET as the clock source can
have a significant impact on the overall system performance because
of the following reasons:
 1) There is a single HPET counter shared by all the CPUs.
 2) HPET counter reading is a very slow operation.

Using HPET as the default clock source may happen when, for example,
the TSC clock calibration exceeds the allowable tolerance. Something
the performance slowdown can be so severe that the system may crash
because of a NMI watchdog soft lockup, for example.

During the TSC clock calibration process, the default clock source
will be set temporarily to HPET. For systems with many CPUs, it is
possible that NMI watchdog soft lockup may occur occasionally during
that short time period where HPET clocking is active as is shown in
the kernel log below:

[   71.646504] hpet0: 8 comparators, 64-bit 14.318180 MHz counter
[   71.655313] Switching to clocksource hpet
[   95.679135] BUG: soft lockup - CPU#144 stuck for 23s! [swapper/144:0]
[   95.693363] BUG: soft lockup - CPU#145 stuck for 23s! [swapper/145:0]
[   95.695580] BUG: soft lockup - CPU#582 stuck for 23s! [swapper/582:0]
[   95.698128] BUG: soft lockup - CPU#357 stuck for 23s! [swapper/357:0]

This patch addresses the above issues by reducing HPET read contention
using the fact that if more than one CPUs are trying to access HPET at
the same time, it will be more efficient when only one CPU in the group
reads the HPET counter and shares it with the rest of the group instead
of each group member trying to read the HPET counter individually.

This is done by using a combination quadword that contains a 32-bit
stored HPET value and a 32-bit spinlock.  The CPU that gets the lock
will be responsible for reading the HPET counter and storing it in
the quadword. The others will monitor the change in HPET value and
lock status and grab the latest stored HPET value accordingly. This
change is only enabled on 64-bit SMP configuration.

On a 4-socket Haswell-EX box with 144 threads (HT on), running the
AIM7 compute workload (1500 users) on a 4.8-rc1 kernel (HZ=1000)
with and without the patch has the following performance numbers
(with HPET or TSC as clock source):

TSC		= 1042431 jobs/min
HPET w/o patch	=  798068 jobs/min
HPET with patch	= 1029445 jobs/min

The perf profile showed a reduction of the %CPU time consumed by
read_hpet from 11.19% without patch to 1.24% with patch.

[ tglx: It's really sad that we need to have such hacks just to deal with
  	the fact that cpu vendors have not managed to fix the TSC wreckage
  	within 15+ years. Were They Forgetting? ]

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Tested-by: Prarit Bhargava <prarit@redhat.com>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: Randy Wright <rwright@hpe.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1473182530-29175-1-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-09-09 15:16:19 +02:00
..
boot Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:37:12 -04:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-01 17:52:01 -07:00
crypto crypto: sha512-mb - fix ctx pointer 2016-08-16 17:09:43 +08:00
entry x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables 2016-08-10 16:05:16 +02:00
events perf/x86/intel/uncore: Add enable_box for client MSR uncore 2016-08-12 08:35:05 +02:00
ia32 mm: remove more IS_ERR_VALUE abuses 2016-05-27 15:57:31 -07:00
include mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2016-08-30 10:10:21 -07:00
kernel x86/hpet: Reduce HPET counter read contention 2016-09-09 15:16:19 +02:00
kvm kvm: nVMX: fix nested tsc scaling 2016-08-18 12:19:09 +02:00
lguest lguest: Read offset of device_cap later 2016-06-10 11:39:09 +02:00
lib x86/mm/kaslr: Fix -Wformat-security warning 2016-08-11 10:58:12 +02:00
math-emu
mm treewide: replace config_enabled() with IS_ENABLED() (2nd round) 2016-08-26 17:39:35 -07:00
net bpf, x86: add support for constant blinding 2016-05-16 13:49:32 -04:00
oprofile x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage 2016-04-13 11:37:41 +02:00
pci x86/PCI: VMD: Fix infinite loop executing irq's 2016-08-23 16:36:42 -05:00
platform Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-08-12 14:31:10 -07:00
power x86/power/64: Use __pa() for physical address computation 2016-08-16 00:39:37 +02:00
purgatory Add sancov plugin 2016-06-07 22:57:10 +02:00
ras x86/RAS/AMD: Reduce the number of IPIs when prepping error injection 2016-07-08 11:29:26 +02:00
realmode x86/boot: Rework reserve_real_mode() to allow multiple tries 2016-08-11 11:15:01 +02:00
tools x86/insn: Add AVX-512 support to the instruction decoder 2016-07-21 09:37:11 -03:00
um Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2016-08-04 19:37:59 -04:00
video
xen xen: change the type of xen_vcpu_id to uint32_t 2016-08-24 18:17:27 +01:00
.gitignore
Kbuild
Kconfig mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS 2016-08-30 10:10:21 -07:00
Kconfig.cpu
Kconfig.debug
Makefile kbuild: abort build on bad stack protector flag 2016-07-26 16:19:19 -07:00
Makefile_32.cpu
Makefile.um