Commit Graph

4506 Commits

Author SHA1 Message Date
Reinette Chatre
49e00eee00 x86/intel_rdt: Fix out-of-bounds memory access in CBM tests
While the DOC at the beginning of lib/bitmap.c explicitly states that
"The number of valid bits in a given bitmap does _not_ need to be an
exact multiple of BITS_PER_LONG.", some of the bitmap operations do
indeed access BITS_PER_LONG portions of the provided bitmap no matter
the size of the provided bitmap. For example, if bitmap_intersects()
is provided with an 8 bit bitmap the operation will access
BITS_PER_LONG bits from the provided bitmap. While the operation
ensures that these extra bits do not affect the result, the memory
is still accessed.

The capacity bitmasks (CBMs) are typically stored in u32 since they
can never exceed 32 bits. A few instances exist where a bitmap_*
operation is performed on a CBM by simply pointing the bitmap operation
to the stored u32 value.

The consequence of this pattern is that some bitmap_* operations will
access out-of-bounds memory when interacting with the provided CBM. This
is confirmed with a KASAN test that reports:

 BUG: KASAN: stack-out-of-bounds in __bitmap_intersects+0xa2/0x100

and

 BUG: KASAN: stack-out-of-bounds in __bitmap_weight+0x58/0x90

Fix this by moving any CBM provided to a bitmap operation needing
BITS_PER_LONG to an 'unsigned long' variable.

[ tglx: Changed related function arguments to unsigned long and got rid
	of the _cbm extra step ]

Fixes: 72d5050566 ("x86/intel_rdt: Add utilities to test pseudo-locked region possibility")
Fixes: 49f7b4efa1 ("x86/intel_rdt: Enable setting of exclusive mode")
Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations' size in bytes")
Fixes: 95f0b77efa ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/69a428613a53f10e80594679ac726246020ff94f.1538686926.git.reinette.chatre@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-09 08:02:15 +02:00
Ingo Molnar
22245bdf0a x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick
We have a special segment descriptor entry in the GDT, whose sole purpose is to
encode the CPU and node numbers in its limit (size) field. There are user-space
instructions that allow the reading of the limit field, which gives us a really
fast way to read the CPU and node IDs from the vDSO for example.

But the naming of related functionality does not make this clear, at all:

	VDSO_CPU_SIZE
	VDSO_CPU_MASK
	__CPU_NUMBER_SEG
	GDT_ENTRY_CPU_NUMBER
	vdso_encode_cpu_node
	vdso_read_cpu_node

There's a number of problems:

 - The 'VDSO_CPU_SIZE' doesn't really make it clear that these are number
   of bits, nor does it make it clear which 'CPU' this refers to, i.e.
   that this is about a GDT entry whose limit encodes the CPU and node number.

 - Furthermore, the 'CPU_NUMBER' naming is actively misleading as well,
   because the segment limit encodes not just the CPU number but the
   node ID as well ...

So use a better nomenclature all around: name everything related to this trick
as 'CPUNODE', to make it clear that this is something special, and add
_BITS to make it clear that these are number of bits, and propagate this to
every affected name:

	VDSO_CPU_SIZE         =>  VDSO_CPUNODE_BITS
	VDSO_CPU_MASK         =>  VDSO_CPUNODE_MASK
	__CPU_NUMBER_SEG      =>  __CPUNODE_SEG
	GDT_ENTRY_CPU_NUMBER  =>  GDT_ENTRY_CPUNODE
	vdso_encode_cpu_node  =>  vdso_encode_cpunode
	vdso_read_cpu_node    =>  vdso_read_cpunode

This, beyond being less confusing, also makes it easier to grep for all related
functionality:

  $ git grep -i cpunode arch/x86

Also, while at it, fix "return is not a function" style sloppiness in vdso_encode_cpunode().

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-08 10:45:02 +02:00
Chang S. Bae
b2e2ba578e x86/vdso: Initialize the CPU/node NR segment descriptor earlier
Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
initialized relatively late during CPU init, from the vCPU code, which
has a number of disadvantages, such as hotplug CPU notifiers and SMP
cross-calls.

Instead just initialize it much earlier, directly in cpu_init().

This reduces complexity and increases robustness.

[ mingo: Wrote new changelog. ]

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-9-git-send-email-chang.seok.bae@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-08 10:41:10 +02:00
Ingo Molnar
c0554d2d3d Merge branch 'linus' into x86/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-04 08:23:03 +02:00
Xiaochen Shen
2cc81c6992 x86/intel_rdt: Show missing resctrl mount options
In resctrl filesystem, mount options exist to enable L3/L2 CDP and MBA
Software Controller features if the platform supports them:

 mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl

But currently only "cdp" option is displayed in /proc/mounts. "cdpl2" and
"mba_MBps" options are not shown even when they are active.

Before:
 # mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 # grep resctrl /proc/mounts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp 0 0

After:
 # mount -t resctrl resctrl -o cdp,mba_MBps /sys/fs/resctrl
 # grep resctrl /proc/mounts
 /sys/fs/resctrl /sys/fs/resctrl resctrl rw,relatime,cdp,mba_MBps 0 0

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/1536796118-60135-1-git-send-email-fenghua.yu@intel.com
2018-10-03 21:53:49 +02:00
Andy Shevchenko
82159876d3 x86/intel_rdt: Switch to bitmap_zalloc()
Switch to bitmap_zalloc() to show clearly what is allocated. Besides that
it returns a pointer of bitmap type instead of opaque void *.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180830115039.63430-1-andriy.shevchenko@linux.intel.com
2018-10-03 21:53:49 +02:00
Reinette Chatre
53ed74af05 x86/intel_rdt: Re-enable pseudo-lock measurements
Commit 4a7a54a55e ("x86/intel_rdt: Disable PMU access") disabled
measurements of pseudo-locked regions because of incorrect usage
of the performance monitoring hardware.

Cache pseudo-locking measurements are now done correctly with the
in-kernel perf API and its use can be re-enabled at this time.

The adjustment to the in-kernel perf API also separated the L2 and L3
measurements that can be triggered separately from user space. The
re-enabling of the measurements is thus not a simple revert of the
original disable in order to accommodate the additional parameter
possible.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/bfb9fc31692e0c62d9ca39062e55eceb6a0635b5.1537377064.git.reinette.chatre@intel.com
2018-10-03 21:53:35 +02:00
Nathan Chancellor
88296bd42b x86/cpu/amd: Remove unnecessary parentheses
Clang warns when multiple pairs of parentheses are used for a single
conditional statement.

arch/x86/kernel/cpu/amd.c:925:14: warning: equality comparison with
extraneous parentheses [-Wparentheses-equality]
        if ((c->x86 == 6)) {
             ~~~~~~~^~~~
arch/x86/kernel/cpu/amd.c:925:14: note: remove extraneous parentheses
around the comparison to silence this warning
        if ((c->x86 == 6)) {
            ~       ^   ~
arch/x86/kernel/cpu/amd.c:925:14: note: use '=' to turn this equality
comparison into an assignment
        if ((c->x86 == 6)) {
                    ^~
                    =
1 warning generated.

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181002224511.14929-1-natechancellor@gmail.com
Link: https://github.com/ClangBuiltLinux/linux/issues/187
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-03 08:27:47 +02:00
Peter Zijlstra
f2c4db1bd8 x86/cpu: Sanitize FAM6_ATOM naming
Going primarily by:

  https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors

with additional information gleaned from other related pages; notably:

 - Bonnell shrink was called Saltwell
 - Moorefield is the Merriefield refresh which makes it Airmont

The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE

  for i in `git grep -l FAM6_ATOM` ; do
	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
  done

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: dave.hansen@linux.intel.com
Cc: len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-02 10:14:32 +02:00
Reinette Chatre
dd45407c0b x86/intel_rdt: Use perf infrastructure for measurements
The success of a cache pseudo-locked region is measured using
performance monitoring events that are programmed directly at the time
the user requests a measurement.

Modifying the performance event registers directly is not appropriate
since it circumvents the in-kernel perf infrastructure that exists to
manage these resources and provide resource arbitration to the
performance monitoring hardware.

The cache pseudo-locking measurements are modified to use the in-kernel
perf infrastructure. Performance events are created and validated with
the appropriate perf API. The performance counters are still read as
directly as possible to avoid the additional cache hits. This is
done safely by first ensuring with the perf API that the counters have
been programmed correctly and only accessing the counters in an
interrupt disabled section where they are not able to be moved.

As part of the transition to the in-kernel perf infrastructure the L2
and L3 measurements are split into two separate measurements that can
be triggered independently. This separation prevents additional cache
misses incurred during the extra testing code used to decide if a
L2 and/or L3 measurement should be made.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: peterz@infradead.org
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fc24e728b446404f42c78573c506e98cd0599873.1537468643.git.reinette.chatre@intel.com
2018-09-28 22:48:27 +02:00
Reinette Chatre
0a701c9dd5 x86/intel_rdt: Create required perf event attributes
A perf event has many attributes that are maintained in a separate
structure that should be provided when a new perf_event is created.

In preparation for the transition to perf_events the required attribute
structures are created for all the events that may be used in the
measurements. Most attributes for all the events are identical. The
actual configuration, what specifies what needs to be measured, is what
will be different between the events used. This configuration needs to
be done with X86_CONFIG that cannot be used as part of the designated
initializers used here, this will be introduced later.

Although they do look identical at this time the attribute structures
needs to be maintained separately since a perf_event will maintain a
pointer to its unique attributes.

In support of patch testing the new structs are given the unused attribute
until their use in later patches.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1822f6164e221a497648d108913d056ab675d5d0.1537377064.git.reinette.chatre@intel.com
2018-09-28 22:48:27 +02:00
Reinette Chatre
b5e4274ef7 x86/intel_rdt: Remove local register variables
Local register variables were used in an effort to improve the
accuracy of the measurement of cache residency of a pseudo-locked
region. This was done to ensure that only the cache residency of
the memory is measured and not the cache residency of the variables
used to perform the measurement.

While local register variables do accomplish the goal they do require
significant care since different architectures have different registers
available. Local register variables also cannot be used with valuable
developer tools like KASAN.

Significant testing has shown that similar accuracy in measurement
results can be obtained by replacing local register variables with
regular local variables.

Make use of local variables in the critical code but do so with
READ_ONCE() to prevent the compiler from merging or refetching reads.
Ensure these variables are initialized before the measurement starts,
and ensure it is only the local variables that are accessed during
the measurement.

With the removal of the local register variables and using READ_ONCE()
there is no longer a motivation for using a direct wrmsr call (that
avoids the additional tracing code that may clobber the local register
variables).

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: acme@kernel.org
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/f430f57347414e0691765d92b144758ab93d8407.1537377064.git.reinette.chatre@intel.com
2018-09-28 22:48:26 +02:00
Pu Wen
ac78bd7235 x86/mce: Add Hygon Dhyana support to the MCA infrastructure
The machine check architecture for Hygon Dhyana CPU is similar to the
AMD family 17h one. Add vendor checking for Hygon Dhyana to share the
code path of AMD family 17h.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: tony.luck@intel.com
Cc: thomas.lendacky@amd.com
Cc: linux-edac@vger.kernel.org
Link: https://lkml.kernel.org/r/87d8a4f16bdea0bfe0c0cf2e4a8d2c2a99b1055c.1537533369.git.puwen@hygon.cn
2018-09-27 18:28:59 +02:00
Pu Wen
1a576b23d6 x86/bugs: Add Hygon Dhyana to the respective mitigation machinery
The Hygon Dhyana CPU has the same speculative execution as AMD family
17h, so share AMD spectre mitigation code with Hygon Dhyana.

Also Hygon Dhyana is not affected by meltdown, so add exception for it.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/0861d39c8a103fc0deca15bafbc85d403666d9ef.1537533369.git.puwen@hygon.cn
2018-09-27 18:28:59 +02:00
Pu Wen
6d0ef316b9 x86/events: Add Hygon Dhyana support to PMU infrastructure
The PMU architecture for the Hygon Dhyana CPU is similar to the AMD
Family 17h one. To support it, call amd_pmu_init() to share the AMD PMU
initialization flow, and change the PMU name to "HYGON".

The Hygon Dhyana CPU supports both legacy and extension PMC MSRs (perf
counter registers and event selection registers), so add Hygon Dhyana
support in the similar way as AMD does.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/9d93ed54a975f33ef7247e0967960f4ce5d3d990.1537533369.git.puwen@hygon.cn
2018-09-27 18:28:57 +02:00
Pu Wen
39dc6f154d x86/cpu/mtrr: Support TOP_MEM2 and get MTRR number
The Hygon Dhyana CPU has a special MSR way to force WB for memory >4GB,
and support TOP_MEM2. Therefore, it is necessary to add Hygon Dhyana
support in amd_special_default_mtrr().

The number of variable MTRRs for Hygon is 2 as AMD's.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/8246f81648d014601de3812ade40e85d9c50d9b3.1537533369.git.puwen@hygon.cn
2018-09-27 18:28:57 +02:00
Pu Wen
d4f7423efd x86/cpu: Get cache info and setup cache cpumap for Hygon Dhyana
The Hygon Dhyana CPU has a topology extensions bit in CPUID. With
this bit, the kernel can get the cache information. So add support in
cpuid4_cache_lookup_regs() to get the correct cache size.

The Hygon Dhyana CPU also discovers num_cache_leaves via CPUID leaf
0x8000001d, so add support to it in find_num_cache_leaves().

Also add cacheinfo_hygon_init_llc_id() and init_hygon_cacheinfo()
functions to initialize Dhyana cache info. Setup cache cpumap in the
same way as AMD does.

Signed-off-by: Pu Wen <puwen@hygon.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: bp@alien8.de
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/2a686b2ac0e2f5a1f2f5f101124d9dd44f949731.1537533369.git.puwen@hygon.cn
2018-09-27 18:28:57 +02:00
Borislav Petkov
7eae653c80 Merge branch 'tip-x86-hygon' into tip-x86-cpu
... in order to share one commit with the EDAC tree.

Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-27 18:13:01 +02:00
Pu Wen
c9661c1e80 x86/cpu: Create Hygon Dhyana architecture support file
Add x86 architecture support for a new processor: Hygon Dhyana Family
18h. Carve out initialization code needed by Dhyana into a separate
compilation unit.

To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON.

Since Dhyana uses AMD functionality to a large degree, select
CPU_SUP_AMD which provides that functionality.

 [ bp: drop explicit license statement as it has an SPDX tag already. ]

Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: x86@kernel.org
Cc: thomas.lendacky@amd.com
Link: https://lkml.kernel.org/r/1a882065223bacbde5726f3beaa70cebd8dcd814.1537533369.git.puwen@hygon.cn
2018-09-27 16:14:05 +02:00
Jiri Kosina
bb4b3b7762 x86/speculation: Propagate information about RSB filling mitigation to sysfs
If spectrev2 mitigation has been enabled, RSB is filled on context switch
in order to protect from various classes of spectrev2 attacks.

If this mitigation is enabled, say so in sysfs for spectrev2.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm
2018-09-26 14:26:52 +02:00
Jiri Kosina
53c613fe63 x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.

Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm
2018-09-26 14:26:52 +02:00
Matthew Whitehead
2893cc8ff8 x86/CPU: Change query logic so CPUID is enabled before testing
Presently we check first if CPUID is enabled. If it is not already
enabled, then we next call identify_cpu_without_cpuid() and clear
X86_FEATURE_CPUID.

Unfortunately, identify_cpu_without_cpuid() is the function where CPUID
becomes _enabled_ on Cyrix 6x86/6x86L CPUs.

Reverse the calling sequence so that CPUID is first enabled, and then
check a second time to see if the feature has now been activated.

[ bp: Massage commit message and remove trailing whitespace. ]

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-3-tedheadster@gmail.com
2018-09-22 11:47:39 +02:00
Matthew Whitehead
03b099bdcd x86/CPU: Use correct macros for Cyrix calls
There are comments in processor-cyrix.h advising you to _not_ make calls
using the deprecated macros in this style:

  setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80);

This is because it expands the macro into a non-functioning calling
sequence. The calling order must be:

  outb(CX86_CCR2, 0x22);
  inb(0x23);

From the comments:

 * When using the old macros a line like
 *   setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
 * gets expanded to:
 *  do {
 *    outb((CX86_CCR2), 0x22);
 *    outb((({
 *        outb((CX86_CCR2), 0x22);
 *        inb(0x23);
 *    }) | 0x88), 0x23);
 *  } while (0);

The new macros fix this problem, so use them instead.

Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jia Zhang <qianyue.zj@alibaba-inc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-2-tedheadster@gmail.com
2018-09-22 11:46:56 +02:00
Borislav Petkov
7401a633c3 x86/mce-inject: Reset injection struct after injection
Clear the MCE struct which is used for collecting the injection details
after injection.

Also, populate it with more details from the machine.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180905081954.10391-1-bp@alien8.de
2018-09-21 14:28:37 +02:00
Reinette Chatre
ffb2315fd2 x86/intel_rdt: Fix incorrect loop end condition
In order to determine a sane default cache allocation for a new CAT/CDP
resource group, all resource groups are checked to determine which cache
portions are available to share. At this time all possible CLOSIDs
that can be supported by the resource is checked. This is problematic
if the resource supports more CLOSIDs than another CAT/CDP resource. In
this case, the number of CLOSIDs that could be allocated are fewer than
the number of CLOSIDs that can be supported by the resource.

Limit the check of closids to that what is supported by the system based
on the minimum across all resources.

Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-10-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:07 +02:00
Reinette Chatre
939b90b20b x86/intel_rdt: Fix exclusive mode handling of MBA resource
It is possible for a resource group to consist out of MBA as well as
CAT/CDP resources. The "exclusive" resource mode only applies to the
CAT/CDP resources since MBA allocations cannot be specified to overlap
or not. When a user requests a resource group to become "exclusive" then it
can only be successful if there are CAT/CDP resources in the group
and none of their CBMs associated with the group's CLOSID overlaps with
any other resource group.

Fix the "exclusive" mode setting by failing if there isn't any CAT/CDP
resource in the group and ensuring that the CBM checking is only done on
CAT/CDP resources.

Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-9-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:07 +02:00
Reinette Chatre
f0df4e1acf x86/intel_rdt: Fix incorrect loop end condition
A loop is used to check if a CAT resource's CBM of one CLOSID
overlaps with the CBM of another CLOSID of the same resource. The loop
is run over all CLOSIDs supported by the resource.

The problem with running the loop over all CLOSIDs supported by the
resource is that its number of supported CLOSIDs may be more than the
number of supported CLOSIDs on the system, which is the minimum number of
CLOSIDs supported across all resources.

Fix the loop to only consider the number of system supported CLOSIDs,
not all that are supported by the resource.

Fixes: 49f7b4efa ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-8-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre
32d736abed x86/intel_rdt: Do not allow pseudo-locking of MBA resource
A system supporting pseudo-locking may have MBA as well as CAT
resources of which only the CAT resources could support cache
pseudo-locking. When the schemata to be pseudo-locked is provided it
should be checked that that schemata does not attempt to pseudo-lock a
MBA resource.

Fixes: e0bdfe8e3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-7-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre
70479c012b x86/intel_rdt: Fix unchecked MSR access
When a new resource group is created, it is initialized with sane
defaults that currently assume the resource being initialized is a CAT
resource. This code path is also followed by a MBA resource that is not
allocated the same as a CAT resource and as a result we encounter the
following unchecked MSR access error:

unchecked MSR access error: WRMSR to 0xd51 (tried to write 0x0000
000000000064) at rIP: 0xffffffffae059994 (native_write_msr+0x4/0x20)
Call Trace:
mba_wrmsr+0x41/0x80
update_domains+0x125/0x130
rdtgroup_mkdir+0x270/0x500

Fix the above by ensuring the initial allocation is only attempted on a
CAT resource.

Fixes: 95f0b77ef ("x86/intel_rdt: Initialize new resource group with sane defaults")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-6-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:06 +02:00
Reinette Chatre
47d53b184a x86/intel_rdt: Fix invalid mode warning when multiple resources are managed
When multiple resources are managed by RDT, the number of CLOSIDs used
is the minimum of the CLOSIDs supported by each resource. In the function
rdt_bit_usage_show(), the annotated bitmask is created to depict how the
CAT supporting caches are being used. During this annotated bitmask
creation, each resource group is queried for its mode that is used as a
label in the annotated bitmask.

The maximum number of resource groups is currently assumed to be the
number of CLOSIDs supported by the resource for which the information is
being displayed. This is incorrect since the number of active CLOSIDs is
the minimum across all resources.

If information for a cache instance with more CLOSIDs than another is
being generated we thus encounter a warning like:

invalid mode for closid 8
WARNING: CPU: 88 PID: 1791 at [SNIP]/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c
:827 rdt_bit_usage_show+0x221/0x2b0

Fix this by ensuring that only the number of supported CLOSIDs are
considered.

Fixes: e651901187 ("x86/intel_rdt: Introduce "bit_usage" to display cache allocations details")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-5-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Reinette Chatre
c793da8e4c x86/intel_rdt: Global closid helper to support future fixes
The number of CLOSIDs supported by a system is the minimum number of
CLOSIDs supported by any of its resources. Care should be taken when
iterating over the CLOSIDs of a resource since it may be that the number
of CLOSIDs supported on the system is less than the number of CLOSIDs
supported by the resource.

Introduce a helper function that can be used to query the number of
CLOSIDs that is supported by all resources, irrespective of how many
CLOSIDs are supported by a particular resource.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-4-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Reinette Chatre
f968dc119a x86/intel_rdt: Fix size reporting of MBA resource
Chen Yu reported a divide-by-zero error when accessing the 'size'
resctrl file when a MBA resource is enabled.

divide error: 0000 [#1] SMP PTI
CPU: 93 PID: 1929 Comm: cat Not tainted 4.19.0-rc2-debug-rdt+ #25
RIP: 0010:rdtgroup_cbm_to_size+0x7e/0xa0
Call Trace:
rdtgroup_size_show+0x11a/0x1d0
seq_read+0xd8/0x3b0

Quoting Chen Yu's report: This is because for MB resource, the
r->cache.cbm_len is zero, thus calculating size in rdtgroup_cbm_to_size()
will trigger the exception.

Fix this issue in the 'size' file by getting correct memory bandwidth value
which is in MBps when MBA software controller is enabled or in percentage
when MBA software controller is disabled.

Fixes: d9b48c86eb ("x86/intel_rdt: Display resource groups' allocations in bytes")
Reported-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Xiaochen Shen" <xiaochen.shen@intel.com>
Link: https://lkml.kernel.org/r/20180904174614.26682-1-yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/1537048707-76280-3-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
Xiaochen Shen
753694a8df x86/intel_rdt: Fix data type in parsing callbacks
Each resource is associated with a parsing callback to parse the data
provided from user space when writing schemata file.

The 'data' parameter in the callbacks is defined as a void pointer which
is error prone due to lack of type check.

parse_bw() processes the 'data' parameter as a string while its caller
actually passes the parameter as a pointer to struct rdt_cbm_parse_data.
Thus, parse_bw() takes wrong data and causes failure of parsing MBA
throttle value.

To fix the issue, the 'data' parameter in all parsing callbacks is defined
and handled as a pointer to struct rdt_parse_data (renamed from struct
rdt_cbm_parse_data).

Fixes: 7604df6e16 ("x86/intel_rdt: Support flexible data to parsing callbacks")
Fixes: 9ab9aa15c3 ("x86/intel_rdt: Ensure requested schemata respects mode")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H Peter Anvin" <hpa@zytor.com>
Cc: "Tony Luck" <tony.luck@intel.com>
Cc: "Chen Yu" <yu.c.chen@intel.com>
Link: https://lkml.kernel.org/r/1537048707-76280-2-git-send-email-fenghua.yu@intel.com
2018-09-18 23:38:05 +02:00
zhong jiang
8e6b65a1b6 x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION
Get rid of local @cpu variable which is unused in the
!CONFIG_IA32_EMULATION case.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/1536806985-24197-1-git-send-email-zhongjiang@huawei.com
[ Clean up commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2018-09-15 14:57:05 +02:00
Andy Lutomirski
bf904d2762 x86/pti/64: Remove the SYSCALL64 entry trampoline
The SYSCALL64 trampoline has a couple of nice properties:

 - The usual sequence of SWAPGS followed by two GS-relative accesses to
   set up RSP is somewhat slow because the GS-relative accesses need
   to wait for SWAPGS to finish.  The trampoline approach allows
   RIP-relative accesses to set up RSP, which avoids the stall.

 - The trampoline avoids any percpu access before CR3 is set up,
   which means that no percpu memory needs to be mapped in the user
   page tables.  This prevents using Meltdown to read any percpu memory
   outside the cpu_entry_area and prevents using timing leaks
   to directly locate the percpu areas.

The downsides of using a trampoline may outweigh the upsides, however.
It adds an extra non-contiguous I$ cache line to system calls, and it
forces an indirect jump to transfer control back to the normal kernel
text after CR3 is set up.  The latter is because x86 lacks a 64-bit
direct jump instruction that could jump from the trampoline to the entry
text.  With retpolines enabled, the indirect jump is extremely slow.

Change the code to map the percpu TSS into the user page tables to allow
the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
new direct information leak, since the TSS is readable by Meltdown from the
cpu_entry_area alias regardless.  It does allow a timing attack to locate
the percpu area, but KASLR is more or less a lost cause against local
attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
on current hardware, KASLR is only useful to mitigate remote attacks that
try to attack the kernel without first gaining RCE against a vulnerable
user process.

On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
overhead from ~237ns to ~228ns.

There is a possible alternative approach: Move the trampoline within 2G of
the entry text and make a separate copy for each CPU.  This would allow a
direct jump to rejoin the normal entry path. There are pro's and con's for
this approach:

 + It avoids a pipeline stall

 - It executes from an extra page and read from another extra page during
   the syscall. The latter is because it needs to use a relative
   addressing mode to find sp1 -- it's the same *cacheline*, but accessed
   using an alias, so it's an extra TLB entry.

 - Slightly more memory. This would be one page per CPU for a simple
   implementation and 64-ish bytes per CPU or one page per node for a more
   complex implementation.

 - More code complexity.

The current approach is chosen for simplicity and because the alternative
does not provide a significant benefit, which makes it worth.

[ tglx: Added the alternative discussion to the changelog ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
2018-09-12 21:33:53 +02:00
Juergen Gross
9bad5658ea x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella
Most of the paravirt ops defined in pv_cpu_ops are for Xen PV guests
only. Define them only if CONFIG_PARAVIRT_XXL is set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-13-jgross@suse.com
2018-09-03 16:50:36 +02:00
Juergen Gross
5c83511bdb x86/paravirt: Use a single ops structure
Instead of using six globally visible paravirt ops structures combine
them in a single structure, keeping the original structures as
sub-structures.

This avoids the need to assemble struct paravirt_patch_template at
runtime on the stack each time apply_paravirt() is being called (i.e.
when loading a module).

[ tglx: Made the struct and the initializer tabular for readability sake ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Cc: virtualization@lists.linux-foundation.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: boris.ostrovsky@oracle.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180828074026.820-9-jgross@suse.com
2018-09-03 16:50:35 +02:00
Jann Horn
81fd9c1844 x86/fault: Plumb error code and fault address through to fault handlers
This is preparation for looking at trap number and fault address in the
handlers for uaccess errors. No functional change.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-kernel@vger.kernel.org
Cc: dvyukov@google.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-6-jannh@google.com
2018-09-03 15:12:09 +02:00
Filippo Sironi
8da38ebaad x86/microcode: Update the new microcode revision unconditionally
Handle the case where microcode gets loaded on the BSP's hyperthread
sibling first and the boot_cpu_data's microcode revision doesn't get
updated because of early exit due to the siblings sharing a microcode
engine.

For that, simply write the updated revision on all CPUs unconditionally.

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: prarit@redhat.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
2018-09-02 14:10:54 +02:00
Prarit Bhargava
370a132bb2 x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
When preparing an MCE record for logging, boot_cpu_data.microcode is used
to read out the microcode revision on the box.

However, on systems where late microcode update has happened, the microcode
revision output in a MCE log record is wrong because
boot_cpu_data.microcode is not updated when the microcode gets updated.

But, the microcode revision saved in boot_cpu_data's microcode member
should be kept up-to-date, regardless, for consistency.

Make it so.

Fixes: fa94d0c6e0 ("x86/MCE: Save microcode revision in machine check records")
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: sironi@amazon.de
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
2018-09-02 14:10:54 +02:00
Jacek Tomaka
f4661d293e x86/microcode: Make revision and processor flags world-readable
The microcode revision is already readable for non-root users via
/proc/cpuinfo. Thus, there's no reason to keep the same information
readable by root only in /sys/devices/system/cpu/cpuX/microcode/.

Make .../processor_flags world-readable too, while at it.

Reported-by: Tim Burgess <timb@dug.com>
Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180825035039.14409-1-jacekt@dugeo.com
2018-09-02 14:09:13 +02:00
Andi Kleen
cc51e5428e x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits
physical address space. The L1TF workaround is limited by this internal
cache address width, and needs to have one bit free there for the
mitigation to work.

Older client systems report only 36bit physical address space so the range
check decides that L1TF is not mitigated for a 36bit phys/32GB system with
some memory holes.

But since these actually have the larger internal cache width this warning
is bogus because it would only really be needed if the system had more than
43bits of memory.

Add a new internal x86_cache_bits field. Normally it is the same as the
physical bits field reported by CPUID, but for Nehalem and newerforce it to
be at least 44bits.

Change the L1TF memory size warning to use the new cache_bits field to
avoid bogus warnings and remove the bogus comment about memory size.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Reported-by: George Anchev <studio@anchev.net>
Reported-by: Christopher Snowhill <kode54@gmail.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Michael Hocko <mhocko@suse.com>
Cc: vbabka@suse.cz
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
2018-08-27 10:29:14 +02:00
Andi Kleen
1ab534e85c x86/spectre: Add missing family 6 check to microcode check
The check for Spectre microcodes does not check for family 6, only the
model numbers.

Add a family 6 check to avoid ambiguity with other families.

Fixes: a5b2966364 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes")
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org
2018-08-27 10:29:14 +02:00
Linus Torvalds
2a8a2b7c49 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:

 - Correct the L1TF fallout on 32bit and the off by one in the 'too much
   RAM for protection' calculation.

 - Add a helpful kernel message for the 'too much RAM' case

 - Unbreak the VDSO in case that the compiler desides to use indirect
   jumps/calls and emits retpolines which cannot be resolved because the
   kernel uses its own thunks, which does not work for the VDSO. Make it
   use the builtin thunks.

 - Re-export start_thread() which was unexported when the 32/64bit
   implementation was unified. start_thread() is required by modular
   binfmt handlers.

 - Trivial cleanups

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/speculation/l1tf: Suggest what to do on systems with too much RAM
  x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM
  x86/kvm/vmx: Remove duplicate l1d flush definitions
  x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
  x86/process: Re-export start_thread()
  x86/mce: Add notifier_block forward declaration
  x86/vdso: Fix vDSO build if a retpoline is emitted
2018-08-26 10:13:21 -07:00
Linus Torvalds
2923b27e54 libnvdimm-for-4.19_dax-memory-failure
* memory_failure() gets confused by dev_pagemap backed mappings. The
   recovery code has specific enabling for several possible page states
   that needs new enabling to handle poison in dax mappings. Teach
   memory_failure() about ZONE_DEVICE pages.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9ui8ACgkQYGjFFmlT
 OEpNRw//XGj9s7sezfJFeol4psJlRUd935yii/gmJRgi/yPf2VxxQG9qyM6SMBUc
 75jASfOL6FSsfxHz0kplyWzMDNdrTkNNAD+9rv80FmY7GqWgcas9DaJX7jZ994vI
 5SRO7pfvNZcXlo7IhqZippDw3yxkIU9Ufi0YQKaEUm7GFieptvCZ0p9x3VYfdvwM
 BExrxQe0X1XUF4xErp5P78+WUbKxP47DLcucRDig8Q7dmHELUdyNzo3E1SVoc7m+
 3CmvyTj6XuFQgOZw7ZKun1BJYfx/eD5ZlRJLZbx6wJHRtTXv/Uea8mZ8mJ31ykN9
 F7QVd0Pmlyxys8lcXfK+nvpL09QBE0/PhwWKjmZBoU8AdgP/ZvBXLDL/D6YuMTg6
 T4wwtPNJorfV4lVD06OliFkVI4qbKbmNsfRq43Ns7PCaLueu4U/eMaSwSH99UMaZ
 MGbO140XW2RZsHiU9yTRUmZq73AplePEjxtzR8oHmnjo45nPDPy8mucWPlkT9kXA
 oUFMhgiviK7dOo19H4eaPJGqLmHM93+x5tpYxGqTr0dUOXUadKWxMsTnkID+8Yi7
 /kzQWCFvySz3VhiEHGuWkW08GZT6aCcpkREDomnRh4MEnETlZI8bblcuXYOCLs6c
 nNf1SIMtLdlsl7U1fEX89PNeQQ2y237vEDhFQZftaalPeu/JJV0=
 =Ftop
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm memory-failure update from Dave Jiang:
 "As it stands, memory_failure() gets thoroughly confused by dev_pagemap
  backed mappings. The recovery code has specific enabling for several
  possible page states and needs new enabling to handle poison in dax
  mappings.

  In order to support reliable reverse mapping of user space addresses:

   1/ Add new locking in the memory_failure() rmap path to prevent races
      that would typically be handled by the page lock.

   2/ Since dev_pagemap pages are hidden from the page allocator and the
      "compound page" accounting machinery, add a mechanism to determine
      the size of the mapping that encompasses a given poisoned pfn.

   3/ Given pmem errors can be repaired, change the speculatively
      accessed poison protection, mce_unmap_kpfn(), to be reversible and
      otherwise allow ongoing access from the kernel.

  A side effect of this enabling is that MADV_HWPOISON becomes usable
  for dax mappings, however the primary motivation is to allow the
  system to survive userspace consumption of hardware-poison via dax.
  Specifically the current behavior is:

     mce: Uncorrected hardware memory error in user-access at af34214200
     {1}[Hardware Error]: It has been corrected by h/w and requires no further action
     mce: [Hardware Error]: Machine check events logged
     {1}[Hardware Error]: event severity: corrected
     Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users
     [..]
     Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed
     mce: Memory error not recovered
     <reboot>

  ...and with these changes:

     Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000
     Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption
     Memory failure: 0x20cb00: recovery action for dax page: Recovered

  Given all the cross dependencies I propose taking this through
  nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax
  folks"

* tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm:
  libnvdimm, pmem: Restore page attributes when clearing errors
  x86/memory_failure: Introduce {set, clear}_mce_nospec()
  x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses
  mm, memory_failure: Teach memory_failure() about dev_pagemap pages
  filesystem-dax: Introduce dax_lock_mapping_entry()
  mm, memory_failure: Collect mapping size in collect_procs()
  mm, madvise_inject_error: Let memory_failure() optionally take a page reference
  mm, dev_pagemap: Do not clear ->mapping on final put
  mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages
  filesystem-dax: Set page->index
  device-dax: Set page->index
  device-dax: Enable page_mapping()
  device-dax: Convert to vmf_insert_mixed and vm_fault_t
2018-08-25 18:43:59 -07:00
Vlastimil Babka
6a012288d6 x86/speculation/l1tf: Suggest what to do on systems with too much RAM
Two users have reported [1] that they have an "extremely unlikely" system
with more than MAX_PA/2 memory and L1TF mitigation is not effective.

Make the warning more helpful by suggesting the proper mem=X kernel boot
parameter to make it effective and a link to the L1TF document to help
decide if the mitigation is worth the unusable RAM.

[1] https://bugzilla.suse.com/show_bug.cgi?id=1105536

Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
2018-08-24 15:55:17 +02:00
Dan Williams
284ce4011b x86/memory_failure: Introduce {set, clear}_mce_nospec()
Currently memory_failure() returns zero if the error was handled. On
that result mce_unmap_kpfn() is called to zap the page out of the kernel
linear mapping to prevent speculative fetches of potentially poisoned
memory. However, in the case of dax mapped devmap pages the page may be
in active permanent use by the device driver, so it cannot be unmapped
from the kernel.

Instead of marking the page not present, marking the page UC should
be sufficient for preventing poison from being pre-fetched into the
cache. Convert mce_unmap_pfn() to set_mce_nospec() remapping the page as
UC, to hide it from speculative accesses.

Given that that persistent memory errors can be cleared by the driver,
include a facility to restore the page to cacheable operation,
clear_mce_nospec().

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <linux-edac@vger.kernel.org>
Cc: <x86@kernel.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2018-08-20 09:22:45 -07:00
Linus Torvalds
d5acba26bf Char/Misc driver patches for 4.19-rc1
Here is the bit set of char/misc drivers for 4.19-rc1
 
 There is a lot here, much more than normal, seems like everyone is
 writing new driver subsystems these days...  Anyway, major things here
 are:
 	- new FSI driver subsystem, yet-another-powerpc low-level
 	  hardware bus
 	- gnss, finally an in-kernel GPS subsystem to try to tame all of
 	  the crazy out-of-tree drivers that have been floating around
 	  for years, combined with some really hacky userspace
 	  implementations.  This is only for GNSS receivers, but you
 	  have to start somewhere, and this is great to see.
 Other than that, there are new slimbus drivers, new coresight drivers,
 new fpga drivers, and loads of DT bindings for all of these and existing
 drivers.
 
 Full details of everything is in the shortlog.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW3g7ew8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykfBgCeOG0RkSI92XVZe0hs/QYFW9kk8JYAnRBf3Qpm
 cvW7a+McOoKz/MGmEKsi
 =TNfn
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver updates from Greg KH:
 "Here is the bit set of char/misc drivers for 4.19-rc1

  There is a lot here, much more than normal, seems like everyone is
  writing new driver subsystems these days... Anyway, major things here
  are:

   - new FSI driver subsystem, yet-another-powerpc low-level hardware
     bus

   - gnss, finally an in-kernel GPS subsystem to try to tame all of the
     crazy out-of-tree drivers that have been floating around for years,
     combined with some really hacky userspace implementations. This is
     only for GNSS receivers, but you have to start somewhere, and this
     is great to see.

  Other than that, there are new slimbus drivers, new coresight drivers,
  new fpga drivers, and loads of DT bindings for all of these and
  existing drivers.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'char-misc-4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (255 commits)
  android: binder: Rate-limit debug and userspace triggered err msgs
  fsi: sbefifo: Bump max command length
  fsi: scom: Fix NULL dereference
  misc: mic: SCIF Fix scif_get_new_port() error handling
  misc: cxl: changed asterisk position
  genwqe: card_base: Use true and false for boolean values
  misc: eeprom: assignment outside the if statement
  uio: potential double frees if __uio_register_device() fails
  eeprom: idt_89hpesx: clean up an error pointer vs NULL inconsistency
  misc: ti-st: Fix memory leak in the error path of probe()
  android: binder: Show extra_buffers_size in trace
  firmware: vpd: Fix section enabled flag on vpd_section_destroy
  platform: goldfish: Retire pdev_bus
  goldfish: Use dedicated macros instead of manual bit shifting
  goldfish: Add missing includes to goldfish.h
  mux: adgs1408: new driver for Analog Devices ADGS1408/1409 mux
  dt-bindings: mux: add adi,adgs1408
  Drivers: hv: vmbus: Cleanup synic memory free path
  Drivers: hv: vmbus: Remove use of slow_virt_to_phys()
  Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
  ...
2018-08-18 11:04:51 -07:00
Linus Torvalds
9a76aba02a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   - Gustavo A. R. Silva keeps working on the implicit switch fallthru
     changes.

   - Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
     Luca Coelho.

   - Re-enable ASPM in r8169, from Kai-Heng Feng.

   - Add virtual XFRM interfaces, which avoids all of the limitations of
     existing IPSEC tunnels. From Steffen Klassert.

   - Convert GRO over to use a hash table, so that when we have many
     flows active we don't traverse a long list during accumluation.

   - Many new self tests for routing, TC, tunnels, etc. Too many
     contributors to mention them all, but I'm really happy to keep
     seeing this stuff.

   - Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.

   - Lots of cleanups and fixes in L2TP code from Guillaume Nault.

   - Add IPSEC offload support to netdevsim, from Shannon Nelson.

   - Add support for slotting with non-uniform distribution to netem
     packet scheduler, from Yousuk Seung.

   - Add UDP GSO support to mlx5e, from Boris Pismenny.

   - Support offloading of Team LAG in NFP, from John Hurley.

   - Allow to configure TX queue selection based upon RX queue, from
     Amritha Nambiar.

   - Support ethtool ring size configuration in aquantia, from Anton
     Mikaev.

   - Support DSCP and flowlabel per-transport in SCTP, from Xin Long.

   - Support list based batching and stack traversal of SKBs, this is
     very exciting work. From Edward Cree.

   - Busyloop optimizations in vhost_net, from Toshiaki Makita.

   - Introduce the ETF qdisc, which allows time based transmissions. IGB
     can offload this in hardware. From Vinicius Costa Gomes.

   - Add parameter support to devlink, from Moshe Shemesh.

   - Several multiplication and division optimizations for BPF JIT in
     nfp driver, from Jiong Wang.

   - Lots of prepatory work to make more of the packet scheduler layer
     lockless, when possible, from Vlad Buslov.

   - Add ACK filter and NAT awareness to sch_cake packet scheduler, from
     Toke Høiland-Jørgensen.

   - Support regions and region snapshots in devlink, from Alex Vesker.

   - Allow to attach XDP programs to both HW and SW at the same time on
     a given device, with initial support in nfp. From Jakub Kicinski.

   - Add TLS RX offload and support in mlx5, from Ilya Lesokhin.

   - Use PHYLIB in r8169 driver, from Heiner Kallweit.

   - All sorts of changes to support Spectrum 2 in mlxsw driver, from
     Ido Schimmel.

   - PTP support in mv88e6xxx DSA driver, from Andrew Lunn.

   - Make TCP_USER_TIMEOUT socket option more accurate, from Jon
     Maxwell.

   - Support for templates in packet scheduler classifier, from Jiri
     Pirko.

   - IPV6 support in RDS, from Ka-Cheong Poon.

   - Native tproxy support in nf_tables, from Máté Eckl.

   - Maintain IP fragment queue in an rbtree, but optimize properly for
     in-order frags. From Peter Oskolkov.

   - Improvde handling of ACKs on hole repairs, from Yuchung Cheng"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
  bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
  hv/netvsc: Fix NULL dereference at single queue mode fallback
  net: filter: mark expected switch fall-through
  xen-netfront: fix warn message as irq device name has '/'
  cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
  net: dsa: mv88e6xxx: missing unlock on error path
  rds: fix building with IPV6=m
  inet/connection_sock: prefer _THIS_IP_ to current_text_addr
  net: dsa: mv88e6xxx: bitwise vs logical bug
  net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
  ieee802154: hwsim: using right kind of iteration
  net: hns3: Add vlan filter setting by ethtool command -K
  net: hns3: Set tx ring' tc info when netdev is up
  net: hns3: Remove tx ring BD len register in hns3_enet
  net: hns3: Fix desc num set to default when setting channel
  net: hns3: Fix for phy link issue when using marvell phy driver
  net: hns3: Fix for information of phydev lost problem when down/up
  net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
  net: hns3: Add support for serdes loopback selftest
  bnxt_en: take coredump_record structure off stack
  ...
2018-08-15 15:04:25 -07:00
Guenter Roeck
1eb46908b3 x86/l1tf: Fix build error seen if CONFIG_KVM_INTEL is disabled
allmodconfig+CONFIG_INTEL_KVM=n results in the following build error.

  ERROR: "l1tf_vmx_mitigation" [arch/x86/kvm/kvm.ko] undefined!

Fixes: 5b76a3cff0 ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-15 09:44:38 -07:00
Linus Torvalds
31130a16d4 xen: features and fixes for 4.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCW3LkCgAKCRCAXGG7T9hj
 vtyfAQDTMUqfBlpz9XqFyTBTFRkP3aVtnEeE7BijYec+RXPOxwEAsiXwZPsmW/AN
 up+NEHqPvMOcZC8zJZ9THCiBgOxligY=
 =F51X
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - add dma-buf functionality to Xen grant table handling

 - fix for booting the kernel as Xen PVH dom0

 - fix for booting the kernel as a Xen PV guest with
   CONFIG_DEBUG_VIRTUAL enabled

 - other minor performance and style fixes

* tag 'for-linus-4.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/balloon: fix balloon initialization for PVH Dom0
  xen: don't use privcmd_call() from xen_mc_flush()
  xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
  xen/biomerge: Use true and false for boolean values
  xen/gntdev: don't dereference a null gntdev_dmabuf on allocation failure
  xen/spinlock: Don't use pvqspinlock if only 1 vCPU
  xen/gntdev: Implement dma-buf import functionality
  xen/gntdev: Implement dma-buf export functionality
  xen/gntdev: Add initial support for dma-buf UAPI
  xen/gntdev: Make private routines/structures accessible
  xen/gntdev: Allow mappings for DMA buffers
  xen/grant-table: Allow allocating buffers suitable for DMA
  xen/balloon: Share common memory reservation routines
  xen/grant-table: Make set/clear page private code shared
2018-08-14 16:54:22 -07:00
Linus Torvalds
958f338e96 Merge branch 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge L1 Terminal Fault fixes from Thomas Gleixner:
 "L1TF, aka L1 Terminal Fault, is yet another speculative hardware
  engineering trainwreck. It's a hardware vulnerability which allows
  unprivileged speculative access to data which is available in the
  Level 1 Data Cache when the page table entry controlling the virtual
  address, which is used for the access, has the Present bit cleared or
  other reserved bits set.

  If an instruction accesses a virtual address for which the relevant
  page table entry (PTE) has the Present bit cleared or other reserved
  bits set, then speculative execution ignores the invalid PTE and loads
  the referenced data if it is present in the Level 1 Data Cache, as if
  the page referenced by the address bits in the PTE was still present
  and accessible.

  While this is a purely speculative mechanism and the instruction will
  raise a page fault when it is retired eventually, the pure act of
  loading the data and making it available to other speculative
  instructions opens up the opportunity for side channel attacks to
  unprivileged malicious code, similar to the Meltdown attack.

  While Meltdown breaks the user space to kernel space protection, L1TF
  allows to attack any physical memory address in the system and the
  attack works across all protection domains. It allows an attack of SGX
  and also works from inside virtual machines because the speculation
  bypasses the extended page table (EPT) protection mechanism.

  The assoicated CVEs are: CVE-2018-3615, CVE-2018-3620, CVE-2018-3646

  The mitigations provided by this pull request include:

   - Host side protection by inverting the upper address bits of a non
     present page table entry so the entry points to uncacheable memory.

   - Hypervisor protection by flushing L1 Data Cache on VMENTER.

   - SMT (HyperThreading) control knobs, which allow to 'turn off' SMT
     by offlining the sibling CPU threads. The knobs are available on
     the kernel command line and at runtime via sysfs

   - Control knobs for the hypervisor mitigation, related to L1D flush
     and SMT control. The knobs are available on the kernel command line
     and at runtime via sysfs

   - Extensive documentation about L1TF including various degrees of
     mitigations.

  Thanks to all people who have contributed to this in various ways -
  patches, review, testing, backporting - and the fruitful, sometimes
  heated, but at the end constructive discussions.

  There is work in progress to provide other forms of mitigations, which
  might be less horrible performance wise for a particular kind of
  workloads, but this is not yet ready for consumption due to their
  complexity and limitations"

* 'l1tf-final' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
  x86/microcode: Allow late microcode loading with SMT disabled
  tools headers: Synchronise x86 cpufeatures.h for L1TF additions
  x86/mm/kmmio: Make the tracer robust against L1TF
  x86/mm/pat: Make set_memory_np() L1TF safe
  x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
  x86/speculation/l1tf: Invert all not present mappings
  cpu/hotplug: Fix SMT supported evaluation
  KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
  x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
  x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
  Documentation/l1tf: Remove Yonah processors from not vulnerable list
  x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
  x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
  x86: Don't include linux/irq.h from asm/hardirq.h
  x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
  x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
  x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
  x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
  x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
  cpu/hotplug: detect SMT disabled by BIOS
  ...
2018-08-14 09:46:06 -07:00
Linus Torvalds
13e091b6dd Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
 "Early TSC based time stamping to allow better boot time analysis.

  This comes with a general cleanup of the TSC calibration code which
  grew warts and duct taping over the years and removes 250 lines of
  code. Initiated and mostly implemented by Pavel with help from various
  folks"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/kvmclock: Mark kvm_get_preset_lpj() as __init
  x86/tsc: Consolidate init code
  sched/clock: Disable interrupts when calling generic_sched_clock_init()
  timekeeping: Prevent false warning when persistent clock is not available
  sched/clock: Close a hole in sched_clock_init()
  x86/tsc: Make use of tsc_calibrate_cpu_early()
  x86/tsc: Split native_calibrate_cpu() into early and late parts
  sched/clock: Use static key for sched_clock_running
  sched/clock: Enable sched clock early
  sched/clock: Move sched clock initialization and merge with generic clock
  x86/tsc: Use TSC as sched clock early
  x86/tsc: Initialize cyc2ns when tsc frequency is determined
  x86/tsc: Calibrate tsc only once
  ARM/time: Remove read_boot_clock64()
  s390/time: Remove read_boot_clock64()
  timekeeping: Default boot time offset to local_clock()
  timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
  s390/time: Add read_persistent_wall_and_boot_offset()
  x86/xen/time: Output xen sched_clock time from 0
  x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
  ...
2018-08-13 18:28:19 -07:00
Linus Torvalds
eac3411944 Merge branch 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 PTI updates from Thomas Gleixner:
 "The Speck brigade sadly provides yet another large set of patches
  destroying the perfomance which we carefully built and preserved

   - PTI support for 32bit PAE. The missing counter part to the 64bit
     PTI code implemented by Joerg.

   - A set of fixes for the Global Bit mechanics for non PCID CPUs which
     were setting the Global Bit too widely and therefore possibly
     exposing interesting memory needlessly.

   - Protection against userspace-userspace SpectreRSB

   - Support for the upcoming Enhanced IBRS mode, which is preferred
     over IBRS. Unfortunately we dont know the performance impact of
     this, but it's expected to be less horrible than the IBRS
     hammering.

   - Cleanups and simplifications"

* 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  x86/mm/pti: Move user W+X check into pti_finalize()
  x86/relocs: Add __end_rodata_aligned to S_REL
  x86/mm/pti: Clone kernel-image on PTE level for 32 bit
  x86/mm/pti: Don't clear permissions in pti_clone_pmd()
  x86/mm/pti: Fix 32 bit PCID check
  x86/mm/init: Remove freed kernel image areas from alias mapping
  x86/mm/init: Add helper for freeing kernel image pages
  x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
  mm: Allow non-direct-map arguments to free_reserved_area()
  x86/mm/pti: Clear Global bit more aggressively
  x86/speculation: Support Enhanced IBRS on future CPUs
  x86/speculation: Protect against userspace-userspace spectreRSB
  x86/kexec: Allocate 8k PGDs for PTI
  Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
  x86/mm: Remove in_nmi() warning from vmalloc_fault()
  x86/entry/32: Check for VM86 mode in slow-path check
  perf/core: Make sure the ring-buffer is mapped in all page-tables
  x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
  x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
  x86/entry/32: Add debug code to check entry/exit CR3
  ...
2018-08-13 17:54:17 -07:00
Linus Torvalds
30de24c7dd Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner:
 "Add support for pseudo-locked cache regions.

  Cache Allocation Technology (CAT) allows on certain CPUs to isolate a
  region of cache and 'lock' it. Cache pseudo-locking builds on the fact
  that a CPU can still read and write data pre-allocated outside its
  current allocated area on cache hit. With cache pseudo-locking data
  can be preloaded into a reserved portion of cache that no application
  can fill, and from that point on will only serve cache hits. The cache
  pseudo-locked memory is made accessible to user space where an
  application can map it into its virtual address space and thus have a
  region of memory with reduced average read latency.

  The locking is not perfect and gets totally screwed by WBINDV and
  similar mechanisms, but it provides a reasonable enhancement for
  certain types of latency sensitive applications.

  The implementation extends the current CAT mechanism and provides a
  generally useful exclusive CAT mode on which it builds the extra
  pseude-locked regions"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  x86/intel_rdt: Disable PMU access
  x86/intel_rdt: Fix possible circular lock dependency
  x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
  x86/intel_rdt: Support restoration of subset of permissions
  x86/intel_rdt: Fix cleanup of plr structure on error
  x86/intel_rdt: Move pseudo_lock_region_clear()
  x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
  x86/intel_rdt: Support L3 cache performance event of Broadwell
  x86/intel_rdt: More precise L2 hit/miss measurements
  x86/intel_rdt: Create character device exposing pseudo-locked region
  x86/intel_rdt: Create debugfs files for pseudo-locking testing
  x86/intel_rdt: Create resctrl debug area
  x86/intel_rdt: Ensure RDT cleanup on exit
  x86/intel_rdt: Resctrl files reflect pseudo-locked information
  x86/intel_rdt: Support creation/removal of pseudo-locked region
  x86/intel_rdt: Pseudo-lock region creation/removal core
  x86/intel_rdt: Discover supported platforms via prefetch disable bits
  x86/intel_rdt: Add utilities to test pseudo-locked region possibility
  x86/intel_rdt: Split resource group removal in two
  x86/intel_rdt: Enable entering of pseudo-locksetup mode
  ...
2018-08-13 16:01:46 -07:00
Linus Torvalds
7796916146 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Thomas Gleixner:
 "Two small updates for the CPU code:

   - Improve NUMA emulation

   - Add the EPT_AD CPU feature bit"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeatures: Add EPT_AD feature bit
  x86/numa_emulation: Introduce uniform split capability
  x86/numa_emulation: Fix emulated-to-physical node mapping
2018-08-13 14:41:53 -07:00
Linus Torvalds
37a1604680 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Thomas Gleixner:
 "A small set of changes to the RAS core:

   - Rework of the MCE bank scanning code

   - Y2038 converion"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Cleanup __mc_scan_banks()
  x86/mce: Carve out bank scanning code
  x86/mce: Remove !banks check
  x86/mce: Carve out the crashing_cpu check
  x86/mce: Always use 64-bit timestamps
2018-08-13 11:19:25 -07:00
Josh Poimboeuf
07d981ad4c x86/microcode: Allow late microcode loading with SMT disabled
The kernel unnecessarily prevents late microcode loading when SMT is
disabled.  It should be safe to allow it if all the primary threads are
online.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2018-08-10 08:32:15 +01:00
Thomas Gleixner
bc2d8d262c cpu/hotplug: Fix SMT supported evaluation
Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
on the kernel command line as it cannot differentiate between SMT disabled
by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
makes the sysfs interface unusable.

Rework this so that during bringup of the non boot CPUs the availability of
SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
'primary' thread then set the local cpu_smt_available marker and evaluate
this explicitely right after the initial SMP bringup has finished.

SMT evaulation on x86 is a trainwreck as the firmware has all the
information _before_ booting the kernel, but there is no interface to query
it.

Fixes: 73d5e2b472 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-07 12:25:30 +02:00
M. Vefa Bicakci
405c018a25 xen/pv: Call get_cpu_address_sizes to set x86_virt/phys_bits
Commit d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits
adjustment corruption") has moved the query and calculation of the
x86_virt_bits and x86_phys_bits fields of the cpuinfo_x86 struct
from the get_cpu_cap function to a new function named
get_cpu_address_sizes.

One of the call sites related to Xen PV VMs was unfortunately missed
in the aforementioned commit. This prevents successful boot-up of
kernel versions 4.17 and up in Xen PV VMs if CONFIG_DEBUG_VIRTUAL
is enabled, due to the following code path:

  enlighten_pv.c::xen_start_kernel
    mmu_pv.c::xen_reserve_special_pages
      page.h::__pa
        physaddr.c::__phys_addr
          physaddr.h::phys_addr_valid

phys_addr_valid uses boot_cpu_data.x86_phys_bits to validate physical
addresses. boot_cpu_data.x86_phys_bits is no longer populated before
the call to xen_reserve_special_pages due to the aforementioned commit
though, so the validation performed by phys_addr_valid fails, which
causes __phys_addr to trigger a BUG, preventing boot-up.

Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: xen-devel@lists.xenproject.org
Cc: x86@kernel.org
Cc: stable@vger.kernel.org # for v4.17 and up
Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-08-06 16:27:41 -04:00
Thomas Gleixner
315706049c Merge branch 'x86/pti-urgent' into x86/pti
Integrate the PTI Global bit fixes which conflict with the 32bit PTI
support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-06 20:56:34 +02:00
Paolo Bonzini
8e0b2b9166 x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
Bit 3 of ARCH_CAPABILITIES tells a hypervisor that L1D flush on vmentry is
not needed.  Add a new value to enum vmx_l1d_flush_state, which is used
either if there is no L1TF bug at all, or if bit 3 is set in ARCH_CAPABILITIES.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 17:10:19 +02:00
Paolo Bonzini
ea156d192f x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Three changes to the content of the sysfs file:

 - If EPT is disabled, L1TF cannot be exploited even across threads on the
   same core, and SMT is irrelevant.

 - If mitigation is completely disabled, and SMT is enabled, print "vulnerable"
   instead of "vulnerable, SMT vulnerable"

 - Reorder the two parts so that the main vulnerability state comes first
   and the detail on SMT is second.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 17:10:19 +02:00
Thomas Gleixner
f2701b77bb Merge 4.18-rc7 into master to pick up the KVM dependcy
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-08-05 16:39:29 +02:00
Thomas Gleixner
4a7a54a55e x86/intel_rdt: Disable PMU access
Peter is objecting to the direct PMU access in RDT. Right now the PMU usage
is broken anyway as it is not coordinated with perf.

Until this discussion settled, disable the PMU mechanics by simply
rejecting the type '2' measurement in the resctrl file.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
CC: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
2018-08-03 13:06:35 +02:00
Sai Praneeth
706d51681d x86/speculation: Support Enhanced IBRS on future CPUs
Future Intel processors will support "Enhanced IBRS" which is an "always
on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
disabled.

From the specification [1]:

 "With enhanced IBRS, the predicted targets of indirect branches
  executed cannot be controlled by software that was executed in a less
  privileged predictor mode or on another logical processor. As a
  result, software operating on a processor with enhanced IBRS need not
  use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
  privileged predictor mode. Software can isolate predictor modes
  effectively simply by setting the bit once. Software need not disable
  enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."

If Enhanced IBRS is supported by the processor then use it as the
preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
Retpoline white paper [2] states:

 "Retpoline is known to be an effective branch target injection (Spectre
  variant 2) mitigation on Intel processors belonging to family 6
  (enumerated by the CPUID instruction) that do not have support for
  enhanced IBRS. On processors that support enhanced IBRS, it should be
  used for mitigation instead of retpoline."

The reason why Enhanced IBRS is the recommended mitigation on processors
which support it is that these processors also support CET which
provides a defense against ROP attacks. Retpoline is very similar to ROP
techniques and might trigger false positives in the CET defense.

If Enhanced IBRS is selected as the mitigation technique for spectre v2,
the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
cleared. Kernel also has to make sure that IBRS bit remains set after
VMEXIT because the guest might have cleared the bit. This is already
covered by the existing x86_spec_ctrl_set_guest() and
x86_spec_ctrl_restore_host() speculation control functions.

Enhanced IBRS still requires IBPB for full mitigation.

[1] Speculative-Execution-Side-Channel-Mitigations.pdf
[2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
Both documents are available at:
https://bugzilla.kernel.org/show_bug.cgi?id=199511

Originally-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim C Chen <tim.c.chen@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
2018-08-03 12:50:34 +02:00
Peter Feiner
301d328a6f x86/cpufeatures: Add EPT_AD feature bit
Some Intel processors have an EPT feature whereby the accessed & dirty bits
in EPT entries can be updated by HW. MSR IA32_VMX_EPT_VPID_CAP exposes the
presence of this capability.

There is no point in trying to use that new feature bit in the VMX code as
VMX needs to read the MSR anyway to access other bits, but having the
feature bit for EPT_AD in place helps virtualization management as it
exposes "ept_ad" in /proc/cpuinfo/$proc/flags if the feature is present.

[ tglx: Amended changelog ]

Signed-off-by: Peter Feiner <pfeiner@google.com>
Signed-off-by: Peter Shier <pshier@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180801180657.138051-1-pshier@google.com
2018-08-03 12:36:23 +02:00
Jiri Kosina
fdf82a7856 x86/speculation: Protect against userspace-userspace spectreRSB
The article "Spectre Returns! Speculation Attacks using the Return Stack 
Buffer" [1] describes two new (sub-)variants of spectrev2-like attacks, 
making use solely of the RSB contents even on CPUs that don't fallback to 
BTB on RSB underflow (Skylake+).

Mitigate userspace-userspace attacks by always unconditionally filling RSB on
context switch when the generic spectrev2 mitigation has been enabled.

[1] https://arxiv.org/pdf/1807.07940.pdf

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1807261308190.997@cbobk.fhfr.pm
2018-07-31 00:45:15 +02:00
David S. Miller
19725496da Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-07-24 19:21:58 -07:00
Dmitry Torokhov
488dee96bb kernfs: allow creating kernfs objects with arbitrary uid/gid
This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.

When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.

Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:44:35 -07:00
Joerg Roedel
45d7b25574 x86/entry/32: Enter the kernel via trampoline stack
Use the entry-stack as a trampoline to enter the kernel. The entry-stack is
already in the cpu_entry_area and will be mapped to userspace when PTI is
enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-8-git-send-email-joro@8bytes.org
2018-07-20 01:11:37 +02:00
Borislav Petkov
9b3661cd7e x86/CPU: Call detect_nopl() only on the BSP
Make it use the setup_* variants and have it be called only on the BSP and
drop the call in generic_identify() - X86_FEATURE_NOPL will be replicated
to the APs through the forced caps. Helps to keep the mess at a manageable
level.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: peterz@infradead.org
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-11-pasha.tatashin@oracle.com
2018-07-20 00:02:39 +02:00
Pavel Tatashin
8990cac6e5 x86/jump_label: Initialize static branching early
Static branching is useful to runtime patch branches that are used in hot
path, but are infrequently changed.

The x86 clock framework is one example that uses static branches to setup
the best clock during boot and never changes it again.

It is desired to enable the TSC based sched clock early to allow fine
grained boot time analysis early on. That requires the static branching
functionality to be functional early as well.

Static branching requires patching nop instructions, thus,
arch_init_ideal_nops() must be called prior to jump_label_init().

Do all the necessary steps to call arch_init_ideal_nops() right after
early_cpu_init(), which also allows to insert a call to jump_label_init()
right after that. jump_label_init() will be called again from the generic
init code, but the code is protected against reinitialization already.

[ tglx: Massaged changelog ]

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: steven.sistare@oracle.com
Cc: daniel.m.jordan@oracle.com
Cc: linux@armlinux.org.uk
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
Cc: john.stultz@linaro.org
Cc: sboyd@codeaurora.org
Cc: hpa@zytor.com
Cc: douly.fnst@cn.fujitsu.com
Cc: prarit@redhat.com
Cc: feng.tang@intel.com
Cc: pmladek@suse.com
Cc: gnomes@lxorguk.ukuu.org.uk
Cc: linux-s390@vger.kernel.org
Cc: boris.ostrovsky@oracle.com
Cc: jgross@suse.com
Cc: pbonzini@redhat.com
Link: https://lkml.kernel.org/r/20180719205545.16512-10-pasha.tatashin@oracle.com
2018-07-20 00:02:38 +02:00
Dewet Thibaut
fbdb328c6b x86/MCE: Remove min interval polling limitation
commit b3b7c4795c ("x86/MCE: Serialize sysfs changes") introduced a min
interval limitation when setting the check interval for polled MCEs.
However, the logic is that 0 disables polling for corrected MCEs, see
Documentation/x86/x86_64/machinecheck. The limitation prevents disabling.

Remove this limitation and allow the value 0 to disable polling again.

Fixes: b3b7c4795c ("x86/MCE: Serialize sysfs changes")
Signed-off-by: Dewet Thibaut <thibaut.dewet@nokia.com>
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180716084927.24869-1-alexander.sverdlin@nokia.com
2018-07-17 17:56:25 +02:00
Greg Kroah-Hartman
83cf9cd6d5 Merge 4.18-rc5 into char-misc-next
We want the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-16 09:04:54 +02:00
Jiri Kosina
d90a7a0ec8 x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Introduce the 'l1tf=' kernel command line option to allow for boot-time
switching of mitigation that is used on processors affected by L1TF.

The possible values are:

  full
	Provides all available mitigations for the L1TF vulnerability. Disables
	SMT and enables all mitigations in the hypervisors. SMT control via
	/sys/devices/system/cpu/smt/control is still possible after boot.
	Hypervisors will issue a warning when the first VM is started in
	a potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  full,force
	Same as 'full', but disables SMT control. Implies the 'nosmt=force'
	command line option. sysfs control of SMT and the hypervisor flush
	control is disabled.

  flush
	Leaves SMT enabled and enables the conditional hypervisor mitigation.
	Hypervisors will issue a warning when the first VM is started in a
	potentially insecure configuration, i.e. SMT enabled or L1D flush
	disabled.

  flush,nosmt
	Disables SMT and enables the conditional hypervisor mitigation. SMT
	control via /sys/devices/system/cpu/smt/control is still possible
	after boot. If SMT is reenabled or flushing disabled at runtime
	hypervisors will issue a warning.

  flush,nowarn
	Same as 'flush', but hypervisors will not warn when
	a VM is started in a potentially insecure configuration.

  off
	Disables hypervisor mitigations and doesn't emit any warnings.

Default is 'flush'.

Let KVM adhere to these semantics, which means:

  - 'lt1f=full,force'	: Performe L1D flushes. No runtime control
    			  possible.

  - 'l1tf=full'
  - 'l1tf-flush'
  - 'l1tf=flush,nosmt'	: Perform L1D flushes and warn on VM start if
			  SMT has been runtime enabled or L1D flushing
			  has been run-time enabled
			  
  - 'l1tf=flush,nowarn'	: Perform L1D flushes and no warnings are emitted.
  
  - 'l1tf=off'		: L1D flushes are not performed and no warnings
			  are emitted.

KVM can always override the L1D flushing behavior using its 'vmentry_l1d_flush'
module parameter except when lt1f=full,force is set.

This makes KVM's private 'nosmt' option redundant, and as it is a bit
non-systematic anyway (this is something to control globally, not on
hypervisor level), remove that option.

Add the missing Documentation entry for the l1tf vulnerability sysfs file
while at it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.202758176@linutronix.de
2018-07-13 16:29:56 +02:00
Thomas Gleixner
fee0aede6f cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
The CPU_SMT_NOT_SUPPORTED state is set (if the processor does not support
SMT) when the sysfs SMT control file is initialized.

That was fine so far as this was only required to make the output of the
control file correct and to prevent writes in that case.

With the upcoming l1tf command line parameter, this needs to be set up
before the L1TF mitigation selection and command line parsing happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142323.121795971@linutronix.de
2018-07-13 16:29:56 +02:00
Thomas Gleixner
895ae47f99 x86/kvm: Allow runtime control of L1D flush
All mitigation modes can be switched at run time with a static key now:

 - Use sysfs_streq() instead of strcmp() to handle the trailing new line
   from sysfs writes correctly.
 - Make the static key management handle multiple invocations properly.
 - Set the module parameter file to RW

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.954525119@linutronix.de
2018-07-13 16:29:55 +02:00
Thomas Gleixner
a7b9020b06 x86/l1tf: Handle EPT disabled state proper
If Extended Page Tables (EPT) are disabled or not supported, no L1D
flushing is required. The setup function can just avoid setting up the L1D
flush for the EPT=n case.

Invoke it after the hardware setup has be done and enable_ept has the
correct state and expose the EPT disabled state in the mitigation status as
well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.612160168@linutronix.de
2018-07-13 16:29:53 +02:00
Thomas Gleixner
72c6d2db64 x86/litf: Introduce vmx status variable
Store the effective mitigation of VMX in a status variable and use it to
report the VMX state in the l1tf sysfs file.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20180713142322.433098358@linutronix.de
2018-07-13 16:29:53 +02:00
Reinette Chatre
2989360d9c x86/intel_rdt: Fix possible circular lock dependency
Lockdep is reporting a possible circular locking dependency:

 ======================================================
 WARNING: possible circular locking dependency detected
 4.18.0-rc1-test-test+ #4 Not tainted
 ------------------------------------------------------
 user_example/766 is trying to acquire lock:
 0000000073479a0f (rdtgroup_mutex){+.+.}, at: pseudo_lock_dev_mmap

 but task is already holding lock:
 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&mm->mmap_sem){++++}:
        _copy_to_user+0x1e/0x70
        filldir+0x91/0x100
        dcache_readdir+0x54/0x160
        iterate_dir+0x142/0x190
        __x64_sys_getdents+0xb9/0x170
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 -> #1 (&sb->s_type->i_mutex_key#3){++++}:
        start_creating+0x60/0x100
        debugfs_create_dir+0xc/0xc0
        rdtgroup_pseudo_lock_create+0x217/0x4d0
        rdtgroup_schemata_write+0x313/0x3d0
        kernfs_fop_write+0xf0/0x1a0
        __vfs_write+0x36/0x190
        vfs_write+0xb7/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 -> #0 (rdtgroup_mutex){+.+.}:
        __mutex_lock+0x80/0x9b0
        pseudo_lock_dev_mmap+0x2f/0x170
        mmap_region+0x3d6/0x610
        do_mmap+0x387/0x580
        vm_mmap_pgoff+0xcf/0x110
        ksys_mmap_pgoff+0x170/0x1f0
        do_syscall_64+0x86/0x200
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 other info that might help us debug this:

 Chain exists of:
   rdtgroup_mutex --> &sb->s_type->i_mutex_key#3 --> &mm->mmap_sem

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&mm->mmap_sem);
                                lock(&sb->s_type->i_mutex_key#3);
                                lock(&mm->mmap_sem);
   lock(rdtgroup_mutex);

  *** DEADLOCK ***

 1 lock held by user_example/766:
  #0: 000000001ef7a35b (&mm->mmap_sem){++++}, at: vm_mmap_pgoff+0x9f/0x110

rdtgroup_mutex is already being released temporarily during pseudo-lock
region creation to prevent the potential deadlock between rdtgroup_mutex
and mm->mmap_sem that is obtained during device_create(). Move the
debugfs creation into this area to avoid the same circular dependency.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fffb57f9c6b8285904c9a60cc91ce21591af17fe.1531332480.git.reinette.chatre@intel.com
2018-07-12 21:33:43 +02:00
Linus Torvalds
23adbe6fb5 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti updates from Thomas Gleixner:
 "Two small fixes correcting the handling of SSB mitigations on AMD
  processors"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
  x86/bugs: Update when to check for the LS_CFG SSBD mitigation
2018-07-08 13:56:25 -07:00
Jann Horn
15279df6f2 x86/mtrr: Don't copy out-of-bounds data in mtrr_write
Don't access the provided buffer out of bounds - this can cause a kernel
out-of-bounds read when invoked through sys_splice() or other things that
use kernel_write()/__kernel_write().

Fixes: 7f8ec5a4f0 ("x86/mtrr: Convert to use strncpy_from_user() helper")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180706215003.156702-1-jannh@google.com
2018-07-07 18:58:41 +02:00
Michael Kelley
7dc9b6b808 Drivers: hv: vmbus: Make TLFS #define names architecture neutral
The Hyper-V feature and hint flags in hyperv-tlfs.h are all defined
with the string "X64" in the name.  Some of these flags are indeed
x86/x64 specific, but others are not.  For the ones that are used
in architecture independent Hyper-V driver code, or will be used in
the upcoming support for Hyper-V for ARM64, this patch removes the
"X64" from the name.

This patch changes the flags that are currently known to be
used on multiple architectures. Hyper-V for ARM64 is still a
work-in-progress and the Top Level Functional Spec (TLFS) has not
been separated into x86/x64 and ARM64 areas.  So additional flags
may need to be updated later.

This patch only changes symbol names.  There are no functional
changes.

Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 13:09:15 +02:00
Michael Kelley
e9a7fda29a x86/hyperv: Add interrupt handler annotations
Add standard interrupt handler annotations to
hyperv_vector_handler(). This does not fix any observed
bug, but avoids potential removal of the code by link
time optimization and makes it consistent with
hv_stimer0_vector_handler in the same source file.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 13:02:28 +02:00
Tom Lendacky
612bc3b3d4 x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
On AMD, the presence of the MSR_SPEC_CTRL feature does not imply that the
SSBD mitigation support should use the SPEC_CTRL MSR. Other features could
have caused the MSR_SPEC_CTRL feature to be set, while a different SSBD
mitigation option is in place.

Update the SSBD support to check for the actual SSBD features that will
use the SPEC_CTRL MSR.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6ac2f49edb ("x86/bugs: Add AMD's SPEC_CTRL MSR usage")
Link: http://lkml.kernel.org/r/20180702213602.29202.33151.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:45:48 +02:00
Tom Lendacky
845d382bb1 x86/bugs: Update when to check for the LS_CFG SSBD mitigation
If either the X86_FEATURE_AMD_SSBD or X86_FEATURE_VIRT_SSBD features are
present, then there is no need to perform the check for the LS_CFG SSBD
mitigation support.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180702213553.29202.21089.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-07-03 09:45:48 +02:00
Reinette Chatre
33dc3e410a x86/intel_rdt: Make CPU information accessible for pseudo-locked regions
When a resource group enters pseudo-locksetup mode it reflects that the
platform supports cache pseudo-locking and the resource group is unused,
ready to be used for a pseudo-locked region. Until it is set up as a
pseudo-locked region the resource group is "locked down" such that no new
tasks or cpus can be assigned to it. This is accomplished in a user visible
way by making the cpus, cpus_list, and tasks resctrl files inaccassible
(user cannot read from or write to these files).

When the resource group changes to pseudo-locked mode it represents a cache
pseudo-locked region. While not appropriate to make any changes to the cpus
assigned to this region it is useful to make it easy for the user to see
which cpus are associated with the pseudo-locked region.

Modify the permissions of the cpus/cpus_list file when the resource group
changes to pseudo-locked mode to support reading (not writing).  The
information presented to the user when reading the file are the cpus
associated with the pseudo-locked region.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/12756b7963b6abc1bffe8fb560b87b75da827bd1.1530421961.git.reinette.chatre@intel.com
2018-07-03 08:38:40 +02:00
Reinette Chatre
392487def4 x86/intel_rdt: Support restoration of subset of permissions
As the mode of a resource group changes, the operations it can support may
also change. One way in which the supported operations are managed is to
modify the permissions of the files within the resource group's resctrl
directory.

At the moment only two possible permissions are supported: the default
permissions or no permissions in support for when the operation is "locked
down". It is possible where an operation on a resource group may have more
possibilities. For example, if by default changes can be made to the
resource group by writing to a resctrl file while the current settings can
be obtained by reading from the file, then it may be possible that in
another mode it is only possible to read the current settings, and not
change them.

Make it possible to modify some of the permissions of a resctrl file in
support of a more flexible way to manage the operations on a resource
group. In this preparation work the original behavior is maintained where
all permissions are restored.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/8773aadfade7bcb2c48a45fa294a04d2c03bb0a1.1530421961.git.reinette.chatre@intel.com
2018-07-03 08:38:40 +02:00
Reinette Chatre
546d3c7427 x86/intel_rdt: Fix cleanup of plr structure on error
When a resource group enters pseudo-locksetup mode a pseudo_lock_region is
associated with it. When the user writes to the resource group's schemata
file the CBM of the requested pseudo-locked region is entered into the
pseudo_lock_region struct. If any part of pseudo-lock region creation fails
the resource group will remain in pseudo-locksetup mode with the
pseudo_lock_region associated with it.

In case of failure during pseudo-lock region creation care needs to be
taken to ensure that the pseudo_lock_region struct associated with the
resource group is cleared from any pseudo-locking data - especially the
CBM. This is because the existence of a pseudo_lock_region struct with a
CBM is significant in other areas of the code, for example, the display of
bit_usage and initialization of a new resource group.

Fix the error path of pseudo-lock region creation to ensure that the
pseudo_lock_region struct is cleared at each error exit.

Fixes: 018961ae55 ("x86/intel_rdt: Pseudo-lock region creation/removal core")
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/49b4782f6d204d122cee3499e642b2772a98d2b4.1530421026.git.reinette.chatre@intel.com
2018-07-03 08:38:39 +02:00
Reinette Chatre
ce730f1cc1 x86/intel_rdt: Move pseudo_lock_region_clear()
The pseudo_lock_region_clear() function is moved to earlier in the file in
preparation for its use in functions that currently appear before it. No
functional change.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/ef098ec2a45501e23792289bff80ae3152141e2f.1530421026.git.reinette.chatre@intel.com
2018-07-03 08:38:39 +02:00
Reinette Chatre
6fc0de37f6 x86/intel_rdt: Limit C-states dynamically when pseudo-locking active
Deeper C-states impact cache content through shrinking of the cache or
flushing entire cache to memory before reducing power to the cache.
Deeper C-states will thus negatively impact the pseudo-locked regions.

To avoid impacting pseudo-locked regions C-states are limited on
pseudo-locked region creation so that cores associated with the
pseudo-locked region are prevented from entering deeper C-states.
This is accomplished by requesting a CPU latency target which will
prevent the core from entering C6 across all supported platforms.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1ef4f99dd6ba12fa6fb44c5a1141e75f952b9cd9.1529706536.git.reinette.chatre@intel.com
2018-06-24 15:35:48 +02:00
Reinette Chatre
f3be1e7b2c x86/intel_rdt: Support L3 cache performance event of Broadwell
Broadwell microarchitecture supports pseudo-locking. Add support for
the L3 cache related performance events of these systems so that
the success of pseudo-locking can be measured more accurately on these
platforms.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/36c1414e9bd17c3faf440f32b644b9c879bcbae2.1529706536.git.reinette.chatre@intel.com
2018-06-24 15:35:48 +02:00
Reinette Chatre
8a2fc0e1bc x86/intel_rdt: More precise L2 hit/miss measurements
Intel Goldmont processors supports non-architectural precise events that
can be used to give us more insight into the success of L2 cache
pseudo-locking on these platforms.

Introduce a new measurement trigger that will enable two precise events,
MEM_LOAD_UOPS_RETIRED.L2_HIT and MEM_LOAD_UOPS_RETIRED.L2_MISS, while
accessing pseudo-locked data. A new tracepoint, pseudo_lock_l2, is
created to make these results visible to the user.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/06b1456da65b543479dac8d9493e41f92f175d6c.1529706536.git.reinette.chatre@intel.com
2018-06-24 15:35:48 +02:00
Reinette Chatre
746e08590b x86/intel_rdt: Create character device exposing pseudo-locked region
After a pseudo-locked region is created it needs to be made
available to user space for usage.

A character device supporting mmap() is created for each pseudo-locked
region. A user space application can now use mmap() system call to map
pseudo-locked region into its virtual address space.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/fccbb9b20f07655ab0a4df9fa1c1babc0288aea0.1529706536.git.reinette.chatre@intel.com
2018-06-24 15:35:48 +02:00
Linus Torvalds
d4e860eaf0 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Make Xen PV guest deal with speculative store bypass correctly

   - Address more fallout from the 5-Level pagetable handling. Undo an
     __initdata annotation to avoid section mismatch and malfunction
     when post init code would touch the freed variable.

   - Handle exception fixup in math_error() before calling notify_die().
     The reverse call order incorrectly triggers notify_die() listeners
     for soemthing which is handled correctly at the site which issues
     the floating point instruction.

   - Fix an off by one in the LLC topology calculation on AMD

   - Handle non standard memory block sizes gracefully un UV platforms

   - Plug a memory leak in the microcode loader

   - Sanitize the purgatory build magic

   - Add the x86 specific device tree bindings directory to the x86
     MAINTAINER file patterns"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix 'no5lvl' handling
  Revert "x86/mm: Mark __pgtable_l5_enabled __initdata"
  x86/CPU/AMD: Fix LLC ID bit-shift calculation
  MAINTAINERS: Add file patterns for x86 device tree bindings
  x86/microcode/intel: Fix memleak in save_microcode_patch()
  x86/platform/UV: Add kernel parameter to set memory block size
  x86/platform/UV: Use new set memory block size function
  x86/platform/UV: Add adjustable set memory block size function
  x86/build: Remove unnecessary preparation for purgatory
  Revert "kexec/purgatory: Add clean-up for purgatory directory"
  x86/xen: Add call of speculative_store_bypass_ht_init() to PV paths
  x86: Call fixup_exception() before notify_die() in math_error()
2018-06-24 19:59:52 +08:00
Linus Torvalds
177d363e72 Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti fixes from Thomas Gleixner:
 "Two small updates for the speculative distractions:

   - Make it more clear to the compiler that array_index_mask_nospec()
     is not subject for optimizations. It's not perfect, but ...

   - Don't report XEN PV guests as vulnerable because their mitigation
     state depends on the hypervisor. Report unknown and refer to the
     hypervisor requirement"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
  x86/pti: Don't report XenPV as vulnerable
2018-06-24 19:48:30 +08:00
Linus Torvalds
a43de48993 Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ras fixes from Thomas Gleixner:
 "A set of fixes for RAS/MCE:

   - Improve the error message when the kernel cannot recover from a MCE
     so the maximum amount of information gets provided.

   - Individually check MCE recovery features on SkyLake CPUs instead of
     assuming none when the CAPID0 register does not advertise the
     general ability for recovery.

   - Prevent MCE to output inconsistent messages which first show an
     error location and then claim that the source is unknown.

   - Prevent overwriting MCi_STATUS in the attempt to gather more
     information when a fatal MCE has alreay been detected. This leads
     to empty status values in the printout and failing to react
     promptly on the fatal event"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix incorrect "Machine check from unknown source" message
  x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
  x86/mce: Check for alternate indication of machine check recovery on Skylake
  x86/mce: Improve error message when kernel cannot recover
2018-06-24 19:22:19 +08:00
Kirill A. Shutemov
2458e53ff7 x86/mm: Fix 'no5lvl' handling
early_identify_cpu() has to use early version of pgtable_l5_enabled()
that doesn't rely on cpu_feature_enabled().

Defining USE_EARLY_PGTABLE_L5 before all includes does the trick.

I lost the define in one of reworks of the original patch.

Fixes: 372fddf709 ("x86/mm: Introduce the 'no5lvl' kernel parameter")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180622220841.54135-3-kirill.shutemov@linux.intel.com
2018-06-23 14:20:37 +02:00
Reinette Chatre
443810fe61 x86/intel_rdt: Create debugfs files for pseudo-locking testing
There is no simple yes/no test to determine if pseudo-locking was
successful. In order to test pseudo-locking we expose a debugfs file for
each pseudo-locked region that will record the latency of reading the
pseudo-locked memory at a stride of 32 bytes (hardcoded). These numbers
will give us an idea of locking was successful or not since they will
reflect cache hits and cache misses (hardware prefetching is disabled
during the test).

The new debugfs file "pseudo_lock_measure" will, when the
pseudo_lock_mem_latency tracepoint is enabled, record the latency of
accessing each cache line twice.

Kernel tracepoints offer us histograms (when CONFIG_HIST_TRIGGERS is
enabled) that is a simple way to visualize the memory access latency
and immediately see any cache misses. For example, the hist trigger
below before trigger of the measurement will display the memory access
latency and instances at each latency:
echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/\
                           pseudo_lock_mem_latency/trigger
echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
echo 1 > /sys/kernel/debug/resctrl/<newlock>/pseudo_lock_measure
echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/6b2ea76181099d1b79ccfa7d3be24497ab2d1a45.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:51 +02:00
Reinette Chatre
37707ec6cb x86/intel_rdt: Create resctrl debug area
In preparation for support of debugging of RDT sub features the user can
now enable a RDT debugfs region.

The debug area is always enabled when CONFIG_DEBUG_FS is set as advised in
http://lkml.kernel.org/r/20180523080501.GA6822@kroah.com

Also from same discussion in above linked email, no error checking on the
debugfs creation return value since code should not behave differently when
debugging passes or fails. Even on failure the returned value can be passed
safely to other debugfs calls.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/9f553faf30866a6317f1aaaa2fe9f92de66a10d2.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:51 +02:00
Reinette Chatre
0af6a48da4 x86/intel_rdt: Ensure RDT cleanup on exit
The RDT system's initialization does not have the corresponding exit
handling to ensure everything initialized on load is cleaned up also.

Introduce the cleanup routines that complement all initialization. This
includes the removal of a duplicate rdtgroup_init() declaration.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/a9e3a2bbd731d13915d2d7bf05d4f675b4fa109b.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:50 +02:00
Reinette Chatre
f4e80d67a5 x86/intel_rdt: Resctrl files reflect pseudo-locked information
Information about resources as well as resource groups are contained in a
variety of resctrl files. Now that pseudo-locked regions can be created the
files can be updated to present appropriate information to the user.

Update the resource group's schemata file to show only the information of
the pseudo-locked region.

Update the resource group's size file to show the size in bytes of only the
pseudo-locked region.

Update the bit_usage file to use the letter 'P' for all pseudo-locked
regions.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/5ece82869b651c2178b278e00bca959f7626b6e9.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:50 +02:00
Reinette Chatre
e0bdfe8e36 x86/intel_rdt: Support creation/removal of pseudo-locked region
The user triggers the creation of a pseudo-locked region when writing a
valid schemata to the schemata file of a resource group in the
pseudo-locksetup mode.

A valid schemata is one that: (1) does not overlap with any other resource
group, (2) does not involve a cache that already contains a pseudo-locked
region within its hierarchy.

After a valid schemata is parsed the system is programmed to associate the
to be pseudo-lock bitmask with the closid associated with the resource
group. With the system set up the pseudo-locked region can be created.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/8929c3a9e2ba600e79649abe584aa28b8d0ff639.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:50 +02:00
Reinette Chatre
018961ae55 x86/intel_rdt: Pseudo-lock region creation/removal core
The user requests a pseudo-locked region by providing a schemata to a
resource group that is in the pseudo-locksetup mode. This is the
functionality that consumes the parsed user data and creates the
pseudo-locked region.

First, required information is deduced from user provided data.
This includes, how much memory does the requested bitmask represent,
which CPU the requested region is associated with, and what is the
cache line size of that cache (to learn the stride needed for locking).
Second, a contiguous block of memory matching the requested bitmask is
allocated.

Finally, pseudo-locking is performed. The resource group already has the
allocation that reflects the requested bitmask. With this class of service
active and interference minimized, the allocated memory is loaded into the
cache.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/67391160bbf06143bc62d856d3d234eb152008b7.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:49 +02:00
Reinette Chatre
f2a177292b x86/intel_rdt: Discover supported platforms via prefetch disable bits
Knowing the model specific prefetch disable bits is required to support
cache pseudo-locking because the hardware prefetchers need to be disabled
when the kernel memory is pseudo-locked to cache. We add these bits only
for platforms known to support cache pseudo-locking.

When the user requests locksetup mode to be entered it will fail if the
prefetch disabling bits are not known for the platform.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/3eef559aa9fd693a104ff99ff909cfee450c1695.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:49 +02:00
Reinette Chatre
72d5050566 x86/intel_rdt: Add utilities to test pseudo-locked region possibility
A pseudo-locked region does not have a class of service associated with
it and thus not tracked in the array of control values maintained as
part of the domain. Even so, when the user provides a new bitmask for
another resource group it needs to be checked for interference with
existing pseudo-locked regions.

Additionally only one pseudo-locked region can be created in any cache
hierarchy.

Introduce two utilities in support of above scenarios: (1) a utility
that can be used to test if a given capacity bitmask overlaps with any
pseudo-locked regions associated with a particular cache instance, (2) a
utility that can be used to test if a pseudo-locked region exists within
a particular cache hierarchy.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b8e31dbdcf22ddf71df46072647b47e7558abb32.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:49 +02:00
Reinette Chatre
17eafd0762 x86/intel_rdt: Split resource group removal in two
Resource groups used for pseudo-locking do not require the same work on
removal as the other resource groups.

The resource group removal is split in two in preparation for support of
pseudo-locking resource groups. A single re-ordering occurs - the
setting of the rdtgrp flag is moved to later. This flag is not used by
any of the code between its original and new location.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/c8cbf7a7c72480b39bb946a929dbae96c0f9aca1.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:48 +02:00
Reinette Chatre
dfe9674b04 x86/intel_rdt: Enable entering of pseudo-locksetup mode
The user can request entering pseudo-locksetup mode by writing
"pseudo-locksetup" to the mode file. Act on this request as well as
support switching from a pseudo-locksetup mode (before pseudo-locked
mode was entered). It is not supported to modify the mode once
pseudo-locked mode has been entered.

The schemata reflects the new mode by adding "uninitialized" to all
resources. The size resctrl file reports zero for all cache domains in
support of the uninitialized nature. Since there are no users of this
class of service its allocations can be ignored when searching for
appropriate default allocations for new resource groups. For the same
reason resource groups in pseudo-locksetup mode are not considered when
testing if new resource groups may overlap.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/56f553334708022903c296284e62db3bbc1ff150.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:48 +02:00
Reinette Chatre
63657c1cdf x86/intel_rdt: Support enter/exit of locksetup mode
The locksetup mode is the way in which the user communicates that the
resource group will be used for a pseudo-locked region. Locksetup mode
should thus ensure that all restrictions on a resource group are met before
locksetup mode can be entered. The resource group should also be configured
to ensure that it cannot be modified in unsupported ways when a
pseudo-locked region.

Introduce the support where the request for entering locksetup mode can be
validated. This includes: CDP is not active, no cpus or tasks are assigned
to the resource group, monitoring is not in progress on the resource
group. Once the resource group is determined ready for a pseudo-locked
region it is configured to not allow future changes to these properties.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b120f71ced30116bcc6c6f651e8a7906ae6b903d.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:47 +02:00
Reinette Chatre
e8140a2d13 x86/intel_rdt: Introduce pseudo-locked region
A pseudo-locked region is introduced representing an instance of a
pseudo-locked cache region. Each cache instance (domain) can support one
pseudo-locked region. Similarly a resource group can be used for one
pseudo-locked region.

Include a pointer to a pseudo-locked region from the domain and resource
group structures.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/9f69eb159051067703bcbc714de62e69874d5dee.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:47 +02:00
Reinette Chatre
bbcee99b67 x86/intel_rdt: Add check to determine if monitoring in progress
When a resource group is pseudo-locked it is orphaned without a class of
service associated with it. We thus do not want any monitoring in progress
on a resource group that will be used for pseudo-locking.

Introduce a test that can be used to determine if pseudo-locking in
progress on a resource group. Temporarily mark it as unused to avoid
compile warnings until it is used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/14fd9494f87ca72a213b3a197d1172d4e66ae196.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:47 +02:00
Reinette Chatre
2a5d76a4fc x86/intel_rdt: Utilities to restrict/restore access to specific files
In support of Cache Pseudo-Locking we need to restrict access to specific
resctrl files to protect the state of a resource group used for
pseudo-locking from being changed in unsupported ways.

Introduce two utilities that can be used to either restrict or restore the
access to all files irrelevant to cache pseudo-locking when pseudo-locking
in progress for the resource group.

At this time introduce a new source file, intel_rdt_pseudo_lock.c, that
will contain most of the code related to cache pseudo-locking.

Temporarily mark these new functions as unused to silence compile warnings
until they are used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/ab6319d1244366be3f9b7f9fba1c3da4810a274b.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:46 +02:00
Reinette Chatre
c966dac8a5 x86/intel_rdt: Protect against resource group changes during locking
We intend to modify file permissions to make the "tasks", "cpus", and
"cpus_list" not accessible to the user when cache pseudo-locking in
progress. Even so, it is still possible for the user to force the file
permissions (using chmod) to make them writeable. Similarly, directory
permissions will be modified to prevent future monitor group creation but
the user can override these restrictions also.

Add additional checks to the files we intend to restrict to ensure that no
modifications from user space are attempted while setting up a
pseudo-locking or after a pseudo-locked region is set up.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/0c5cb006e81ead0b8bfff2df530c5d3017fd31d1.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:46 +02:00
Reinette Chatre
125db711e3 x86/intel_rdt: Add utility to restrict/restore access to resctrl files
When a resource group is used for Cache Pseudo-Locking then the region of
cache ends up being orphaned with no class of service referring to it. The
resctrl files intended to manage how the classes of services are utilized
thus become irrelevant.

The fact that a resctrl file is not relevant can be communicated to the
user by setting all of its permissions to zero. That is, its read, write,
and execute permissions are unset for all users.

Introduce two utilities, rdtgroup_kn_mode_restrict() and
rdtgroup_kn_mode_restore(), that can be used to restrict and restore the
permissions of a file or directory belonging to a resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/7afdbf5551b2f93cd45d61fbf5e01d87331f529a.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:46 +02:00
Reinette Chatre
f7a6e3f6f5 x86/intel_rdt: Add utility to test if tasks assigned to resource group
In considering changes to a resource group it becomes necessary to know
whether tasks have been assigned to the resource group in question.

Introduce a new utility that can be used to check if any tasks have been
assigned to a particular resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/be9ea3969ffd731dfd90c0ebcd5a0e0a2d135bb2.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:45 +02:00
Reinette Chatre
21220bb199 x86/intel_rdt: Respect read and write access
By default, if the opener has CAP_DAC_OVERRIDE, a kernfs file can be opened
regardless of RW permissions. Writing to a kernfs file will thus succeed
even if permissions are 0000.

It's required to restrict the actions that can be performed on a resource
group from userspace based on the mode of the resource group.  This
restriction will be done through a modification of the file
permissions. That is, for example, if a resource group is locked then the
user cannot add tasks to the resource group.

For this restriction through file permissions to work it has to be ensured
that the permissions are always respected. To do so the resctrl filesystem
is created with the KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK flag that will result
in open(2) failing with -EACCESS regardless of CAP_DAC_OVERRIDE if the
permission does not have the respective read or write access.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/26f4fc25f110bfc07c2d2c8b2c4ee904922fedf7.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:45 +02:00
Reinette Chatre
bb9fec69cb x86/intel_rdt: Introduce the Cache Pseudo-Locking modes
The two modes used to manage Cache Pseudo-Locked regions are introduced.  A
resource group is assigned "pseudo-locksetup" mode when the user indicates
that this resource group will be used for a Cache Pseudo-Locked
region. When the Cache Pseudo-Locked region has been set up successfully
after the user wrote the requested schemata to the "schemata" file, then
the mode will automatically changed to "pseudo-locked".  The user is not
able to modify the mode to "pseudo-locked" by writing "pseudo-locked" to
the "mode" file directly.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/98d6ca129bbe7dd0932d1fcfeb3cbb65f29a8d9d.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:45 +02:00
Reinette Chatre
d9b48c86eb x86/intel_rdt: Display resource groups' allocations' size in bytes
The schemata file displays the allocations associated with each domain of
each resource. The syntax of this file reflects the capacity bitmask (CBM)
of the actual allocation. In order to determine the actual size of an
allocation the user needs to dig through three different files to query the
variables needed to compute it (the cache size, the CBM length, and the
schemata).

Introduce a new file "size" associated with each resource group that will
mirror the schemata file syntax and display the size in bytes of each
allocation.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/cc0058014c30adb88ca7d1a5abfadacbfb5edd0d.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:44 +02:00
Reinette Chatre
e651901187 x86/intel_rdt: Introduce "bit_usage" to display cache allocations details
With cache regions now explicitly marked as "shareable" or "exclusive"
we would like to communicate to the user how portions of the cache
are used.

Introduce "bit_usage" that indicates for each resource
how portions of the cache are configured to be used.

To assist the user to distinguish whether the sharing is from software or
hardware we add the following annotation:

0 - currently unused
X - currently available for sharing and used by software and hardware
H - currently used by hardware only but available for software use
S - currently used and shareable by software only
E - currently used exclusively by one resource group

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/105d44c40e582c2b7e2dccf0ae247e5e61137d4b.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:44 +02:00
Reinette Chatre
9ab9aa15c3 x86/intel_rdt: Ensure requested schemata respects mode
When the administrator requests a change in a resource group's schemata
we have to ensure that the new schemata respects the current resource
group as well as the other active resource groups' schemata.

The new schemata is not allowed to overlap with the schemata of any
exclusive resource groups. Similarly, if the resource group being
changed is exclusive then its new schemata is not allowed to overlap
with any schemata of any other active resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/b0c05b21110d3040fff45f4c1d2cfda8dba3f207.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:43 +02:00
Reinette Chatre
7604df6e16 x86/intel_rdt: Support flexible data to parsing callbacks
Each resource is associated with a configurable callback that should be
used to parse the information provided for the particular resource from
user space. In addition to the resource and domain pointers this callback
is provided with just the character buffer being parsed.

In support of flexible parsing the callback is modified to support a void
pointer as argument. This enables resources that need more data than just
the user provided data to pass its required data to the callback without
affecting the signatures for the callbacks of all the other resources.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/34baacfced4d787d994ec7015e249e6c7e619053.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:43 +02:00
Reinette Chatre
9af4c0a6dc x86/intel_rdt: Making CBM name and type more explicit
cbm_validate() receives a pointer to the variable that will be initialized
with a validated capacity bitmask. The pointer points to a variable of type
unsigned long that is immediately assigned to a variable of type u32 by the
caller on return from cbm_validate().

Let cbm_validate() initialize a variable of type u32 directly.

At this time also change tha variable name "data" within parse_cbm() to a
name more reflective of the content: "cbm_val". This frees up the generic
"data" to be used later when it is indeed used for a collection of input.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/5e29cf0209ea2deac9beacd35cbe5239a50959fb.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:43 +02:00
Reinette Chatre
49f7b4efa1 x86/intel_rdt: Enable setting of exclusive mode
The new "mode" file now accepts "exclusive" that means that the
allocations of this resource group cannot be shared.

Enable users to modify a resource group's mode to "exclusive". To
succeed it is required that there is no overlap between resource group's
current schemata and that of all the other active resource groups as
well as cache regions potentially used by other hardware entities.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/83642cbba3c8c21db7fa6bb36fe7d385d3b275f2.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
414dd2b473 x86/intel_rdt: Introduce new "exclusive" mode
At the moment all allocations are shareable. There is no way for a user to
designate that an allocation associated with a resource group cannot be
shared by another.

Introduce the new mode "exclusive". When a resource group is marked as such
it implies that no overlap is allowed between its allocation and that of
another resource group.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/f6d24672a4280fe3b24cd2da9b5f50214439c1af.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
95f0b77efa x86/intel_rdt: Initialize new resource group with sane defaults
Currently when a new resource group is created its allocations would be
those that belonged to the resource group to which its closid belonged
previously.

That is, we can encounter a case like:
mkdir newgroup
cat newgroup/schemata
L2:0=ff;1=ff
echo 'L2:0=0xf0;1=0xf0' > newgroup/schemata
cat newgroup/schemata
L2:0=0xf0;1=0xf0
rmdir newgroup
mkdir newnewgroup
cat newnewgroup/schemata
L2:0=0xf0;1=0xf0

When the new group is created it would be reasonable to expect its
allocations to be initialized with all regions that it can possibly use.
At this time these regions would be all that are shareable by other
resource groups as well as regions that are not currently used.
If the available cache region is found to be non-contiguous the
available region is adjusted to enforce validity.

When a new resource group is created the hardware is initialized with
these new default allocations.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/c468ed79340b63024111978e01430bb9589d85c0.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
024d15be38 x86/intel_rdt: Make useful functions available internally
In support of the work done to enable resource groups to have different
modes some static functions need to be available for sharing amongst
all RDT components.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/2af8fd6e937ae4fbdaa52dee1123823cb4993176.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:42 +02:00
Reinette Chatre
0b9aa65626 x86/intel_rdt: Introduce test to determine if closid is in use
During CAT feature discovery the capacity bitmasks (CBMs) associated
with all the classes of service are initialized to all ones, even if the
class of service is not in use. Introduce a test that can be used to
determine if a class of service is in use. This test enables code
interested in parsing the CBMs to know if its values are meaningful or
can be ignored.

Temporarily mark the function as unused to silence compile warnings
until it is used.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/798f8d89cd9b12df492d48c14bdc8ee3b39b1c6f.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
d48d7a57f7 x86/intel_rdt: Introduce resource group's mode resctrl file
A new resctrl file "mode" associated with each resource group is
introduced. This file will display the resource group's current mode and an
administrator can also use it to modify the resource group's mode.

Only shareable mode is currently supported.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20ab78fda26a8c8d98e18ec555f6a1f728948972.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
472ef09b40 x86/intel_rdt: Associate mode with each RDT resource group
Each RDT resource group is associated with a mode that will reflect
the level of sharing of its allocations. The default, shareable, will be
associated with each resource group on creation since it is zero and
resource groups are created with kzalloc. The managing of the mode of a
resource group will follow. The default resource group always remain
though so ensure that it is reset to the default mode when the resctrl
filesystem is unmounted.

Also introduce a utility that can be used to determine the mode of a
resource group when it is searched for based on its class of service.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/797e4e1de4e4fcdf5b5e0039354d6a28079e2015.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:41 +02:00
Reinette Chatre
eb956a636f x86/intel_rdt: Introduce RDT resource group mode
At this time there are no constraints on how bitmasks represented by
schemata can be associated with closids represented by resource groups.  A
bitmask of one class of service can without any objections overlap with the
bitmask of another class of service.

The concept of "mode" is introduced in preparation for support of control
over whether cache regions can be shared between classes of service. At
this time the only mode reflects the current cache allocations where all
can potentially be shared.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/87e88275597fbfa03ea9d41c1186bf012c831c01.1529706536.git.reinette.chatre@intel.com
2018-06-23 13:03:40 +02:00
Reinette Chatre
32206ab365 x86/intel_rdt: Provide pseudo-locking hooks within rdt_mount
Stephen Rothwell reported that the Cache Pseudo-Locking enabling and the
kernfs support for mounting with fs_context are conflicting.

In preparation for a conflict-free merge between the two repos some no-op
hooks are created within the RDT mount function being changed by the two
features. The goal is for this commit to be placed on a minimal no-rebase
branch to be consumed by both features.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: vikas.shivappa@linux.intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Cc: David Howells <dhowells@redhat.com>
Link: https://lkml.kernel.org/r/410697ead08978bd12111c0afc4ce9e7bd71a5fe.1529706536.git.reinette.chatre@intel.com
2018-06-23 12:53:19 +02:00
Suravee Suthikulpanit
964d978433 x86/CPU/AMD: Fix LLC ID bit-shift calculation
The current logic incorrectly calculates the LLC ID from the APIC ID.

Unless specified otherwise, the LLC ID should be calculated by removing
the Core and Thread ID bits from the least significant end of the APIC
ID. For more info, see "ApicId Enumeration Requirements" in any Fam17h
PPR document.

[ bp: Improve commit message. ]

Fixes: 68091ee7ac ("Calculate last level cache ID from number of sharing threads")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1528915390-30533-1-git-send-email-suravee.suthikulpanit@amd.com
2018-06-22 21:21:49 +02:00
Thomas Gleixner
7731b8bc94 Merge branch 'linus' into x86/urgent
Required to queue a dependent fix.
2018-06-22 21:20:35 +02:00
Borislav Petkov
7ce2f0393e x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
The TOPOEXT reenablement is a workaround for broken BIOSen which didn't
enable the CPUID bit. amd_get_topology_early(), however, relies on
that bit being set so that it can read out the CPUID leaf and set
smp_num_siblings properly.

Move the reenablement up to early_init_amd(). While at it, simplify
amd_get_topology_early().

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-06-22 14:52:13 +02:00
Zhenzhong Duan
0218c76626 x86/microcode/intel: Fix memleak in save_microcode_patch()
Free useless ucode_patch entry when it's replaced.

[ bp: Drop the memfree_patch() two-liner. ]

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Srinivas REDDY Eeda <srinivas.eeda@oracle.com>
Link: http://lkml.kernel.org/r/888102f0-fd22-459d-b090-a1bd8a00cb2b@default
2018-06-22 14:42:59 +02:00
Borislav Petkov
d5c84ef202 x86/mce: Cleanup __mc_scan_banks()
Correct comments, improve readability, simplify.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-7-bp@alien8.de
2018-06-22 14:37:23 +02:00
Borislav Petkov
f35565e398 x86/mce: Carve out bank scanning code
Carve out the scan loop into a separate function.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-6-bp@alien8.de
2018-06-22 14:37:23 +02:00
Borislav Petkov
45deca7d96 x86/mce: Remove !banks check
If we don't have MCA banks, we won't see machine checks anyway. Drop the
check.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-5-bp@alien8.de
2018-06-22 14:37:22 +02:00
Borislav Petkov
d3d6923cd1 x86/mce: Carve out the crashing_cpu check
Carve out the rendezvous handler timeout avoidance check into a separate
function in order to simplify the #MC handler.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180622095428.626-4-bp@alien8.de
2018-06-22 14:37:22 +02:00
Arnd Bergmann
bc39f01020 x86/mce: Always use 64-bit timestamps
The machine check timestamp uses get_seconds(), which returns an
'unsigned long' number that might overflow on 32-bit architectures (in
the distant future) and is therefore deprecated.

The normal replacement would be ktime_get_real_seconds(), but that needs
to use a sequence lock that might cause a deadlock if the MCE happens at
just the wrong moment. The __ktime_get_real_seconds() skips that lock
and is safer here, but has a miniscule risk of returning the wrong time
when we read it on a 32-bit architecture at the same time as updating
the epoch, i.e. from before y2106 overflow time to after, or vice versa.

This seems to be an acceptable risk in this particular case, and is the
same thing we do in kdb.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: y2038@lists.linaro.org
Link: http://lkml.kernel.org/r/20180618100759.1921750-1-arnd@arndb.de
2018-06-22 14:37:22 +02:00
Tony Luck
40c36e2741 x86/mce: Fix incorrect "Machine check from unknown source" message
Some injection testing resulted in the following console log:

  mce: [Hardware Error]: CPU 22: Machine Check Exception: f Bank 1: bd80000000100134
  mce: [Hardware Error]: RIP 10:<ffffffffc05292dd> {pmem_do_bvec+0x11d/0x330 [nd_pmem]}
  mce: [Hardware Error]: TSC c51a63035d52 ADDR 3234bc4000 MISC 88
  mce: [Hardware Error]: PROCESSOR 0:50654 TIME 1526502199 SOCKET 0 APIC 38 microcode 2000043
  mce: [Hardware Error]: Run the above through 'mcelog --ascii'
  Kernel panic - not syncing: Machine check from unknown source

This confused everybody because the first line quite clearly shows
that we found a logged error in "Bank 1", while the last line says
"unknown source".

The problem is that the Linux code doesn't do the right thing
for a local machine check that results in a fatal error.

It turns out that we know very early in the handler whether the
machine check is fatal. The call to mce_no_way_out() has checked
all the banks for the CPU that took the local machine check. If
it says we must crash, we can do so right away with the right
messages.

We do scan all the banks again. This means that we might initially
not see a problem, but during the second scan find something fatal.
If this happens we print a slightly different message (so I can
see if it actually every happens).

[ bp: Remove unneeded severity assignment. ]

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: stable@vger.kernel.org # 4.2
Link: http://lkml.kernel.org/r/52e049a497e86fd0b71c529651def8871c804df0.1527283897.git.tony.luck@intel.com
2018-06-22 14:35:50 +02:00
Borislav Petkov
1f74c8a647 x86/mce: Do not overwrite MCi_STATUS in mce_no_way_out()
mce_no_way_out() does a quick check during #MC to see whether some of
the MCEs logged would require the kernel to panic immediately. And it
passes a struct mce where MCi_STATUS gets written.

However, after having saved a valid status value, the next iteration
of the loop which goes over the MCA banks on the CPU, overwrites the
valid status value because we're using struct mce as storage instead of
a temporary variable.

Which leads to MCE records with an empty status value:

  mce: [Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 0: 0000000000000000
  mce: [Hardware Error]: RIP 10:<ffffffffbd42fbd7> {trigger_mce+0x7/0x10}

In order to prevent the loss of the status register value, return
immediately when severity is a panic one so that we can panic
immediately with the first fatal MCE logged. This is also the intention
of this function and not to noodle over the banks while a fatal MCE is
already logged.

Tony: read the rest of the MCA bank to populate the struct mce fully.

Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20180622095428.626-8-bp@alien8.de
2018-06-22 14:35:50 +02:00
Thomas Gleixner
1e1d7e25fd x86/cpu/AMD: Evaluate smp_num_siblings early
To support force disabling of SMT it's required to know the number of
thread siblings early. amd_get_topology() cannot be called before the APIC
driver is selected, so split out the part which initializes
smp_num_siblings and invoke it from amd_early_init().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:21:00 +02:00
Borislav Petkov
119bff8a9c x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
Old code used to check whether CPUID ext max level is >= 0x80000008 because
that last leaf contains the number of cores of the physical CPU.  The three
functions called there now do not depend on that leaf anymore so the check
can go.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
1910ad5624 x86/cpu/intel: Evaluate smp_num_siblings early
Make use of the new early detection function to initialize smp_num_siblings
on the boot cpu before the MP-Table or ACPI/MADT scan happens. That's
required for force disabling SMT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
95f3d39ccf x86/cpu/topology: Provide detect_extended_topology_early()
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_extended_topology() cannot be called before
the APIC driver is selected, so split out the part which initializes
smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
545401f444 x86/cpu/common: Provide detect_ht_early()
To support force disabling of SMT it's required to know the number of
thread siblings early. detect_ht() cannot be called before the APIC driver
is selected, so split out the part which initializes smp_num_siblings.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:59 +02:00
Thomas Gleixner
44ca36de56 x86/cpu/AMD: Remove the pointless detect_ht() call
Real 32bit AMD CPUs do not have SMT and the only value of the call was to
reach the magic printout which got removed.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:58 +02:00
Thomas Gleixner
55e6d279ab x86/cpu: Remove the pointless CPU printout
The value of this printout is dubious at best and there is no point in
having it in two different places along with convoluted ways to reach it.

Remove it completely.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:20:58 +02:00
Jiri Kosina
6cb2b08ff9 x86/pti: Don't report XenPV as vulnerable
Xen PV domain kernel is not by design affected by meltdown as it's
enforcing split CR3 itself. Let's not report such systems as "Vulnerable"
in sysfs (we're also already forcing PTI to off in X86_HYPER_XEN_PV cases);
the security of the system ultimately depends on presence of mitigation in
the Hypervisor, which can't be easily detected from DomU; let's report
that.

Reported-and-tested-by: Mike Latimer <mlatimer@suse.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1806180959080.6203@cbobk.fhfr.pm
[ Merge the user-visible string into a single line. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-21 14:14:52 +02:00
Konrad Rzeszutek Wilk
56563f53d3 x86/bugs: Move the l1tf function and define pr_fmt properly
The pr_warn in l1tf_select_mitigation would have used the prior pr_fmt
which was defined as "Spectre V2 : ".

Move the function to be past SSBD and also define the pr_fmt.

Fixes: 17dbca1193 ("x86/speculation/l1tf: Add sysfs reporting for l1tf")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-06-21 08:38:34 +02:00
Andi Kleen
17dbca1193 x86/speculation/l1tf: Add sysfs reporting for l1tf
L1TF core kernel workarounds are cheap and normally always enabled, However
they still should be reported in sysfs if the system is vulnerable or
mitigated. Add the necessary CPU feature/bug bits.

- Extend the existing checks for Meltdowns to determine if the system is
  vulnerable. All CPUs which are not vulnerable to Meltdown are also not
  vulnerable to L1TF

- Check for 32bit non PAE and emit a warning as there is no practical way
  for mitigation due to the limited physical address bits

- If the system has more than MAX_PA/2 physical memory the invert page
  workarounds don't protect the system against the L1TF attack anymore,
  because an inverted physical address will also point to valid
  memory. Print a warning in this case and report that the system is
  vulnerable.

Add a function which returns the PFN limit for the L1TF mitigation, which
will be used in follow up patches for sanity and range checks.

[ tglx: Renamed the CPU feature bit to L1TF_PTEINV ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
2018-06-20 19:10:00 +02:00
Linus Torvalds
050e9baa9d Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables
The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.

That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.

HOWEVER.

It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  # CONFIG_CC_STACKPROTECTOR_NONE is not set
  # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
  # CONFIG_CC_STACKPROTECTOR_STRONG is not set
  CONFIG_CC_STACKPROTECTOR_AUTO=y

and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
  CONFIG_CC_STACKPROTECTOR=y
  # CONFIG_CC_STACKPROTECTOR_STRONG is not set
  CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.

The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one.  This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).

This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
  CONFIG_STACKPROTECTOR=y
  CONFIG_STACKPROTECTOR_STRONG=y
  CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.

Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-14 12:21:18 +09:00
Kees Cook
6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds
f4e5b30d80 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates and fixes from Thomas Gleixner:

 - Fix the (late) fallout from the vector management rework causing
   hlist corruption and irq descriptor reference leaks caused by a
   missing sanity check.

   The straight forward fix triggered another long standing issue to
   surface. The pre rework code hid the issue due to being way slower,
   but now the chance that user space sees an EBUSY error return when
   updating irq affinities is way higher, though quite a bunch of
   userspace tools do not handle it properly despite the fact that EBUSY
   could be returned for at least 10 years.

   It turned out that the EBUSY return can be avoided completely by
   utilizing the existing delayed affinity update mechanism for irq
   remapped scenarios as well. That's a bit more error handling in the
   kernel, but avoids fruitless fingerpointing discussions with tool
   developers.

 - Decouple PHYSICAL_MASK from AMD SME as its going to be required for
   the upcoming Intel memory encryption support as well.

 - Handle legacy device ACPI detection properly for newer platforms

 - Fix the wrong argument ordering in the vector allocation tracepoint

 - Simplify the IDT setup code for the APIC=n case

 - Use the proper string helpers in the MTRR code

 - Remove a stale unused VDSO source file

 - Convert the microcode update lock to a raw spinlock as its used in
   atomic context.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
  x86/apic/vector: Print APIC control bits in debugfs
  genirq/affinity: Defer affinity setting if irq chip is busy
  x86/platform/uv: Use apic_ack_irq()
  x86/ioapic: Use apic_ack_irq()
  irq_remapping: Use apic_ack_irq()
  x86/apic: Provide apic_ack_irq()
  genirq/migration: Avoid out of line call if pending is not set
  genirq/generic_pending: Do not lose pending affinity update
  x86/apic/vector: Prevent hlist corruption and leaks
  x86/vector: Fix the args of vector_alloc tracepoint
  x86/idt: Simplify the idt_setup_apic_and_irq_gates()
  x86/platform/uv: Remove extra parentheses
  x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
  x86: Mark native_set_p4d() as __always_inline
  x86/microcode: Make the late update update_lock a raw lock for RT
  x86/mtrr: Convert to use strncpy_from_user() helper
  x86/mtrr: Convert to use match_string() helper
  x86/vdso: Remove unused file
  x86/i8237: Register device based on FADT legacy boot flag
2018-06-10 09:44:53 -07:00
Tony Luck
1d9f3e20a5 x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
New stepping of Skylake has fixes for cache occupancy and memory
bandwidth monitoring.

Update the code to enable these by default on newer steppings.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: stable@vger.kernel.org # v4.14
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: https://lkml.kernel.org/r/20180608160732.9842-1-tony.luck@intel.com
2018-06-09 16:04:34 +02:00
Tony Luck
c7d606f560 x86/mce: Improve error message when kernel cannot recover
Since we added support to add recovery from some errors inside the kernel in:

commit b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")

we have done a less than stellar job at reporting the cause of recoverable
machine checks that occur in other parts of the kernel. The user just gets
the unhelpful message:

	mce: [Hardware Error]: Machine check: Action required: unknown MCACOD

doubly unhelpful when they check the manual for the reported IA32_MSR_STATUS.MCACOD
and see that it is listed as one of the standard recoverable values.

Add an extra rule to the MCE severity table to catch this case and report it
as:

	mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel

Fixes: b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org # 4.6+
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/4cc7c465150a9a48b8b9f45d0b840278e77eb9b5.1527283897.git.tony.luck@intel.com
2018-06-07 22:22:12 +02:00
Konrad Rzeszutek Wilk
108fab4b5c x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
Both AMD and Intel can have SPEC_CTRL_MSR for SSBD.

However AMD also has two more other ways of doing it - which
are !SPEC_CTRL MSR ways.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: kvm@vger.kernel.org
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: andrew.cooper3@citrix.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180601145921.9500-4-konrad.wilk@oracle.com
2018-06-06 14:13:17 +02:00
Konrad Rzeszutek Wilk
6ac2f49edb x86/bugs: Add AMD's SPEC_CTRL MSR usage
The AMD document outlining the SSBD handling
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
mentions that if CPUID 8000_0008.EBX[24] is set we should be using
the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f)
for speculative store bypass disable.

This in effect means we should clear the X86_FEATURE_VIRT_SSBD
flag so that we would prefer the SPEC_CTRL MSR.

See the document titled:
   124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf

A copy of this document is available at
   https://bugzilla.kernel.org/show_bug.cgi?id=199889

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: kvm@vger.kernel.org
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: andrew.cooper3@citrix.com
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
2018-06-06 14:13:16 +02:00
Konrad Rzeszutek Wilk
2480986001 x86/bugs: Add AMD's variant of SSB_NO
The AMD document outlining the SSBD handling
124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
mentions that the CPUID 8000_0008.EBX[26] will mean that the
speculative store bypass disable is no longer needed.

A copy of this document is available at:
    https://bugzilla.kernel.org/show_bug.cgi?id=199889

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Cc: kvm@vger.kernel.org
Cc: andrew.cooper3@citrix.com
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lkml.kernel.org/r/20180601145921.9500-2-konrad.wilk@oracle.com
2018-06-06 14:13:16 +02:00
Linus Torvalds
ab20fd0013 Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource controller updates from Thomas Gleixner:
 "An update for the Intel Resource Director Technolgy (RDT) which adds a
  feedback driven software controller to runtime adjust the bandwidth
  allocation MSRs.

  This makes the allocations more accurate and allows to use bandwidth
  values in understandable units (MB/s) instead of using percentage
  based allocations as the original, still available, interface.

  The software controller can be enabled with a new mount option for the
  resctrl filesystem"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
  x86/intel_rdt/mba_sc: Prepare for feedback loop
  x86/intel_rdt/mba_sc: Add schemata support
  x86/intel_rdt/mba_sc: Add initialization support
  x86/intel_rdt/mba_sc: Enable/disable MBA software controller
  x86/intel_rdt/mba_sc: Documentation for MBA software controller(mba_sc)
2018-06-04 21:34:39 -07:00
Linus Torvalds
0ef283d4c7 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Thomas Gleixner:

 - Fix a stack out of bounds write in the MCE error injection code.

 - Avoid IPIs during CPU hotplug to read the MCx_MISC block address from
   a remote CPU. That's fragile and pointless because the block
   addresses are the same on all CPUs. So they can be read once and
   local.

 - Add support for MCE broadcasting on newer VIA Centaur CPUs.

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE/AMD: Read MCx_MISC block addresses on any CPU
  x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read()
  x86/MCE: Enable MCE broadcasting on new Centaur CPUs
2018-06-04 20:26:07 -07:00
Linus Torvalds
0afe832e55 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apm: Fix spelling mistake: "caculate" -> "calculate"
  x86/mtrr: Rename main.c to mtrr.c and remove duplicate prefixes
  x86: Remove pr_fmt duplicate logging prefixes
  x86/early-quirks: Rename duplicate define of dev_err
  x86/bpf: Clean up non-standard comments, to make the code more readable
2018-06-04 19:17:47 -07:00
Linus Torvalds
5cef8c2a22 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:

 - Centaur CPU updates (David Wang)

 - AMD and other CPU topology enumeration improvements and fixes
   (Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)

 - Continued 5-level paging work (Kirill A. Shutemov)

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Mark __pgtable_l5_enabled __initdata
  x86/mm: Mark p4d_offset() __always_inline
  x86/mm: Introduce the 'no5lvl' kernel parameter
  x86/mm: Stop pretending pgtable_l5_enabled is a variable
  x86/mm: Unify pgtable_l5_enabled usage in early boot code
  x86/boot/compressed/64: Fix trampoline page table address calculation
  x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
  x86/Centaur: Report correct CPU/cache topology
  x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
  x86/CPU: Make intel_num_cpu_cores() generic
  x86/CPU: Move cpu local function declarations to local header
  x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
  x86/CPU: Modify detect_extended_topology() to return result
  x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
  x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
  perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
  x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
  x86/Centaur: Initialize supported CPU features properly
2018-06-04 18:19:18 -07:00
Linus Torvalds
4057adafb3 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:

 - updates to the handling of expedited grace periods

 - updates to reduce lock contention in the rcu_node combining tree

   [ These are in preparation for the consolidation of RCU-bh,
     RCU-preempt, and RCU-sched into a single flavor, which was
     requested by Linus in response to a security flaw whose root cause
     included confusion between the multiple flavors of RCU ]

 - torture-test updates that save their users some time and effort

 - miscellaneous fixes

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  rcu/x86: Provide early rcu_cpu_starting() callback
  torture: Make kvm-find-errors.sh find build warnings
  rcutorture: Abbreviate kvm.sh summary lines
  rcutorture: Print end-of-test state in kvm.sh summary
  rcutorture: Print end-of-test state
  torture: Fold parse-torture.sh into parse-console.sh
  torture: Add a script to edit output from failed runs
  rcu: Update list of rcu_future_grace_period() trace events
  rcu: Drop early GP request check from rcu_gp_kthread()
  rcu: Simplify and inline cpu_needs_another_gp()
  rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
  rcu: Make rcu_start_this_gp() check for out-of-range requests
  rcu: Add funnel locking to rcu_start_this_gp()
  rcu: Make rcu_start_future_gp() caller select grace period
  rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()
  rcu: Clear request other than RCU_GP_FLAG_INIT at GP end
  rcu: Cleanup, don't put ->completed into an int
  rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()
  rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition
  rcu: Make rcu_migrate_callbacks wake GP kthread when needed
  ...
2018-06-04 15:54:04 -07:00
Ingo Molnar
24dd064d5b Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into x86/urgent
Merge these small and simple 1-2 commit branches into the urgent branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04 18:50:32 +02:00
Ingo Molnar
52f2b34f46 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU fix from Paul E. McKenney:

 "This additional v4.18 pull request contains a single commit that fell
  through the cracks:

      Provide early rcu_cpu_starting() callback for the benefit of the
      x86/mtrr code, which needs RCU to be available on incoming CPUs
      earlier than has been the case in the past."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-30 07:55:39 +02:00
Scott Wood
ff987fcf01 x86/microcode: Make the late update update_lock a raw lock for RT
__reload_late() is called from stop_machine context and thus cannot
acquire a non-raw spinlock on PREEMPT_RT.

Signed-off-by: Scott Wood <swood@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Clark Williams <williams@redhat.com>
Cc: Pei Zhang <pezhang@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20180524154420.24455-1-swood@redhat.com
2018-05-27 21:50:09 +02:00
Dominik Brodowski
8ecc4979b1 x86/speculation: Simplify the CPU bug detection logic
Only CPUs which speculate can speculate. Therefore, it seems prudent
to test for cpu_no_speculation first and only then determine whether
a specific speculating CPU is susceptible to store bypass speculation.
This is underlined by all CPUs currently listed in cpu_no_speculation
were present in cpu_no_spec_store_bypass as well.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: konrad.wilk@oracle.com
Link: https://lkml.kernel.org/r/20180522090539.GA24668@light.dominikbrodowski.net
2018-05-23 10:55:52 +02:00
Peter Zijlstra
f64c6013a2 rcu/x86: Provide early rcu_cpu_starting() callback
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.

RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.

Fix this by initializing RCU way early in the MTRR case.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22 16:12:26 -07:00
Linus Torvalds
3b78ce4a34 Merge branch 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Merge speculative store buffer bypass fixes from Thomas Gleixner:

 - rework of the SPEC_CTRL MSR management to accomodate the new fancy
   SSBD (Speculative Store Bypass Disable) bit handling.

 - the CPU bug and sysfs infrastructure for the exciting new Speculative
   Store Bypass 'feature'.

 - support for disabling SSB via LS_CFG MSR on AMD CPUs including
   Hyperthread synchronization on ZEN.

 - PRCTL support for dynamic runtime control of SSB

 - SECCOMP integration to automatically disable SSB for sandboxed
   processes with a filter flag for opt-out.

 - KVM integration to allow guests fiddling with SSBD including the new
   software MSR VIRT_SPEC_CTRL to handle the LS_CFG based oddities on
   AMD.

 - BPF protection against SSB

.. this is just the core and x86 side, other architecture support will
come separately.

* 'speck-v20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
  bpf: Prevent memory disambiguation attack
  x86/bugs: Rename SSBD_NO to SSB_NO
  KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
  x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
  x86/bugs: Rework spec_ctrl base and mask logic
  x86/bugs: Remove x86_spec_ctrl_set()
  x86/bugs: Expose x86_spec_ctrl_base directly
  x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
  x86/speculation: Rework speculative_store_bypass_update()
  x86/speculation: Add virtualized speculative store bypass disable support
  x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
  x86/speculation: Handle HT correctly on AMD
  x86/cpufeatures: Add FEATURE_ZEN
  x86/cpufeatures: Disentangle SSBD enumeration
  x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
  x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
  KVM: SVM: Move spec control call after restore of GS
  x86/cpu: Make alternative_msr_write work for 32-bit code
  x86/bugs: Fix the parameters alignment and missing void
  x86/bugs: Make cpu_show_common() static
  ...
2018-05-21 11:23:26 -07:00
Borislav Petkov
fbf96cf904 x86/MCE/AMD: Read MCx_MISC block addresses on any CPU
We used rdmsr_safe_on_cpu() to make sure we're reading the proper CPU's
MISC block addresses. However, that caused trouble with CPU hotplug due to
the _on_cpu() helper issuing an IPI while IRQs are disabled.

But we don't have to do that: the block addresses are the same on any CPU
so we can read them on any CPU. (What practically happens is, we read them
on the BSP and cache them, and for later reads, we service them from the
cache).

Suggested-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-19 15:21:46 +02:00
Thomas Gleixner
95b5c0a592 Merge branch 'ras/urgent' into ras/core
Pick up urgent fix as pending patch depends on it.
2018-05-19 15:20:49 +02:00
Borislav Petkov
78ce241099 x86/MCE/AMD: Cache SMCA MISC block addresses
... into a global, two-dimensional array and service subsequent reads from
that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs
disabled).

In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage
of the bank->blocks pointer.

Fixes: 27bd595027 ("x86/mce/AMD: Get address from already initialized block")
Reported-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Tested-by: Johannes Hirte <johannes.hirte@datenkhaos.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
2018-05-19 15:19:30 +02:00
Vikas Shivappa
de73f38f76 x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth
mba_sc is a feedback loop where we periodically read MBM counters and
try to restrict the bandwidth below a max value so the below is always
true:

  "current bandwidth(cur_bw) < user specified bandwidth(user_bw)"

The frequency of these checks is currently 1s and we just tag along the
MBM overflow timer to do the updates. Doing it once in a second also
makes the calculation of bandwidth easy. The steps of increase or
decrease of bandwidth is the minimum granularity specified by the
hardware.

Although the MBA's goal is to restrict the bandwidth below a maximum,
there may be a need to even increase the bandwidth. Since MBA controls
the L2 external bandwidth where as MBM measures the L3 external
bandwidth, we may end up restricting some rdtgroups unnecessarily. This
may happen in the sequence where rdtgroup (set of jobs) had high
"L3 <-> memory traffic" in initial phases -> mba_sc kicks in and reduced
bandwidth percentage values -> but after some it has mostly "L2 <-> L3"
traffic. In this scenario mba_sc increases the bandwidth percentage when
there is lesser memory traffic.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-7-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
ba0f26d852 x86/intel_rdt/mba_sc: Prepare for feedback loop
This is a preparatory patch for the mba feedback loop. Add support to
measure the "bandwidth in MBps" and the "delta bandwidth". Measure it by
reading the MBM IA32_QM_CTR MSRs and calculating the amount of "bytes"
moved. There is no user space interface for this and will only be used by
the feedback loop patch.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-6-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
8205a078ba x86/intel_rdt/mba_sc: Add schemata support
Currently when user updates the "schemata" with new MBA percentage
values, kernel writes the corresponding bandwidth percentage values to
the IA32_MBA_THRTL_MSR.

When MBA is expressed in MBps, the schemata format is changed to have the
per package memory bandwidth in MBps instead of being specified in
percentage. Do not write the IA32_MBA_THRTL_MSRs when the schemata is
updated as that is handled separately.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-5-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:44 +02:00
Vikas Shivappa
1bd2a63b4f x86/intel_rdt/mba_sc: Add initialization support
When MBA software controller is enabled, a per domain storage is required
for user specified bandwidth in "MBps" and the "percentage" values which
are programmed into the IA32_MBA_THRTL_MSR. Add support for these data
structures and initialization.

The MBA percentage values have a default max value of 100 but however the
max value in MBps is not available from the hardware so it's set to
U32_MAX.

This simply says that the control group can use all bandwidth by default
but does not say what is the actual max bandwidth available. The actual
bandwidth that is available may depend on lot of factors like QPI link,
number of memory channels, memory channel frequency, its width and memory
speed, how many channels are configured and also if memory interleaving is
enabled. So there is no way to determine the maximum at runtime reliably.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-4-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:43 +02:00
Vikas Shivappa
19c635ab24 x86/intel_rdt/mba_sc: Enable/disable MBA software controller
Currently user does memory bandwidth allocation(MBA) by specifying the
bandwidth in percentage via the resctrl schemata file:
	"/sys/fs/resctrl/schemata"

Add a new mount option "mba_MBps" to enable the user to specify MBA
in MBps:

$mount -t resctrl resctrl [-o cdp[,cdpl2][mba_MBps]] /sys/fs/resctrl

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ravi.v.shankar@intel.com
Cc: tony.luck@intel.com
Cc: fenghua.yu@intel.com
Cc: vikas.shivappa@intel.com
Cc: ak@linux.intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1524263781-14267-3-git-send-email-vikas.shivappa@linux.intel.com
2018-05-19 13:16:43 +02:00
Kirill A. Shutemov
372fddf709 x86/mm: Introduce the 'no5lvl' kernel parameter
This kernel parameter allows to force kernel to use 4-level paging even
if hardware and kernel support 5-level paging.

The option may be useful to work around regressions related to 5-level
paging.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-5-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Ingo Molnar
177bfd725b Merge branches 'x86/urgent' and 'core/urgent' into x86/boot, to pick up fixes and avoid conflicts
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 08:18:56 +02:00
Konrad Rzeszutek Wilk
240da953fc x86/bugs: Rename SSBD_NO to SSB_NO
The "336996 Speculative Execution Side Channel Mitigations" from
May defines this as SSB_NO, hence lets sync-up.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-18 11:17:30 +02:00
Tom Lendacky
bc226f07dc KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using
speculative store bypass disable (SSBD) under SVM.  This will allow guests
to use SSBD on hardware that uses non-architectural mechanisms for enabling
SSBD.

[ tglx: Folded the migration fixup from Paolo Bonzini ]

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17 17:09:21 +02:00
Thomas Gleixner
47c61b3955 x86/speculation, KVM: Implement support for VIRT_SPEC_CTRL/LS_CFG
Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to
x86_virt_spec_ctrl().  If either X86_FEATURE_LS_CFG_SSBD or
X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl
argument to check whether the state must be modified on the host. The
update reuses speculative_store_bypass_update() so the ZEN-specific sibling
coordination can be reused.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-17 17:09:21 +02:00
Thomas Gleixner
be6fcb5478 x86/bugs: Rework spec_ctrl base and mask logic
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value
which are not to be modified. However the implementation is not really used
and the bitmask was inverted to make a check easier, which was removed in
"x86/bugs: Remove x86_spec_ctrl_set()"

Aside of that it is missing the STIBP bit if it is supported by the
platform, so if the mask would be used in x86_virt_spec_ctrl() then it
would prevent a guest from setting STIBP.

Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to
sanitize the value which is supplied by the guest.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17 17:09:20 +02:00
Thomas Gleixner
4b59bdb569 x86/bugs: Remove x86_spec_ctrl_set()
x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there
provide no real value as both call sites can just write x86_spec_ctrl_base
to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra
masking or checking.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:20 +02:00
Thomas Gleixner
fa8ac49882 x86/bugs: Expose x86_spec_ctrl_base directly
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR.
x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to
prevent modification to that variable. Though the variable is read only
after init and globaly visible already.

Remove the function and export the variable instead.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Borislav Petkov
cc69b34989 x86/bugs: Unify x86_spec_ctrl_{set_guest,restore_host}
Function bodies are very similar and are going to grow more almost
identical code. Add a bool arg to determine whether SPEC_CTRL is being set
for the guest or restored to the host.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Thomas Gleixner
0270be3e34 x86/speculation: Rework speculative_store_bypass_update()
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse
speculative_store_bypass_update() to avoid code duplication. Add an
argument for supplying a thread info (TIF) value and create a wrapper
speculative_store_bypass_update_current() which is used at the existing
call site.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:19 +02:00
Tom Lendacky
11fb068349 x86/speculation: Add virtualized speculative store bypass disable support
Some AMD processors only support a non-architectural means of enabling
speculative store bypass disable (SSBD).  To allow a simplified view of
this to a guest, an architectural definition has been created through a new
CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f.  With this, a
hypervisor can virtualize the existence of this definition and provide an
architectural method for using SSBD to a guest.

Add the new CPUID feature, the new MSR and update the existing SSBD
support to use this MSR when present.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
ccbcd26744 x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store
Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care
about the bit position of the SSBD bit and thus facilitate migration.
Also, the sibling coordination on Family 17H CPUs can only be done on
the host.

Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an
extra argument for the VIRT_SPEC_CTRL MSR.

Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU
data structure which is going to be used in later patches for the actual
implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
d1035d9718 x86/cpufeatures: Add FEATURE_ZEN
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations
can be built for ZEN.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:18 +02:00
Thomas Gleixner
52817587e7 x86/cpufeatures: Disentangle SSBD enumeration
The SSBD enumeration is similarly to the other bits magically shared
between Intel and AMD though the mechanisms are different.

Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific
features or family dependent setup.

Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is
controlled via MSR_SPEC_CTRL and fix up the usage sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:17 +02:00
Thomas Gleixner
7eb8956a7f x86/cpufeatures: Disentangle MSR_SPEC_CTRL enumeration from IBRS
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on
Intel and implied by IBRS or STIBP support on AMD. That's just confusing
and in case an AMD CPU has IBRS not supported because the underlying
problem has been fixed but has another bit valid in the SPEC_CTRL MSR,
the thing falls apart.

Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the
availability on both Intel and AMD.

While at it replace the boot_cpu_has() checks with static_cpu_has() where
possible. This prevents late microcode loading from exposing SPEC_CTRL, but
late loading is already very limited as it does not reevaluate the
mitigation options and other bits and pieces. Having static_cpu_has() is
the simplest and least fragile solution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-17 17:09:17 +02:00
Borislav Petkov
e7c587da12 x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
Intel and AMD have different CPUID bits hence for those use synthetic bits
which get set on the respective vendor's in init_speculation_control(). So
that debacles like what the commit message of

  c65732e4f7 ("x86/cpu: Restore CPUID_8000_0008_EBX reload")

talks about don't happen anymore.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
2018-05-17 17:09:16 +02:00
Andy Shevchenko
7f8ec5a4f0 x86/mtrr: Convert to use strncpy_from_user() helper
Replace the open coded string fetch from user-space with strncpy_from_user().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20180515180535.89703-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 09:47:23 +02:00
Andy Shevchenko
13a4db9d75 x86/mtrr: Convert to use match_string() helper
The helper returns index of the matching string in an array.
Replace the open coded array lookup with match_string().

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20180515175759.89315-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-16 09:47:22 +02:00
Joe Perches
e6d8c84a58 x86/mtrr: Rename main.c to mtrr.c and remove duplicate prefixes
Kbuild uses the first file as the name for KBUILD_MODNAME.
mtrr uses main.c as its first file, so rename that file to mtrr.c
and fixup the Makefile.

Remove the now duplicate "mtrr: " prefixes from the logging calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/ae1fa81a0d1fad87571967b91ea90f70f486e853.1525964384.git.joe@perches.com
2018-05-13 21:25:18 +02:00
Thomas Gleixner
9305bd6ca7 x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
No point to have it at the call sites.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-13 16:14:24 +02:00
David Wang
a2aa578fec x86/Centaur: Report correct CPU/cache topology
Centaur CPUs enumerate the cache topology in the same way as Intel CPUs,
but the function is unused so for. The Centaur init code also misses to
initialize x86_info::max_cores, so the CPU topology can't be described
correctly.

Initialize x86_info::max_cores and invoke init_cacheinfo() to make
CPU and cache topology information available and correct.

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lukelin@viacpu.com
Cc: qiyuanwang@zhaoxin.com
Cc: gregkh@linuxfoundation.org
Cc: brucechang@via-alliance.com
Cc: timguo@zhaoxin.com
Cc: cooperyan@zhaoxin.com
Cc: hpa@zytor.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1525314766-18910-4-git-send-email-davidwang@zhaoxin.com
2018-05-13 16:14:24 +02:00
David Wang
807e9bc8e2 x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
There is no point in having the conditional cpu_detect_cache_sizes() call
at the callsite of init_intel_cacheinfo().

Move it into init_intel_cacheinfo() and make init_intel_cacheinfo() void.

[ tglx: Made the init_intel_cacheinfo() void as the return value was
  	pointless. Adjust changelog accordingly ]

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lukelin@viacpu.com
Cc: qiyuanwang@zhaoxin.com
Cc: gregkh@linuxfoundation.org
Cc: brucechang@via-alliance.com
Cc: timguo@zhaoxin.com
Cc: cooperyan@zhaoxin.com
Cc: hpa@zytor.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1525314766-18910-3-git-send-email-davidwang@zhaoxin.com
2018-05-13 16:14:24 +02:00
David Wang
2cc61be60e x86/CPU: Make intel_num_cpu_cores() generic
intel_num_cpu_cores() is a static function in intel.c which can't be used
by other files. Define another function called detect_num_cpu_cores() in
common.c to replace this function so it can be reused.

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lukelin@viacpu.com
Cc: qiyuanwang@zhaoxin.com
Cc: gregkh@linuxfoundation.org
Cc: brucechang@via-alliance.com
Cc: timguo@zhaoxin.com
Cc: cooperyan@zhaoxin.com
Cc: hpa@zytor.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1525314766-18910-2-git-send-email-davidwang@zhaoxin.com
2018-05-13 12:06:12 +02:00
Thomas Gleixner
b5cf8707e6 x86/CPU: Move cpu local function declarations to local header
No point in exposing all these functions globaly as they are strict local
to the cpu management code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-13 12:06:12 +02:00
Konrad Rzeszutek Wilk
ffed645e3b x86/bugs: Fix the parameters alignment and missing void
Fixes: 7bb4d366c ("x86/bugs: Make cpu_show_common() static")
Fixes: 24f7fc83b ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-12 11:33:35 +02:00
Jiri Kosina
7bb4d366cb x86/bugs: Make cpu_show_common() static
cpu_show_common() is not used outside of arch/x86/kernel/cpu/bugs.c, so
make it static.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-10 22:58:11 +02:00
Jiri Kosina
d66d8ff3d2 x86/bugs: Fix __ssb_select_mitigation() return type
__ssb_select_mitigation() returns one of the members of enum ssb_mitigation,
not ssb_mitigation_cmd; fix the prototype to reflect that.

Fixes: 24f7fc83b9 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-10 22:58:11 +02:00
Konrad Rzeszutek Wilk
9f65fb2937 x86/bugs: Rename _RDS to _SSBD
Intel collateral will reference the SSB mitigation bit in IA32_SPEC_CTL[2]
as SSBD (Speculative Store Bypass Disable).

Hence changing it.

It is unclear yet what the MSR_IA32_ARCH_CAPABILITIES (0x10a) Bit(4) name
is going to be. Following the rename it would be SSBD_NO but that rolls out
to Speculative Store Bypass Disable No.

Also fixed the missing space in X86_FEATURE_AMD_SSBD.

[ tglx: Fixup x86_amd_rds_enable() and rds_tif_to_amd_ls_cfg() as well ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-09 21:41:38 +02:00
Suravee Suthikulpanit
3986a0a805 x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
Derive topology information from Extended Topology Enumeration (CPUID
function 0xB) when the information is available.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1524865681-112110-3-git-send-email-suravee.suthikulpanit@amd.com
2018-05-06 12:49:16 +02:00
Suravee Suthikulpanit
6c4f5abaf3 x86/CPU: Modify detect_extended_topology() to return result
Current implementation does not communicate whether it can successfully
detect CPUID function 0xB information. Therefore, modify the function to
return success or error codes. This will be used by subsequent patches.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1524865681-112110-2-git-send-email-suravee.suthikulpanit@amd.com
2018-05-06 12:49:16 +02:00
Suravee Suthikulpanit
68091ee7ac x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
Last Level Cache ID can be calculated from the number of threads sharing
the cache, which is available from CPUID Fn0x8000001D (Cache Properties).
This is used to left-shift the APIC ID to derive LLC ID.

Therefore, default to this method unless the APIC ID enumeration does not
follow the scheme.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1524864877-111962-5-git-send-email-suravee.suthikulpanit@amd.com
2018-05-06 12:49:15 +02:00
Borislav Petkov
1d200c078d x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
Since this file contains general cache-related information for x86,
rename the file to a more generic name.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1524864877-111962-4-git-send-email-suravee.suthikulpanit@amd.com
2018-05-06 12:49:15 +02:00
Borislav Petkov
f8b64d08dd x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
Move smp_num_siblings and cpu_llc_id to cpu/common.c so that they're
always present as symbols and not only in the CONFIG_SMP case. Then,
other code using them doesn't need ugly ifdeffery anymore. Get rid of
some ifdeffery.

Signed-off-by: Borislav Petkov <bpetkov@suse.de>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1524864877-111962-2-git-send-email-suravee.suthikulpanit@amd.com
2018-05-06 12:49:14 +02:00
Luck, Tony
985c78d3ff x86/MCE: Fix stack out-of-bounds write in mce-inject.c: Flags_read()
Each of the strings that we want to put into the buf[MAX_FLAG_OPT_SIZE]
in flags_read() is two characters long. But the sprintf() adds
a trailing newline and will add a terminating NUL byte. So
MAX_FLAG_OPT_SIZE needs to be 4.

sprintf() calls vsnprintf() and *that* does return:

" * The return value is the number of characters which would
 * be generated for the given input, excluding the trailing
 * '\0', as per ISO C99."

Note the "excluding".

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/20180427163707.ktaiysvbk3yhk4wm@agluck-desk
2018-05-06 12:46:39 +02:00
David Wang
13e8582245 x86/MCE: Enable MCE broadcasting on new Centaur CPUs
Newer Centaur multi-core CPUs also support MCE broadcasting to all
cores. Add a Centaur-specific init function setting that up.

 [ bp:
   - make mce_centaur_feature_init() static
   - flip check to do the f/m/s first for better readability
   - touch up text
  ]

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lukelin@viacpu.com
Cc: qiyuanwang@zhaoxin.com
Cc: Greg KH <greg@kroah.com>
Cc: brucechang@via-alliance.com
Cc: timguo@zhaoxin.com
Cc: cooperyan@zhaoxin.com
Cc: Tony Luck <tony.luck@intel.com>
Cc: benjaminpan@viatech.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1524652420-17330-2-git-send-email-davidwang@zhaoxin.com
2018-05-06 12:46:25 +02:00
Kees Cook
f21b53b20c x86/speculation: Make "seccomp" the default mode for Speculative Store Bypass
Unless explicitly opted out of, anything running under seccomp will have
SSB mitigations enabled. Choosing the "prctl" mode will disable this.

[ tglx: Adjusted it to the new arch_seccomp_spec_mitigate() mechanism ]

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:45 +02:00
Thomas Gleixner
8bf37d8c06 seccomp: Move speculation migitation control to arch code
The migitation control is simpler to implement in architecture code as it
avoids the extra function call to check the mode. Aside of that having an
explicit seccomp enabled mode in the architecture mitigations would require
even more workarounds.

Move it into architecture code and provide a weak function in the seccomp
code. Remove the 'which' argument as this allows the architecture to decide
which mitigations are relevant for seccomp.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:44 +02:00
Thomas Gleixner
356e4bfff2 prctl: Add force disable speculation
For certain use cases it is desired to enforce mitigations so they cannot
be undone afterwards. That's important for loader stubs which want to
prevent a child from disabling the mitigation again. Will also be used for
seccomp(). The extra state preserving of the prctl state for SSB is a
preparatory step for EBPF dymanic speculation control.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:43 +02:00
Kees Cook
f9544b2b07 x86/bugs: Make boot modes __ro_after_init
There's no reason for these to be changed after boot.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-05 00:51:43 +02:00
Kees Cook
7bbf1373e2 nospec: Allow getting/setting on non-current task
Adjust arch_prctl_get/set_spec_ctrl() to operate on tasks other than
current.

This is needed both for /proc/$pid/status queries and for seccomp (since
thread-syncing can trigger seccomp in non-current threads).

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2018-05-03 13:55:51 +02:00
Thomas Gleixner
a73ec77ee1 x86/speculation: Add prctl for Speculative Store Bypass mitigation
Add prctl based control for Speculative Store Bypass mitigation and make it
the default mitigation for Intel and AMD.

Andi Kleen provided the following rationale (slightly redacted):

 There are multiple levels of impact of Speculative Store Bypass:

 1) JITed sandbox.
    It cannot invoke system calls, but can do PRIME+PROBE and may have call
    interfaces to other code

 2) Native code process.
    No protection inside the process at this level.

 3) Kernel.

 4) Between processes. 

 The prctl tries to protect against case (1) doing attacks.

 If the untrusted code can do random system calls then control is already
 lost in a much worse way. So there needs to be system call protection in
 some way (using a JIT not allowing them or seccomp). Or rather if the
 process can subvert its environment somehow to do the prctl it can already
 execute arbitrary code, which is much worse than SSB.

 To put it differently, the point of the prctl is to not allow JITed code
 to read data it shouldn't read from its JITed sandbox. If it already has
 escaped its sandbox then it can already read everything it wants in its
 address space, and do much worse.

 The ability to control Speculative Store Bypass allows to enable the
 protection selectively without affecting overall system performance.

Based on an initial patch from Tim Chen. Completely rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-03 13:55:51 +02:00
Thomas Gleixner
885f82bfbc x86/process: Allow runtime control of Speculative Store Bypass
The Speculative Store Bypass vulnerability can be mitigated with the
Reduced Data Speculation (RDS) feature. To allow finer grained control of
this eventually expensive mitigation a per task mitigation control is
required.

Add a new TIF_RDS flag and put it into the group of TIF flags which are
evaluated for mismatch in switch_to(). If these bits differ in the previous
and the next task, then the slow path function __switch_to_xtra() is
invoked. Implement the TIF_RDS dependent mitigation control in the slow
path.

If the prctl for controlling Speculative Store Bypass is disabled or no
task uses the prctl then there is no overhead in the switch_to() fast
path.

Update the KVM related speculation control functions to take TID_RDS into
account as well.

Based on a patch from Tim Chen. Completely rewritten.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-05-03 13:55:50 +02:00
Thomas Gleixner
28a2775217 x86/speculation: Create spec-ctrl.h to avoid include hell
Having everything in nospec-branch.h creates a hell of dependencies when
adding the prctl based switching mechanism. Move everything which is not
required in nospec-branch.h to spec-ctrl.h and fix up the includes in the
relevant files.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:50 +02:00
Konrad Rzeszutek Wilk
764f3c2158 x86/bugs/AMD: Add support to disable RDS on Fam[15,16,17]h if requested
AMD does not need the Speculative Store Bypass mitigation to be enabled.

The parameters for this are already available and can be done via MSR
C001_1020. Each family uses a different bit in that MSR for this.

[ tglx: Expose the bit mask via a variable and move the actual MSR fiddling
  	into the bugs code as that's the right thing to do and also required
	to prepare for dynamic enable/disable ]

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:49 +02:00
Konrad Rzeszutek Wilk
1115a859f3 x86/bugs: Whitelist allowed SPEC_CTRL MSR values
Intel and AMD SPEC_CTRL (0x48) MSR semantics may differ in the
future (or in fact use different MSRs for the same functionality).

As such a run-time mechanism is required to whitelist the appropriate MSR
values.

[ tglx: Made the variable __ro_after_init ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:49 +02:00
Konrad Rzeszutek Wilk
772439717d x86/bugs/intel: Set proper CPU features and setup RDS
Intel CPUs expose methods to:

 - Detect whether RDS capability is available via CPUID.7.0.EDX[31],

 - The SPEC_CTRL MSR(0x48), bit 2 set to enable RDS.

 - MSR_IA32_ARCH_CAPABILITIES, Bit(4) no need to enable RRS.

With that in mind if spec_store_bypass_disable=[auto,on] is selected set at
boot-time the SPEC_CTRL MSR to enable RDS if the platform requires it.

Note that this does not fix the KVM case where the SPEC_CTRL is exposed to
guests which can muck with it, see patch titled :
 KVM/SVM/VMX/x86/spectre_v2: Support the combination of guest and host IBRS.

And for the firmware (IBRS to be set), see patch titled:
 x86/spectre_v2: Read SPEC_CTRL MSR during boot and re-use reserved bits

[ tglx: Distangled it from the intel implementation and kept the call order ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:48 +02:00
Konrad Rzeszutek Wilk
24f7fc83b9 x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation
Contemporary high performance processors use a common industry-wide
optimization known as "Speculative Store Bypass" in which loads from
addresses to which a recent store has occurred may (speculatively) see an
older value. Intel refers to this feature as "Memory Disambiguation" which
is part of their "Smart Memory Access" capability.

Memory Disambiguation can expose a cache side-channel attack against such
speculatively read values. An attacker can create exploit code that allows
them to read memory outside of a sandbox environment (for example,
malicious JavaScript in a web page), or to perform more complex attacks
against code running within the same privilege level, e.g. via the stack.

As a first step to mitigate against such attacks, provide two boot command
line control knobs:

 nospec_store_bypass_disable
 spec_store_bypass_disable=[off,auto,on]

By default affected x86 processors will power on with Speculative
Store Bypass enabled. Hence the provided kernel parameters are written
from the point of view of whether to enable a mitigation or not.
The parameters are as follows:

 - auto - Kernel detects whether your CPU model contains an implementation
	  of Speculative Store Bypass and picks the most appropriate
	  mitigation.

 - on   - disable Speculative Store Bypass
 - off  - enable Speculative Store Bypass

[ tglx: Reordered the checks so that the whole evaluation is not done
  	when the CPU does not support RDS ]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:48 +02:00
Konrad Rzeszutek Wilk
c456442cd3 x86/bugs: Expose /sys/../spec_store_bypass
Add the sysfs file for the new vulerability. It does not do much except
show the words 'Vulnerable' for recent x86 cores.

Intel cores prior to family 6 are known not to be vulnerable, and so are
some Atoms and some Xeon Phi.

It assumes that older Cyrix, Centaur, etc. cores are immune.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:47 +02:00
Konrad Rzeszutek Wilk
5cf6875487 x86/bugs, KVM: Support the combination of guest and host IBRS
A guest may modify the SPEC_CTRL MSR from the value used by the
kernel. Since the kernel doesn't use IBRS, this means a value of zero is
what is needed in the host.

But the 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to
the other bits as reserved so the kernel should respect the boot time
SPEC_CTRL value and use that.

This allows to deal with future extensions to the SPEC_CTRL interface if
any at all.

Note: This uses wrmsrl() instead of native_wrmsl(). I does not make any
difference as paravirt will over-write the callq *0xfff.. with the wrmsrl
assembler code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:47 +02:00
Konrad Rzeszutek Wilk
1b86883ccb x86/bugs: Read SPEC_CTRL MSR during boot and re-use reserved bits
The 336996-Speculative-Execution-Side-Channel-Mitigations.pdf refers to all
the other bits as reserved. The Intel SDM glossary defines reserved as
implementation specific - aka unknown.

As such at bootup this must be taken it into account and proper masking for
the bits in use applied.

A copy of this document is available at
https://bugzilla.kernel.org/show_bug.cgi?id=199511

[ tglx: Made x86_spec_ctrl_base __ro_after_init ]

Suggested-by: Jon Masters <jcm@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:47 +02:00
Konrad Rzeszutek Wilk
d1059518b4 x86/bugs: Concentrate bug reporting into a separate function
Those SysFS functions have a similar preamble, as such make common
code to handle them.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:46 +02:00
Konrad Rzeszutek Wilk
4a28bfe326 x86/bugs: Concentrate bug detection into a separate function
Combine the various logic which goes through all those
x86_cpu_id matching structures in one function.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
2018-05-03 13:55:46 +02:00
Thomas Gleixner
c65732e4f7 x86/cpu: Restore CPUID_8000_0008_EBX reload
The recent commt which addresses the x86_phys_bits corruption with
encrypted memory on CPUID reload after a microcode update lost the reload
of CPUID_8000_0008_EBX as well.

As a consequence IBRS and IBRS_FW are not longer detected

Restore the behaviour by bringing the reload of CPUID_8000_0008_EBX
back. This restore has a twist due to the convoluted way the cpuid analysis
works:

CPUID_8000_0008_EBX is used by AMD to enumerate IBRB, IBRS, STIBP. On Intel
EBX is not used. But the speculation control code sets the AMD bits when
running on Intel depending on the Intel specific speculation control
bits. This was done to use the same bits for alternatives.

The change which moved the 8000_0008 evaluation out of get_cpu_cap() broke
this nasty scheme due to ordering. So that on Intel the store to
CPUID_8000_0008_EBX clears the IBRB, IBRS, STIBP bits which had been set
before by software.

So the actual CPUID_8000_0008_EBX needs to go back to the place where it
was and the phys/virt address space calculation cannot touch it.

In hindsight this should have used completely synthetic bits for IBRB,
IBRS, STIBP instead of reusing the AMD bits, but that's for 4.18.

/me needs to find time to cleanup that steaming pile of ...

Fixes: d94a155c59 ("x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption")
Reported-by: Jörg Otte <jrg.otte@gmail.com>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jörg Otte <jrg.otte@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: kirill.shutemov@linux.intel.com
Cc: Borislav Petkov <bp@alien8.de
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805021043510.1668@nanos.tec.linutronix.de
2018-05-02 16:44:38 +02:00
jacek.tomaka@poczta.fm
b837913fc2 x86/cpu/intel: Add missing TLB cpuid values
Make kernel print the correct number of TLB entries on Intel Xeon Phi 7210
(and others)

Before:
[ 0.320005] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
After:
[ 0.320005] Last level dTLB entries: 4KB 256, 2MB 128, 4MB 128, 1GB 16

The entries do exist in the official Intel SMD but the type column there is
incorrect (states "Cache" where it should read "TLB"), but the entries for
the values 0x6B, 0x6C and 0x6D are correctly described as 'Data TLB'.

Signed-off-by: Jacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20180423161425.24366-1-jacekt@dugeo.com
2018-04-26 21:42:44 +02:00
Borislav Petkov
09e182d17e x86/microcode: Do not exit early from __reload_late()
Vitezslav reported a case where the

  "Timeout during microcode update!"

panic would hit. After a deeper look, it turned out that his .config had
CONFIG_HOTPLUG_CPU disabled which practically made save_mc_for_early() a
no-op.

When that happened, the discovered microcode patch wasn't saved into the
cache and the late loading path wouldn't find any.

This, then, lead to early exit from __reload_late() and thus CPUs waiting
until the timeout is reached, leading to the panic.

In hindsight, that function should have been written so it does not return
before the post-synchronization. Oh well, I know better now...

Fixes: bb8c13d61a ("x86/microcode: Fix CPU synchronization routine")
Reported-by: Vitezslav Samel <vitezslav@samel.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitezslav Samel <vitezslav@samel.cz>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz
Link: https://lkml.kernel.org/r/20180421081930.15741-2-bp@alien8.de
2018-04-24 09:48:22 +02:00
Borislav Petkov
84749d8375 x86/microcode/intel: Save microcode patch unconditionally
save_mc_for_early() was a no-op on !CONFIG_HOTPLUG_CPU but the
generic_load_microcode() path saves the microcode patches it has found into
the cache of patches which is used for late loading too. Regardless of
whether CPU hotplug is used or not.

Make the saving unconditional so that late loading can find the proper
patch.

Reported-by: Vitezslav Samel <vitezslav@samel.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vitezslav Samel <vitezslav@samel.cz>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180418081140.GA2439@pc11.op.pod.cz
Link: https://lkml.kernel.org/r/20180421081930.15741-1-bp@alien8.de
2018-04-24 09:48:22 +02:00
David Wang
60882cc159 x86/Centaur: Initialize supported CPU features properly
Centaur CPUs have some Intel compatible capabilities,including Permformance
Monitoring Counters and CPU virtualization capabilities. Initialize them in
the Centaur specific init code.

Signed-off-by: David Wang <davidwang@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: lukelin@viacpu.com
Cc: qiyuanwang@zhaoxin.com
Cc: gregkh@linuxfoundation.org
Cc: brucechang@via-alliance.com
Cc: timguo@zhaoxin.com
Cc: cooperyan@zhaoxin.com
Cc: hpa@zytor.com
Cc: benjaminpan@viatech.com
Link: https://lkml.kernel.org/r/1524212968-28998-1-git-send-email-davidwang@zhaoxin.com
2018-04-20 12:08:17 +02:00
Linus Torvalds
9fb71c2f23 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Address a swiotlb regression which was caused by the recent DMA
     rework and made driver fail because dma_direct_supported() returned
     false

   - Fix a signedness bug in the APIC ID validation which caused invalid
     APIC IDs to be detected as valid thereby bloating the CPU possible
     space.

   - Fix inconsisten config dependcy/select magic for the MFD_CS5535
     driver.

   - Fix a corruption of the physical address space bits when encryption
     has reduced the address space and late cpuinfo updates overwrite
     the reduced bit information with the original value.

   - Dominiks syscall rework which consolidates the architecture
     specific syscall functions so all syscalls can be wrapped with the
     same macros. This allows to switch x86/64 to struct pt_regs based
     syscalls. Extend the clearing of user space controlled registers in
     the entry patch to the lower registers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix signedness bug in APIC ID validity checks
  x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
  x86/olpc: Fix inconsistent MFD_CS5535 configuration
  swiotlb: Use dma_direct_supported() for swiotlb_ops
  syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
  syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
  syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
  syscalls/core, syscalls/x86: Clean up syscall stub naming convention
  syscalls/x86: Extend register clearing on syscall entry to lower registers
  syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
  syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
  syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
  syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
  syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
  x86/syscalls: Don't pointlessly reload the system call number
  x86/mm: Fix documentation of module mapping range with 4-level paging
  x86/cpuid: Switch to 'static const' specifier
2018-04-15 16:12:35 -07:00
Ingo Molnar
ef389b7346 Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:42:34 +02:00
Kirill A. Shutemov
d94a155c59 x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
Some features (Intel MKTME, AMD SME) reduce the number of effectively
available physical address bits. cpuinfo_x86::x86_phys_bits is adjusted
accordingly during the early cpu feature detection.

Though if get_cpu_cap() is called later again then this adjustement is
overwritten. That happens in setup_pku(), which is called after
detect_tme().

To address this, extract the address sizes enumeration into a separate
function, which is only called only from early_identify_cpu() and from
generic_identify().

This makes get_cpu_cap() safe to be called later during boot proccess
without overwriting cpuinfo_x86::x86_phys_bits.

[ tglx: Massaged changelog ]

Fixes: cb06d8e3d0 ("x86/tme: Detect if TME and MKTME is activated by BIOS")
Reported-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180410092704.41106-1-kirill.shutemov@linux.intel.com
2018-04-10 16:33:21 +02:00
Linus Torvalds
d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00
Linus Torvalds
672a9c1069 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  kfifo: fix inaccurate comment
  tools/thermal: tmon: fix for segfault
  net: Spelling s/stucture/structure/
  edd: don't spam log if no EDD information is present
  Documentation: Fix early-microcode.txt references after file rename
  tracing: Block comments should align the * on each line
  treewide: Fix typos in printk
  GenWQE: Fix a typo in two comments
  treewide: Align function definition open/close braces
2018-04-05 11:56:35 -07:00
Linus Torvalds
06dd3dfeea Char/Misc patches for 4.17-rc1
Here is the big set of char/misc driver patches for 4.17-rc1.
 
 There are a lot of little things in here, nothing huge, but all
 important to the different hardware types involved:
 	- thunderbolt driver updates
 	- parport updates (people still care...)
 	- nvmem driver updates
 	- mei updates (as always)
 	- hwtracing driver updates
 	- hyperv driver updates
 	- extcon driver updates
 	- and a handfull of even smaller driver subsystem and individual
 	  driver updates
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWsShSQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykNqwCfUbfvopswb1PesHCLABDBsFQChgoAniDa6pS9
 kI8TN5MdLN85UU27Mkb6
 =BzFR
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc updates from Greg KH:
 "Here is the big set of char/misc driver patches for 4.17-rc1.

  There are a lot of little things in here, nothing huge, but all
  important to the different hardware types involved:

   -  thunderbolt driver updates

   -  parport updates (people still care...)

   -  nvmem driver updates

   -  mei updates (as always)

   -  hwtracing driver updates

   -  hyperv driver updates

   -  extcon driver updates

   -  ... and a handful of even smaller driver subsystem and individual
      driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
  hwtracing: Add HW tracing support menu
  intel_th: Add ACPI glue layer
  intel_th: Allow forcing host mode through drvdata
  intel_th: Pick up irq number from resources
  intel_th: Don't touch switch routing in host mode
  intel_th: Use correct method of finding hub
  intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  stm class: Make dummy's master/channel ranges configurable
  stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
  MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
  hv: add SPDX license id to Kconfig
  hv: add SPDX license to trace
  Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
  Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
  /dev/mem: Avoid overwriting "err" in read_mem()
  eeprom: at24: use SPDX identifier instead of GPL boiler-plate
  eeprom: at24: simplify the i2c functionality checking
  eeprom: at24: fix a line break
  eeprom: at24: tweak newlines
  eeprom: at24: refactor at24_probe()
  ...
2018-04-04 20:07:20 -07:00
Linus Torvalds
a5532439eb Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Ingo Molnar:
 "Two changes: add the new convert_art_ns_to_tsc() API for upcoming
  Intel Goldmont+ drivers, and remove the obsolete rdtscll() API"

* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Get rid of rdtscll()
  x86/tsc: Convert ART in nanoseconds to TSC
2018-04-02 16:18:31 -07:00
Linus Torvalds
cea061e455 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Add "Jailhouse" hypervisor support (Jan Kiszka)

   - Update DeviceTree support (Ivan Gorinov)

   - Improve DMI date handling (Andy Shevchenko)"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/PCI: Fix a potential regression when using dmi_get_bios_year()
  firmware/dmi_scan: Uninline dmi_get_bios_year() helper
  x86/devicetree: Use CPU description from Device Tree
  of/Documentation: Specify local APIC ID in "reg"
  MAINTAINERS: Add entry for Jailhouse
  x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI
  x86: Consolidate PCI_MMCONFIG configs
  x86: Align x86_64 PCI_MMCONFIG with 32-bit variant
  x86/jailhouse: Enable PCI mmconfig access in inmates
  PCI: Scan all functions when running over Jailhouse
  jailhouse: Provide detection for non-x86 systems
  x86/devicetree: Fix device IRQ settings in DT
  x86/devicetree: Initialize device tree before using it
  pci: Simplify code by using the new dmi_get_bios_year() helper
  ACPI/sleep: Simplify code by using the new dmi_get_bios_year() helper
  x86/pci: Simplify code by using the new dmi_get_bios_year() helper
  dmi: Introduce the dmi_get_bios_year() helper function
  x86/platform/quark: Re-use DEFINE_SHOW_ATTRIBUTE() macro
  x86/platform/atom: Re-use DEFINE_SHOW_ATTRIBUTE() macro
2018-04-02 16:15:32 -07:00
Linus Torvalds
d22fff8141 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:

 - Extend the memmap= boot parameter syntax to allow the redeclaration
   and dropping of existing ranges, and to support all e820 range types
   (Jan H. Schönherr)

 - Improve the W+X boot time security checks to remove false positive
   warnings on Xen (Jan Beulich)

 - Support booting as Xen PVH guest (Juergen Gross)

 - Improved 5-level paging (LA57) support, in particular it's possible
   now to have a single kernel image for both 4-level and 5-level
   hardware (Kirill A. Shutemov)

 - AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)

 - Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
   (Kirill A. Shutemov)

 - Improved Intel-MID support (Andy Shevchenko)

 - Show EFI page tables in page_tables debug files (Andy Lutomirski)

 - ... plus misc fixes and smaller cleanups

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
  x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
  x86/mm: Update comment in detect_tme() regarding x86_phys_bits
  x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
  x86/mm: Remove pointless checks in vmalloc_fault
  x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
  ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
  ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
  x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
  x86/pconfig: Detect PCONFIG targets
  x86/tme: Detect if TME and MKTME is activated by BIOS
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  x86/boot/compressed/64: Use page table in trampoline memory
  x86/boot/compressed/64: Use stack from trampoline memory
  x86/boot/compressed/64: Make sure we have a 32-bit code segment
  x86/mm: Do not use paravirtualized calls in native_set_p4d()
  kdump, vmcoreinfo: Export pgtable_l5_enabled value
  x86/boot/compressed/64: Prepare new top-level page table for trampoline
  x86/boot/compressed/64: Set up trampoline memory
  x86/boot/compressed/64: Save and restore trampoline memory
  ...
2018-04-02 15:45:30 -07:00
Linus Torvalds
86bbbebac1 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Ingo Molnar:
 "The main changes in this cycle were:

   - AMD MCE support/decoding improvements (Yazen Ghannam)

   - general MCE header cleanups and reorganization (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "x86/mce/AMD: Collect error info even if valid bits are not set"
  x86/MCE: Cleanup and complete struct mce fields definitions
  x86/mce/AMD: Carve out SMCA get_block_address() code
  x86/mce/AMD: Get address from already initialized block
  x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
  x86/mce/AMD: Pass the bank number to smca_get_bank_type()
  x86/mce/AMD: Collect error info even if valid bits are not set
  x86/mce: Issue the 'mcelog --ascii' message only on !AMD
  x86/mce: Convert 'struct mca_config' bools to a bitfield
  x86/mce: Put private structures and definitions into the internal header
2018-04-02 11:47:07 -07:00
Colin Ian King
eaeb8e76cd x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
Trivial fix to spelling mistake in the pr_err_once() error message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20180313154709.1015-1-colin.king@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-31 10:46:40 +02:00
Vitaly Kuznetsov
5431390b30 x86/hyper-v: detect nested features
TLFS 5.0 says: "Support for an enlightened VMCS interface is reported with
CPUID leaf 0x40000004. If an enlightened VMCS interface is supported,
 additional nested enlightenments may be discovered by reading the CPUID
leaf 0x4000000A (see 2.4.11)."

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov
415bd1cd3a x86/hyper-v: move definitions from TLFS to hyperv-tlfs.h
mshyperv.h now only contains fucntions/variables we define in kernel, all
definitions from TLFS should go to hyperv-tlfs.h.

'enum hv_cpuid_function' is removed as we already have this info in
hyperv-tlfs.h, code in mshyperv.c is adjusted accordingly.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-28 22:47:06 +02:00