Commit Graph

15809 Commits

Author SHA1 Message Date
Linus Torvalds
6a1000bd27 ioremap changes for 5.6
- remove ioremap_nocache given that is is equivalent to
    ioremap everywhere
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl4vKHwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYMPGBAAuVNUZaZfWYHpiVP2oRcUQUguFiD3NTbknsyzV2oH
 J9P0GfeENSKwE9OOhZ7XIjnCZAJwQgTK/ppQY5yiQ/KAtYyyXjXEJ6jqqjiTDInr
 +3+I3t/LhkgrK7tMrb7ylTGa/d7KhaciljnOXC8+b75iddvM9I1z2pbHDbppZMS9
 wT4RXL/cFtRb85AfOyPLybcka3f5P2gGvQz38qyimhJYEzHDXZu9VO1Bd20f8+Xf
 eLBKX0o6yWMhcaPLma8tm0M0zaXHEfLHUKLSOkiOk+eHTWBZ3b/w5nsOQZYZ7uQp
 25yaClbameAn7k5dHajduLGEJv//ZjLRWcN3HJWJ5vzO111aHhswpE7JgTZJSVWI
 ggCVkytD3ESXapvswmACSeCIDMmiJMzvn6JvwuSMVB7a6e5mcqTuGo/FN+DrBF/R
 IP+/gY/T7zIIOaljhQVkiEIIwiD/akYo0V9fheHTBnqcKEDTHV4WjKbeF6aCwcO+
 b8inHyXZSKSMG//UlDuN84/KH/o1l62oKaB1uDIYrrL8JVyjAxctWt3GOt5KgSFq
 wVz1lMw4kIvWtC/Sy2H4oB+RtODLp6yJDqmvmPkeJwKDUcd/1JKf0KsZ8j3FpGei
 /rEkBEss0KBKyFAgBSRO2jIpdj2epgcBcsdB/r5mlhcn8L77AS6mHbA173kY4pQ/
 Kdg=
 =TUCJ
 -----END PGP SIGNATURE-----

Merge tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap

Pull ioremap updates from Christoph Hellwig:
 "Remove the ioremap_nocache API (plus wrappers) that are always
  identical to ioremap"

* tag 'ioremap-5.6' of git://git.infradead.org/users/hch/ioremap:
  remove ioremap_nocache and devm_ioremap_nocache
  MIPS: define ioremap_nocache to ioremap
2020-01-27 13:03:00 -08:00
Linus Torvalds
30f5a75640 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov:

 - Misc fixes to the MCE code all over the place, by Jan H. Schönherr.

 - Initial support for AMD F19h and other cleanups to amd64_edac, by
   Yazen Ghannam.

 - Other small cleanups.

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/mce_amd: Make fam_ops static global
  EDAC/amd64: Drop some family checks for newer systems
  EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh
  x86/amd_nb: Add Family 19h PCI IDs
  EDAC/mce_amd: Always load on SMCA systems
  x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType
  x86/mce: Fix use of uninitialized MCE message string
  x86/mce: Fix mce=nobootlog
  x86/mce: Take action on UCNA/Deferred errors again
  x86/mce: Remove mce_inject_log() in favor of mce_log()
  x86/mce: Pass MCE message to mce_panic() on failed kernel recovery
  x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused
2020-01-27 09:19:35 -08:00
Borislav Petkov
3c749b81ee x86/CPU/AMD: Remove amd_get_topology_early()
... and fold its function body into its single call site.

No functional changes:

  # arch/x86/kernel/cpu/amd.o:

   text    data     bss     dec     hex filename
   5994     385       1    6380    18ec amd.o.before
   5994     385       1    6380    18ec amd.o.after

md5:
   99ec6daa095b502297884e949c520f90  amd.o.before.asm
   99ec6daa095b502297884e949c520f90  amd.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200123165811.5288-1-bp@alien8.de
2020-01-25 12:59:46 +01:00
Dave Hansen
45fc24e89b x86/mpx: remove MPX from arch/x86
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

This removes all the remaining (dead at this point) MPX handling
code remaining in the tree.  The only remaining code is the XSAVE
support for MPX state which is currently needd for KVM to handle
VMs which might use MPX.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:20 -08:00
Dave Hansen
aa9ccb7b47 x86/mpx: remove bounds exception code
From: Dave Hansen <dave.hansen@linux.intel.com>

MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).

Remove the other user-visible ABI: signal handling.  This code
should basically have been inactive after the prctl()s were
removed, but there may be some small ABI remnants from this code.
Remove it.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:15 -08:00
Dave Hansen
3a1255396b x86/alternatives: add missing insn.h include
From: Dave Hansen <dave.hansen@linux.intel.com>

While testing my MPX removal series, Borislav noted compilation
failure with an allnoconfig build.

Turned out to be a missing include of insn.h in alternative.c.
With MPX, it got it implicitly from:

	asm/mmu_context.h -> asm/mpx.h -> asm/insn.h

Fixes: c3d6324f84 ("x86/alternatives: Teach text_poke_bp() to emulate instructions")
Reported-by: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: x86@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
2020-01-23 10:41:13 -08:00
Mateusz Nosek
4144fddbd3 x86/tsc: Remove redundant assignment
Previously, the assignment to the local variable 'now' took place
before the for loop. The loop is unconditional so it will be entered
at least once. The variable 'now' is reassigned in the loop and is not
used before reassigning. Therefore, the assignment before the loop is
unnecessary and can be removed.

No code changed:

  # arch/x86/kernel/tsc_sync.o:

   text    data     bss     dec     hex filename
   3569     198      44    3811     ee3 tsc_sync.o.before
   3569     198      44    3811     ee3 tsc_sync.o.after

md5:
   36216de29b208edbcd34fed9fe7f7b69  tsc_sync.o.before.asm
   36216de29b208edbcd34fed9fe7f7b69  tsc_sync.o.after.asm

 [ bp: Massage commit message. ]

Signed-off-by: Mateusz Nosek <mateusznosek0@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200118171143.25178-1-mateusznosek0@gmail.com
2020-01-22 13:52:42 +01:00
Xiaochen Shen
32ada3b9e0 x86/resctrl: Clean up unused function parameter in mkdir path
Commit

  334b0f4e9b ("x86/resctrl: Fix a deadlock due to inaccurate reference")

changed the argument to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
within mkdir_rdt_prepare(). That change resulted in an unused function
parameter to mkdir_rdt_prepare().

Clean up the unused function parameter in mkdir_rdt_prepare() and its
callers rdtgroup_mkdir_mon() and rdtgroup_mkdir_ctrl_mon().

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1578500886-21771-5-git-send-email-xiaochen.shen@intel.com
2020-01-20 17:00:41 +01:00
Xiaochen Shen
334b0f4e9b x86/resctrl: Fix a deadlock due to inaccurate reference
There is a race condition which results in a deadlock when rmdir and
mkdir execute concurrently:

$ ls /sys/fs/resctrl/c1/mon_groups/m1/
cpus  cpus_list  mon_data  tasks

Thread 1: rmdir /sys/fs/resctrl/c1
Thread 2: mkdir /sys/fs/resctrl/c1/mon_groups/m1

3 locks held by mkdir/48649:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c13b>] filename_create+0x7b/0x170
 #2:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

4 locks held by rmdir/48652:
 #0:  (sb_writers#17){.+.+}, at: [<ffffffffb4ca2aa0>] mnt_want_write+0x20/0x50
 #1:  (&type->i_mutex_dir_key#8/1){+.+.}, at: [<ffffffffb4c8c3cf>] do_rmdir+0x13f/0x1e0
 #2:  (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffb4c86d5d>] vfs_rmdir+0x4d/0x120
 #3:  (rdtgroup_mutex){+.+.}, at: [<ffffffffb4a4389d>] rdtgroup_kn_lock_live+0x3d/0x70

Thread 1 is deleting control group "c1". Holding rdtgroup_mutex,
kernfs_remove() removes all kernfs nodes under directory "c1"
recursively, then waits for sub kernfs node "mon_groups" to drop active
reference.

Thread 2 is trying to create a subdirectory "m1" in the "mon_groups"
directory. The wrapper kernfs_iop_mkdir() takes an active reference to
the "mon_groups" directory but the code drops the active reference to
the parent directory "c1" instead.

As a result, Thread 1 is blocked on waiting for active reference to drop
and never release rdtgroup_mutex, while Thread 2 is also blocked on
trying to get rdtgroup_mutex.

Thread 1 (rdtgroup_rmdir)   Thread 2 (rdtgroup_mkdir)
(rmdir /sys/fs/resctrl/c1)  (mkdir /sys/fs/resctrl/c1/mon_groups/m1)
-------------------------   -------------------------
                            kernfs_iop_mkdir
                              /*
                               * kn: "m1", parent_kn: "mon_groups",
                               * prgrp_kn: parent_kn->parent: "c1",
                               *
                               * "mon_groups", parent_kn->active++: 1
                               */
                              kernfs_get_active(parent_kn)
kernfs_iop_rmdir
  /* "c1", kn->active++ */
  kernfs_get_active(kn)

  rdtgroup_kn_lock_live
    atomic_inc(&rdtgrp->waitcount)
    /* "c1", kn->active-- */
    kernfs_break_active_protection(kn)
    mutex_lock

  rdtgroup_rmdir_ctrl
    free_all_child_rdtgrp
      sentry->flags = RDT_DELETED

    rdtgroup_ctrl_remove
      rdtgrp->flags = RDT_DELETED
      kernfs_get(kn)
      kernfs_remove(rdtgrp->kn)
        __kernfs_remove
          /* "mon_groups", sub_kn */
          atomic_add(KN_DEACTIVATED_BIAS, &sub_kn->active)
          kernfs_drain(sub_kn)
            /*
             * sub_kn->active == KN_DEACTIVATED_BIAS + 1,
             * waiting on sub_kn->active to drop, but it
             * never drops in Thread 2 which is blocked
             * on getting rdtgroup_mutex.
             */
Thread 1 hangs here ---->
            wait_event(sub_kn->active == KN_DEACTIVATED_BIAS)
            ...
                              rdtgroup_mkdir
                                rdtgroup_mkdir_mon(parent_kn, prgrp_kn)
                                  mkdir_rdt_prepare(parent_kn, prgrp_kn)
                                    rdtgroup_kn_lock_live(prgrp_kn)
                                      atomic_inc(&rdtgrp->waitcount)
                                      /*
                                       * "c1", prgrp_kn->active--
                                       *
                                       * The active reference on "c1" is
                                       * dropped, but not matching the
                                       * actual active reference taken
                                       * on "mon_groups", thus causing
                                       * Thread 1 to wait forever while
                                       * holding rdtgroup_mutex.
                                       */
                                      kernfs_break_active_protection(
                                                               prgrp_kn)
                                      /*
                                       * Trying to get rdtgroup_mutex
                                       * which is held by Thread 1.
                                       */
Thread 2 hangs here ---->             mutex_lock
                                      ...

The problem is that the creation of a subdirectory in the "mon_groups"
directory incorrectly releases the active protection of its parent
directory instead of itself before it starts waiting for rdtgroup_mutex.
This is triggered by the rdtgroup_mkdir() flow calling
rdtgroup_kn_lock_live()/rdtgroup_kn_unlock() with kernfs node of the
parent control group ("c1") as argument. It should be called with kernfs
node "mon_groups" instead. What is currently missing is that the
kn->priv of "mon_groups" is NULL instead of pointing to the rdtgrp.

Fix it by pointing kn->priv to rdtgrp when "mon_groups" is created. Then
it could be passed to rdtgroup_kn_lock_live()/rdtgroup_kn_unlock()
instead. And then it operates on the same rdtgroup structure but handles
the active reference of kernfs node "mon_groups" to prevent deadlock.
The same changes are also made to the "mon_data" directories.

This results in some unused function parameters that will be cleaned up
in follow-up patch as the focus here is on the fix only in support of
backporting efforts.

Fixes: c7d9aac613 ("x86/intel_rdt/cqm: Add mkdir support for RDT monitoring")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-4-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:57:53 +01:00
Xiaochen Shen
074fadee59 x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup
There is a race condition in the following scenario which results in an
use-after-free issue when reading a monitoring file and deleting the
parent ctrl_mon group concurrently:

Thread 1 calls atomic_inc() to take refcount of rdtgrp and then calls
kernfs_break_active_protection() to drop the active reference of kernfs
node in rdtgroup_kn_lock_live().

In Thread 2, kernfs_remove() is a blocking routine. It waits on all sub
kernfs nodes to drop the active reference when removing all subtree
kernfs nodes recursively. Thread 2 could block on kernfs_remove() until
Thread 1 calls kernfs_break_active_protection(). Only after
kernfs_remove() completes the refcount of rdtgrp could be trusted.

Before Thread 1 calls atomic_inc() and kernfs_break_active_protection(),
Thread 2 could call kfree() when the refcount of rdtgrp (sentry) is 0
instead of 1 due to the race.

In Thread 1, in rdtgroup_kn_unlock(), referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_mondata_show)  Thread 2 (rdtgroup_rmdir)
--------------------------------  -------------------------
rdtgroup_kn_lock_live
  /*
   * kn active protection until
   * kernfs_break_active_protection(kn)
   */
  rdtgrp = kernfs_to_rdtgroup(kn)
                                  rdtgroup_kn_lock_live
                                    atomic_inc(&rdtgrp->waitcount)
                                    mutex_lock
                                  rdtgroup_rmdir_ctrl
                                    free_all_child_rdtgrp
                                      /*
                                       * sentry->waitcount should be 1
                                       * but is 0 now due to the race.
                                       */
                                      kfree(sentry)*[1]
  /*
   * Only after kernfs_remove()
   * completes, the refcount of
   * rdtgrp could be trusted.
   */
  atomic_inc(&rdtgrp->waitcount)
  /* kn->active-- */
  kernfs_break_active_protection(kn)
                                    rdtgroup_ctrl_remove
                                      rdtgrp->flags = RDT_DELETED
                                      /*
                                       * Blocking routine, wait for
                                       * all sub kernfs nodes to drop
                                       * active reference in
                                       * kernfs_break_active_protection.
                                       */
                                      kernfs_remove(rdtgrp->kn)
                                  rdtgroup_kn_unlock
                                    mutex_unlock
                                    atomic_dec_and_test(
                                                &rdtgrp->waitcount)
                                    && (flags & RDT_DELETED)
                                      kernfs_unbreak_active_protection(kn)
                                      kfree(rdtgrp)
  mutex_lock
mon_event_read
rdtgroup_kn_unlock
  mutex_unlock
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    /* kn->active++ */
    kernfs_unbreak_active_protection(kn)
    kfree(rdtgrp)

Fix it by moving free_all_child_rdtgrp() to after kernfs_remove() in
rdtgroup_rmdir_ctrl() to ensure it has the accurate refcount of rdtgrp.

Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-3-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:56:11 +01:00
Xiaochen Shen
b8511ccc75 x86/resctrl: Fix use-after-free when deleting resource groups
A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount)
that indicates how many waiters expect this rdtgrp to exist. Waiters
could be waiting on rdtgroup_mutex or some work sitting on a task's
workqueue for when the task returns from kernel mode or exits.

The deletion of a rdtgrp is intended to have two phases:

  (1) while holding rdtgroup_mutex the necessary cleanup is done and
  rdtgrp->flags is set to RDT_DELETED,

  (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
  only if there are no waiters and its flag is set to RDT_DELETED. Upon
  gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
  for the RDT_DELETED flag.

When unmounting the resctrl file system or deleting ctrl_mon groups,
all of the subdirectories are removed and the data structure of rdtgrp
is forcibly freed without checking rdtgrp->waitcount. If at this point
there was a waiter on rdtgrp then a use-after-free issue occurs when the
waiter starts running and accesses the rdtgrp structure it was waiting
on.

See kfree() calls in [1], [2] and [3] in these two call paths in
following scenarios:
(1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp()
(2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

There are several scenarios that result in use-after-free issue in
following:

Scenario 1:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdt_kill_sb() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdt_kill_sb)
-------------------------------        ----------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       mutex_lock
                                       rmdir_all_sub
                                         /*
                                          * sentry and rdtgrp are freed
                                          * without checking refcount
                                          */
                                         free_all_child_rdtgrp
                                           kfree(sentry)*[1]
                                         kfree(rdtgrp)*[2]
                                       mutex_unlock
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1] or [2].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

Scenario 2:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdtgroup_rmdir)
-------------------------------        -------------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       rdtgroup_kn_lock_live
                                         atomic_inc(&rdtgrp->waitcount)
                                         mutex_lock
                                       rdtgroup_rmdir_ctrl
                                         free_all_child_rdtgrp
                                           /*
                                            * sentry is freed without
                                            * checking refcount
                                            */
                                           kfree(sentry)*[3]
                                         rdtgroup_ctrl_remove
                                           rdtgrp->flags = RDT_DELETED
                                       rdtgroup_kn_unlock
                                         mutex_unlock
                                         atomic_dec_and_test(
                                                     &rdtgrp->waitcount)
                                         && (flags & RDT_DELETED)
                                           kfree(rdtgrp)
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [3].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
like following. Note that "0x6b" is POISON_FREE after kfree(). The
corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
waitcount member of struct rdtgroup which was freed:

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
  420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
  Single bit error detected. Probably bad RAM.
  Run memtest86+ or a similar memory test tool.
  Next obj: start=ffff9504c5b0d800, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048
  420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkdkkkkkkkkkkk
  Prev obj: start=ffff9504c58ab000, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

Fix this by taking reference count (waitcount) of rdtgrp into account in
the two call paths that currently do not do so. Instead of always
freeing the resource group it will only be freed if there are no waiters
on it. If there are waiters, the resource group will have its flags set
to RDT_DELETED.

It will be left to the waiter to free the resource group when it starts
running and finding that it was the last waiter and the resource group
has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb()
-> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() ->
rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:45:43 +01:00
Tony W Wang-oc
283bab9809 x86/cpu: Remove redundant cpu_detect_cache_sizes() call
Both functions call init_intel_cacheinfo() which computes L2 and L3 cache
sizes from CPUID(4). But then they also call cpu_detect_cache_sizes() a
bit later which computes ->x86_tlbsize and L2 size from CPUID(80000006).

However, the latter call is not needed because

 - on these CPUs, CPUID(80000006).EBX for ->x86_tlbsize is reserved

 - CPUID(80000006).ECX for the L2 size has the same result as CPUID(4)

Therefore, remove the latter call to simplify the code.

 [ bp: Rewrite commit message. ]

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1579075257-6985-1-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-20 16:32:35 +01:00
Chen Yu
e79f15a459 x86/resctrl: Add task resctrl information display
Monitoring tools that want to find out which resctrl control and monitor
groups a task belongs to must currently read the "tasks" file in every
group until they locate the process ID.

Add an additional file /proc/{pid}/cpu_resctrl_groups to provide this
information:

1)   res:
     mon:

resctrl is not available.

2)   res:/
     mon:

Task is part of the root resctrl control group, and it is not associated
to any monitor group.

3)  res:/
    mon:mon0

Task is part of the root resctrl control group and monitor group mon0.

4)  res:group0
    mon:

Task is part of resctrl control group group0, and it is not associated
to any monitor group.

5) res:group0
   mon:mon1

Task is part of resctrl control group group0 and monitor group mon1.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Jinshi Chen <jinshi.chen@intel.com>
Link: https://lkml.kernel.org/r/20200115092851.14761-1-yu.c.chen@intel.com
2020-01-20 16:19:10 +01:00
Arvind Sankar
dacc909233 x86/sysfb: Fix check for bad VRAM size
When checking whether the reported lfb_size makes sense, the height
* stride result is page-aligned before seeing whether it exceeds the
reported size.

This doesn't work if height * stride is not an exact number of pages.
For example, as reported in the kernel bugzilla below, an 800x600x32 EFI
framebuffer gets skipped because of this.

Move the PAGE_ALIGN to after the check vs size.

Reported-by: Christopher Head <chead@chead.ca>
Tested-by: Christopher Head <chead@chead.ca>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206051
Link: https://lkml.kernel.org/r/20200107230410.2291947-1-nivedita@alum.mit.edu
2020-01-20 10:57:53 +01:00
Ingo Molnar
cb6c82df68 Linux 5.5-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4k7i8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvk0IAKRenVOdiudY77SQ
 VZjsteyrYTTQtPPv494ToIRjR0XQ+gYp8vyWzXTUC5Nm9Y9U3VzDqUPUjWszrSXE
 6mU+tzcMc9qwuUxnIFn8zfg64ygw+37sn/w3xqeH4QmF9Z5Wl3EX3SdXTs7jp3RS
 VxiztkUNI5ZBV2GDtla5K/9qLPqCQnUYXIiyi5lAtBtiitZDVXFp7dy7hMgEiaEO
 +78K5Kh3xlt5ndDsBFOlwIb2Oof3KL7bBXntdbSBc/bjol6IRvAgln48HWCv59G2
 jzAp2tj2KobX9GRAEPj+v4TQZEW0SXDNDi8MgQsM+3DYVCTmANsv57CBKRuf01+F
 nB1kAys=
 =zSnJ
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc7' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-20 08:43:44 +01:00
Ard Biesheuvel
1f299fad1e efi/x86: Limit EFI old memory map to SGI UV machines
We carry a quirk in the x86 EFI code to switch back to an older
method of mapping the EFI runtime services memory regions, because
it was deemed risky at the time to implement a new method without
providing a fallback to the old method in case problems arose.

Such problems did arise, but they appear to be limited to SGI UV1
machines, and so these are the only ones for which the fallback gets
enabled automatically (via a DMI quirk). The fallback can be enabled
manually as well, by passing efi=old_map, but there is very little
evidence that suggests that this is something that is being relied
upon in the field.

Given that UV1 support is not enabled by default by the distros
(Ubuntu, Fedora), there is no point in carrying this fallback code
all the time if there are no other users. So let's move it into the
UV support code, and document that efi=old_map now requires this
support code to be enabled.

Note that efi=old_map has been used in the past on other SGI UV
machines to work around kernel regressions in production, so we
keep the option to enable it by hand, but only if the kernel was
built with UV support.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20200113172245.27925-8-ardb@kernel.org
2020-01-20 08:13:01 +01:00
Ingo Molnar
a786810cc8 Linux 5.5-rc7
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4k7i8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGvk0IAKRenVOdiudY77SQ
 VZjsteyrYTTQtPPv494ToIRjR0XQ+gYp8vyWzXTUC5Nm9Y9U3VzDqUPUjWszrSXE
 6mU+tzcMc9qwuUxnIFn8zfg64ygw+37sn/w3xqeH4QmF9Z5Wl3EX3SdXTs7jp3RS
 VxiztkUNI5ZBV2GDtla5K/9qLPqCQnUYXIiyi5lAtBtiitZDVXFp7dy7hMgEiaEO
 +78K5Kh3xlt5ndDsBFOlwIb2Oof3KL7bBXntdbSBc/bjol6IRvAgln48HWCv59G2
 jzAp2tj2KobX9GRAEPj+v4TQZEW0SXDNDi8MgQsM+3DYVCTmANsv57CBKRuf01+F
 nB1kAys=
 =zSnJ
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc7' into efi/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-20 08:05:16 +01:00
Linus Torvalds
0cc2682d8b Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - a resctrl fix for uninitialized objects found by debugobjects

   - a resctrl memory leak fix

   - fix the unintended re-enabling of the of SME and SEV CPU flags if
     memory encryption was disabled at bootup via the MSR space"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
  x86/resctrl: Fix potential memory leak
  x86/resctrl: Fix an imbalance in domain_remove_cpu()
2020-01-18 13:02:12 -08:00
Xiaochen Shen
536a0d8e79 x86/resctrl: Check monitoring static key in the MBM overflow handler
Currently, there are three static keys in the resctrl file system:
rdt_mon_enable_key and rdt_alloc_enable_key indicate if the monitoring
feature and the allocation feature are enabled, respectively. The
rdt_enable_key is enabled when either the monitoring feature or the
allocation feature is enabled.

If no monitoring feature is present (either hardware doesn't support a
monitoring feature or the feature is disabled by the kernel command line
option "rdt="), rdt_enable_key is still enabled but rdt_mon_enable_key
is disabled.

MBM is a monitoring feature. The MBM overflow handler intends to
check if the monitoring feature is not enabled for fast return.

So check the rdt_mon_enable_key in it instead of the rdt_enable_key as
former is the more accurate check.

 [ bp: Massage commit message. ]

Fixes: e33026831b ("x86/intel_rdt/mbm: Handle counter overflow")
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1576094705-13660-1-git-send-email-xiaochen.shen@intel.com
2020-01-17 19:32:32 +01:00
Tony W Wang-oc
a84de2fa96 x86/speculation/swapgs: Exclude Zhaoxin CPUs from SWAPGS vulnerability
New Zhaoxin family 7 CPUs are not affected by the SWAPGS vulnerability. So
mark these CPUs in the cpu vulnerability whitelist accordingly.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579227872-26972-3-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17 19:13:47 +01:00
Tony W Wang-oc
1e41a766c9 x86/speculation/spectre_v2: Exclude Zhaoxin CPUs from SPECTRE_V2
New Zhaoxin family 7 CPUs are not affected by SPECTRE_V2. So define a
separate cpu_vuln_whitelist bit NO_SPECTRE_V2 and add these CPUs to the cpu
vulnerability whitelist.

Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1579227872-26972-2-git-send-email-TonyWWang-oc@zhaoxin.com
2020-01-17 19:13:47 +01:00
Pawan Gupta
5efc6fa904 x86/cpu: Update cached HLE state on write to TSX_CTRL_CPUID_CLEAR
/proc/cpuinfo currently reports Hardware Lock Elision (HLE) feature to
be present on boot cpu even if it was disabled during the bootup. This
is because cpuinfo_x86->x86_capability HLE bit is not updated after TSX
state is changed via the new MSR IA32_TSX_CTRL.

Update the cached HLE bit also since it is expected to change after an
update to CPUID_CLEAR bit in MSR IA32_TSX_CTRL.

Fixes: 95c5824f75 ("x86/cpu: Add a "tsx=" cmdline option with TSX disabled by default")
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/2529b99546294c893dfa1c89e2b3e46da3369a59.1578685425.git.pawan.kumar.gupta@linux.intel.com
2020-01-17 19:13:46 +01:00
Arnd Bergmann
d0b7788804 x86/apic/uv: Avoid unused variable warning
When CONFIG_PROC_FS is disabled, the compiler warns about an unused
variable:

arch/x86/kernel/apic/x2apic_uv_x.c: In function 'uv_setup_proc_files':
arch/x86/kernel/apic/x2apic_uv_x.c:1546:8: error: unused variable 'name' [-Werror=unused-variable]
  char *name = hubless ? "hubless" : "hubbed";

Simplify the code so this variable is no longer needed.

Fixes: 8785968bce ("x86/platform/uv: Add UV Hubbed/Hubless Proc FS Files")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191212140419.315264-1-arnd@arndb.de
2020-01-17 14:34:41 +01:00
Tom Lendacky
a006483b2f x86/CPU/AMD: Ensure clearing of SME/SEV features is maintained
If the SME and SEV features are present via CPUID, but memory encryption
support is not enabled (MSR 0xC001_0010[23]), the feature flags are cleared
using clear_cpu_cap(). However, if get_cpu_cap() is later called, these
feature flags will be reset back to present, which is not desired.

Change from using clear_cpu_cap() to setup_clear_cpu_cap() so that the
clearing of the flags is maintained.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.16.x-
Link: https://lkml.kernel.org/r/226de90a703c3c0be5a49565047905ac4e94e8f3.1579125915.git.thomas.lendacky@amd.com
2020-01-16 20:23:20 +01:00
Yazen Ghannam
b3f79ae459 x86/amd_nb: Add Family 19h PCI IDs
Add the new PCI Device 18h IDs for AMD Family 19h systems. Note that
Family 19h systems will not have a new PCI root device ID.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200110015651.14887-4-Yazen.Ghannam@amd.com
2020-01-16 17:09:18 +01:00
Yazen Ghannam
89a76171bf x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType
Add support for a new version of the Load Store unit bank type as
indicated by its McaType value, which will be present in future SMCA
systems.

Add the new (HWID, MCATYPE) tuple. Reuse the same name, since this is
logically the same to the user.

Also, add the new error descriptions to edac_mce_amd.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200110015651.14887-2-Yazen.Ghannam@amd.com
2020-01-16 17:09:02 +01:00
Sean Christopherson
bb02e2cb71 x86/cpu: Print "VMX disabled" error message iff KVM is enabled
Don't print an error message about VMX being disabled by BIOS if KVM,
the sole user of VMX, is disabled. E.g. if KVM is disabled and the MSR
is unlocked, the kernel will intentionally disable VMX when locking
feature control and then complain that "BIOS" disabled VMX.

Fixes: ef4d3bf198 ("x86/cpu: Clear VMX feature flag if VMX is not fully enabled")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200114202545.20296-1-sean.j.christopherson@intel.com
2020-01-15 13:26:50 +01:00
Chuansheng Liu
978370956d x86/mce/therm_throt: Do not access uninitialized therm_work
It is relatively easy to trigger the following boot splat on an Ice Lake
client platform. The call stack is like:

  kernel BUG at kernel/timer/timer.c:1152!

  Call Trace:
  __queue_delayed_work
  queue_delayed_work_on
  therm_throt_process
  intel_thermal_interrupt
  ...

The reason is that a CPU's thermal interrupt is enabled prior to
executing its hotplug onlining callback which will initialize the
throttling workqueues.

Such a race can lead to therm_throt_process() accessing an uninitialized
therm_work, leading to the above BUG at a very early bootup stage.

Therefore, unmask the thermal interrupt vector only after having setup
the workqueues completely.

 [ bp: Heavily massage commit message and correct comment formatting. ]

Fixes: f6656208f0 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200107004116.59353-1-chuansheng.liu@intel.com
2020-01-15 11:31:33 +01:00
Arvind Sankar
2f1e1d8ba4 arch/x86/setup: Drop dummy_con initialization
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset.
Drop it from arch setup code.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20191218214506.49252-24-nivedita@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 15:29:19 +01:00
Dmitry Safonov
64b302ab66 x86/vdso: Provide vdso_data offset on vvar_page
VDSO support for time namespaces needs to set up a page with the same
layout as VVAR. That timens page will be placed on position of VVAR page
inside namespace. That page has vdso_data->seq set to 1 to enforce
the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce
the time namespace handling path.

To prepare the time namespace page the kernel needs to know the vdso_data
offset.  Provide arch_get_vdso_data() helper for locating vdso_data on VVAR
page.

Co-developed-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-22-dima@arista.com
2020-01-14 12:20:57 +01:00
Sean Christopherson
85c17291e2 x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether
IA32_FEAT_CTL has been initialized.  This will allow KVM, and any future
subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures
to query platform support, e.g. allows a future patch to remove KVM's
manual IA32_FEAT_CTL MSR checks.

Various features (on platforms that support IA32_FEAT_CTL) are dependent
on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE.  The
MSR is always configured during boot, but only if the CPU vendor is
recognized by the kernel.  Because CPUID doesn't incorporate the current
IA32_FEAT_CTL value in its reporting of relevant features, it's possible
for a feature to be reported as supported in cpufeatures but not truly
enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize
the CPU.

As a result, without the flag, KVM would see VMX as supported even if
IA32_FEAT_CTL hasn't been initialized, and so would need to manually
read the MSR and check the various enabling bits to avoid taking an
unexpected #GP on VMXON.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-14-sean.j.christopherson@intel.com
2020-01-13 18:49:00 +01:00
Sean Christopherson
167a4894c1 x86/cpu: Set synthetic VMX cpufeatures during init_ia32_feat_ctl()
Set the synthetic VMX cpufeatures, which need to be kept to preserve
/proc/cpuinfo's ABI, in the common IA32_FEAT_CTL initialization code.
Remove the vendor code that manually sets the synthetic flags.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-13-sean.j.christopherson@intel.com
2020-01-13 18:43:19 +01:00
Sean Christopherson
14442a159c x86/cpu: Print VMX flags in /proc/cpuinfo using VMX_FEATURES_*
Add support for generating VMX feature names in capflags.c and use the
resulting x86_vmx_flags to print the VMX flags in /proc/cpuinfo.  Don't
print VMX flags if no bits are set in word 0, which holds Pin Controls.
Pin Control's INTR and NMI exiting are fundamental pillars of VMX, if
they are not supported then the CPU is broken, it does not actually
support VMX, or the kernel wasn't built with support for the target CPU.

Print the features in a dedicated "vmx flags" line to avoid polluting
the common "flags" and to avoid having to prefix all flags with "vmx_",
which results in horrendously long names.

Keep synthetic VMX flags in cpufeatures to preserve /proc/cpuinfo's ABI
for those flags.  This means that "flags" and "vmx flags" will have
duplicate entries for tpr_shadow (virtual_tpr), vnmi, ept, flexpriority,
vpid and ept_ad, but caps the pollution of "flags" at those six VMX
features.  The vendor-specific code that populates the synthetic flags
will be consolidated in a future patch to further minimize the lasting
damage.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-12-sean.j.christopherson@intel.com
2020-01-13 18:36:02 +01:00
Sean Christopherson
b47ce1fed4 x86/cpu: Detect VMX features on Intel, Centaur and Zhaoxin CPUs
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill
the capabilities during IA32_FEAT_CTL MSR initialization.

Make the VMX capabilities dependent on IA32_FEAT_CTL and
X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't
possibly support VMX, or when /proc/cpuinfo is not available.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
2020-01-13 18:02:53 +01:00
Sean Christopherson
ef4d3bf198 x86/cpu: Clear VMX feature flag if VMX is not fully enabled
Now that IA32_FEAT_CTL is always configured and locked for CPUs that are
known to support VMX[*], clear the VMX capability flag if the MSR is
unsupported or BIOS disabled VMX, i.e. locked IA32_FEAT_CTL and didn't
set the appropriate VMX enable bit.

[*] Because init_ia32_feat_ctl() is called from vendors ->c_init(), it's
    still possible for IA32_FEAT_CTL to be left unlocked when VMX is
    supported by the CPU.  This is not fatal, and will be addressed in a
    future patch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-9-sean.j.christopherson@intel.com
2020-01-13 17:53:57 +01:00
Sean Christopherson
7d37953ba8 x86/zhaoxin: Use common IA32_FEAT_CTL MSR initialization
Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Zhaoxin CPU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-8-sean.j.christopherson@intel.com
2020-01-13 17:50:40 +01:00
Sean Christopherson
501444905f x86/centaur: Use common IA32_FEAT_CTL MSR initialization
Use the recently added IA32_FEAT_CTL MSR initialization sequence to
opportunistically enable VMX support when running on a Centaur CPU.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-7-sean.j.christopherson@intel.com
2020-01-13 17:48:36 +01:00
Sean Christopherson
6d527cebfa x86/mce: WARN once if IA32_FEAT_CTL MSR is left unlocked
WARN if the IA32_FEAT_CTL MSR is somehow left unlocked now that CPU
initialization unconditionally locks the MSR.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-6-sean.j.christopherson@intel.com
2020-01-13 17:47:18 +01:00
Sean Christopherson
1db2a6e1e2 x86/intel: Initialize IA32_FEAT_CTL MSR at boot
Opportunistically initialize IA32_FEAT_CTL to enable VMX when the MSR is
left unlocked by BIOS.  Configuring feature control at boot time paves
the way for similar enabling of other features, e.g. Software Guard
Extensions (SGX).

Temporarily leave equivalent KVM code in place in order to avoid
introducing a regression on Centaur and Zhaoxin CPUs, e.g. removing
KVM's code would leave the MSR unlocked on those CPUs and would break
existing functionality if people are loading kvm_intel on Centaur and/or
Zhaoxin.  Defer enablement of the boot-time configuration on Centaur and
Zhaoxin to future patches to aid bisection.

Note, Local Machine Check Exceptions (LMCE) are also supported by the
kernel and enabled via feature control, but the kernel currently uses
LMCE if and only if the feature is explicitly enabled by BIOS.  Keep
the current behavior to avoid introducing bugs, future patches can opt
in to opportunistic enabling if it's deemed desirable to do so.

Always lock IA32_FEAT_CTL if it exists, even if the CPU doesn't support
VMX, so that other existing and future kernel code that queries the MSR
can assume it's locked.

Start from a clean slate when constructing the value to write to
IA32_FEAT_CTL, i.e. ignore whatever value BIOS left in the MSR so as not
to enable random features or fault on the WRMSR.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-5-sean.j.christopherson@intel.com
2020-01-13 17:45:45 +01:00
Sean Christopherson
32ad73db7f x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13 17:23:08 +01:00
Xiaochen Shen
dc433797c6 x86/resctrl: Do not reconfigure exiting tasks
When writing a pid to file "tasks", a callback function move_myself() is
queued to this task to be called when the task returns from kernel mode
or exits. The purpose of move_myself() is to activate the newly assigned
closid and/or rmid associated with this task. This activation is done by
calling resctrl_sched_in() from move_myself(), the same function that is
called when switching to this task.

If this work is successfully queued but then the task enters PF_EXITING
status (e.g., receiving signal SIGKILL, SIGTERM) prior to the
execution of the callback move_myself(), move_myself() still calls
resctrl_sched_in() since the task status is not currently considered.

When a task is exiting, the data structure of the task itself will
be freed soon. Calling resctrl_sched_in() to write the register that
controls the task's resources is unnecessary and it implies extra
performance overhead.

Add check on task status in move_myself() and return immediately if the
task is PF_EXITING.

 [ bp: Massage. ]

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lkml.kernel.org/r/1578500026-21152-1-git-send-email-xiaochen.shen@intel.com
2020-01-13 14:10:21 +01:00
Jan H. Schönherr
7a8bc2b046 x86/mce: Fix use of uninitialized MCE message string
The function mce_severity() is not required to update its msg argument.
In fact, mce_severity_amd() does not, which makes mce_no_way_out()
return uninitialized data, which may be used later for printing.

Assuming that implementations of mce_severity() either always or never
update the msg argument (which is currently the case), it is sufficient
to initialize the temporary variable in mce_no_way_out().

While at it, avoid printing a useless "Unknown".

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200103150722.20313-4-jschoenh@amazon.de
2020-01-13 10:07:56 +01:00
Jan H. Schönherr
90454e4959 x86/mce: Fix mce=nobootlog
Since commit

  8b38937b7a ("x86/mce: Do not enter deferred errors into the generic
		 pool twice")

the mce=nobootlog option has become mostly ineffective (after being only
slightly ineffective before), as the code is taking actions on MCEs left
over from boot when they have a usable address.

Move the check for MCP_DONTLOG a bit outward to make it effective again.

Also, since commit

  011d826111 ("RAS: Add a Corrected Errors Collector")

the two branches of the remaining "if" at the bottom of machine_check_poll()
do same. Unify them.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200103150722.20313-3-jschoenh@amazon.de
2020-01-13 10:07:56 +01:00
Jan H. Schönherr
8438b84ab4 x86/mce: Take action on UCNA/Deferred errors again
Commit

  fa92c58694 ("x86, mce: Support memory error recovery for both UCNA
		and Deferred error in machine_check_poll")

added handling of UCNA and Deferred errors by adding them to the ring
for SRAO errors.

Later, commit

  fd4cf79fcc ("x86/mce: Remove the MCE ring for Action Optional errors")

switched storage from the SRAO ring to the unified pool that is still
in use today. In order to only act on the intended errors, a filter
for MCE_AO_SEVERITY is used -- effectively removing handling of
UCNA/Deferred errors again.

Extend the severity filter to include UCNA/Deferred errors again.
Also, generalize the naming of the notifier from SRAO to UC to capture
the extended scope.

Note, that this change may cause a message like the following to appear,
as the same address may be reported as SRAO and as UCNA:

 Memory failure: 0x5fe3284: already hardware poisoned

Technically, this is a return to previous behavior.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200103150722.20313-2-jschoenh@amazon.de
2020-01-13 10:07:23 +01:00
Changbin Du
248ed51048 x86/nmi: Remove irq_work from the long duration NMI handler
First, printk() is NMI-context safe now since the safe printk() has been
implemented and it already has an irq_work to make NMI-context safe.

Second, this NMI irq_work actually does not work if a NMI handler causes
panic by watchdog timeout. It has no chance to run in such case, while
the safe printk() will flush its per-cpu buffers before panicking.

While at it, repurpose the irq_work callback into a function which
concentrates the NMI duration checking and makes the code easier to
follow.

 [ bp: Massage. ]

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200111125427.15662-1-changbin.du@gmail.com
2020-01-11 15:55:39 +01:00
Ingo Molnar
57ad87ddce Merge branch 'x86/mm' into efi/core, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-10 18:53:14 +01:00
Julia Lawall
1429b568ad x86/crash: Use resource_size()
Use resource_size() rather than a verbose computation on
the end and start fields.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

<smpl>
@@ struct resource ptr; @@
- (ptr.end - ptr.start + 1)
+ resource_size(&ptr)
</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/1577900990-8588-10-git-send-email-Julia.Lawall@inria.fr
2020-01-09 14:40:03 +01:00
Benjamin Thiel
b47a36982d x86/cpu: Add a missing prototype for arch_smt_update()
.. in order to fix a -Wmissing-prototype warning.

No functional change.

Signed-off-by: Benjamin Thiel <b.thiel@posteo.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200109121723.8151-1-b.thiel@posteo.de
2020-01-09 14:31:53 +01:00
Brian Gerst
2b10906f2d x86: Remove force_iret()
force_iret() was originally intended to prevent the return to user mode with
the SYSRET or SYSEXIT instructions, in cases where the register state could
have been changed to be incompatible with those instructions.  The entry code
has been significantly reworked since then, and register state is validated
before SYSRET or SYSEXIT are used.  force_iret() no longer serves its original
purpose and can be eliminated.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20191219115812.102620-1-brgerst@gmail.com
2020-01-08 19:40:51 +01:00
Sebastian Andrzej Siewior
bbc55341b9 x86/fpu: Deactivate FPU state after failure during state load
In __fpu__restore_sig(), fpu_fpregs_owner_ctx needs to be reset if the
FPU state was not fully restored. Otherwise the following may happen (on
the same CPU):

  Task A                     Task B               fpu_fpregs_owner_ctx
  *active*                                        A.fpu
  __fpu__restore_sig()
                             ctx switch           load B.fpu
                             *active*             B.fpu
  fpregs_lock()
  copy_user_to_fpregs_zeroing()
    copy_kernel_to_xregs() *modify*
    copy_user_to_xregs() *fails*
  fpregs_unlock()
                            ctx switch            skip loading B.fpu,
                            *active*              B.fpu

In the success case, fpu_fpregs_owner_ctx is set to the current task.

In the failure case, the FPU state might have been modified by loading
the init state.

In this case, fpu_fpregs_owner_ctx needs to be reset in order to ensure
that the FPU state of the following task is loaded from saved state (and
not skipped because it was the previous state).

Reset fpu_fpregs_owner_ctx after a failure during restore occurred, to
ensure that the FPU state for the next task is always loaded.

The problem was debugged-by Yu-cheng Yu <yu-cheng.yu@intel.com>.

 [ bp: Massage commit message. ]

Fixes: 5f409e20b7 ("x86/fpu: Defer FPU state load until return to userspace")
Reported-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191220195906.plk6kpmsrikvbcfn@linutronix.de
2020-01-07 13:44:42 +01:00
Shile Zhang
22a7fa8848 x86/unwind/orc: Fix !CONFIG_MODULES build warning
To fix follwowing warning due to ORC sort moved to build time:

  arch/x86/kernel/unwind_orc.c:210:12: warning: ‘orc_sort_cmp’ defined but not used [-Wunused-function]
  arch/x86/kernel/unwind_orc.c:190:13: warning: ‘orc_sort_swap’ defined but not used [-Wunused-function]

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/c9c81536-2afc-c8aa-c5f8-c7618ecd4f54@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07 08:15:11 +01:00
Frederic Weisbecker
50cc02e599 x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
This is a leftover. Page faults, just like most other exceptions,
are protected inside user_exit() / user_enter() calls in x86 entry code
when we fault from userspace. So this pair of calls is now superfluous.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191227163612.10039-3-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07 08:11:23 +01:00
Yu-cheng Yu
158e2ee61f x86/fpu/xstate: Make xfeature_is_supervisor()/xfeature_is_user() return bool
Have both xfeature_is_supervisor()/xfeature_is_user() return bool
because they are used only in boolean context.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191212210855.19260-3-yu-cheng.yu@intel.com
2020-01-06 19:08:40 +01:00
Yu-cheng Yu
8c9e607376 x86/fpu/xstate: Fix small issues
In response to earlier comments, fix small issues before introducing
XSAVES supervisor states:

- Fix comments of xfeature_is_supervisor().
- Replace ((u64)1 << 63) with XCOMP_BV_COMPACTED_FORMAT.

No functional changes.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191212210855.19260-2-yu-cheng.yu@intel.com
2020-01-06 17:31:11 +01:00
Christoph Hellwig
4bdc0d676a remove ioremap_nocache and devm_ioremap_nocache
ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2020-01-06 09:45:59 +01:00
Shakeel Butt
ab6a211443 x86/resctrl: Fix potential memory leak
set_cache_qos_cfg() is leaking memory when the given level is not
RDT_RESOURCE_L3 or RDT_RESOURCE_L2. At the moment, this function is
called with only valid levels but move the allocation after the valid
level checks in order to make it more robust and future proof.

 [ bp: Massage commit message. ]

Fixes: 99adde9b37 ("x86/intel_rdt: Enable L2 CDP in MSR IA32_L2_QOS_CFG")
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20200102165844.133133-1-shakeelb@google.com
2020-01-02 18:26:27 +01:00
Borislav Petkov
36209766ce x86/traps: Cleanup do_general_protection()
Hoist the user_mode() case up because it is less code and can be dealt
with up-front like the other special cases UMIP and vm86.

This saves an indentation level for the kernel-mode #GP case and allows
to "unfold" the code more so that it is more readable.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: x86@kernel.org
2019-12-31 17:29:29 +01:00
Jann Horn
2f004eea0f x86/kasan: Print original address on #GP
Make #GP exceptions caused by out-of-bounds KASAN shadow accesses easier
to understand by computing the address of the original access and
printing that. More details are in the comments in the patch.

This turns an error like this:

  kasan: CONFIG_KASAN_INLINE enabled
  kasan: GPF could be caused by NULL-ptr deref or user memory access
  general protection fault, probably for non-canonical address
      0xe017577ddf75b7dd: 0000 [#1] PREEMPT SMP KASAN PTI

into this:

  general protection fault, probably for non-canonical address
      0xe017577ddf75b7dd: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: maybe wild-memory-access in range
      [0x00badbeefbadbee8-0x00badbeefbadbeef]

The hook is placed in architecture-independent code, but is currently
only wired up to the X86 exception handler because I'm not sufficiently
familiar with the address space layout and exception handling mechanisms
on other architectures.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm <linux-mm@kvack.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191218231150.12139-4-jannh@google.com
2019-12-31 13:15:38 +01:00
Jann Horn
aa49f20462 x86/dumpstack: Introduce die_addr() for die() with #GP fault address
Split __die() into __die_header() and __die_body(). This allows inserting
extra information below the header line that initiates the bug report.

Introduce a new function die_addr() that behaves like die(), but is for
faults only and uses __die_header() and __die_body() so that a future
commit can print extra information after the header line.

 [ bp: Comment the KASAN-specific usage of gp_addr. ]

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191218231150.12139-3-jannh@google.com
2019-12-31 13:11:35 +01:00
Jann Horn
59c1dcbed5 x86/traps: Print address on #GP
A frequent cause of #GP exceptions are memory accesses to non-canonical
addresses. Unlike #PF, #GP doesn't report a fault address in CR2, so the
kernel doesn't currently print the fault address for a #GP.

Luckily, the necessary infrastructure for decoding x86 instructions and
computing the memory address being accessed is already present. Hook
it up to the #GP handler so that the address operand of the faulting
instruction can be figured out and printed.

Distinguish two cases:

  a) (Part of) the memory range being accessed lies in the non-canonical
     address range; in this case, it is likely that the decoded address
     is actually the one that caused the #GP.

  b) The entire memory range of the decoded operand lies in canonical
     address space; the #GP may or may not be related in some way to the
     computed address. Print it, but with hedging language in the message.

While it is already possible to compute the faulting address manually by
disassembling the opcode dump and evaluating the instruction against the
register dump, this should make it slightly easier to identify crashes
at a glance.

Note that the operand length which comes from the instruction decoder
and is used to determine whether the access straddles into non-canonical
address space, is currently somewhat unreliable; but it should be good
enough, considering that Linux on x86-64 never maps the page directly
before the start of the non-canonical range anyway, and therefore the
case where a memory range begins in that page and potentially straddles
into the non-canonical range should be fairly uncommon.

In the case the address is still computed wrongly, it only influences
whether the error message claims that the access is canonical.

 [ bp: Remove ambiguous "we", massage, reflow comments and spacing. ]

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Tested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: kasan-dev@googlegroups.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191218231150.12139-2-jannh@google.com
2019-12-31 12:31:13 +01:00
Qian Cai
e278af89f1 x86/resctrl: Fix an imbalance in domain_remove_cpu()
A system that supports resource monitoring may have multiple resources
while not all of these resources are capable of monitoring. Monitoring
related state is initialized only for resources that are capable of
monitoring and correspondingly this state should subsequently only be
removed from these resources that are capable of monitoring.

domain_add_cpu() calls domain_setup_mon_state() only when r->mon_capable
is true where it will initialize d->mbm_over. However,
domain_remove_cpu() calls cancel_delayed_work(&d->mbm_over) without
checking r->mon_capable resulting in an attempt to cancel d->mbm_over on
all resources, even those that never initialized d->mbm_over because
they are not capable of monitoring. Hence, it triggers a debugobjects
warning when offlining CPUs because those timer debugobjects are never
initialized:

  ODEBUG: assert_init not available (active state 0) object type:
  timer_list hint: 0x0
  WARNING: CPU: 143 PID: 789 at lib/debugobjects.c:484
  debug_print_object
  Hardware name: HP Synergy 680 Gen9/Synergy 680 Gen9 Compute Module, BIOS I40 05/23/2018
  RIP: 0010:debug_print_object
  Call Trace:
  debug_object_assert_init
  del_timer
  try_to_grab_pending
  cancel_delayed_work
  resctrl_offline_cpu
  cpuhp_invoke_callback
  cpuhp_thread_fun
  smpboot_thread_fn
  kthread
  ret_from_fork

Fixes: e33026831b ("x86/intel_rdt/mbm: Handle counter overflow")
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: john.stultz@linaro.org
Cc: sboyd@kernel.org
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: tj@kernel.org
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191211033042.2188-1-cai@lca.pw
2019-12-30 19:25:59 +01:00
Peter Zijlstra
1f676247f3 x86/alternatives: Implement a better poke_int3_handler() completion scheme
Commit:

  285a54efe3 ("x86/alternatives: Sync bp_patching update for avoiding NULL pointer exception")

added an additional text_poke_sync() IPI to text_poke_bp_batch() to
handle the rare case where another CPU is still inside an INT3 handler
while we clear the global state.

Instead of spraying IPIs around, count the active INT3 handlers and
wait for them to go away before proceeding to clear/reuse the data.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:43:29 +01:00
Ingo Molnar
46f5cfc13d Merge branch 'core/kprobes' into perf/core, to pick up a completed branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:43:08 +01:00
Omar Sandoval
8757dc970f x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y
On x86 kernels configured with CONFIG_PROC_KCORE=y and
CONFIG_KEXEC_CORE=n, the vmcoreinfo note in /proc/kcore is incomplete.

Specifically, it is missing arch-specific information like the KASLR
offset and whether 5-level page tables are enabled. This breaks
applications like drgn [1] and crash [2], which need this information
for live debugging via /proc/kcore.

This happens because:

1. CONFIG_PROC_KCORE selects CONFIG_CRASH_CORE.
2. kernel/crash_core.c (compiled if CONFIG_CRASH_CORE=y) calls
   arch_crash_save_vmcoreinfo() to get the arch-specific parts of
   vmcoreinfo. If it is not defined, then it uses a no-op fallback.
3. x86 defines arch_crash_save_vmcoreinfo() in
   arch/x86/kernel/machine_kexec_*.c, which is only compiled if
   CONFIG_KEXEC_CORE=y.

Therefore, an x86 kernel with CONFIG_CRASH_CORE=y and
CONFIG_KEXEC_CORE=n uses the no-op fallback and gets incomplete
vmcoreinfo data. This isn't relevant to kdump, which requires
CONFIG_KEXEC_CORE. It only affects applications which read vmcoreinfo at
runtime, like the ones mentioned above.

Fix it by moving arch_crash_save_vmcoreinfo() into two new
arch/x86/kernel/crash_core_*.c files, which are gated behind
CONFIG_CRASH_CORE.

1: 73dd7def12/libdrgn/program.c (L385)
2: 60a42d7092

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/0589961254102cca23e3618b96541b89f2b249e2.1576858905.git.osandov@fb.com
2019-12-23 12:58:41 +01:00
Linus Torvalds
5c741e2583 Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS fixes from Borislav Petkov:
 "Three urgent RAS fixes for the AMD side of things:

   - initialize struct mce.bank so that calculated error severity on AMD
     SMCA machines is correct

   - do not send IPIs early during bank initialization, when interrupts
     are disabled

   - a fix for when only a subset of MCA banks are enabled, which led to
     boot hangs on some new AMD CPUs"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix possibly incorrect severity calculation on AMD
  x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
  x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
2019-12-21 06:04:12 -08:00
Linus Torvalds
2abf193275 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
 "Add HPET quirks for the Intel 'Coffee Lake H' and 'Ice Lake' platforms"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel: Disable HPET on Intel Ice Lake platforms
  x86/intel: Disable HPET on Intel Coffee Lake H platforms
2019-12-17 11:11:08 -08:00
Jan H. Schönherr
81736abd55 x86/mce: Remove mce_inject_log() in favor of mce_log()
The mutex in mce_inject_log() became unnecessary with commit

  5de97c9f6d ("x86/mce: Factor out and deprecate the /dev/mcelog driver"),

though the original reason for its presence only vanished with commit

  7298f08ea8 ("x86/mcelog: Get rid of RCU remnants").

Drop the mutex. And as that makes mce_inject_log() identical to mce_log(),
get rid of the former in favor of the latter.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191210000733.17979-7-jschoenh@amazon.de
2019-12-17 10:26:41 +01:00
Jan H. Schönherr
2d806d0723 x86/mce: Pass MCE message to mce_panic() on failed kernel recovery
In commit

  b2f9d678e2 ("x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries")

another call to mce_panic() was introduced. Pass the message of the
handled MCE to that instance of mce_panic() as well, as there doesn't
seem to be a reason not to.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191210000733.17979-6-jschoenh@amazon.de
2019-12-17 10:26:35 +01:00
Arnd Bergmann
db1ae0314f x86/mce/therm_throt: Mark throttle_active_work() as __maybe_unused
throttle_active_work() is only called if CONFIG_SYSFS is set, otherwise
we get a harmless warning:

  arch/x86/kernel/cpu/mce/therm_throt.c:238:13: error: 'throttle_active_work' \
	  defined but not used [-Werror=unused-function]

Mark the function as __maybe_unused to avoid the warning.

Fixes: f6656208f0 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bberg@redhat.com
Cc: ckellner@redhat.com
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: hdegoede@redhat.com
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191210203925.3119091-1-arnd@arndb.de
2019-12-17 10:26:28 +01:00
Jan H. Schönherr
a3a57ddad0 x86/mce: Fix possibly incorrect severity calculation on AMD
The function mce_severity_amd_smca() requires m->bank to be initialized
for correct operation. Fix the one case, where mce_severity() is called
without doing so.

Fixes: 6bda529ec4 ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
Fixes: d28af26faa ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
Link: https://lkml.kernel.org/r/20191210000733.17979-4-jschoenh@amazon.de
2019-12-17 09:39:53 +01:00
Yazen Ghannam
966af20929 x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[]
Each logical CPU in Scalable MCA systems controls a unique set of MCA
banks in the system. These banks are not shared between CPUs. The bank
types and ordering will be the same across CPUs on currently available
systems.

However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
other CPUs do not. In this case, the bank seen as Reserved on one CPU is
assumed to be the same type as the bank seen as a known type on another
CPU.

In general, this occurs when the hardware represented by the MCA bank
is disabled, e.g. disabled memory controllers on certain models, etc.
The MCA bank is disabled in the hardware, so there is no possibility of
getting an MCA/MCE from it even if it is assumed to have a known type.

For example:

Full system:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          UMC
	 2    |         CS          |          CS

System with hardware disabled:
	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         UMC         |          RAZ
	 2    |         CS          |          CS

For this reason, there is a single, global struct smca_banks[] that is
initialized at boot time. This array is initialized on each CPU as it
comes online. However, the array will not be updated if an entry already
exists.

This works as expected when the first CPU (usually CPU0) has all
possible MCA banks enabled. But if the first CPU has a subset, then it
will save a "Reserved" type in smca_banks[]. Successive CPUs will then
not be able to update smca_banks[] even if they encounter a known bank
type.

This may result in unexpected behavior. Depending on the system
configuration, a user may observe issues enumerating the MCA
thresholding sysfs interface. The issues may be as trivial as sysfs
entries not being available, or as severe as system hangs.

For example:

	Bank  |  Type seen on CPU0  |  Type seen on CPU1
	------------------------------------------------
	 0    |         LS          |          LS
	 1    |         RAZ         |          UMC
	 2    |         CS          |          CS

Extend the smca_banks[] entry check to return if the entry is a
non-reserved type. Otherwise, continue so that CPUs that encounter a
known bank type can update smca_banks[].

Fixes: 68627a697c ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
2019-12-17 09:39:53 +01:00
Konstantin Khlebnikov
246ff09f89 x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure()
... because interrupts are disabled that early and sending IPIs can
deadlock:

  BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
  no locks held by swapper/1/0.
  irq event stamp: 0
  hardirqs last  enabled at (0): [<0000000000000000>] 0x0
  hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
  softirqs last  enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  Preemption disabled at:
  [<ffffffff8104703b>] start_secondary+0x3b/0x190
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1
  Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
  Call Trace:
   dump_stack
   ___might_sleep.cold.92
   wait_for_completion
   ? generic_exec_single
   rdmsr_safe_on_cpu
   ? wrmsr_on_cpus
   mce_amd_feature_init
   mcheck_cpu_init
   identify_cpu
   identify_secondary_cpu
   smp_store_cpu_info
   start_secondary
   secondary_startup_64

The function smca_configure() is called only on the current CPU anyway,
therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid
the IPI.

 [ bp: Update commit message. ]

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
2019-12-17 09:39:33 +01:00
Borislav Petkov
d157aa0fb2 x86/cpu/tsx: Define pr_fmt()
... so that all current and future pr_* statements in this file have the
proper prefix.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20191112221823.19677-2-bp@alien8.de
2019-12-15 10:58:54 +01:00
Borislav Petkov
72c2ce9867 x86/bugs: Move enum taa_mitigations to bugs.c
... because it is used only there.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20191112221823.19677-1-bp@alien8.de
2019-12-14 16:06:33 +01:00
yu kuai
27353d5785 x86/process: Remove set but not used variables prev and next
Remove two unused variables:

  arch/x86/kernel/process.c: In function ‘__switch_to_xtra’:
  arch/x86/kernel/process.c:618:31: warning: variable ‘next’ set but not used [-Wunused-but-set-variable]
    618 |  struct thread_struct *prev, *next;
        |                               ^~~~
  arch/x86/kernel/process.c:618:24: warning: variable ‘prev’ set but not used [-Wunused-but-set-variable]
    618 |  struct thread_struct *prev, *next;
        |

They are never used and so can be removed.

Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Cc: yi.zhang@huawei.com
Cc: zhengbin13@huawei.com
Link: https://lkml.kernel.org/r/20191213121253.10072-1-yukuai3@huawei.com
2019-12-14 08:26:00 +01:00
Linus Torvalds
22ff311af9 treewide conversion from FIELD_SIZEOF() to sizeof_field()
-----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl3umDgWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlvsD/49R12HK7UzTxNTrcpvbadJ4t7j
 j/qJvjMerW7iVNAPOoNAOePUa21+y3rI1AZPvoPyzIqp1Bf2eOICf5SdisG2cG+O
 X0A8EKWvS0SSQWSKaT6udUKJ3nBJItwvOvQ5B58KQzcOj3S4X7B9iVBWgieMHrzz
 urkZm7pqowrZB3wuF8keRtli5IZaoiCwzApy48Qrn70G3OeXymknFbpHTDwIAiGw
 RiE5Xh0R4EzQdsYyCgjR8U56gBchadAmj8BUJU0ppMnOFMyIAG670hNLrs0L3roP
 8TOIeyb993ZC5GZaMlnR8mz0jfibfkPa3Z85VAsVyQSPaOQldwc9j8TGBqD5Gfat
 1PjOU5RVwma0pH5xTPOeevWPQpIK9KovQpQYqMMN9GMxOEx96IOUjwTrnNK2xWoN
 UGyOVlESFGoniClhCiKYzPSrYOjlIBk5ovf15PdTe+bwyUDMfyfy5CZV88OS2DHz
 ZBZvpLrH/EMW9zJ+FqMTp0C4s4wa2Ioid3bSh6XuNUTtltKSjp71eUja8ZEz+2sd
 5AGstCC+hYqxaEk+6/851pfkQ9sbBjwuGtNrtX+pqreiLUvWLhQ0yUj6cLXlEQNH
 aucjCukCjI+4lMzofeaQ2LbNhtff4YsfO4b1Ye8maoDdHjzUVL57n3bTOxKhdzbt
 y6FM3lApOjk3OyaTJQ==
 =YU4A
 -----END PGP SIGNATURE-----

Merge tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull FIELD_SIZEOF conversion from Kees Cook:
 "A mostly mechanical treewide conversion from FIELD_SIZEOF() to
  sizeof_field(). This avoids the redundancy of having 2 macros
  (actually 3) doing the same thing, and consolidates on sizeof_field().
  While "field" is not an accurate name, it is the common name used in
  the kernel, and doesn't result in any unintended innuendo.

  As there are still users of FIELD_SIZEOF() in -next, I will clean up
  those during this coming development cycle and send the final old
  macro removal patch at that time"

* tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use sizeof_field() macro
  MIPS: OCTEON: Replace SIZEOF_FIELD() macro
2019-12-13 14:02:12 -08:00
Shile Zhang
f14bf6a350 x86/unwind/orc: Remove boot-time ORC unwind tables sorting
Now that the orc_unwind and orc_unwind_ip tables are sorted at build time,
remove the boot time sorting pass.

No change in functionality.

[ mingo: Rewrote the changelog and code comments. ]

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20191204004633.88660-8-shile.zhang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-13 10:47:58 +01:00
Linus Torvalds
6674fdb25a This contains 3 changes:
- Removal of code I accidentally applied when doing a minor fix up
    to a patch, and then using "git commit -a --amend", which pulled
    in some other changes I was playing with.
 
  - Remove an used variable in trace_events_inject code
 
  - Fix to function graph tracer when it traces a ftrace direct function.
    It will now ignore tracing a function that has a ftrace direct
    tramploine attached. This is needed for eBPF to use the ftrace direct
    code.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXfD/thQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qoo2AP4j7ONw7BTmMyo+GdYqPPntBeDnClHK
 vfMKrgK1j5BxYgEA7LgkwuUT9bcyLjfJVcyfeW67rB2PtmovKTWnKihFOwI=
 =DZ6N
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:

 - Remove code I accidentally applied when doing a minor fix up to a
   patch, and then using "git commit -a --amend", which pulled in some
   other changes I was playing with.

 - Remove an used variable in trace_events_inject code

 - Fix function graph tracer when it traces a ftrace direct function.
   It will now ignore tracing a function that has a ftrace direct
   tramploine attached. This is needed for eBPF to use the ftrace direct
   code.

* tag 'trace-v5.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix function_graph tracer interaction with BPF trampoline
  tracing: remove set but not used variable 'buffer'
  module: Remove accidental change of module_enable_x()
2019-12-11 12:22:38 -08:00
Alexei Starovoitov
ff205766db ftrace: Fix function_graph tracer interaction with BPF trampoline
Depending on type of BPF programs served by BPF trampoline it can call original
function. In such case the trampoline will skip one stack frame while
returning. That will confuse function_graph tracer and will cause crashes with
bad RIP. Teach graph tracer to skip functions that have BPF trampoline attached.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-10 13:53:59 -05:00
Sean Christopherson
960786422f x86/ACPI/sleep: Move acpi_get_wakeup_address() into sleep.c, remove <asm/realmode.h> from <asm/acpi.h>
Move the definition of acpi_get_wakeup_address() into sleep.c to break
linux/acpi.h's dependency (by way of asm/acpi.h) on asm/realmode.h.
Everyone and their mother includes linux/acpi.h, i.e. modifying
realmode.h results in a full kernel rebuild, which makes the already
inscrutable real mode boot code even more difficult to understand and is
positively rage inducing when trying to make changes to x86's boot flow.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-13-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:48 +01:00
Sean Christopherson
cb28909525 x86/ACPI/sleep: Remove an unnecessary include of asm/realmode.h
None of the declarations in x86's acpi/sleep.h are in any way dependent
on the real mode boot code.  Remove sleep.h's include of asm/realmode.h
to limit the dependencies on realmode.h to code that actually interacts
with the boot code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-11-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:48 +01:00
Sean Christopherson
6315ec9286 x86/kprobes: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
The inclusion of linux/vmalloc.h, which is required for its definition
of set_vm_flush_reset_perms(), is somehow dependent on asm/realmode.h
being included by asm/acpi.h.  Explicitly include linux/vmalloc.h so
that a future patch can drop the realmode.h include from asm/acpi.h
without breaking the build.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20191126165417.22423-5-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:48 +01:00
Sean Christopherson
ac0b14dc16 x86/ftrace: Explicitly include vmalloc.h for set_vm_flush_reset_perms()
The inclusion of linux/vmalloc.h, which is required for its definition
of set_vm_flush_reset_perms(), is somehow dependent on asm/realmode.h
being included by asm/acpi.h.  Explicitly include linux/vmalloc.h so
that a future patch can drop the realmode.h include from asm/acpi.h
without breaking the build.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20191126165417.22423-4-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:47 +01:00
Sean Christopherson
ca947b72e1 x86/boot: Explicitly include realmode.h to handle RM reservations
Explicitly include asm/realmode.h, which provides reserve_real_mode(),
instead of picking it up by an indirect include of asm/acpi.h.  acpi.h
will soon stop including realmode.h so that changing realmode.h doesn't
require a full kernel rebuild.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Link: https://lkml.kernel.org/r/20191126165417.22423-3-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:15:47 +01:00
Ingo Molnar
186525bd6b mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions
- Untangle the somewhat incestous way of how VMALLOC_START is used all across the
  kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
  It doesn't help that vmalloc.h only includes a single asm header:

     #include <asm/page.h>           /* pgprot_t */

  So there was no existing cross-arch way to decouple address layout
  definitions from page.h details. I used this:

   #ifndef VMALLOC_START
   # include <asm/vmalloc.h>
   #endif

  This way every architecture that wants to simplify page.h can do so.

- Also on x86 we had a couple of LDT related inline functions that used
  the late-stage address space layout positions - but these could be
  uninlined without real trouble - the end result is cleaner this way as
  well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
eb243d1d28 x86/mm/pat: Rename <asm/pat.h> => <asm/memtype.h>
pat.h is a file whose main purpose is to provide the memtype_*() APIs.

PAT is the low level hardware mechanism - but the high level abstraction
is memtype.

So name the header <memtype.h> as well - this goes hand in hand with memtype.c
and memtype_interval.c.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Ingo Molnar
2040cf9f59 Linux 5.5-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl3tf/0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGlKwH/3fTToujuJfTx5E5
 mrARAP65J1L/DxpEKvKRt2bNZo6w13mNd8g7ZPmYChz90bYGvXQSG8hYTU9iAw3O
 yimSTJlNXDhVAluB53XnDdUxIWC4HUZsNxWJNCeXMuiMcGNsTGX+v3f+x7oHCT0P
 jI1RSIsFGjgr0RWqZ8U5aJckQo2xABC1TfYw53K66Oc/JLZpSFJFwMgjf1fD5diU
 HGDA8E2p0u1TQIyNzr86iqMvnlSRYBQwBQn6OgEKCG4Z0NLtXfDF4mqnxsXgLmIH
 oQoFfxaMKXyGWds7ZxwcGWntALCF41ThfpiJWDIyxjWxFEty4bqTCbDPwwyp7ip0
 iuASmTI=
 =YqO2
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc1' into core/kprobes, to resolve conflicts

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:11:00 +01:00
Ingo Molnar
360db4ace3 x86/setup: Enhance the comments
Update various comments, fix outright mistakes and meaningless descriptions.

Also harmonize the style across the file, both in form and in language.

Cc: linux-kernel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 09:59:38 +01:00
Ingo Molnar
12609013c4 x86/setup: Clean up the header portion of setup.c
In 20 years we accumulated 89 #include lines in setup.c,
but we only need 30 of them (!) ...

Get rid of the excessive ones, and while at it, sort the
remaining ones alphabetically.

Also get rid of the incomplete changelogs at the header of the file,
and explain better what this file does.

Cc: linux-kernel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 09:59:37 +01:00
Pankaj Bharadiya
c593642c8b treewide: Use sizeof_field() macro
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done

Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
2019-12-09 10:36:44 -08:00
Kees Cook
4fc265a9c9 x86/mtrr: Require CAP_SYS_ADMIN for all access
Zhang Xiaoxu noted that physical address locations for MTRR were visible
to non-root users, which could be considered an information leak.
In discussing[1] the options for solving this, it sounded like just
moving the capable check into open() was the first step.

If this breaks userspace, then we will have a test case for the more
conservative approaches discussed in the thread. In summary:

- MTRR should check capabilities at open time (or retain the
  checks on the opener's permissions for later checks).

- changing the DAC permissions might break something that expects to
  open mtrr when not uid 0.

- if we leave the DAC permissions alone and just move the capable check
  to the opener, we should get the desired protection. (i.e. check
  against CAP_SYS_ADMIN not just the wider uid 0.)

- if that still breaks things, as in userspace expects to be able to
  read other parts of the file as non-uid-0 and non-CAP_SYS_ADMIN, then
  we need to censor the contents using the opener's permissions. For
  example, as done in other /proc cases, like commit

  51d7b12041 ("/proc/iomem: only expose physical resource addresses to privileged users").

[1] https://lore.kernel.org/lkml/201911110934.AC5BA313@keescook/

Reported-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-security-module@vger.kernel.org
Cc: Matthew Garrett <mjg59@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: x86-ml <x86@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/201911181308.63F06502A1@keescook
2019-12-09 09:24:24 +01:00
Borislav Petkov
2e30dd9e06 x86/mtrr: Get rid of mtrr_seq_show() forward declaration
... by moving the function up in the file.

No functional changes.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20191108200815.24589-1-bp@alien8.de
2019-12-09 09:23:44 +01:00
Linus Torvalds
e5b3fc125d Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Various fixes:

   - Fix the PAT performance regression that downgraded write-combining
     device memory regions to uncached.

   - There's been a number of bugs in 32-bit double fault handling -
     hopefully all fixed now.

   - Fix an LDT crash

   - Fix an FPU over-optimization that broke with GCC9 code
     optimizations.

   - Misc cleanups"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/pat: Fix off-by-one bugs in interval tree search
  x86/ioperm: Save an indentation level in tss_update_io_bitmap()
  x86/fpu: Don't cache access to fpu_fpregs_owner_ctx
  x86/entry/32: Remove unused 'restore_all_notrace' local label
  x86/ptrace: Document FSBASE and GSBASE ABI oddities
  x86/ptrace: Remove set_segment_reg() implementations for current
  x86/traps: die() instead of panicking on a double fault
  x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit
  x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area
  x86/doublefault/32: Rename doublefault.c to doublefault_32.c
  x86/traps: Disentangle the 32-bit and 64-bit doublefault code
  lkdtm: Add a DOUBLE_FAULT crash type on x86
  selftests/x86/single_step_syscall: Check SYSENTER directly
  x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
2019-12-01 19:05:07 -08:00
Linus Torvalds
8fa91bfa9b Merge branch 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fix from Borislav Petkov:
 "One urgent fix for the thermal throttling machinery: the recent change
  reworking the thermal notifications forgot to mask out read-only and
  reserved bits in the thermal status MSRs, leading to exceptions while
  writing those MSRs.

  The fix takes care of masking out those bits first"

* 'ras-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce/therm_throt: Mask out read-only and reserved MSR bits
2019-11-30 14:49:08 -08:00
Borislav Petkov
7b0b8cfd26 x86/ioperm: Save an indentation level in tss_update_io_bitmap()
... for better readability.

No functional changes.

[ Minor edit. ]

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-30 18:06:56 +01:00
Kai-Heng Feng
e0748539e3 x86/intel: Disable HPET on Intel Ice Lake platforms
Like CFL and CFL-H, ICL SoC has skewed HPET timer once it hits PC10.
So let's disable HPET on ICL.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: feng.tang@intel.com
Cc: harry.pan@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20191129062303.18982-2-kai.heng.feng@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-29 12:17:58 +01:00
Kai-Heng Feng
f8edbde885 x86/intel: Disable HPET on Intel Coffee Lake H platforms
Coffee Lake H SoC has similar behavior as Coffee Lake, skewed HPET timer
once the SoCs entered PC10.

So let's disable HPET on CFL-H platforms.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@alien8.de
Cc: feng.tang@intel.com
Cc: harry.pan@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20191129062303.18982-1-kai.heng.feng@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-29 12:17:58 +01:00
Srinivas Pandruvada
5a43b87b3c x86/mce/therm_throt: Mask out read-only and reserved MSR bits
While writing to MSR IA32_THERM_STATUS/IA32_PKG_THERM_STATUS, avoid
writing 1 to read only and reserved fields because updating some fields
generates exception.

 [ bp: Vertically align for better readability. ]

Fixes: f6656208f0 ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Tested-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191128150824.22413-1-srinivas.pandruvada@linux.intel.com
2019-11-29 09:17:52 +01:00
Linus Torvalds
81b6b96475 dma-mapping updates for 5.5-rc1
- improve dma-debug scalability (Eric Dumazet)
  - tiny dma-debug cleanup (Dan Carpenter)
  - check for vmap memory in dma_map_single (Kees Cook)
  - check for dma_addr_t overflows in dma-direct when using
    DMA offsets (Nicolas Saenz Julienne)
  - switch the x86 sta2x11 SOC to use more generic DMA code
    (Nicolas Saenz Julienne)
  - fix arm-nommu dma-ranges handling (Vladimir Murzin)
  - use __initdata in CMA (Shyam Saini)
  - replace the bus dma mask with a limit (Nicolas Saenz Julienne)
  - merge the remapping helpers into the main dma-direct flow (me)
  - switch xtensa to the generic dma remap handling (me)
  - various cleanups around dma_capable (me)
  - remove unused dev arguments to various dma-noncoherent helpers (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAl3f+eULHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPyPg/+PVHCrhmepudQQFHu6wfurE5U77iNnoUifvG+b5z5
 5mHmTMkQwyox6rKDe8NuFApAhz1VJDSUgSelPmvTSOIEIGXCvX1p+GqRSVS5YQON
 aLzGvbWKE8hCpaPdDHKYDauD1FZGMM8L2P5oOMF9X9fQ94xxRqfqJM6c8iD16Sgg
 +aOgPNzTnxQHJFF/Dbt/mjJrKXWI+XF+bgUbH+l9yKa7Dd7ibmJR8yl9hs1jmp0H
 1CZ+CizwnAs57rCd1a6Ybc6gj59tySc03NMnnbTko+KDxrcbD3Ee2tpqHVkkCjYz
 Yl0m4FIpbotrpokL/FIS727bVvkjbWgoeM+kiVPoYzmZea3pq/tFDr6tp/BxDhFj
 TZXSFfgQljlYMD3ppSoklFlfjGriVWV0tPO3arPXwuuMF5EX/IMQmvxei05jpc8n
 iELNXOP9iZZkY4tLHy2hn2uWrxBRrS1WQwlLg9hahlNRzyfFSyHeP0zWlVDt+RgF
 5CCbEI+HQcUqg1FApB30lQNWTn1+dJftrpKVBlgNBIyIa/z2rFbt8GdSnItxjfQX
 /XX8EZbFvF6AcXkgURkYFIoKM/EbYShOSLcYA3PTUtcuTnF6Kk5eimySiGWZTVCS
 prruSFDZJOvL3SnOIMIiYVmBdB7lEbDyLI/VYuhoECXEDCJpVmRktNkJNg4q6/E+
 fjQ=
 =e5wO
 -----END PGP SIGNATURE-----

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux; tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - improve dma-debug scalability (Eric Dumazet)

 - tiny dma-debug cleanup (Dan Carpenter)

 - check for vmap memory in dma_map_single (Kees Cook)

 - check for dma_addr_t overflows in dma-direct when using DMA offsets
   (Nicolas Saenz Julienne)

 - switch the x86 sta2x11 SOC to use more generic DMA code (Nicolas
   Saenz Julienne)

 - fix arm-nommu dma-ranges handling (Vladimir Murzin)

 - use __initdata in CMA (Shyam Saini)

 - replace the bus dma mask with a limit (Nicolas Saenz Julienne)

 - merge the remapping helpers into the main dma-direct flow (me)

 - switch xtensa to the generic dma remap handling (me)

 - various cleanups around dma_capable (me)

 - remove unused dev arguments to various dma-noncoherent helpers (me)

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux:

* tag 'dma-mapping-5.5' of git://git.infradead.org/users/hch/dma-mapping: (22 commits)
  dma-mapping: treat dev->bus_dma_mask as a DMA limit
  dma-direct: exclude dma_direct_map_resource from the min_low_pfn check
  dma-direct: don't check swiotlb=force in dma_direct_map_resource
  dma-debug: clean up put_hash_bucket()
  powerpc: remove support for NULL dev in __phys_to_dma / __dma_to_phys
  dma-direct: avoid a forward declaration for phys_to_dma
  dma-direct: unify the dma_capable definitions
  dma-mapping: drop the dev argument to arch_sync_dma_for_*
  x86/PCI: sta2x11: use default DMA address translation
  dma-direct: check for overflows on 32 bit DMA addresses
  dma-debug: increase HASH_SIZE
  dma-debug: reorder struct dma_debug_entry fields
  xtensa: use the generic uncached segment support
  dma-mapping: merge the generic remapping helpers into dma-direct
  dma-direct: provide mmap and get_sgtable method overrides
  dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages
  dma-direct: remove __dma_direct_free_pages
  usb: core: Remove redundant vmap checks
  kernel: dma-contiguous: mark CMA parameters __initdata/__initconst
  dma-debug: add a schedule point in debug_dma_dump_mappings()
  ...
2019-11-28 11:16:43 -08:00
Linus Torvalds
95f1fa9e34 New tracing features:
- PERAMAENT flag to ftrace_ops when attaching a callback to a function
    As /proc/sys/kernel/ftrace_enabled when set to zero will disable all
    attached callbacks in ftrace, this has a detrimental impact on live
    kernel tracing, as it disables all that it patched. If a ftrace_ops
    is registered to ftrace with the PERMANENT flag set, it will prevent
    ftrace_enabled from being disabled, and if ftrace_enabled is already
    disabled, it will prevent a ftrace_ops with PREMANENT flag set from
    being registered.
 
  - New register_ftrace_direct(). As eBPF would like to register its own
    trampolines to be called by the ftrace nop locations directly,
    without going through the ftrace trampoline, this function has been
    added. This allows for eBPF trampolines to live along side of
    ftrace, perf, kprobe and live patching. It also utilizes the ftrace
    enabled_functions file that keeps track of functions that have been
    modified in the kernel, to allow for security auditing.
 
  - Allow for kernel internal use of ftrace instances. Subsystems in
    the kernel can now create and destroy their own tracing instances
    which allows them to have their own tracing buffer, and be able
    to record events without worrying about other users from writing over
    their data.
 
  - New seq_buf_hex_dump() that lets users use the hex_dump() in their
    seq_buf usage.
 
  - Notifications now added to tracing_max_latency to allow user space
    to know when a new max latency is hit by one of the latency tracers.
 
  - Wider spread use of generic compare operations for use of bsearch and
    friends.
 
  - More synthetic event fields may be defined (32 up from 16)
 
  - Use of xarray for architectures with sparse system calls, for the
    system call trace events.
 
 This along with small clean ups and fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXdwv4BQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnB5AP91vsdHQjwE1+/UWG/cO+qFtKvn2QJK
 QmBRIJNH/s+1TAD/fAOhgw+ojSK3o/qc+NpvPTEW9AEwcJL1wacJUn+XbQc=
 =ztql
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "New tracing features:

   - New PERMANENT flag to ftrace_ops when attaching a callback to a
     function.

     As /proc/sys/kernel/ftrace_enabled when set to zero will disable
     all attached callbacks in ftrace, this has a detrimental impact on
     live kernel tracing, as it disables all that it patched. If a
     ftrace_ops is registered to ftrace with the PERMANENT flag set, it
     will prevent ftrace_enabled from being disabled, and if
     ftrace_enabled is already disabled, it will prevent a ftrace_ops
     with PREMANENT flag set from being registered.

   - New register_ftrace_direct().

     As eBPF would like to register its own trampolines to be called by
     the ftrace nop locations directly, without going through the ftrace
     trampoline, this function has been added. This allows for eBPF
     trampolines to live along side of ftrace, perf, kprobe and live
     patching. It also utilizes the ftrace enabled_functions file that
     keeps track of functions that have been modified in the kernel, to
     allow for security auditing.

   - Allow for kernel internal use of ftrace instances.

     Subsystems in the kernel can now create and destroy their own
     tracing instances which allows them to have their own tracing
     buffer, and be able to record events without worrying about other
     users from writing over their data.

   - New seq_buf_hex_dump() that lets users use the hex_dump() in their
     seq_buf usage.

   - Notifications now added to tracing_max_latency to allow user space
     to know when a new max latency is hit by one of the latency
     tracers.

   - Wider spread use of generic compare operations for use of bsearch
     and friends.

   - More synthetic event fields may be defined (32 up from 16)

   - Use of xarray for architectures with sparse system calls, for the
     system call trace events.

  This along with small clean ups and fixes"

* tag 'trace-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (51 commits)
  tracing: Enable syscall optimization for MIPS
  tracing: Use xarray for syscall trace events
  tracing: Sample module to demonstrate kernel access to Ftrace instances.
  tracing: Adding new functions for kernel access to Ftrace instances
  tracing: Fix Kconfig indentation
  ring-buffer: Fix typos in function ring_buffer_producer
  ftrace: Use BIT() macro
  ftrace: Return ENOTSUPP when DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not configured
  ftrace: Rename ftrace_graph_stub to ftrace_stub_graph
  ftrace: Add a helper function to modify_ftrace_direct() to allow arch optimization
  ftrace: Add helper find_direct_entry() to consolidate code
  ftrace: Add another check for match in register_ftrace_direct()
  ftrace: Fix accounting bug with direct->count in register_ftrace_direct()
  ftrace/selftests: Fix spelling mistake "wakeing" -> "waking"
  tracing: Increase SYNTH_FIELDS_MAX for synthetic_events
  ftrace/samples: Add a sample module that implements modify_ftrace_direct()
  ftrace: Add modify_ftrace_direct()
  tracing: Add missing "inline" in stub function of latency_fsnotify()
  tracing: Remove stray tab in TRACE_EVAL_MAP_FILE's help text
  tracing: Use seq_buf_hex_dump() to dump buffers
  ...
2019-11-27 11:42:01 -08:00