Commit Graph

35042 Commits

Author SHA1 Message Date
Joerg Roedel
86cf69f1d8 x86/mm/32: implement arch_sync_kernel_mappings()
Implement the function to sync changes in vmalloc and ioremap ranges to
all page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-6-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Joerg Roedel
8e19843c36 x86/mm/64: implement arch_sync_kernel_mappings()
Implement the function to sync changes in vmalloc and ioremap ranges to
all page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-5-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
88dca4ca5a mm: remove the pgprot argument to __vmalloc
The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
cca98e9f8b mm: enforce that vmap can't map pages executable
To help enforcing the W^X protection don't allow remapping existing pages
as executable.

x86 bits from Peter Zijlstra, arm64 bits from Mark Rutland.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-20-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Christoph Hellwig
0348801151 x86: fix vmap arguments in map_irq_stack
vmap does not take a gfp_t, the flags argument is for VM_* flags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-3-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:10 -07:00
Christoph Hellwig
78bb17f76e x86/hyperv: use vmalloc_exec for the hypercall page
Patch series "decruft the vmalloc API", v2.

Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.

I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic.  This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules.  Besides that it removes more than 300 lines of code.

This patch (of 29):

Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lkml.kernel.org/r/20200414131348.444715-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200414131348.444715-2-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:10 -07:00
Steven Price
99395ee3f7 mm: ptdump: expand type of 'val' in note_page()
The page table entry is passed in the 'val' argument to note_page(),
however this was previously an "unsigned long" which is fine on 64-bit
platforms.  But for 32 bit x86 it is not always big enough to contain a
page table entry which may be 64 bits.

Change the type to u64 to ensure that it is always big enough.

[akpm@linux-foundation.org: fix riscv]
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200521152308.33096-3-steven.price@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:10 -07:00
Steven Price
1494e0c38e x86: mm: ptdump: calculate effective permissions correctly
Patch series "Fix W+X debug feature on x86"

Jan alerted me[1] that the W+X detection debug feature was broken in x86
by my change[2] to switch x86 to use the generic ptdump infrastructure.

Fundamentally the approach of trying to move the calculation of
effective permissions into note_page() was broken because note_page() is
only called for 'leaf' entries and the effective permissions are passed
down via the internal nodes of the page tree.  The solution I've taken
here is to create a new (optional) callback which is called for all
nodes of the page tree and therefore can calculate the effective
permissions.

Secondly on some configurations (32 bit with PAE) "unsigned long" is not
large enough to store the table entries.  The fix here is simple - let's
just use a u64.

[1] https://lore.kernel.org/lkml/d573dc7e-e742-84de-473d-f971142fa319@suse.com/
[2] 2ae27137b2 ("x86: mm: convert dump_pagetables to use walk_page_range")

This patch (of 2):

By switching the x86 page table dump code to use the generic code the
effective permissions are no longer calculated correctly because the
note_page() function is only called for *leaf* entries.  To calculate
the actual effective permissions it is necessary to observe the full
hierarchy of the page tree.

Introduce a new callback for ptdump which is called for every entry and
can therefore update the prot_levels array correctly.  note_page() can
then simply access the appropriate element in the array.

[steven.price@arm.com: make the assignment conditional on val != 0]
  Link: http://lkml.kernel.org/r/430c8ab4-e7cd-6933-dde6-087fac6db872@arm.com
Fixes: 2ae27137b2 ("x86: mm: convert dump_pagetables to use walk_page_range")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200521152308.33096-1-steven.price@arm.com
Link: http://lkml.kernel.org/r/20200521152308.33096-2-steven.price@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:09 -07:00
Linus Torvalds
9bf9511e3d Add support for wider Memory Bandwidth Monitoring counters by querying
their width from CPUID. As a prerequsite, streamline and unify the CPUID
 detection of the respective resource control attributes. By Reinette
 Chatre.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7VNQYACgkQEsHwGGHe
 VUqFgw//Qr6Ot21wtZAgVSCOPJ7rWH4aR6Bbq9YN0/kUGk2PbAjfFK3aMekHd4u6
 +cTc1eroOiWc5mXurbJ6gfmDgTYHj09DII/qXIi9wxWHPyAZ3cBltGXG9A86lIHZ
 hyA6l3Z+RzhnIKoATbXzaEy1nEoGMgCsezuGmN2d1KM2Fl8dmYHT+88iMDSazNb3
 D51+HHvUF03+h41A11mw7Omknycpk3wOmNX0A+t55EExW8xrYjjenY+suWvhDoGj
 +uqDVhYjxvVziI0lsAfcDfN3R3MuPMBXJXtwf9qaZODGs4ttTg/lyHRcTqGBVwrf
 xGuDw0qRLvDKy9UC09H5F7ArScx9X8bOu9NFjNA2AZyLLdhVpTa5AuX1T7VroRkD
 rcDx++EEjVEX36wBoGbwrfZTcaL3er8sPVZiXG1dDc5/GqWfP+abx4G1JPG9xRYH
 V4oghEsBAJXRfGt07PTNSTqDWu/dQWmMUIVfGWX97l+5ED+HlL24MVoThkkH6f2M
 7uAj8IRsbgzn207nZD8DNorARcQVsK2VvzgZzbqa0VsArt57Xk8DSZpfsqw8B7OJ
 OwuA/0S6nFQX9g6C0eLR68WPwv6YrLjKMEO7KJukNbaN/pcVGKXaJ/dSTwpsNQbC
 JZeAuc7DSYJ3Iv+xh64DvpL3hvb46J1UMRCWoVFw3ZQcnGV6t0E=
 =JeAm
 -----END PGP SIGNATURE-----

Merge tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache resource control updates from Borislav Petkov:
 "Add support for wider Memory Bandwidth Monitoring counters by querying
  their width from CPUID.

  As a prerequsite for that, streamline and unify the CPUID detection of
  the respective resource control attributes.

  By Reinette Chatre"

* tag 'x86_cache_updates_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Support wider MBM counters
  x86/resctrl: Support CPUID enumeration of MBM counter width
  x86/resctrl: Maintain MBM counter width per resource
  x86/resctrl: Query LLC monitoring properties once during boot
  x86/resctrl: Remove unnecessary RMID checks
  x86/cpu: Move resctrl CPUID code to resctrl/
  x86/resctrl: Rename asm/resctrl_sched.h to asm/resctrl.h
2020-06-01 12:24:14 -07:00
Linus Torvalds
ef34ba6d36 A single fix for late microcode loading to handle the correct return
value from stop_machine(), from Mihai Carabas.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7UxoEACgkQEsHwGGHe
 VUrQdxAArdz2s/EtyhFeacKGp6nFG8wHyMVFbwJ/6ZzKcaX6zVc4PI/5O1837ls7
 rFMZt5r21kxfLB3/wMAOZGI+ZP7i6IXzcwBI5/BmS+YK+t3PqWeT+iTNo8hr9tI/
 d8Xly4sE/CIrPZduZPnNVsrRdzqKDs/KMnnPTxZWVNDWMVOKHJZtJ2Ty8eHZsgwl
 b4yBL1JiZHELSb9SrMhZfortogB2eSUaFABWYJMhGJ8XHQ6AZ+A3EB+he/9Zu3Wu
 Giz4LvnhCGJyhTLaDHRUhMfLHo1knl6LNS6QNqVSP82TKRlX3AVeDnHST968BeTr
 ronLTvOVkkZcpvk5ukeSqcBFhxiio9R1rUbkfZlYPt63m/6uWCiMzzOGXB+JTtYc
 5of95CXehYj41XlQVkQtJJmoysYdt7JJw0w5+Cr3Uuov/RKOEiCdrgemOxOmIcM+
 YJ8m+lTn95+8PXFjg/kvweZA7rXr1HcPhfmd9tCMha2k6b1MbdaMT3xb+m1vGXD/
 BRojkuqf7OK19T/Owcum6A/oBmjuNjPZPL5HapQ9ZbMz6AZ3InRmaU+8EvQLIer7
 iimQYWzTTdlZsresJh2+itPMf1EVyHVRnzFlx/N1BMhAxpR2aYXwGA5WKxk10p7U
 80iejJntiNwXJmCHXXiQ55Dyii0vZykJv2FbGjLF4xUUGB/zxW8=
 =g4rq
 -----END PGP SIGNATURE-----

Merge tag 'x86_microcode_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 microcode update from Borislav Petkov:
 "A single fix for late microcode loading to handle the correct return
  value from stop_machine(), from Mihai Carabas"

* tag 'x86_microcode_for_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode: Fix return value for microcode late loading
2020-06-01 12:22:53 -07:00
Linus Torvalds
81e8c10dac Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Introduce crypto_shash_tfm_digest() and use it wherever possible.
   - Fix use-after-free and race in crypto_spawn_alg.
   - Add support for parallel and batch requests to crypto_engine.

  Algorithms:
   - Update jitter RNG for SP800-90B compliance.
   - Always use jitter RNG as seed in drbg.

  Drivers:
   - Add Arm CryptoCell driver cctrng.
   - Add support for SEV-ES to the PSP driver in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (114 commits)
  crypto: hisilicon - fix driver compatibility issue with different versions of devices
  crypto: engine - do not requeue in case of fatal error
  crypto: cavium/nitrox - Fix a typo in a comment
  crypto: hisilicon/qm - change debugfs file name from qm_regs to regs
  crypto: hisilicon/qm - add DebugFS for xQC and xQE dump
  crypto: hisilicon/zip - add debugfs for Hisilicon ZIP
  crypto: hisilicon/hpre - add debugfs for Hisilicon HPRE
  crypto: hisilicon/sec2 - add debugfs for Hisilicon SEC
  crypto: hisilicon/qm - add debugfs to the QM state machine
  crypto: hisilicon/qm - add debugfs for QM
  crypto: stm32/crc32 - protect from concurrent accesses
  crypto: stm32/crc32 - don't sleep in runtime pm
  crypto: stm32/crc32 - fix multi-instance
  crypto: stm32/crc32 - fix run-time self test issue.
  crypto: stm32/crc32 - fix ext4 chksum BUG_ON()
  crypto: hisilicon/zip - Use temporary sqe when doing work
  crypto: hisilicon - add device error report through abnormal irq
  crypto: hisilicon - remove codes of directly report device errors through MSI
  crypto: hisilicon - QM memory management optimization
  crypto: hisilicon - unify initial value assignment into QM
  ...
2020-06-01 12:00:10 -07:00
Linus Torvalds
8fc984aedc A pile of x86 fixes:
- Prevent a memory leak in ioperm which was caused by the stupid
     assumption that the exit cleanup is always called for current, which is
     not the case when fork fails after taking a reference on the ioperm
     bitmap.
 
   - Fix an arithmething overflow in the DMA code on 32bit systems
 
   - Fill gaps in the xstate copy with defaults instead of leaving them
     uninitialized
 
   - Revert: o"Make __X32_SYSCALL_BIT be unsigned long" as it turned out
     that existing user space fails to build.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7Tt8YTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYobtTEACukhGsuivgiTwltWuHcATqrcNbgHSu
 nnhuQrjJ8KJiF5O60nDztPAVzxD+Ww2tzuDnD1BLFDI9cEA5oPhzXf7kUuJvrYUK
 INY+OALPPpw2iWjmygIsEyw3Pzmnm6peRA4h5UZSZdFxdROGGwBeGYNxowuVWFiH
 X7Fa1J4QxTI7e2X3psDVz94bOnVTPRPAR2bNpX8K8Qs+Wn1FFO92LFU04EvJTCHe
 JdN73VAS+0o0qPlPMewiuyfxaHexc8eJySMdOiysPnGRy+vagyyMPOV2Kg0DD6bp
 caDxCXNjIxXlRExV6F75s8hnl42DwXzLSzY/G7L/HVJ5r3voqcREYtXHgfenl7Jg
 8o6tEi+qFduPJ6SuRjfjPBDBF4wJvcjgmCwJaPJbMkrg8p5jH9Xg35egmEMo9cF8
 JQa2RzWJTR9XUjuPAuHJZR6f9jnle01PCznmw7Mavoed82udW1Lo32+QnvWsx6Qq
 4uuV38FqK3lsVCfFjyZir9OB9DGeuT/NETs3WJuGW5QUnC1mqfvIYipL3BkxNMKP
 IBB7n5X2iCJ545JkydepXF2I+b/i8XhNcIwYMVoSbZzBKccwCZ7zxHFNj6YAWG+M
 TN77x/+lw5zbnxhL3YzK+fgPNLio/By4Zcpmq6uppaf9Ip67SJGVq22Ef3S0w8vG
 X1inh1zqLX9hsQ==
 =DmSb
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A pile of x86 fixes:

   - Prevent a memory leak in ioperm which was caused by the stupid
     assumption that the exit cleanup is always called for current,
     which is not the case when fork fails after taking a reference on
     the ioperm bitmap.

   - Fix an arithmething overflow in the DMA code on 32bit systems

   - Fill gaps in the xstate copy with defaults instead of leaving them
     uninitialized

   - Revert: "Make __X32_SYSCALL_BIT be unsigned long" as it turned out
     that existing user space fails to build"

* tag 'x86-urgent-2020-05-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ioperm: Prevent a memory leak when fork fails
  x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
  copy_xstate_to_kernel(): don't leave parts of destination uninitialized
  x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
2020-05-31 10:45:11 -07:00
Ingo Molnar
aa61b7bb00 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into x86/urgent
Pick up FPU register dump fixes from Al Viro.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-05-29 11:37:11 +02:00
Jay Lang
4bfe6cce13 x86/ioperm: Prevent a memory leak when fork fails
In the copy_process() routine called by _do_fork(), failure to allocate
a PID (or further along in the function) will trigger an invocation to
exit_thread(). This is done to clean up from an earlier call to
copy_thread_tls(). Naturally, the child task is passed into exit_thread(),
however during the process, io_bitmap_exit() nullifies the parent's
io_bitmap rather than the child's.

As copy_thread_tls() has been called ahead of the failure, the reference
count on the calling thread's io_bitmap is incremented as we would expect.
However, io_bitmap_exit() doesn't accept any arguments, and thus assumes
it should trash the current thread's io_bitmap reference rather than the
child's. This is pretty sneaky in practice, because in all instances but
this one, exit_thread() is called with respect to the current task and
everything works out.

A determined attacker can issue an appropriate ioctl (i.e. KDENABIO) to
get a bitmap allocated, and force a clone3() syscall to fail by passing
in a zeroed clone_args structure. The kernel handles the erroneous struct
and the buggy code path is followed, and even though the parent's reference
to the io_bitmap is trashed, the child still holds a reference and thus
the structure will never be freed.

Fix this by tweaking io_bitmap_exit() and its subroutines to accept a
task_struct argument which to operate on.

Fixes: ea5f1cd7ab ("x86/ioperm: Remove bitmap if all permissions dropped")
Signed-off-by: Jay Lang <jaytlang@mit.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable#@vger.kernel.org
Link: https://lkml.kernel.org/r/20200524162742.253727-1-jaytlang@mit.edu
2020-05-28 21:36:20 +02:00
Alexander Dahl
8874347066 x86/dma: Fix max PFN arithmetic overflow on 32 bit systems
The intermediate result of the old term (4UL * 1024 * 1024 * 1024) is
4 294 967 296 or 0x100000000 which is no problem on 64 bit systems.
The patch does not change the later overall result of 0x100000 for
MAX_DMA32_PFN (after it has been shifted by PAGE_SHIFT). The new
calculation yields the same result, but does not require 64 bit
arithmetic.

On 32 bit systems the old calculation suffers from an arithmetic
overflow in that intermediate term in braces: 4UL aka unsigned long int
is 4 byte wide and an arithmetic overflow happens (the 0x100000000 does
not fit in 4 bytes), the in braces result is truncated to zero, the
following right shift does not alter that, so MAX_DMA32_PFN evaluates to
0 on 32 bit systems.

That wrong value is a problem in a comparision against MAX_DMA32_PFN in
the init code for swiotlb in pci_swiotlb_detect_4gb() to decide if
swiotlb should be active.  That comparison yields the opposite result,
when compiling on 32 bit systems.

This was not possible before

  1b7e03ef75 ("x86, NUMA: Enable emulation on 32bit too")

when that MAX_DMA32_PFN was first made visible to x86_32 (and which
landed in v3.0).

In practice this wasn't a problem, unless CONFIG_SWIOTLB is active on
x86-32.

However if one has set CONFIG_IOMMU_INTEL, since

  c5a5dc4cbb ("iommu/vt-d: Don't switch off swiotlb if bounce page is used")

there's a dependency on CONFIG_SWIOTLB, which was not necessarily
active before. That landed in v5.4, where we noticed it in the fli4l
Linux distribution. We have CONFIG_IOMMU_INTEL active on both 32 and 64
bit kernel configs there (I could not find out why, so let's just say
historical reasons).

The effect is at boot time 64 MiB (default size) were allocated for
bounce buffers now, which is a noticeable amount of memory on small
systems like pcengines ALIX 2D3 with 256 MiB memory, which are still
frequently used as home routers.

We noticed this effect when migrating from kernel v4.19 (LTS) to v5.4
(LTS) in fli4l and got that kernel messages for example:

  Linux version 5.4.22 (buildroot@buildroot) (gcc version 7.3.0 (Buildroot 2018.02.8))  SMP Mon Nov 26 23:40:00 CET 2018
  …
  Memory: 183484K/261756K available (4594K kernel code, 393K rwdata, 1660K rodata, 536K init, 456K bss , 78272K reserved, 0K cma-reserved, 0K highmem)
  …
  PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
  software IO TLB: mapped [mem 0x0bb78000-0x0fb78000] (64MB)

The initial analysis and the suggested fix was done by user 'sourcejedi'
at stackoverflow and explicitly marked as GPLv2 for inclusion in the
Linux kernel:

  https://unix.stackexchange.com/a/520525/50007

The new calculation, which does not suffer from that overflow, is the
same as for arch/mips now as suggested by Robin Murphy.

The fix was tested by fli4l users on round about two dozen different
systems, including both 32 and 64 bit archs, bare metal and virtualized
machines.

 [ bp: Massage commit message. ]

Fixes: 1b7e03ef75 ("x86, NUMA: Enable emulation on 32bit too")
Reported-by: Alan Jenkins <alan.christopher.jenkins@gmail.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexander Dahl <post@lespocky.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org
Link: https://unix.stackexchange.com/q/520065/50007
Link: https://web.nettworks.org/bugs/browse/FFL-2560
Link: https://lkml.kernel.org/r/20200526175749.20742-1-post@lespocky.de
2020-05-28 20:21:32 +02:00
Al Viro
9e46365459 copy_xstate_to_kernel(): don't leave parts of destination uninitialized
copy the corresponding pieces of init_fpstate into the gaps instead.

Cc: stable@kernel.org
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-05-27 17:06:31 -04:00
Krzysztof Kozlowski
ed3119e455 x86: Hide the archdata.iommu field behind generic IOMMU_API
There is a generic, kernel wide configuration symbol for enabling the
IOMMU specific bits: CONFIG_IOMMU_API.  Implementations (including
INTEL_IOMMU and AMD_IOMMU driver) select it so use it here as well.

This makes the conditional archdata.iommu field consistent with other
platforms and also fixes any compile test builds of other IOMMU drivers,
when INTEL_IOMMU or AMD_IOMMU are not selected).

For the case when INTEL_IOMMU/AMD_IOMMU and COMPILE_TEST are not
selected, this should create functionally equivalent code/choice.  With
COMPILE_TEST this field could appear if other IOMMU drivers are chosen
but neither INTEL_IOMMU nor AMD_IOMMU are not.

Reported-by: kbuild test robot <lkp@intel.com>
Fixes: e93a1695d7 ("iommu: Enable compile testing for some of drivers")
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20200518120855.27822-2-krzk@kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2020-05-27 16:44:05 +02:00
Andy Lutomirski
700d3a5a66 x86/syscalls: Revert "x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long"
Revert

  45e29d119e ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")

and add a comment to discourage someone else from making the same
mistake again.

It turns out that some user code fails to compile if __X32_SYSCALL_BIT
is unsigned long. See, for example [1] below.

 [ bp: Massage and do the same thing in the respective tools/ header. ]

Fixes: 45e29d119e ("x86/syscalls: Make __X32_SYSCALL_BIT be unsigned long")
Reported-by: Thorsten Glaser <t.glaser@tarent.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: stable@kernel.org
Link: [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954294
Link: https://lkml.kernel.org/r/92e55442b744a5951fdc9cfee10badd0a5f7f828.1588983892.git.luto@kernel.org
2020-05-26 16:42:43 +02:00
Linus Torvalds
98790bbac4 A set of EFI fixes:
- Don't return a garbage screen info when EFI framebuffer is not available
 
  - Make the early EFI console work proper with wider fonts instead of drawing
    garbage
 
  - Prevent a memory buffer leak in allocate_e820()
 
  - Print the firmware error record proper so it can be decoded by users
 
  - Fix a symbol clash in the host tool build which only happens with newer
    compilers.
 
  - Add a missing check for the event log version of TPM which caused boot
    fails on several Dell systems due to an attempt to decode SHA-1 format
    with the crypto agile algorithm
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7KiA4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoS+WD/93Pd1AyO2wX8EBp7hKMFIof2fUlFGd
 yErHZibCZzTbvxsN+g1WNJBI7/9OjCcU6rG383ky4hZzZZ/pjtLhOSO08Q9mGsIg
 8lTQWozBWTjz73uHi4XItF9UkXy3QG7CwWFa3wgpD/VxTEgcEblczDJ+YU9TNX/L
 cA2TnSmjey+Kh6BbBMp1mKIpFN+hryAv70d2qaJJJkEolNDAkDpRfefaVdlbm/2E
 8gqNXaETn7nhaJpyHEqjipOAKYsTHbASfB3NLXjh+R88em5v0LGg/EZ9UTgFAq2x
 kQZr/O3wgOZ5ahzhtCcTx8VyTVE7AFxJqi8MkaTEYzQeFwK5xBJgf/tpJ/2CH7aj
 S0/dDZ4U3hyG29DI6BKDAQOxX5H315Q/7FBLLVHXGXO4VEz18qowZN4WkemPB27m
 5jt07TCHp+tf3TAaunjmISrUrh8ZD4SB6gnMHKXM2x7t80hxoqm7gI6Cf4tdGB3S
 DK6uYMydG5ecdmtrgZzWWDVX42D6vfKkVdYuAZlrBaZ6gFGs4WM2vmDDozmx5MRk
 znFr5hjjVoXyBwTs0UavBeSCOlB0/ifXICzg0Ba5/wG1Li9DUX3KwG7mlWVJnyfo
 r/CryLmeIEZ7JPl60+gXT3Nnd6dTgiA4EcR53HhPEbSoJ+58ITcuxPm4lCRdesJK
 QLlF4Yye/nn14Q==
 =BiIm
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:
 "A set of EFI fixes:

   - Don't return a garbage screen info when EFI framebuffer is not
     available

   - Make the early EFI console work properly with wider fonts instead
     of drawing garbage

   - Prevent a memory buffer leak in allocate_e820()

   - Print the firmware error record properly so it can be decoded by
     users

   - Fix a symbol clash in the host tool build which only happens with
     newer compilers.

   - Add a missing check for the event log version of TPM which caused
     boot failures on several Dell systems due to an attempt to decode
     SHA-1 format with the crypto agile algorithm"

* tag 'efi-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tpm: check event log version before reading final events
  efi: Pull up arch-specific prototype efi_systab_show_arch()
  x86/boot: Mark global variables as static
  efi: cper: Add support for printing Firmware Error Record Reference
  efi/libstub/x86: Avoid EFI map buffer alloc in allocate_e820()
  efi/earlycon: Fix early printk for wider fonts
  efi/libstub: Avoid returning uninitialized data from setup_graphics()
2020-05-24 10:24:10 -07:00
Linus Torvalds
667b6249b7 Two fixes for x86:
- Unbreak stack dumps for inactive tasks by interpreting the special
     first frame left by __switch_to_asm() correctly. The recent change not
     to skip the first frame so ORC and frame unwinder behave in the same
     way caused all entries to be unreliable, i.e. prepended with '?'.
 
   - Use cpumask_available() instead of an implicit NULL check of a
     cpumask_var_t in mmio trace to prevent a Clang build warning
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7KjZITHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYofGED/9q61QWzA7WULpps2UA1JDa8JGvxwIl
 Z/juNskVAXWZRlBOACPD7mZNz3tkfHnl62igHIwUNlEddSANQ22c/7Yt74w+lCcD
 KtPVVx/zdnO2nt5HVekRT6D9pKfD8cfSF4X2k2/HF6u8hQGoqWGv2BVBuarNurWE
 3CIFtLbNvBhjI4WdzK7Y0IfcINSkcyABQn1+9Id8mwH8XOStl1aaIMY7hxlpj9e4
 mXoQtkbRXnTbv6Asw6Obb1F/7AtCdaDrNqfBCA0Juv4fJzPQOMgZSWEX6OtZny6E
 8vsDsSCYOY4wcGYH5CBJd3n48UOsrWbT+7yNLeAnE7ZaZzc0pdi0g2NVWUGuhYa3
 EbPzvj+kPgcVsfpfasts9KRAR57GNysKD8MLZGqaST9MAB7EKLbPWEDnhNnsAnU5
 3KNFEbfB16CyJmztlE2YCT6nNJ3rzaOtcDiRmJduf0Ib9PEEkPaaX85DfO0Yabnn
 QilGsYbkdux+UTQUtZg6+HPsikcKiN46hOLrSXXu1O+iMDxhL/mq/79hNrO9hffI
 idV+js2nxv9tC30MMczMdPuUX4nOHs26IMZObdV88gDMV9n9TGkW+XinoJBi/+er
 3xuDQw6aRqpolMmUVFhBLV0gYTB2+J0zc3eawa5c6U6B9avc4j4KxkNVIfrLiRkK
 3brABHq+di44MA==
 =COcb
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixes for x86:

   - Unbreak stack dumps for inactive tasks by interpreting the special
     first frame left by __switch_to_asm() correctly.

     The recent change not to skip the first frame so ORC and frame
     unwinder behave in the same way caused all entries to be
     unreliable, i.e. prepended with '?'.

   - Use cpumask_available() instead of an implicit NULL check of a
     cpumask_var_t in mmio trace to prevent a Clang build warning"

* tag 'x86-urgent-2020-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
  x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
2020-05-24 10:21:02 -07:00
Nick Desaulniers
c071b0f11e x86: bitops: fix build regression
This is easily reproducible via CC=clang + CONFIG_STAGING=y +
CONFIG_VT6656=m.

It turns out that if your config tickles __builtin_constant_p via
differences in choices to inline or not, these statements produce
invalid assembly:

    $ cat foo.c
    long a(long b, long c) {
      asm("orb	%1, %0" : "+q"(c): "r"(b));
      return c;
    }
    $ gcc foo.c
    foo.c: Assembler messages:
    foo.c:2: Error: `%rax' not allowed with `orb'

Use the `%b` "x86 Operand Modifier" to instead force register allocation
to select a lower-8-bit GPR operand.

The "q" constraint only has meaning on -m32 otherwise is treated as
"r".  Not all GPRs have low-8-bit aliases for -m32.

Fixes: 1651e70066 ("x86: Fix bitops.h warning with a moved cast")
Reported-by: kernelci.org bot <bot@kernelci.org>
Suggested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
Suggested-by: Ilie Halip <ilie.halip@gmail.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>	[build, clang-11]
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-By: Brian Gerst <brgerst@gmail.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Marco Elver <elver@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20200508183230.229464-1-ndesaulniers@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/961
Link: https://lore.kernel.org/lkml/20200504193524.GA221287@google.com/
Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-23 10:26:31 -07:00
Borislav Petkov
9bb4cbf486 EFI fixes for v5.7-rc6:
- fix EFI framebuffer earlycon for wide fonts
 - avoid filling screen_info with garbage if the EFI framebuffer is not
   available
 - fix a potential host tool build error due to a symbol clash on x86
 - work around a EFI firmware bug regarding the binary format of the TPM
   final events table
 - fix a missing memory free by reworking the E820 table sizing routine to
   not do the allocation in the first place
 - add CPER parsing for firmware errors
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEnNKg2mrY9zMBdeK7wjcgfpV0+n0FAl7H3HIACgkQwjcgfpV0
 +n1pEAgAjJfwDJmBcYhJzjX8WLnXPJiUmUH9d9tF1t3TlhF6c1G8auXU+Fyia4uI
 ejRNw/N4+SXzM9yL+Z19PKBpQsPzQXgm2r9WTPVN5jTelUUI+jFZCH+pKC+TKRp1
 /Tx/XIMifCw18gNXsjj6WJEeAyLoh4tb+6bwn7DlPO5cPrxX49LvPuQNMXybk2yi
 KimdNKUry1wYpo/WpHqEdFq5//CLAWNkrL9UXlkANvQ6BJNIMI0kRIUC0MVsTMnE
 BoCkBO93PdvqxOcnV3WTRvSFetb7qA59Jay62jLc26Myqc4t4pgVWojVm6RHLfZg
 17btYACxICgF2mNTZYlKemEEqKPpzQ==
 =mY5f
 -----END PGP SIGNATURE-----

Merge tag 'efi-fixes-for-v5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi into efi/urgent

Pull EFI fixes from Ard Biesheuvel:

"- fix EFI framebuffer earlycon for wide fonts
 - avoid filling screen_info with garbage if the EFI framebuffer is not
   available
 - fix a potential host tool build error due to a symbol clash on x86
 - work around a EFI firmware bug regarding the binary format of the TPM
   final events table
 - fix a missing memory free by reworking the E820 table sizing routine to
   not do the allocation in the first place
 - add CPER parsing for firmware errors"
2020-05-22 20:06:25 +02:00
Josh Poimboeuf
187b96db5c x86/unwind/orc: Fix unwind_get_return_address_ptr() for inactive tasks
Normally, show_trace_log_lvl() scans the stack, looking for text
addresses to print.  In parallel, it unwinds the stack with
unwind_next_frame().  If the stack address matches the pointer returned
by unwind_get_return_address_ptr() for the current frame, the text
address is printed normally without a question mark.  Otherwise it's
considered a breadcrumb (potentially from a previous call path) and it's
printed with a question mark to indicate that the address is unreliable
and typically can be ignored.

Since the following commit:

  f1d9a2abff ("x86/unwind/orc: Don't skip the first frame for inactive tasks")

... for inactive tasks, show_trace_log_lvl() prints *only* unreliable
addresses (prepended with '?').

That happens because, for the first frame of an inactive task,
unwind_get_return_address_ptr() returns the wrong return address
pointer: one word *below* the task stack pointer.  show_trace_log_lvl()
starts scanning at the stack pointer itself, so it never finds the first
'reliable' address, causing only guesses to being printed.

The first frame of an inactive task isn't a normal stack frame.  It's
actually just an instance of 'struct inactive_task_frame' which is left
behind by __switch_to_asm().  Now that this inactive frame is actually
exposed to callers, fix unwind_get_return_address_ptr() to interpret it
properly.

Fixes: f1d9a2abff ("x86/unwind/orc: Don't skip the first frame for inactive tasks")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200522135435.vbxs7umku5pyrdbk@treble
2020-05-22 19:55:17 +02:00
Linus Torvalds
97076ea41a hyperv-fixes for 5.7-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl7Dri8THHdlaS5saXVA
 a2VybmVsLm9yZwAKCRB2FHBfkEGgXtWvB/wIE86Nni/FpiRVGSaDYQDduGgMvfxY
 yBRkKw1NdQMIJCkl/63XUcpT1U1lhGolk18CMIBo3ZSLv5xLrZDfDHaD2oTZG6lu
 WfL3lbKcYTsF+cpBm1DkBx7p32cXGDXQ/c5UZOXQEZVPtMI9U+HGg8iRfMPnPzTQ
 eb6o4T7HLNlx9WWHJzx/QbB7MZ+qOyb78EFO60FEJXA/lqbabzaAgQaz8inRKu8d
 70ed5Sl4mUt12GZ2a9KlvdliWBFKf/sv/Rs6VBeBpTByrGJazzlGKBQHMO1oUrme
 Mg3+OoCTZlFwGgkjb/0TCrR0EkVkkxTrU9EYCXg5dQkrLmvgzkPbrCIk
 =CnNw
 -----END PGP SIGNATURE-----

Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

Pull hyperv fix from Wei Liu:
 "One patch from Vitaly to fix reenlightenment notifications"

* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  x86/hyperv: Properly suspend/resume reenlightenment notifications
2020-05-19 11:48:21 -07:00
Nathan Chancellor
d7110a26e5 x86/mmiotrace: Use cpumask_available() for cpumask_var_t variables
When building with Clang + -Wtautological-compare and
CONFIG_CPUMASK_OFFSTACK unset:

  arch/x86/mm/mmio-mod.c:375:6: warning: comparison of array 'downed_cpus'
  equal to a null pointer is always false [-Wtautological-pointer-compare]
          if (downed_cpus == NULL &&
              ^~~~~~~~~~~    ~~~~
  arch/x86/mm/mmio-mod.c:405:6: warning: comparison of array 'downed_cpus'
  equal to a null pointer is always false [-Wtautological-pointer-compare]
          if (downed_cpus == NULL || cpumask_weight(downed_cpus) == 0)
              ^~~~~~~~~~~    ~~~~
  2 warnings generated.

Commit

  f7e30f01a9 ("cpumask: Add helper cpumask_available()")

added cpumask_available() to fix warnings of this nature. Use that here
so that clang does not warn regardless of CONFIG_CPUMASK_OFFSTACK's
value.

Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/982
Link: https://lkml.kernel.org/r/20200408205323.44490-1-natechancellor@gmail.com
2020-05-19 19:30:28 +02:00
Linus Torvalds
ef0d5b9102 A single bugfix for the ORC unwinder to ensure that the error flag which
tells the unwinding code whether a stack trace can be trusted or not is
 always set correctly. This was messed up by a couple of changes in the
 recent past.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7BC+gTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoWFBEACR8MiO0VM2XXNsejd7rttgs/eoC4/M
 IKM5K1hq4eRCTodwVnkWwLk6p0asAMKhzpWQ3MS5RJBNAYxLbbxnsYSGtd8zIsdV
 wk6jbNYeT2MUZq2tYkjn3b9B6+91FFMZq6q+KDOfNPqcKZyP4n5o5QSewznBvQwt
 dHvjGgegJDjrrtuhLSQKG/uvSSi2hN9S5ibSMCa004GnH6P+uk/eICpvUXwNCyjV
 ygogYTmQQqAEqnlqVNdQxo+DFYbaxKCw12VSoBeOsEySljPdc136hP/j7Tzbf2em
 rkqtyXwng1+yG0vozMCAkyP5l3uA+HUculQLdmO8/55eia5Dl/zgsp3SvW7/2ONS
 0DRfGo0ghoZgId1oDu6DGPsX80wKKskerJpTN/tHWTXQWeUXCNXrX//lhrFiwd7P
 mHiyuk+INw3LQBkTlf7XhAf28w/9/+gCm3prEGnUCmLaJOeZ8HtL0mwDzudgc9Ca
 NW/b3tdt4JU3oXKyyqywr4XAYfxlfmyf3DrBMnuHdTgccaB9PAAzugjmDnFJOuzk
 jQw/Qfd6w7ZgVcVoaNQjjeogMTryGthCOPe9DzPUgkr+jCDsMwXopCvxbhbWI9e5
 L1/U5ilka/VC2ZP7qZUvwsltCgp6RamhDb3yLZbn/2PKf0sFKVoI/j/g1qMnLNZt
 TBNjzYuWAC8Hlw==
 =4kDr
 -----END PGP SIGNATURE-----

Merge tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 stack unwinding fix from Thomas Gleixner:
 "A single bugfix for the ORC unwinder to ensure that the error flag
  which tells the unwinding code whether a stack trace can be trusted or
  not is always set correctly.

  This was messed up by a couple of changes in the recent past"

* tag 'objtool-urgent-2020-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Fix error handling in __unwind_start()
2020-05-17 12:20:14 -07:00
Linus Torvalds
43567139f5 A single fix for early boot crashes of kernels built with gcc10 and
stack protector enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl7A+q4ACgkQEsHwGGHe
 VUpvtA/+NNPKVGSKZPdDlUm64JEPy7XrbzFJ+zigWGQjUPtZsDkAT4U33eQIvV5f
 ea7vB2u+e7iRZBExgTI1JfyjTenGpBffhubR/ueawtxeTgvZSopFajHQir/VGPlJ
 KQdtqe2wZek3Wux8BsKl8vcbqhgNH/LKgQzoG2y5P1LuA77MpFkMVkAoxKqbTDbt
 Nx7j147ffZBJHfmUHz2/nWD9r0Exu+abeSPJeO4T52ImhVkr+Pd1nFS8S+mRCHMj
 uJjxL/nB/sZmDDX+EX/zA7Du3ibaVa2po9cuhMTwNIPZIpak8Yyopl64fVm/N7jH
 w0DIc1CgEaA1IkG7lwyKSgB/T6Fsg4SQp8gM4V3BkcTgVDuhTH0J/kGrOk2+YFSc
 akk3420XBS4Q54BQ547woOImabxgQXDBvqBq+DhJFwP1qSllUXbZX7rlwZ3VQ160
 sfmItVM0c4J9bgaXqZuwqHxJdgakaIECkXWZwpksQAzVxaOKpZo7drLq6SDhX9HH
 BZdm/5AhIJ5rIGaiMXsZj5cC+H341N5TlaXA+I2b0r/vVOLtbe3it1rbSsvMoZJQ
 7WOesyqFSjSObDUpXZ0riLl1X+rdrCAfzHsm5IMwLAoxmv80973johZKNZIgqIoh
 CbPdyvaJoNK8FK6gT7bw3HNJ1ILGqk53jpWH1Gr1MlfzSzErOdQ=
 =5Xi5
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Borislav Petkov:
 "A single fix for early boot crashes of kernels built with gcc10 and
  stack protector enabled"

* tag 'x86_urgent_for_v5.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Fix early boot crash on gcc-10, third try
2020-05-17 11:08:29 -07:00
Linus Torvalds
5d438e071f A new testcase for guest debugging (gdbstub) that exposed a bunch of
bugs, mostly for AMD processors.  And a few other x86 fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6/0xcUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOZuwf/bQZw/SP9awLjOOVsRaSWUmwRGD4q
 6KVq9+JYsPU4CyJ7P+vdsFF39a0ixoAnKWqRe/vsXdXZrdYCDUuQxh+7X+lmjKAb
 dCQBnoqxI0w3yuxrm9Kn6Xs1AGIWibaRlZnXUKbuyn4ecFrh08OfYKGkYsEovhxK
 G4ftY4/xyM7Qvm0fq7ZmzxPrkzd74HDZBvB83R6uiyPiX3w4O9qumqkUogcVXIJX
 l3mnvSPClDDX4FOr8uhnU93varuR7Bek4Fh+Abj4uNks/F3z9ooJO9Hy9E+V5fhY
 g6Oj2IrxDwJ2G6hqyucr1kujukJC1bX2nMZ1O4gNayXsxZEU/JtI0Y26SA==
 =EzBt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A new testcase for guest debugging (gdbstub) that exposed a bunch of
  bugs, mostly for AMD processors. And a few other x86 fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
  KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
  KVM: SVM: Disable AVIC before setting V_IRQ
  KVM: Introduce kvm_make_all_cpus_request_except()
  KVM: VMX: pass correct DR6 for GD userspace exit
  KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
  KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
  KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on
  KVM: selftests: Add KVM_SET_GUEST_DEBUG test
  KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG
  KVM: X86: Set RTM for DB_VECTOR too for KVM_EXIT_DEBUG
  KVM: x86: fix DR6 delivery for various cases of #DB injection
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
2020-05-16 13:39:22 -07:00
Linus Torvalds
f85c1598dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix sk_psock reference count leak on receive, from Xiyu Yang.

 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.

 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
    from Maciej Żenczykowski.

 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
    Abeni.

 5) Fix netprio cgroup v2 leak, from Zefan Li.

 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.

 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.

 8) Various bpf probe handling fixes, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
  selftests: mptcp: pm: rm the right tmp file
  dpaa2-eth: properly handle buffer size restrictions
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
  bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  MAINTAINERS: Mark networking drivers as Maintained.
  ipmr: Add lockdep expression to ipmr_for_each_table macro
  ipmr: Fix RCU list debugging warning
  drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
  net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
  tcp: fix error recovery in tcp_zerocopy_receive()
  MAINTAINERS: Add Jakub to networking drivers.
  MAINTAINERS: another add of Karsten Graul for S390 networking
  drivers: ipa: fix typos for ipa_smp2p structure doc
  pppoe: only process PADT targeted at local interfaces
  selftests/bpf: Enforce returning 0 for fentry/fexit programs
  bpf: Enforce returning 0 for fentry/fexit progs
  net: stmmac: fix num_por initialization
  security: Fix the default value of secid_to_secctx hook
  libbpf: Fix register naming in PT_REGS s390 macros
  ...
2020-05-15 13:10:06 -07:00
Jim Mattson
c4e0e4ab4c KVM: x86: Fix off-by-one error in kvm_vcpu_ioctl_x86_setup_mce
Bank_num is a one-based count of banks, not a zero-based index. It
overflows the allocated space only when strictly greater than
KVM_MAX_MCE_BANKS.

Fixes: a9e38c3e01 ("KVM: x86: Catch potential overrun in MCE setup")
Signed-off-by: Jue Wang <juew@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200511225616.19557-1-jmattson@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-15 13:48:56 -04:00
Daniel Borkmann
0ebeea8ca8 bpf: Restrict bpf_probe_read{, str}() only to archs where they work
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.

To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3de ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").

Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.

However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).

For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
2020-05-15 08:10:36 -07:00
Borislav Petkov
a9a3ed1eff x86: Fix early boot crash on gcc-10, third try
... or the odyssey of trying to disable the stack protector for the
function which generates the stack canary value.

The whole story started with Sergei reporting a boot crash with a kernel
built with gcc-10:

  Kernel panic — not syncing: stack-protector: Kernel stack is corrupted in: start_secondary
  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.6.0-rc5—00235—gfffb08b37df9 
  Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M—D3H, BIOS F12 11/14/2013
  Call Trace:
    dump_stack
    panic
    ? start_secondary
    __stack_chk_fail
    start_secondary
    secondary_startup_64
  -—-[ end Kernel panic — not syncing: stack—protector: Kernel stack is corrupted in: start_secondary

This happens because gcc-10 tail-call optimizes the last function call
in start_secondary() - cpu_startup_entry() - and thus emits a stack
canary check which fails because the canary value changes after the
boot_init_stack_canary() call.

To fix that, the initial attempt was to mark the one function which
generates the stack canary with:

  __attribute__((optimize("-fno-stack-protector"))) ... start_secondary(void *unused)

however, using the optimize attribute doesn't work cumulatively
as the attribute does not add to but rather replaces previously
supplied optimization options - roughly all -fxxx options.

The key one among them being -fno-omit-frame-pointer and thus leading to
not present frame pointer - frame pointer which the kernel needs.

The next attempt to prevent compilers from tail-call optimizing
the last function call cpu_startup_entry(), shy of carving out
start_secondary() into a separate compilation unit and building it with
-fno-stack-protector, was to add an empty asm("").

This current solution was short and sweet, and reportedly, is supported
by both compilers but we didn't get very far this time: future (LTO?)
optimization passes could potentially eliminate this, which leads us
to the third attempt: having an actual memory barrier there which the
compiler cannot ignore or move around etc.

That should hold for a long time, but hey we said that about the other
two solutions too so...

Reported-by: Sergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kalle Valo <kvalo@codeaurora.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200314164451.346497-1-slyfox@gentoo.org
2020-05-15 11:48:01 +02:00
Josh Poimboeuf
71c9582528 x86/unwind/orc: Fix error handling in __unwind_start()
The unwind_state 'error' field is used to inform the reliable unwinding
code that the stack trace can't be trusted.  Set this field for all
errors in __unwind_start().

Also, move the zeroing out of the unwind_state struct to before the ORC
table initialization check, to prevent the caller from reading
uninitialized data if the ORC table is corrupted.

Fixes: af085d9084 ("stacktrace/x86: add function for detecting reliable stack traces")
Fixes: d3a0910401 ("x86/unwinder/orc: Dont bail on stack overflow")
Fixes: 98d0c8ebf7 ("x86/unwind/orc: Prevent unwinding before ORC initialization")
Reported-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d6ac7215a84ca92b895fdd2e1aa546729417e6e6.1589487277.git.jpoimboe@redhat.com
2020-05-15 10:35:08 +02:00
Linus Torvalds
f44d5c4890 Various tracing fixes:
- Fix a crash when having function tracing and function stack tracing on
    the command line. The ftrace trampolines are created as executable and
    read only. But the stack tracer tries to modify them with text_poke()
    which expects all kernel text to still be writable at boot.
    Keep the trampolines writable at boot, and convert them to read-only
    with the rest of the kernel.
 
  - A selftest was triggering in the ring buffer iterator code, that
    is no longer valid with the update of keeping the ring buffer
    writable while a iterator is reading. Just bail after three failed
    attempts to get an event and remove the warning and disabling of the
    ring buffer.
 
  - While modifying the ring buffer code, decided to remove all the
    unnecessary BUG() calls.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXr1CDhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qsXcAQCoL229SBrtHsn4DUO7eAQRppUT3hNw
 RuKzvQ56+1GccQEAh8VGCeg89uMSK6imrTujEl6VmOUdbgrD5R96yiKoGQw=
 =vi+k
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull more tracing fixes from Steven Rostedt:
 "Various tracing fixes:

   - Fix a crash when having function tracing and function stack tracing
     on the command line.

     The ftrace trampolines are created as executable and read only. But
     the stack tracer tries to modify them with text_poke() which
     expects all kernel text to still be writable at boot. Keep the
     trampolines writable at boot, and convert them to read-only with
     the rest of the kernel.

   - A selftest was triggering in the ring buffer iterator code, that is
     no longer valid with the update of keeping the ring buffer writable
     while a iterator is reading.

     Just bail after three failed attempts to get an event and remove
     the warning and disabling of the ring buffer.

   - While modifying the ring buffer code, decided to remove all the
     unnecessary BUG() calls"

* tag 'trace-v5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Remove all BUG() calls
  ring-buffer: Don't deactivate the ring buffer on failed iterator reads
  x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
2020-05-14 11:46:52 -07:00
Arvind Sankar
e78d334a54 x86/boot: Mark global variables as static
Mike Lothian reports that after commit
  964124a97b ("efi/x86: Remove extra headroom for setup block")
gcc 10.1.0 fails with

  HOSTCC  arch/x86/boot/tools/build
  /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
  error: linker defined: multiple definition of '_end'
  /usr/lib/gcc/x86_64-pc-linux-gnu/10.1.0/../../../../x86_64-pc-linux-gnu/bin/ld:
  /tmp/ccEkW0jM.o: previous definition here
  collect2: error: ld returned 1 exit status
  make[1]: *** [scripts/Makefile.host:103: arch/x86/boot/tools/build] Error 1
  make: *** [arch/x86/Makefile:303: bzImage] Error 2

The issue is with the _end variable that was added, to hold the end of
the compressed kernel from zoffsets.h (ZO__end). The name clashes with
the linker-defined _end symbol that indicates the end of the build
program itself.

Even when there is no compile-time error, this causes build to use
memory past the end of its .bss section.

To solve this, mark _end as static, and for symmetry, mark the rest of
the variables that keep track of symbols from the compressed kernel as
static as well.

Fixes: 964124a97b ("efi/x86: Remove extra headroom for setup block")
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20200511225849.1311869-1-nivedita@alum.mit.edu
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2020-05-14 11:11:20 +02:00
Babu Moger
37486135d3 KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c
Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-13 11:27:41 -04:00
Vitaly Kuznetsov
38dce4195f x86/hyperv: Properly suspend/resume reenlightenment notifications
Errors during hibernation with reenlightenment notifications enabled were
reported:

 [   51.730435] PM: hibernation entry
 [   51.737435] PM: Syncing filesystems ...
 ...
 [   54.102216] Disabling non-boot CPUs ...
 [   54.106633] smpboot: CPU 1 is now offline
 [   54.110006] unchecked MSR access error: WRMSR to 0x40000106 (tried to
     write 0x47c72780000100ee) at rIP: 0xffffffff90062f24
     native_write_msr+0x4/0x20)
 [   54.110006] Call Trace:
 [   54.110006]  hv_cpu_die+0xd9/0xf0
 ...

Normally, hv_cpu_die() just reassigns reenlightenment notifications to some
other CPU when the CPU receiving them goes offline. Upon hibernation, there
is no other CPU which is still online so cpumask_any_but(cpu_online_mask)
returns >= nr_cpu_ids and using it as hv_vp_index index is incorrect.
Disable the feature when cpumask_any_but() fails.

Also, as we now disable reenlightenment notifications upon hibernation we
need to restore them on resume. Check if hv_reenlightenment_cb was
previously set and restore from hv_resume().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20200512160153.134467-1-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
2020-05-13 15:02:03 +00:00
Steven Rostedt (VMware)
59566b0b62 x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
Booting one of my machines, it triggered the following crash:

 Kernel/User page tables isolation: enabled
 ftrace: allocating 36577 entries in 143 pages
 Starting tracer 'function'
 BUG: unable to handle page fault for address: ffffffffa000005c
 #PF: supervisor write access in kernel mode
 #PF: error_code(0x0003) - permissions violation
 PGD 2014067 P4D 2014067 PUD 2015063 PMD 7b253067 PTE 7b252061
 Oops: 0003 [] PREEMPT SMP PTI
 CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0-test+ 
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
 RIP: 0010:text_poke_early+0x4a/0x58
 Code: 34 24 48 89 54 24 08 e8 bf 72 0b 00 48 8b 34 24 48 8b 4c 24 08 84 c0 74 0b 48 89 df f3 a4 48 83 c4 10 5b c3 9c 58 fa 48 89 df <f3> a4 50 9d 48 83 c4 10 5b e9 d6 f9 ff ff
0 41 57 49
 RSP: 0000:ffffffff82003d38 EFLAGS: 00010046
 RAX: 0000000000000046 RBX: ffffffffa000005c RCX: 0000000000000005
 RDX: 0000000000000005 RSI: ffffffff825b9a90 RDI: ffffffffa000005c
 RBP: ffffffffa000005c R08: 0000000000000000 R09: ffffffff8206e6e0
 R10: ffff88807b01f4c0 R11: ffffffff8176c106 R12: ffffffff8206e6e0
 R13: ffffffff824f2440 R14: 0000000000000000 R15: ffffffff8206eac0
 FS:  0000000000000000(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffa000005c CR3: 0000000002012000 CR4: 00000000000006b0
 Call Trace:
  text_poke_bp+0x27/0x64
  ? mutex_lock+0x36/0x5d
  arch_ftrace_update_trampoline+0x287/0x2d5
  ? ftrace_replace_code+0x14b/0x160
  ? ftrace_update_ftrace_func+0x65/0x6c
  __register_ftrace_function+0x6d/0x81
  ftrace_startup+0x23/0xc1
  register_ftrace_function+0x20/0x37
  func_set_flag+0x59/0x77
  __set_tracer_option.isra.19+0x20/0x3e
  trace_set_options+0xd6/0x13e
  apply_trace_boot_options+0x44/0x6d
  register_tracer+0x19e/0x1ac
  early_trace_init+0x21b/0x2c9
  start_kernel+0x241/0x518
  ? load_ucode_intel_bsp+0x21/0x52
  secondary_startup_64+0xa4/0xb0

I was able to trigger it on other machines, when I added to the kernel
command line of both "ftrace=function" and "trace_options=func_stack_trace".

The cause is the "ftrace=function" would register the function tracer
and create a trampoline, and it will set it as executable and
read-only. Then the "trace_options=func_stack_trace" would then update
the same trampoline to include the stack tracer version of the function
tracer. But since the trampoline already exists, it updates it with
text_poke_bp(). The problem is that text_poke_bp() called while
system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
the page mapping, as it would think that the text is still read-write.
But in this case it is not, and we take a fault and crash.

Instead, lets keep the ftrace trampolines read-write during boot up,
and then when the kernel executable text is set to read-only, the
ftrace trampolines get set to read-only as well.

Link: https://lkml.kernel.org/r/20200430202147.4dc6e2de@oasis.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable@vger.kernel.org
Fixes: 768ae4406a ("x86/ftrace: Use text_poke()")
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-05-12 18:24:34 -04:00
Linus Torvalds
c14cab2688 A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing page
    attributes. The optimization for small ranges failed to do so when
    the virtual address was in the vmalloc or module space.
 
  - Unbreak the trace event registration for syscalls without arguments
    caused by the refactoring of the SYSCALL_DEFINE0() macro.
 
  - Move the printk in the TSC deadline timer code to a place where it is
    guaranteed to only be called once during boot and cannot be rearmed by
    clearing warn_once after boot. If it's invoked post boot then lockdep
    rightfully complains about a potential deadlock as the calling context
    is different.
 
  - A series of fixes for objtool and the ORC unwinder addressing variety
    of small issues:
 
      Stack offset tracking for indirect CFAs in objtool ignored subsequent
      pushs and pops
 
      Repair the unwind hints in the register clearing entry ASM code
 
      Make the unwinding in the low level exit to usermode code stop after
      switching to the trampoline stack. The unwind hint is not longer valid
      and the ORC unwinder emits a warning as it can't find the registers
      anymore.
 
      Fix the unwind hints in switch_to_asm() and rewind_stack_do_exit()
      which caused objtool to generate bogus ORC data.
 
      Prevent unwinder warnings when dumping the stack of a non-current
      task as there is no way to be sure about the validity because the
      dumped stack can be a moving target.
 
      Make the ORC unwinder behave the same way as the frame pointer
      unwinder when dumping an inactive tasks stack and do not skip the
      first frame.
 
      Prevent ORC unwinding before ORC data has been initialized
 
      Immediately terminate unwinding when a unknown ORC entry type is
      found.
 
      Prevent premature stop of the unwinder caused by IRET frames.
 
      Fix another infinite loop in objtool caused by a negative offset which
      was not catched.
 
      Address a few build warnings in the ORC unwinder and add missing
      static/ro_after_init annotations
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6363QTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoRJHD/4hWjzJLsUZ9xq2NrzhevoeJtxj+wVM
 66x9NM3mlFQ30BN4Aye4EnNEhR0iIvNPWWdfEmaJYfPHPwnUjjcOa426HYxP/WXA
 DWd5F20wGaaPOJ65LJpy/+pfcxAeQynt4I2cDEWHAplswfOWV/Hv8mSeKAKuq400
 lCWaTMkWcO/toexSNn8PVyWi9rHlm+76E1bHkVwuoekGBGt1VloKGlK6OPyElzL2
 w9VtrjSLlYQ0MdfCJKQeg44XQPMbf4hZRfc88x9SwDWB01q7aSvb0pWNl9AJKNXA
 7fFu5T4F4PABPgRM7eJ5yNk0De9jM1y+6eCp66f9UXoNOeSr7Boz9Xc4xWqAraIi
 9Dtx3WliO9CAxwUiD+Cj2iJO5o83AdRK/xhCth2VRnYMS6imfSidEqTC+LhEtkzw
 Yplu7sbrWQDa5JTh8vk60clDvbkU+pfdxJisY+KClRguWfQfR6MJNuQnE0NHr7cH
 H4VXFFHEE6tDdJneQ9RxA4iF20RTgSlJGK0YlsH6QsxPsRgoHVkGUao8fQhrNvRc
 MIdpm9YasWStjJ7ZXbDeStmnLFN3DCj1RC8wmvJ4i/R1sPnBvPvRUt4Lm988a951
 Vyr23VIcVrE7zykiqQZVH7bvIv6ULORqTJbIOF1rO/aIut4W8z0ojoVXC0Z7CiwF
 S5SGj+hlWciIew==
 =0rCi
 -----END PGP SIGNATURE-----

Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes for x86:

   - Ensure that direct mapping alias is always flushed when changing
     page attributes. The optimization for small ranges failed to do so
     when the virtual address was in the vmalloc or module space.

   - Unbreak the trace event registration for syscalls without arguments
     caused by the refactoring of the SYSCALL_DEFINE0() macro.

   - Move the printk in the TSC deadline timer code to a place where it
     is guaranteed to only be called once during boot and cannot be
     rearmed by clearing warn_once after boot. If it's invoked post boot
     then lockdep rightfully complains about a potential deadlock as the
     calling context is different.

   - A series of fixes for objtool and the ORC unwinder addressing
     variety of small issues:

       - Stack offset tracking for indirect CFAs in objtool ignored
         subsequent pushs and pops

       - Repair the unwind hints in the register clearing entry ASM code

       - Make the unwinding in the low level exit to usermode code stop
         after switching to the trampoline stack. The unwind hint is no
         longer valid and the ORC unwinder emits a warning as it can't
         find the registers anymore.

       - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
         which caused objtool to generate bogus ORC data.

       - Prevent unwinder warnings when dumping the stack of a
         non-current task as there is no way to be sure about the
         validity because the dumped stack can be a moving target.

       - Make the ORC unwinder behave the same way as the frame pointer
         unwinder when dumping an inactive tasks stack and do not skip
         the first frame.

       - Prevent ORC unwinding before ORC data has been initialized

       - Immediately terminate unwinding when a unknown ORC entry type
         is found.

       - Prevent premature stop of the unwinder caused by IRET frames.

       - Fix another infinite loop in objtool caused by a negative
         offset which was not catched.

       - Address a few build warnings in the ORC unwinder and add
         missing static/ro_after_init annotations"

* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
  x86/apic: Move TSC deadline timer debug printk
  ftrace/x86: Fix trace event registration for syscalls without arguments
  x86/mm/cpa: Flush direct map alias during cpa
  objtool: Fix infinite loop in for_offset_range()
  x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
  x86/unwind/orc: Fix error path for bad ORC entry type
  x86/unwind/orc: Prevent unwinding before ORC initialization
  x86/unwind/orc: Don't skip the first frame for inactive tasks
  x86/unwind: Prevent false warnings for non-current tasks
  x86/unwind/orc: Convert global variables to static
  x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
  x86/entry/64: Fix unwind hints in __switch_to_asm()
  x86/entry/64: Fix unwind hints in kernel exit path
  x86/entry/64: Fix unwind hints in register clearing code
  objtool: Fix stack offset tracking for indirect CFAs
2020-05-10 11:59:53 -07:00
Linus Torvalds
af38553c66 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "14 fixes and one selftest to verify the ipc fixes herein"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: limit boost_watermark on small zones
  ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
  mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
  epoll: atomically remove wait entry on wake up
  kselftests: introduce new epoll60 testcase for catching lost wakeups
  percpu: make pcpu_alloc() aware of current gfp context
  mm/slub: fix incorrect interpretation of s->offset
  scripts/gdb: repair rb_first() and rb_last()
  eventpoll: fix missing wakeup for ovflist in ep_poll_callback
  arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
  scripts/decodecode: fix trapping instruction formatting
  kernel/kcov.c: fix typos in kcov_remote_start documentation
  mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
  mm, memcg: fix error return value of mem_cgroup_css_alloc()
  ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
2020-05-08 08:41:09 -07:00
Suravee Suthikulpanit
7d611233b0 KVM: SVM: Disable AVIC before setting V_IRQ
The commit 64b5bd2704 ("KVM: nSVM: ignore L1 interrupt window
while running L2 with V_INTR_MASKING=1") introduced a WARN_ON,
which checks if AVIC is enabled when trying to set V_IRQ
in the VMCB for enabling irq window.

The following warning is triggered because the requesting vcpu
(to deactivate AVIC) does not get to process APICv update request
for itself until the next #vmexit.

WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd]
 RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd]
 Call Trace:
  kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm]
  ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm]
  ? _copy_to_user+0x26/0x30
  ? kvm_vm_ioctl+0xb3e/0xd90 [kvm]
  ? set_next_entity+0x78/0xc0
  kvm_vcpu_ioctl+0x236/0x610 [kvm]
  ksys_ioctl+0x8a/0xc0
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x58/0x210
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes by sending APICV update request to all other vcpus, and
immediately update APIC for itself.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Link: https://lkml.org/lkml/2020/5/2/167
Fixes: 64b5bd2704 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1")
Message-Id: <1588818939-54264-1-git-send-email-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:32 -04:00
Suravee Suthikulpanit
54163a346d KVM: Introduce kvm_make_all_cpus_request_except()
This allows making request to all other vcpus except the one
specified in the parameter.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:32 -04:00
Paolo Bonzini
45981dedf5 KVM: VMX: pass correct DR6 for GD userspace exit
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM.  Instead,
DR6.BD should be set in order to catch this case.

On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted.  However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:31 -04:00
Paolo Bonzini
d67668e9dd KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
different handling of DR6 on intercepted #DB exceptions on Intel and AMD.

On Intel, #DB exceptions transmit the DR6 value via the exit qualification
field of the VMCS, and the exit qualification only contains the description
of the precise event that caused a vmexit.

On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
was to be injected into the guest.  This has two effects when guest debugging
is in use:

* the guest DR6 is clobbered

* the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
than just the last one that happened (the testcase in the next patch covers
this issue).

This patch fixes both issues by emulating, so to speak, the Intel behavior
on AMD processors.  The important observation is that (after the previous
patches) the VMCB value of DR6 is only ever observable from the guest is
KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
will be if guest debugging is enabled.

Therefore it is possible to enter the guest with an all-zero DR6,
reconstruct the #DB payload from the DR6 we get at exit time, and let
kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
is set, but this is harmless.

This may not be the most optimized way to deal with this, but it is
simple and, being confined within SVM code, it gets rid of the set_dr6
callback and kvm_update_dr6.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:44:31 -04:00
Paolo Bonzini
5679b803e4 KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the
second argument.  Ensure that the VMCB value is synchronized to
vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so
that the current value of DR6 is always available in vcpu->arch.dr6.
The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-08 07:43:47 -04:00
Eric Biggers
2aaba014b5 crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h
<linux/cryptohash.h> sounds very generic and important, like it's the
header to include if you're doing cryptographic hashing in the kernel.
But actually it only includes the library implementation of the SHA-1
compression function (not even the full SHA-1).  This should basically
never be used anymore; SHA-1 is no longer considered secure, and there
are much better ways to do cryptographic hashing in the kernel.

Most files that include this header don't actually need it.  So in
preparation for removing it, remove all these unneeded includes of it.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-05-08 15:32:17 +10:00
Janakarajan Natarajan
996ed22c7a arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
When trying to lock read-only pages, sev_pin_memory() fails because
FOLL_WRITE is used as the flag for get_user_pages_fast().

Commit 73b0140bf0 ("mm/gup: change GUP fast to use flags rather than a
write 'bool'") updated the get_user_pages_fast() call sites to use
flags, but incorrectly updated the call in sev_pin_memory().  As the
original coding of this call was correct, revert the change made by that
commit.

Fixes: 73b0140bf0 ("mm/gup: change GUP fast to use flags rather than a write 'bool'")
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-05-07 19:27:20 -07:00
Linus Torvalds
8c16ec94dc Bugfixes, mostly for ARM and AMD, and more documentation.
-----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl6yqbIUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroObBQf+NH9DCs6X92YggAoNpJl6uSIOX35X
 ErdWqYj80Xx95QU73aMukjs3Zqxe6WfYI9jPEOD8SDUZzZlVfIA35D8BYlqt1c5R
 A2K2ebTQbZ+j487QTUPbEvEivyxyVSozwvOdKBfL5kv0D9Cn2STyjVjmguUoCp9n
 VztmwbwpSZdOnexRSolwAWuyOriYbvpV12cIZpcMGrjL67yZPv8UyCxxJplDCLlB
 1c8tvGI2Md8apE/YZDqlCFh3H4YBQsact8uOoyY8cXKO/xIAsZOI+Dhm/cQAhGDk
 QIQqv/hkM4HPvOXQluwIau4Cx+Fl05xY/ggtQt4z/8yml2pOw8PKmwziZA==
 =60QX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bugfixes, mostly for ARM and AMD, and more documentation.

  Slightly bigger than usual because I couldn't send out what was
  pending for rc4, but there is nothing worrisome going on. I have more
  fixes pending for guest debugging support (gdbstub) but I will send
  them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
  KVM: selftests: Fix build for evmcs.h
  kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
  KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
  docs/virt/kvm: Document configuring and running nested guests
  KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
  kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
  KVM: x86: Fixes posted interrupt check for IRQs delivery modes
  KVM: SVM: fill in kvm_run->debug.arch.dr[67]
  KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
  KVM: arm64: Fix 32bit PC wrap-around
  KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
  KVM: arm64: Save/restore sp_el0 as part of __guest_enter
  KVM: arm64: Delete duplicated label in invalid_vector
  KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
  KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
  KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
  KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
  KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
  KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
  ...
2020-05-07 09:50:59 -07:00
Paolo Bonzini
2c19dba680 KVM: nSVM: trap #DB and #BP to userspace if guest debugging is on
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07 07:45:16 -04:00
Peter Xu
d5d260c5ff KVM: X86: Fix single-step with KVM_SET_GUEST_DEBUG
When single-step triggered with KVM_SET_GUEST_DEBUG, we should fill in the pc
value with current linear RIP rather than the cached singlestep address.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200505205000.188252-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-05-07 06:13:41 -04:00