- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXS88txQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhaPAQDHaAmu6wXtZjZE6GU4ZP61UNgDECmZ
4wlGrNc1AAlqAQD/QC8339p37aDCp9n27VY1wmJwF3nca+jAHfQLqWkkYgw=
=n/tz
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The main changes in this release include:
- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes"
* tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Make trace_get_fields() global
tracing: Let filter_assign_type() detect FILTER_PTR_STRING
tracing: Pass type into tracing_generic_entry_update()
ftrace/selftest: Test if set_event/ftrace_pid exists before writing
ftrace/selftests: Return the skip code when tracing directory not configured in kernel
tracing/kprobe: Check registered state using kprobe
tracing/probe: Add trace_event_call accesses APIs
tracing/probe: Add probe event name and group name accesses APIs
tracing/probe: Add trace flag access APIs for trace_probe
tracing/probe: Add trace_event_file access APIs for trace_probe
tracing/probe: Add trace_event_call register API for trace_probe
tracing/probe: Add trace_probe init and free functions
tracing/uprobe: Set print format when parsing command
tracing/kprobe: Set print format right after parsed command
kprobes: Fix to init kprobes in subsys_initcall
tracepoint: Use struct_size() in kmalloc()
ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
ftrace: Enable trampoline when rec count returns back to one
tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
tracing: Make a separate config for trace event self tests
...
Merge more updates from Andrew Morton:
"VM:
- z3fold fixes and enhancements by Henry Burns and Vitaly Wool
- more accurate reclaimed slab caches calculations by Yafang Shao
- fix MAP_UNINITIALIZED UAPI symbol to not depend on config, by
Christoph Hellwig
- !CONFIG_MMU fixes by Christoph Hellwig
- new novmcoredd parameter to omit device dumps from vmcore, by
Kairui Song
- new test_meminit module for testing heap and pagealloc
initialization, by Alexander Potapenko
- ioremap improvements for huge mappings, by Anshuman Khandual
- generalize kprobe page fault handling, by Anshuman Khandual
- device-dax hotplug fixes and improvements, by Pavel Tatashin
- enable synchronous DAX fault on powerpc, by Aneesh Kumar K.V
- add pte_devmap() support for arm64, by Robin Murphy
- unify locked_vm accounting with a helper, by Daniel Jordan
- several misc fixes
core/lib:
- new typeof_member() macro including some users, by Alexey Dobriyan
- make BIT() and GENMASK() available in asm, by Masahiro Yamada
- changed LIST_POISON2 on x86_64 to 0xdead000000000122 for better
code generation, by Alexey Dobriyan
- rbtree code size optimizations, by Michel Lespinasse
- convert struct pid count to refcount_t, by Joel Fernandes
get_maintainer.pl:
- add --no-moderated switch to skip moderated ML's, by Joe Perches
misc:
- ptrace PTRACE_GET_SYSCALL_INFO interface
- coda updates
- gdb scripts, various"
[ Using merge message suggestion from Vlastimil Babka, with some editing - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (100 commits)
fs/select.c: use struct_size() in kmalloc()
mm: add account_locked_vm utility function
arm64: mm: implement pte_devmap support
mm: introduce ARCH_HAS_PTE_DEVMAP
mm: clean up is_device_*_page() definitions
mm/mmap: move common defines to mman-common.h
mm: move MAP_SYNC to asm-generic/mman-common.h
device-dax: "Hotremove" persistent memory that is used like normal RAM
mm/hotplug: make remove_memory() interface usable
device-dax: fix memory and resource leak if hotplug fails
include/linux/lz4.h: fix spelling and copy-paste errors in documentation
ipc/mqueue.c: only perform resource calculation if user valid
include/asm-generic/bug.h: fix "cut here" for WARN_ON for __WARN_TAINT architectures
scripts/gdb: add helpers to find and list devices
scripts/gdb: add lx-genpd-summary command
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
kernel/pid.c: convert struct pid count to refcount_t
drivers/rapidio/devices/rio_mport_cdev.c: NUL terminate some strings
select: shift restore_saved_sigmask_unless() into poll_select_copy_remaining()
select: change do_poll() to return -ERESTARTNOHAND rather than -EINTR
...
Since commit 2724273e8f ("vmcore: add API to collect hardware dump in
second kernel"), drivers are allowed to add device related dump data to
vmcore as they want by using the device dump API. This has a potential
issue, the data is stored in memory, drivers may append too much data
and use too much memory. The vmcore is typically used in a kdump kernel
which runs in a pre-reserved small chunk of memory. So as a result it
will make kdump unusable at all due to OOM issues.
So introduce new 'novmcoredd' command line option. User can disable
device dump to reduce memory usage. This is helpful if device dump is
using too much memory, disabling device dump could make sure a regular
vmcore without device dump data is still available.
[akpm@linux-foundation.org: tweak documentation]
[akpm@linux-foundation.org: vmcore.c needs moduleparam.h]
Link: http://lkml.kernel.org/r/20190528111856.7276-1-kasong@redhat.com
Signed-off-by: Kairui Song <kasong@redhat.com>
Acked-by: Dave Young <dyoung@redhat.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: "David S . Miller" <davem@davemloft.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+QmuaPwR3wnBdVwACF8+vY7k4RUFAl0tpocACgkQCF8+vY7k
4RWoxA//b/fmDXP3WPzrjjSmpyB9ml0/epKzPbT5S2j0lftqKBmet29k+PCjVrTx
Nq2QauehY9ug5h8UMVUCmzPr95F0tSIGRoqk1vrn7z0K3q6k1SHrtvqbY1Bgb2Uk
Qvh2YFU4fQLJg8WAbExCjxCdbdmBKQVGKTwCtM+tP5OMxwAFOmQrjGaUaKCKIIA2
7Wzrx8CpSji+bJ3uK/d36c+4M9oDly5eaxBhoboL3BI0y+GqwiSASGwTO7BxrPOg
0wq5IZHnqS8+bprT9xQdDOqf+UOY9U1cxE/+sqsHxblfUEx9gfLy/R+FLmJn+SS9
Z3yLy4SqVHQMpWBjEAGodohikF60PAuTdymSC11jqFaKCUxWrIZg5xO+0blMrxPF
7vYIexutCkaBMHBlNaNsHIqB7B/2FGGKoN7QW64hwvwJCGvF7OmJcV+R4bROGvh4
nFuis9/Nm66Fq7I3aw37ThyZ0aWZdaQ0QJTH9ksxU/ZCz2hhMNYu/rXggrDvkS4U
nr77ZT5Gd7nj4b110zf8+99uiGiinY6hTfzPAuTCLBhaxwrv4/xDHAhpwdEB5T4j
8gOkxV8c0XWtL7sKqhGJvs/RRe2za0Y9XH6fyxsYfWcfuLjEvug8ouXMad9gxFWH
DL3WnKJEMGLScei2wux4kGOwEbkR1bUf2cHJfh3GpCB/y8vgLOc=
=smxY
-----END PGP SIGNATURE-----
Merge tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull rst conversion of docs from Mauro Carvalho Chehab:
"As agreed with Jon, I'm sending this big series directly to you, c/c
him, as this series required a special care, in order to avoid
conflicts with other trees"
* tag 'docs/v5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (77 commits)
docs: kbuild: fix build with pdf and fix some minor issues
docs: block: fix pdf output
docs: arm: fix a breakage with pdf output
docs: don't use nested tables
docs: gpio: add sysfs interface to the admin-guide
docs: locking: add it to the main index
docs: add some directories to the main documentation index
docs: add SPDX tags to new index files
docs: add a memory-devices subdir to driver-api
docs: phy: place documentation under driver-api
docs: serial: move it to the driver-api
docs: driver-api: add remaining converted dirs to it
docs: driver-api: add xilinx driver API documentation
docs: driver-api: add a series of orphaned documents
docs: admin-guide: add a series of orphaned documents
docs: cgroup-v1: add it to the admin-guide book
docs: aoe: add it to the driver-api book
docs: add some documentation dirs to the driver-api book
docs: driver-model: move it to the driver-api book
docs: lp855x-driver.rst: add it to the driver-api book
...
There are lots of documents under Documentation/*.txt and a few other
orphan documents elsehwere that belong to the driver-API book.
Move them to their right place.
Reviewed-by: Cornelia Huck <cohuck@redhat.com> # vfio-related parts
Acked-by: Logan Gunthorpe <logang@deltatee.com> # switchtec
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
There are lots of documents that belong to the admin-guide but
are on random places (most under Documentation root dir).
Move them to the admin guide.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
The Kdump documentation describes procedures with admins use
in order to solve issues on their systems.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
The docs under Documentation/laptops contain users specific
information.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
The stuff under sysctl describes /sys interface from userspace
point of view. So, add it to the admin-guide and remove the
:orphan: from its index file.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Rename the block documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Rename the /proc/sys/ documentation files to ReST, using the
README file as a template for an index.rst, adding the other
files there via TOC markup.
Despite being written on different times with different
styles, try to make them somewhat coherent with a similar
look and feel, ensuring that they'll look nice as both
raw text file and as via the html output produced by the
Sphinx build system.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Rename the laptops documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver, as well
as some other functions only used by drivers that haven't (yet?) made it
upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf record -e
mem: ...) which could lead to register corruption and kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for vmalloc
when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to use gas
macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to:
Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Cédric Le Goater,
Christian Lamparter, Christophe Leroy, Christophe Lombard, Christoph Hellwig,
Daniel Axtens, Denis Efremov, Enrico Weigelt, Frederic Barrat, Gautham R.
Shenoy, Geert Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg
Kurz, Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi Bangoria,
Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher Boessenkool, Shaokun
Zhang, Shawn Anastasio, Stewart Smith, Suraj Jitindar Singh, Thiago Jung
Bauermann, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdKVoLAAoJEFHr6jzI4aWA0kIP/A6shIbbE7H5W2hFrqt/PPPK
3+VrvPKbOFF+W6hcE/RgSZmEnUo0svdNjHUd/eMfFS1vb/uRt2QDdrsHUNNwURQL
M2mcLXFwYpnjSjb/XMgDbHpAQxjeGfTdYLonUIejN7Rk8KQUeLyKQ3SBn6kfMc46
DnUUcPcjuRGaETUmVuZZ4e40ZWbJp8PKDrSJOuUrTPXMaK5ciNbZk5mCWXGbYl6G
BMQAyv4ld/417rNTjBEP/T2foMJtioAt4W6mtlgdkOTdIEZnFU67nNxDBthNSu2c
95+I+/sML4KOp1R4yhqLSLIDDbc3bg3c99hLGij0d948z3bkSZ8bwnPaUuy70C4v
U8rvl/+N6C6H3DgSsPE/Gnkd8DnudqWY8nULc+8p3fXljGwww6/Qgt+6yCUn8BdW
WgixkSjKgjDmzTw8trIUNEqORrTVle7cM2hIyIK2Q5T4kWzNQxrLZ/x/3wgoYjUa
1KwIzaRo5JKZ9D3pJnJ5U+knE2/90rJIyfcp0W6ygyJsWKi2GNmq1eN3sKOw0IxH
Tg86RENIA/rEMErNOfP45sLteMuTR7of7peCG3yumIOZqsDVYAzerpvtSgip2cvK
aG+9HcYlBFOOOF9Dabi8GXsTBLXLfwiyjjLSpA9eXPwW8KObgiNfTZa7ujjTPvis
4mk9oukFTFUpfhsMmI3T
=3dBZ
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Removal of the NPU DMA code, used by the out-of-tree Nvidia driver,
as well as some other functions only used by drivers that haven't
(yet?) made it upstream.
- A fix for a bug in our handling of hardware watchpoints (eg. perf
record -e mem: ...) which could lead to register corruption and
kernel crashes.
- Enable HAVE_ARCH_HUGE_VMAP, which allows us to use large pages for
vmalloc when using the Radix MMU.
- A large but incremental rewrite of our exception handling code to
use gas macros rather than multiple levels of nested CPP macros.
And the usual small fixes, cleanups and improvements.
Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Andreas Schwab,
Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann,
Athira Rajeev, Cédric Le Goater, Christian Lamparter, Christophe
Leroy, Christophe Lombard, Christoph Hellwig, Daniel Axtens, Denis
Efremov, Enrico Weigelt, Frederic Barrat, Gautham R. Shenoy, Geert
Uytterhoeven, Geliang Tang, Gen Zhang, Greg Kroah-Hartman, Greg Kurz,
Gustavo Romero, Krzysztof Kozlowski, Madhavan Srinivasan, Masahiro
Yamada, Mathieu Malaterre, Michael Neuling, Nathan Lynch, Naveen N.
Rao, Nicholas Piggin, Nishad Kamdar, Oliver O'Halloran, Qian Cai, Ravi
Bangoria, Sachin Sant, Sam Bobroff, Satheesh Rajendran, Segher
Boessenkool, Shaokun Zhang, Shawn Anastasio, Stewart Smith, Suraj
Jitindar Singh, Thiago Jung Bauermann, YueHaibing"
* tag 'powerpc-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (163 commits)
powerpc/powernv/idle: Fix restore of SPRN_LDBAR for POWER9 stop state.
powerpc/eeh: Handle hugepages in ioremap space
ocxl: Update for AFU descriptor template version 1.1
powerpc/boot: pass CONFIG options in a simpler and more robust way
powerpc/boot: add {get, put}_unaligned_be32 to xz_config.h
powerpc/irq: Don't WARN continuously in arch_local_irq_restore()
powerpc/module64: Use symbolic instructions names.
powerpc/module32: Use symbolic instructions names.
powerpc: Move PPC_HA() PPC_HI() and PPC_LO() to ppc-opcode.h
powerpc/module64: Fix comment in R_PPC64_ENTRY handling
powerpc/boot: Add lzo support for uImage
powerpc/boot: Add lzma support for uImage
powerpc/boot: don't force gzipped uImage
powerpc/8xx: Add microcode patch to move SMC parameter RAM.
powerpc/8xx: Use IO accessors in microcode programming.
powerpc/8xx: replace #ifdefs by IS_ENABLED() in microcode.c
powerpc/8xx: refactor programming of microcode CPM params.
powerpc/8xx: refactor printing of microcode patch name.
powerpc/8xx: Refactor microcode write
powerpc/8xx: refactor writing of CPM microcode arrays
...
Patch series "add init_on_alloc/init_on_free boot options", v10.
Provide init_on_alloc and init_on_free boot options.
These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.
Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes. SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.
Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.
As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations. There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.
This patch (of 2):
The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.
This is expected to be on-by-default on Android and Chrome OS. And it
gives the opportunity for anyone else to use it under distros too via the
boot args. (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)
init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes. Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.
init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion. This helps to ensure sensitive data
doesn't leak via use-after-free accesses.
Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory. The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never
zero-initialized to preserve their semantics.
Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.
If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.
Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:
hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)
Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)
The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.
The new features are also going to pave the way for hardware memory
tagging (e.g. arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects. With MTE, tagging will have the
same cost as memory initialization.
Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized. There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.
[glider@google.com: v8]
Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz> [page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When debug_pagealloc is enabled, we currently allocate the page_ext
array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag. Now that
we have the page_type field in struct page, we can use that instead, as
guard pages are neither PageSlab nor mapped to userspace. This reduces
memory overhead when debug_pagealloc is enabled and there are no other
features requiring the page_ext array.
Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with other
trees, unfortunately. He has a lot more of these waiting on the wings
that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos, and one
on Spectre vulnerabilities.
- Various improvements to the build system, including automatic markup of
function() references because some people, for reasons I will never
understand, were of the opinion that :c:func:``function()`` is
unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
=D0eO
-----END PGP SIGNATURE-----
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Pull cgroup updates from Tejun Heo:
"Documentation updates and the addition of cgroup_parse_float() which
will be used by new controllers including blk-iocost"
* 'for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1: convert docs to ReST and rename to *.rst
cgroup: Move cgroup_parse_float() implementation out of CONFIG_SYSFS
cgroup: add cgroup_parse_float()
Pull RCU updates from Ingo Molnar:
"The changes in this cycle are:
- RCU flavor consolidation cleanups and optmizations
- Documentation updates
- Miscellaneous fixes
- SRCU updates
- RCU-sync flavor consolidation
- Torture-test updates
- Linux-kernel memory-consistency-model updates, most notably the
addition of plain C-language accesses"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
tools/memory-model: Improve data-race detection
tools/memory-model: Change definition of rcu-fence
tools/memory-model: Expand definition of barrier
tools/memory-model: Do not use "herd" to refer to "herd7"
tools/memory-model: Fix comment in MP+poonceonces.litmus
Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
rcu: Don't return a value from rcu_assign_pointer()
rcu: Force inlining of rcu_read_lock()
rcu: Fix irritating whitespace error in rcu_assign_pointer()
rcu: Upgrade sync_exp_work_done() to smp_mb()
rcutorture: Upper case solves the case of the vanishing NULL pointer
torture: Suppress propagating trace_printk() warning
rcutorture: Dump trace buffer for callback pipe drain failures
torture: Add --trust-make to suppress "make clean"
torture: Make --cpus override idleness calculations
torture: Run kernel build in source directory
torture: Add function graph-tracing cheat sheet
torture: Capture qemu output
rcutorture: Tweak kvm options
rcutorture: Add trivial RCU implementation
...
Pull x86 vsyscall updates from Thomas Gleixner:
"Further hardening of the legacy vsyscall by providing support for
execute only mode and switching the default to it.
This prevents a certain class of attacks which rely on the vsyscall
page being accessible at a fixed address in the canonical kernel
address space"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Add a test for process_vm_readv() on the vsyscall page
x86/vsyscall: Add __ro_after_init to global variables
x86/vsyscall: Change the default vsyscall mode to xonly
selftests/x86/vsyscall: Verify that vsyscall=none blocks execution
x86/vsyscall: Document odd SIGSEGV error code for vsyscalls
x86/vsyscall: Show something useful on a read fault
x86/vsyscall: Add a new vsyscall=xonly mode
Documentation/admin: Remove the vsyscall=native documentation
commit 243e25112d ("powerpc/xive: Native exploitation of the XIVE
interrupt controller") added an option to turn off Linux native XIVE
usage via the xive=off kernel command line option.
This documents this option.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Stewart Smith <stewart@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With vsyscall emulation on, a readable vsyscall page is still exposed that
contains syscall instructions that validly implement the vsyscalls.
This is required because certain dynamic binary instrumentation tools
attempt to read the call targets of call instructions in the instrumented
code. If the instrumented code uses vsyscalls, then the vsyscall page needs
to contain readable code.
Unfortunately, leaving readable memory at a deterministic address can be
used to help various ASLR bypasses, so some hardening value can be gained
by disallowing vsyscall reads.
Given how rarely the vsyscall page needs to be readable, add a mechanism to
make the vsyscall page be execute only.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org
This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
page table functions.
This enables huge (2MB and 1GB) ioremap mappings. I don't have a
benchmark for this change, but huge vmap will be used by a later core
kernel change to enable huge vmalloc memory mappings. This improves
cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
size dentry cache hash.
Profiling git diff dTLB misses with a vanilla kernel:
81.75% git [kernel.vmlinux] [k] __d_lookup_rcu
7.21% git [kernel.vmlinux] [k] strncpy_from_user
1.77% git [kernel.vmlinux] [k] find_get_entry
1.59% git [kernel.vmlinux] [k] kmem_cache_free
40,168 dTLB-miss
0.100342754 seconds time elapsed
With powerpc huge vmalloc:
2,987 dTLB-miss
0.095933138 seconds time elapsed
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Convert the PM documents to ReST, in order to allow them to
build with Sphinx.
The conversion is actually:
- add blank lines and indentation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Sphinx need to know when a paragraph ends. So, do some adjustments
at the file for it to be properly parsed.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
that's said, I believe that this file should be moved to the
GPU/DRM documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Convert those documents and prepare them to be part of the kernel
API book, as most of the stuff there are related to the
Kernel interfaces.
Still, in the future, it would make sense to split the docs,
as some of the stuff is clearly focused on sysadmin tasks.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Convert kdump documentation to ReST and add it to the
user faced manual, as the documents are mainly focused on
sysadmins that would be enabling kdump.
Note: the vmcoreinfo.rst has one very long title on one of its
sub-sections:
PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)
I opted to break this one, into two entries with the same content,
in order to make it easier to display after being parsed in html and PDF.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Also, removed the Maintained by, as requested by Geert.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Convert all text files with s390 documentation to ReST format.
Tried to preserve as much as possible the original document
format. Still, some of the files required some work in order
for it to be visible on both plain text and after converted
to html.
The conversion is actually:
- add blank lines and identation in order to identify paragraphs;
- fix tables markups;
- add some lists markups;
- mark literal blocks;
- adjust title markups.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The hisax driver got removed on 85993b8c97 ("isdn: remove hisax driver"),
but a left-over was kept at kernel-parameters.txt.
Fixes: 85993b8c97 ("isdn: remove hisax driver")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: Wolfram Sang <wsa@the-dreams.de>
Reviewed-by: Sven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: Bhupesh Sharma <bhsharma@redhat.com>
Acked-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The default behavior of hardlockup depends on the config of
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.
Fix the description of nmi_watchdog to make it clear.
Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Add kprobe_event= boot parameter to define kprobe events
at boot time.
The definition syntax is similar to tracefs/kprobe_events
interface, but use ',' and ';' instead of ' ' and '\n'
respectively. e.g.
kprobe_event=p,vfs_read,$arg1,$arg2
This puts a probe on vfs_read with argument1 and 2, and
enable the new event.
Link: http://lkml.kernel.org/r/155851395498.15728.830529496248543583.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some workloads need to change kthread priority for RCU core processing
without affecting other softirq work. This commit therefore introduces
the rcutree.use_softirq kernel boot parameter, which moves the RCU core
work from softirq to a per-CPU SCHED_OTHER kthread named rcuc. Use of
SCHED_OTHER approach avoids the scalability problems that appeared
with the earlier attempt to move RCU core processing to from softirq
to kthreads. That said, kernels built with RCU_BOOST=y will run the
rcuc kthreads at the RCU-boosting priority.
Note that rcutree.use_softirq=0 must be specified to move RCU core
processing to the rcuc kthreads: rcutree.use_softirq=1 is the default.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked
from RCU core processing, in contrast to softirq->rcuc transition
in old mainline RCU priority boosting. ]
[ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock()
while holding rq or pi locks, also possibly fixing a pre-existing latent
bug involving raise_softirq()-induced wakeups. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().
Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .
[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.com
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXNxbogAKCRCAXGG7T9hj
vpyFAQCUWBVb3vHQqqqsboKYA86cJg/t8fjdhw+vFieDcLs7ZwEA4nBDP9JfoHiV
HkDjhD3SEPS3kftsrR1PVGLrv/dIqgo=
=4YnV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.2b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- some minor cleanups
- two small corrections for Xen on ARM
- two fixes for Xen PVH guest support
- a patch for a new command line option to tune virtual timer handling
* tag 'for-linus-5.2b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/arm: Use p2m entry with lock protection
xen/arm: Free p2m entry if fail to add it to RB tree
xen/pvh: correctly setup the PV EFI interface for dom0
xen/pvh: set xen_domain_type to HVM in xen_pvh_init
xenbus: drop useless LIST_HEAD in xenbus_write_watch() and xenbus_file_write()
xen-netfront: mark expected switch fall-through
xen: xen-pciback: fix warning Using plain integer as NULL pointer
x86/xen: Add "xen_timer_slop" command line option
The maximum number of unique System V IPC identifiers was limited to
32k. That limit should be big enough for most use cases.
However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers. To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.
The use of this new option will change the pattern of the IPC
identifiers returned by functions like shmget(2). An application that
depends on such pattern may not work properly. So it should only be
used if the users really need more than 32k of unique IPC numbers.
This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128. So it is a trade-off.
The computation of a new IPC id is not done in the performance critical
path. So a little bit of additional overhead shouldn't have any real
performance impact.
Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow specifying reboot_mode for panic only. This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.
Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fi
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: Randomize free memory", v10.
This patch (of 3):
Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache. Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms. In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM. Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].
Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:
It's been a problem in the HPC space:
http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/
A kernel module called zonesort is available to try to help:
https://software.intel.com/en-us/articles/xeon-phi-software
and this abandoned patch series proposed that for the kernel:
https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com
Dan's patch series doesn't attempt to ensure buffers won't conflict, but
also reduces the chance that the buffers will. This will make performance
more consistent, albeit slower than "optimal" (which is near impossible
to attain in a general-purpose kernel). That's better than forcing
users to deploy remedies like:
"To eliminate this gradual degradation, we have added a Stream
measurement to the Node Health Check that follows each job;
nodes are rebooted whenever their measured memory bandwidth
falls below 300 GB/s."
A replacement for zonesort was merged upstream in commit cc9aec03e5
("x86/numa_emulation: Introduce uniform split capability"). With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes. A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable. While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.
The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts. Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.
Here are some performance impact details of the patches:
1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
3X speedup in a contrived case that tries to force cache conflicts.
The contrived cased used the numa_emulation capability to force an
instance of the benchmark to be run in two of the near-memory sized
numa nodes. If both instances were placed on the same emulated they
would fit and cause zero conflicts. While on separate emulated nodes
without randomization they underutilized the cache and conflicted
unnecessarily due to the in-order allocation per node.
2/ A well known Java server application benchmark was run with a heap
size that exceeded cache size by 3X. The cache conflict rate was 8%
for the first run and degraded to 21% after page allocator aging. With
randomization enabled the rate levelled out at 11%.
3/ A MongoDB workload did not observe measurable difference in
cache-conflict rates, but the overall throughput dropped by 7% with
randomization in one case.
4/ Mel Gorman ran his suite of performance workloads with randomization
enabled on platforms without a memory-side-cache and saw a mix of some
improvements and some losses [3].
While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads. For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected. Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.
Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization. Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects. The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages. Pages are freed in physical address order when first
onlined.
Quoting Kees:
"While we already have a base-address randomization
(CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
memory layouts would certainly be using the predictability of
allocation ordering (i.e. for attacks where the base address isn't
important: only the relative positions between allocated memory).
This is common in lots of heap-style attacks. They try to gain
control over ordering by spraying allocations, etc.
I'd really like to see this because it gives us something similar
to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.
Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time. Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).
The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.
This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions. It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent. As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.
[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309
[dan.j.williams@intel.com: fix shuffle enable]
Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 MDS mitigations from Thomas Gleixner:
"Microarchitectural Data Sampling (MDS) is a hardware vulnerability
which allows unprivileged speculative access to data which is
available in various CPU internal buffers. This new set of misfeatures
has the following CVEs assigned:
CVE-2018-12126 MSBDS Microarchitectural Store Buffer Data Sampling
CVE-2018-12130 MFBDS Microarchitectural Fill Buffer Data Sampling
CVE-2018-12127 MLPDS Microarchitectural Load Port Data Sampling
CVE-2019-11091 MDSUM Microarchitectural Data Sampling Uncacheable Memory
MDS attacks target microarchitectural buffers which speculatively
forward data under certain conditions. Disclosure gadgets can expose
this data via cache side channels.
Contrary to other speculation based vulnerabilities the MDS
vulnerability does not allow the attacker to control the memory target
address. As a consequence the attacks are purely sampling based, but
as demonstrated with the TLBleed attack samples can be postprocessed
successfully.
The mitigation is to flush the microarchitectural buffers on return to
user space and before entering a VM. It's bolted on the VERW
instruction and requires a microcode update. As some of the attacks
exploit data structures shared between hyperthreads, full protection
requires to disable hyperthreading. The kernel does not do that by
default to avoid breaking unattended updates.
The mitigation set comes with documentation for administrators and a
deeper technical view"
* 'x86-mds-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/speculation/mds: Fix documentation typo
Documentation: Correct the possible MDS sysfs values
x86/mds: Add MDSUM variant to the MDS documentation
x86/speculation/mds: Add 'mitigations=' support for MDS
x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
x86/speculation/mds: Fix comment
x86/speculation/mds: Add SMT warning message
x86/speculation: Move arch_smt_update() call to after mitigation decisions
x86/speculation/mds: Add mds=full,nosmt cmdline option
Documentation: Add MDS vulnerability documentation
Documentation: Move L1TF to separate directory
x86/speculation/mds: Add mitigation mode VMWERV
x86/speculation/mds: Add sysfs reporting for MDS
x86/speculation/mds: Add mitigation control for MDS
x86/speculation/mds: Conditionally clear CPU buffers on idle entry
x86/kvm/vmx: Add MDS protection when L1D Flush is not active
x86/speculation/mds: Clear CPU buffers on exit to user
x86/speculation/mds: Add mds_clear_cpu_buffers()
x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
x86/speculation/mds: Add BUG_MSBDS_ONLY
...
Highlights:
- Support for Kernel Userspace Access/Execution Prevention (like
SMAP/SMEP/PAN/PXN) on some 64-bit and 32-bit CPUs. This prevents the kernel
from accidentally accessing userspace outside copy_to/from_user(), or
ever executing userspace.
- KASAN support on 32-bit.
- Rework of where we map the kernel, vmalloc, etc. on 64-bit hash to use the
same address ranges we use with the Radix MMU.
- A rewrite into C of large parts of our idle handling code for 64-bit Book3S
(ie. power8 & power9).
- A fast path entry for syscalls on 32-bit CPUs, for a 12-17% speedup in the
null_syscall benchmark.
- On 64-bit bare metal we have support for recovering from errors with the time
base (our clocksource), however if that fails currently we hang in __delay()
and never crash. We now have support for detecting that case and short
circuiting __delay() so we at least panic() and reboot.
- Add support for optionally enabling the DAWR on Power9, which had to be
disabled by default due to a hardware erratum. This has the effect of
enabling hardware breakpoints for GDB, the downside is a badly behaved
program could crash the machine by pointing the DAWR at cache inhibited
memory. This is opt-in obviously.
- xmon, our crash handler, gets support for a read only mode where operations
that could change memory or otherwise disturb the system are disabled.
Plus many clean-ups, reworks and minor fixes etc.
Thanks to:
Christophe Leroy, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Andrew
Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Ben Hutchings,
Bo YU, Breno Leitao, Cédric Le Goater, Christopher M. Riedl, Christoph
Hellwig, Colin Ian King, David Gibson, Ganesh Goudar, Gautham R. Shenoy,
George Spelvin, Greg Kroah-Hartman, Greg Kurz, Horia Geantă, Jagadeesh
Pagadala, Joel Stanley, Joe Perches, Julia Lawall, Laurentiu Tudor, Laurent
Vivier, Lukas Bulwahn, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu
Malaterre, Michael Neuling, Mukesh Ojha, Nathan Fontenot, Nathan Lynch,
Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Peng Hao, Qian Cai, Ravi
Bangoria, Rick Lindsley, Russell Currey, Sachin Sant, Stewart Smith, Sukadev
Bhattiprolu, Thomas Huth, Tobin C. Harding, Tyrel Datwyler, Valentin
Schneider, Wei Yongjun, Wen Yang, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc1WbwAAoJEFHr6jzI4aWAv5cP/iDskai4Az/GCa6yLj4b+det
7mc7tTOaEzhUtvfrYYfHgvvdNNzo1ETv7rqTdZqtWJ3xfwdeowLFXXZwSywZKUDB
bi4pcl2v55Qlf9kxgx9RDr6+4fTwGG4nhO2qPDJDR1umEih9mG/2HJ7d+Wnq6Va2
E9srd+R6Fa0ty88+9vzBtdyllnDK1XHu3ahsxCH62aRm79ucuVrxyydWmbbs5lJe
a7g/OQIPgZmObHhfXvw9DFkOvkp5Pm6hfHOeyQH2nTB5X6k0judWv00uoHTJgOuP
DKxZtDhaGnajUfuhQYboDPOuFjY7lkfgEXaagyZsjdudqridTMmv1iU1o7iy8BT4
AId4DyJbvFFgqRJkCwKzhKRRHPfFMfM7KTJ38GPZuPmniuULk9uiIy6JyY0tXO+l
UQEclPzOTPkAE12FBaOBuqZqTRuBQuokWQF8ZDPOxbNAixHgFoRd4Z9diNwCPpLu
+KoyCwd2Gm5DyX+mC85sWG28IPKi9Hhhw2XBOA5F4A2kH6uFa1BnERSRGYomx+pc
BvEXHglf/vgV0XUQZfDCsiOecIKYuWxgre0/liLhhU5qMss2pxHczzffH4KtdykS
9y7o3mVRcS7Moitbmb6SAJoQxbR5QhzfN832DbSd6jEfKdg1ytZlfHTG0WZYHKDs
PHs6V1N+cQANdukutrJz
=cUkd
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.2-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Slightly delayed due to the issue with printk() calling
probe_kernel_read() interacting with our new user access prevention
stuff, but all fixed now.
The only out-of-area changes are the addition of a cpuhp_state, small
additions to Documentation and MAINTAINERS updates.
Highlights:
- Support for Kernel Userspace Access/Execution Prevention (like
SMAP/SMEP/PAN/PXN) on some 64-bit and 32-bit CPUs. This prevents
the kernel from accidentally accessing userspace outside
copy_to/from_user(), or ever executing userspace.
- KASAN support on 32-bit.
- Rework of where we map the kernel, vmalloc, etc. on 64-bit hash to
use the same address ranges we use with the Radix MMU.
- A rewrite into C of large parts of our idle handling code for
64-bit Book3S (ie. power8 & power9).
- A fast path entry for syscalls on 32-bit CPUs, for a 12-17% speedup
in the null_syscall benchmark.
- On 64-bit bare metal we have support for recovering from errors
with the time base (our clocksource), however if that fails
currently we hang in __delay() and never crash. We now have support
for detecting that case and short circuiting __delay() so we at
least panic() and reboot.
- Add support for optionally enabling the DAWR on Power9, which had
to be disabled by default due to a hardware erratum. This has the
effect of enabling hardware breakpoints for GDB, the downside is a
badly behaved program could crash the machine by pointing the DAWR
at cache inhibited memory. This is opt-in obviously.
- xmon, our crash handler, gets support for a read only mode where
operations that could change memory or otherwise disturb the system
are disabled.
Plus many clean-ups, reworks and minor fixes etc.
Thanks to: Christophe Leroy, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
Anton Blanchard, Ben Hutchings, Bo YU, Breno Leitao, Cédric Le Goater,
Christopher M. Riedl, Christoph Hellwig, Colin Ian King, David Gibson,
Ganesh Goudar, Gautham R. Shenoy, George Spelvin, Greg Kroah-Hartman,
Greg Kurz, Horia Geantă, Jagadeesh Pagadala, Joel Stanley, Joe
Perches, Julia Lawall, Laurentiu Tudor, Laurent Vivier, Lukas Bulwahn,
Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
Neuling, Mukesh Ojha, Nathan Fontenot, Nathan Lynch, Nicholas Piggin,
Nick Desaulniers, Oliver O'Halloran, Peng Hao, Qian Cai, Ravi
Bangoria, Rick Lindsley, Russell Currey, Sachin Sant, Stewart Smith,
Sukadev Bhattiprolu, Thomas Huth, Tobin C. Harding, Tyrel Datwyler,
Valentin Schneider, Wei Yongjun, Wen Yang, YueHaibing"
* tag 'powerpc-5.2-1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (205 commits)
powerpc/64s: Use early_mmu_has_feature() in set_kuap()
powerpc/book3s/64: check for NULL pointer in pgd_alloc()
powerpc/mm: Fix hugetlb page initialization
ocxl: Fix return value check in afu_ioctl()
powerpc/mm: fix section mismatch for setup_kup()
powerpc/mm: fix redundant inclusion of pgtable-frag.o in Makefile
powerpc/mm: Fix makefile for KASAN
powerpc/kasan: add missing/lost Makefile
selftests/powerpc: Add a signal fuzzer selftest
powerpc/booke64: set RI in default MSR
ocxl: Provide global MMIO accessors for external drivers
ocxl: move event_fd handling to frontend
ocxl: afu_irq only deals with IRQ IDs, not offsets
ocxl: Allow external drivers to use OpenCAPI contexts
ocxl: Create a clear delineation between ocxl backend & frontend
ocxl: Don't pass pci_dev around
ocxl: Split pci.c
ocxl: Remove some unused exported symbols
ocxl: Remove superfluous 'extern' from headers
ocxl: read_pasid never returns an error, so make it void
...
Pull intgrity updates from James Morris:
"This contains just three patches, the remainder were either included
in other pull requests (eg. audit, lockdown) or will be upstreamed via
other subsystems (eg. kselftests, Power).
Included here is one bug fix, one documentation update, and extending
the x86 IMA arch policy rules to coordinate the different kernel
module signature verification methods"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
doc/kernel-parameters.txt: Deprecate ima_appraise_tcb
x86/ima: add missing include
x86/ima: require signed kernel modules
Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user handlers
- Non-critical fixes and cleanup
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlzMFGgACgkQt6xw3ITB
YzTicAf/TX1h1+ecbx4WJAa4qeiOCPoNpG9efldQumqJhKL44MR5bkhuShna5mwE
ptm5qUXkZCxLTjzssZKnbdbgwa3t+emW8Of3D91IfI9akiZbMoDx5FGgcNbqjazb
RLrhOFHwgontA38yppZN+DrL+sXbvif/CVELdHahkEx6KepSGaS2lmPXRmz/W56v
4yIRy/zxc3Dhjgfm3wKh72nBwoZdLiIc4mchd5pthNlR9E2idrYkQegG1C+gA00r
o8uZRVOWgoh7H+QJE+xLUc8PaNCg8xqRRXOuZYg9GOz6hh7zSWhm+f1nRz9S2tIR
gIgsCHNqoO2I3E1uJpAQXDGtt2kFhA==
=ulpJ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Mostly just incremental improvements here:
- Introduce AT_HWCAP2 for advertising CPU features to userspace
- Expose SVE2 availability to userspace
- Support for "data cache clean to point of deep persistence" (DC PODP)
- Honour "mitigations=off" on the cmdline and advertise status via
sysfs
- CPU timer erratum workaround (Neoverse-N1 #1188873)
- Introduce perf PMU driver for the SMMUv3 performance counters
- Add config option to disable the kuser helpers page for AArch32 tasks
- Futex modifications to ensure liveness under contention
- Rework debug exception handling to seperate kernel and user
handlers
- Non-critical fixes and cleanup"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
Documentation: Add ARM64 to kernel-parameters.rst
arm64/speculation: Support 'mitigations=' cmdline option
arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSB
arm64: enable generic CPU vulnerabilites support
arm64: add sysfs vulnerability show for speculative store bypass
arm64: Fix size of __early_cpu_boot_status
clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters
clocksource/arm_arch_timer: Remove use of workaround static key
clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stable
clocksource/arm_arch_timer: Direcly assign set_next_event workaround
arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvct
watchdog/sbsa: Use arch_timer_read_counter instead of arch_counter_get_cntvct
ARM: vdso: Remove dependency with the arch_timer driver internals
arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1
arm64: Add part number for Neoverse N1
arm64: Make ARM64_ERRATUM_1188873 depend on COMPAT
arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32
arm64: mm: Remove pte_unmap_nested()
arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variable
arm64: compat: Reduce address limit for 64K pages
...
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this
will allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc0CCEAAoJEDjwexyKj9rgjmkH/A3e2drvuP/hSF3xfCKTQFdx
/PoLHQVCqENB3HU3FA/ljoXuG6jMgwj61looqlxBNumXFpIfTg0E1JC5S4wRGJ+K
cOVhIKV53gcuZkRcCJQp0WMnGzpk1Daf7iYXYmAl+7e+mREUPxOuJ0Ei6vXvRGZS
8cQrUCGrtPgkAeLlndypHI2M2TDDGJIMczOGbOZau8+8Lo7Wq9zt5y0h/v0ew37g
ogA0eGh6koU1435dt2pclZRiZ1XOcar3Uin9ioT+RnSgJ4pr1Pza/F6IGO0RdQa+
rva990lqGFp5r9lE4rMCwK9LWb/rfHdVPd35t9XPwphnQ/ORoWUwLk3uc5XOHow=
=dbuy
-----END PGP SIGNATURE-----
Merge tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- Support for kernel address space layout randomization
- Add support for kernel image signature verification
- Convert s390 to the generic get_user_pages_fast code
- Convert s390 to the stack unwind API analog to x86
- Add support for CPU directed interrupts for PCI devices
- Provide support for MIO instructions to the PCI base layer, this will
allow the use of direct PCI mappings in user space code
- Add the basic KVM guest ultravisor interface for protected VMs
- Add AT_HWCAP bits for several new hardware capabilities
- Update the CPU measurement facility counter definitions to SVN 6
- Arnds cleanup patches for his quest to get LLVM compiles working
- A vfio-ccw update with bug fixes and support for halt and clear
- Improvements for the hardware TRNG code
- Another round of cleanup for the QDIO layer
- Numerous cleanups and bug fixes
* tag 's390-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (98 commits)
s390/vdso: drop unnecessary cc-ldoption
s390: fix clang -Wpointer-sign warnigns in boot code
s390: drop CONFIG_VIRT_TO_BUS
s390: boot, purgatory: pass $(CLANG_FLAGS) where needed
s390: only build for new CPUs with clang
s390: simplify disabled_wait
s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
s390/unwind: introduce stack unwind API
s390/opcodes: add missing instructions to the disassembler
s390/bug: add entry size to the __bug_table section
s390: use proper expoline sections for .dma code
s390/nospec: rename assembler generated expoline thunks
s390: add missing ENDPROC statements to assembler functions
locking/lockdep: check for freed initmem in static_obj()
s390/kernel: add support for kernel address space layout randomization (KASLR)
s390/kernel: introduce .dma sections
s390/sclp: do not use static sccbs
s390/kprobes: use static buffer for insn_page
s390/kernel: convert SYSCALL and PGM_CHECK handlers to .quad
s390/kernel: build a relocatable kernel
...