So far the pseries platforms has always been using IOMMU making
SWIOTLB unnecessary. Now we want secure guests which means devices can
only access certain areas of guest physical memory; we are going to
use SWIOTLB for this purpose.
This allows SWIOTLB for pseries. By default there is no change in
behavior.
This enables SWIOTLB when the "swiotlb" kernel parameter is set to
"force".
With the SWIOTLB enabled, the kernel creates a directly mapped DMA
window (using the usual DDW mechanism) and implements SWIOTLB on top
of that.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the dma_get_mask() helper from dma-mapping.h instead, as they are
functionally identical.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
A couple of bugs in queue_hotplug_event():
1. Unchecked kmalloc result which could lead to an oops.
2. Use of GFP_KERNEL allocations in interrupt context (this code's
only caller is ras_hotplug_interrupt()).
Use kmemdup to avoid open-coding the allocation+copy and check for
failure; use GFP_ATOMIC for both allocations.
Ultimately it probably would be better to avoid or reduce allocations
in this path if possible.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's common for the platform to replace the cache device nodes after a
migration. Since the cacheinfo code is never informed about this, it
never drops its references to the source system's cache nodes, causing
it to wind up in an inconsistent state resulting in warnings and oopses
as soon as CPU online/offline occurs after the migration, e.g.
cache for /cpus/l3-cache@3113(Unified) refers to cache for /cpus/l2-cache@200d(Unified)
WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 release_cache+0x1bc/0x1d0
[...]
NIP release_cache+0x1bc/0x1d0
LR release_cache+0x1b8/0x1d0
Call Trace:
release_cache+0x1b8/0x1d0 (unreliable)
cacheinfo_cpu_offline+0x1c4/0x2c0
unregister_cpu_online+0x1b8/0x260
cpuhp_invoke_callback+0x114/0xf40
cpuhp_thread_fun+0x270/0x310
smpboot_thread_fn+0x2c8/0x390
kthread+0x1b8/0x1c0
ret_from_kernel_thread+0x5c/0x68
Using device tree notifiers won't work since we want to rebuild the
hierarchy only after all the removals and additions have occurred and
the device tree is in a consistent state. Call cacheinfo_teardown()
before processing device tree updates, and rebuild the hierarchy
afterward.
Fixes: 410bccf978 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
CPU online/offline code paths are sensitive to parts of the device
tree (various cpu node properties, cache nodes) that can be changed as
a result of a migration.
Prevent CPU hotplug while the device tree potentially is inconsistent.
Fixes: 410bccf978 ("powerpc/pseries: Partition migration in the kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During post-migration device tree updates, we can oops in
pseries_update_drconf_memory() if the source device tree has an
ibm,dynamic-memory-v2 property and the destination has a
ibm,dynamic_memory (v1) property. The notifier processes an "update"
for the ibm,dynamic-memory property but it's really an add in this
scenario. So make sure the old property object is there before
dereferencing it.
Fixes: 2b31e3aec1 ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
While developing KASAN for 64-bit book3s, I hit the following stack
over-read.
It occurs because the hypercall to put characters onto the terminal
takes 2 longs (128 bits/16 bytes) of characters at a time, and so
hvc_put_chars() would unconditionally copy 16 bytes from the argument
buffer, regardless of supplied length. However, udbg_hvc_putc() can
call hvc_put_chars() with a single-byte buffer, leading to the error.
==================================================================
BUG: KASAN: stack-out-of-bounds in hvc_put_chars+0xdc/0x110
Read of size 8 at addr c0000000023e7a90 by task swapper/0
CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.0-rc2-next-20190528-02824-g048a6ab4835b #113
Call Trace:
dump_stack+0x104/0x154 (unreliable)
print_address_description+0xa0/0x30c
__kasan_report+0x20c/0x224
kasan_report+0x18/0x30
__asan_report_load8_noabort+0x24/0x40
hvc_put_chars+0xdc/0x110
hvterm_raw_put_chars+0x9c/0x110
udbg_hvc_putc+0x154/0x200
udbg_write+0xf0/0x240
console_unlock+0x868/0xd30
register_console+0x970/0xe90
register_early_udbg_console+0xf8/0x114
setup_arch+0x108/0x790
start_kernel+0x104/0x784
start_here_common+0x1c/0x534
Memory state around the buggy address:
c0000000023e7980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0000000023e7a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
>c0000000023e7a80: f1 f1 01 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
^
c0000000023e7b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0000000023e7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
Document that a 16-byte buffer is requred, and provide it in udbg.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In dlpar_parse_cc_property(), 'prop->name' is allocated by kstrdup().
kstrdup() may return NULL, so it should be checked and handle error.
And prop should be freed if 'prop->name' is NULL.
Signed-off-by: Gen Zhang <blackgod016574@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When removing memory we need to remove the memory from the node
it was added to instead of looking up the node it should be in
in the device tree.
During testing we have seen scenarios where the affinity for a
LMB changes due to a partition migration or PRRN event. In these
cases the node the LMB exists in may not match the node the device
tree indicates it belongs in. This can lead to a system crash
when trying to DLPAR remove the LMB after a migration or PRRN
event. The current code looks up the node in the device tree to
remove the LMB from, the crash occurs when we try to offline this
node and it does not have any data, i.e. node_data[nid] == NULL.
36:mon> e
cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
lr: c0000000003a14ec: remove_memory+0xbc/0x110
sp: c0000001828b7a90
msr: 800000000280b033
dar: 9a28
dsisr: 40000000
current = 0xc0000006329c4c80
paca = 0xc000000007a55200 softe: 0 irq_happened: 0x01
pid = 76926, comm = kworker/u320:3
36:mon> t
[link register ] c0000000003a14ec remove_memory+0xbc/0x110
[c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
[c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
[c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
[c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
[c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
[c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
[c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
[c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
[c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
[c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
To resolve this we need to track the node a LMB belongs to when
it is added to the system so we can remove it from that node instead
of the node that the device tree indicates it should belong to.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add support to hwpoison the pages upon hitting machine check
exception.
This patch queues the address where UE is hit to percpu array
and schedules work to plumb it into memory poison infrastructure.
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
[mpe: Combine #ifdefs, drop PPC_BIT8(), and empty inline stub]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The commit 4c5d87db49 ("powerpc/pseries: PAPR persistent memory
support") set a local variable "count" in dlpar_hp_pmem() but never
use it.
arch/powerpc/platforms/pseries/pmem.c: In function 'dlpar_hp_pmem':
arch/powerpc/platforms/pseries/pmem.c:109:6: warning: variable 'count' set but not used
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The commit b7d6bf4fdd ("powerpc/pseries/pci: Remove obsolete SW
invalidate") left 2 variables unused.
arch/powerpc/platforms/pseries/iommu.c:108:17: warning: variable 'tces' set but not used
__be64 *tcep, *tces;
^~~~
arch/powerpc/platforms/pseries/iommu.c:132:17: warning: variable 'tces' set but not used
__be64 *tcep, *tces;
^~~~
Also, the commit 68c0449ea1 ("powerpc/pseries/iommu: Use memory@
nodes in max RAM address calculation") set "ranges" in
ddw_memory_hotplug_max() but never use it.
arch/powerpc/platforms/pseries/iommu.c: In function 'ddw_memory_hotplug_max':
arch/powerpc/platforms/pseries/iommu.c:948:7: warning: variable 'ranges' set but not used
int ranges, n_mem_addr_cells, n_mem_size_cells, len;
^~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
resize_hpt_for_hotplug() reports a warning when it cannot
resize the hash page table ("Unable to resize hash page
table to target order") but in some cases it's not a problem
and can make user thinks something has not worked properly.
This patch moves the warning to arch_remove_memory() to
only report the problem when it is needed.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is a follow up to the patch that fixed misleading print for TLB
mutlihit due to wrongly populated mc_err_types[] array. Convert all the
static array initialization to '[x] = val' style for better
readability of array indexing and avoid any further confusion.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
we currently use of_read_property() to obtain the pointer to the array
corresponding to the property "ibm,drc-indexes". The elements of this
array are of type __be32, but are accessed without any conversion to
the OS-endianness, which is buggy on a Little Endian OS.
Fix this by using of_property_read_u32_index() accessor function to
safely read the elements of the array.
Fixes: e83636ac33 ("pseries/drc-info: Search DRC properties for CPU indexes")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Replace the /sys/class/dax device model with /sys/bus/dax, and include
a compat driver so distributions can opt-in to the new ABI.
* Allow for an alternative driver for the device-dax address-range
* Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
* Arrange for the device-dax target-node to be onlined so that the newly
added memory range can be uniquely referenced by numa apis.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJchWpGAAoJEB7SkWpmfYgCJk8P/0Q1DINszUDO/vKjJ09cDs9P
Jw3it6GBIL50rDOu9QdcprSpwYDD0h1mLAV/m6oa3bVO+p4uWGvnxaxRx2HN2c/v
vhZFtUDpHlqR63vzWMNVKRprYixCRJDUr6xQhhCcE3ak/ELN6w7LWfikKVWv15UL
MfR96IQU38f+xRda/zSXnL9606Dvkvu/inEHj84lRcHIwj3sQAUalrE8bR3O32gZ
bDg/l5kzT49o8ZXUo/TegvRSSSZpJmOl2DD0RW+ax5q3NI2bOXFrVDUKBKxf/hcQ
E/V9i57TrqQx0GqRhnU7rN/v53cFZGGs31TEEIB/xs3bzCnADxwXcjL5b5K005J6
vJjBA2ODBewHFK3uVx46Hy1iV4eCtZWj4QrMnrjdSrjXOfbF5GTbWOhPFgoq7TWf
S7VqFEf3I2gDPaMq4o8Ej1kLH4HMYeor2NSOZjyvGn87rSZ3ZIQguwbaNIVl+itz
gdDt0ZOU0BgOBkV+rZIeZDaGdloWCHcDPL15CkZaOZyzdWhfEZ7dod6ad+9udilU
EUPH62RgzXZtfm5zpebYyjNVLbb9pLZ0nT+UypyGR6zqWx1SqU3mXi63NFXPco+x
XA9j//edPeI6NHg2CXLEh8DLuCg3dG1zWRJANkiF+niBwyCR8CHtGWAoY6soXbKe
2UrXGcIfXxyJ8V9v8v4q
=hfa3
-----END PGP SIGNATURE-----
Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".
Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.
One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.
Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.
- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"
NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.
And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.
Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.
I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"
* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
Merge more updates from Andrew Morton:
- some of the rest of MM
- various misc things
- dynamic-debug updates
- checkpatch
- some epoll speedups
- autofs
- rapidio
- lib/, lib/lzo/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
samples/mic/mpssd/mpssd.h: remove duplicate header
kernel/fork.c: remove duplicated include
include/linux/relay.h: fix percpu annotation in struct rchan
arch/nios2/mm/fault.c: remove duplicate include
unicore32: stop printing the virtual memory layout
MAINTAINERS: fix GTA02 entry and mark as orphan
mm: create the new vm_fault_t type
arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
arch: simplify several early memory allocations
openrisc: simplify pte_alloc_one_kernel()
sh: prefer memblock APIs returning virtual address
microblaze: prefer memblock API returning virtual address
powerpc: prefer memblock APIs returning virtual address
lib/lzo: separate lzo-rle from lzo
lib/lzo: implement run-length encoding
lib/lzo: fast 8-byte copy on arm64
lib/lzo: 64-bit CTZ on arm64
lib/lzo: tidy-up ifdefs
ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
ipc: annotate implicit fall through
...
Patch series "memblock: simplify several early memory allocation", v4.
These patches simplify some of the early memory allocations by replacing
usage of older memblock APIs with newer and shinier ones.
Quite a few places in the arch/ code allocated memory using a memblock
API that returns a physical address of the allocated area, then
converted this physical address to a virtual one and then used memset(0)
to clear the allocated range.
More recent memblock APIs do all the three steps in one call and their
usage simplifies the code.
It's important to note that regardless of API used, the core allocation
is nearly identical for any set of memblock allocators: first it tries
to find a free memory with all the constraints specified by the caller
and then falls back to the allocation with some or all constraints
disabled.
The first three patches perform the conversion of call sites that have
exact requirements for the node and the possible memory range.
The fourth patch is a bit one-off as it simplifies openrisc's
implementation of pte_alloc_one_kernel(), and not only the memblock
usage.
The fifth patch takes care of simpler cases when the allocation can be
satisfied with a simple call to memblock_alloc().
The sixth patch removes one-liner wrappers for memblock_alloc on arm and
unicore32, as suggested by Christoph.
This patch (of 6):
There are a several places that allocate memory using memblock APIs that
return a physical address, convert the returned address to the virtual
address and frequently also memset(0) the allocated range.
Update these places to use memblock allocators already returning a
virtual address. Use memblock functions that clear the allocated memory
instead of calling memset(0) where appropriate.
The calls to memblock_alloc_base() that were not followed by memset(0)
are replaced with memblock_alloc_try_nid_raw(). Since the latter does
not panic() when the allocation fails, the appropriate panic() calls are
added to the call sites.
Link: http://lkml.kernel.org/r/1546248566-14910-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mark Salter <msalter@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Notable changes:
- Enable THREAD_INFO_IN_TASK to move thread_info off the stack.
- A big series from Christoph reworking our DMA code to use more of the generic
infrastructure, as he said:
"This series switches the powerpc port to use the generic swiotlb and
noncoherent dma ops, and to use more generic code for the coherent direct
mapping, as well as removing a lot of dead code."
- Increase our vmalloc space to 512T with the Hash MMU on modern CPUs, allowing
us to support machines with larger amounts of total RAM or distance between
nodes.
- Two series from Christophe, one to optimise TLB miss handlers on 6xx, and
another to optimise the way STRICT_KERNEL_RWX is implemented on some 32-bit
CPUs.
- Support for KCOV coverage instrumentation which means we can run syzkaller
and discover even more bugs in our code.
And as always many clean-ups, reworks and minor fixes etc.
Thanks to:
Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea Arcangeli, Andrew
Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir Singh, Brajeswar Ghosh,
Breno Leitao, Christian Lamparter, Christian Zigotzky, Christophe Leroy,
Christoph Hellwig, Corentin Labbe, Daniel Axtens, David Gibson, Diana Craciun,
Firoz Khan, Gustavo A. R. Silva, Igor Stoppa, Joe Lawrence, Joel Stanley,
Jonathan Neuschäfer, Jordan Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh
Salgaonkar, Mark Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce,
Meelis Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot, Nicholas
Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran, Paul Mackerras,
Peter Xu, PrasannaKumar Muralidharan, Qian Cai, Rashmica Gupta, Reza Arbab,
Robert P. J. Day, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Sandipan Das,
Sergey Senozhatsky, Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav
Jain, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcgRJlAAoJEFHr6jzI4aWAL9oP+gPlrZgyaAg/51lmubLtlbtk
QuGU8EiuJZoJD1OHrMPtppBOY7rQZOxJe58AoPig8wTvs+j/TxJ25fmiZncnf5U2
PC8QAjbj0UmQHgy+K30sUeOnDg9tdkHKHJ5/ecjJcvykkqsjyMnV7biFQ1cOA0HT
LflXHEEtiG9P9u7jZoAhtnfpgn1/l9mhTYMe26J1fqvC0164qMDFaXDTQXyDfyvG
gmuqccGMawSk7IdagmQxwXtwyfwOnarmGn+n31XKRejApGZ/pjiEA23JOJOaJcia
m76Jy3roao6sEtCUNpBFXEtwOy9POy3OiGy6yg/9896tDMvG84OuO6ltV1nFGawL
PmwE+ug63L4g/HWxZyAeb26T2oTTp/YIaKQPtsq4d286pvg/qr2KPNzFoAEhmJqU
yLrebv276pVeiLpLmCLPvcPj9t76vWKZaUm0FoE+zUDg7Rl7Alow8A/c4tdjOI6y
QwpbCiYseyiJ32lCZZdbN7Cy6+iM6vb3i1oNKc8MVqhBGTwLJnTU0ruPBSvCaRvD
NoQWO1RWpNu/BuivuLEKS9q3AoxenGwiqowxGhdVmI3Oc9jGWcEYlduR00VDYPVp
/RCfwtTY5NyC++h5cnbz8aLJ1hBXG5m79CXfprV+zPWeiLPCaMT6w9Y5QUS2wqA+
EZ734NknDJOjaHc4cGdZ
=Z9bb
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Enable THREAD_INFO_IN_TASK to move thread_info off the stack.
- A big series from Christoph reworking our DMA code to use more of
the generic infrastructure, as he said:
"This series switches the powerpc port to use the generic swiotlb
and noncoherent dma ops, and to use more generic code for the
coherent direct mapping, as well as removing a lot of dead
code."
- Increase our vmalloc space to 512T with the Hash MMU on modern
CPUs, allowing us to support machines with larger amounts of total
RAM or distance between nodes.
- Two series from Christophe, one to optimise TLB miss handlers on
6xx, and another to optimise the way STRICT_KERNEL_RWX is
implemented on some 32-bit CPUs.
- Support for KCOV coverage instrumentation which means we can run
syzkaller and discover even more bugs in our code.
And as always many clean-ups, reworks and minor fixes etc.
Thanks to: Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea
Arcangeli, Andrew Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir
Singh, Brajeswar Ghosh, Breno Leitao, Christian Lamparter, Christian
Zigotzky, Christophe Leroy, Christoph Hellwig, Corentin Labbe, Daniel
Axtens, David Gibson, Diana Craciun, Firoz Khan, Gustavo A. R. Silva,
Igor Stoppa, Joe Lawrence, Joel Stanley, Jonathan Neuschäfer, Jordan
Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark
Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce, Meelis
Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot,
Nicholas Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran,
Paul Mackerras, Peter Xu, PrasannaKumar Muralidharan, Qian Cai,
Rashmica Gupta, Reza Arbab, Robert P. J. Day, Russell Currey,
Sabyasachi Gupta, Sam Bobroff, Sandipan Das, Sergey Senozhatsky,
Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav Jain,
YueHaibing"
* tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits)
powerpc/32: Clear on-stack exception marker upon exception return
powerpc: Remove export of save_stack_trace_tsk_reliable()
powerpc/mm: fix "section_base" set but not used
powerpc/mm: Fix "sz" set but not used warning
powerpc/mm: Check secondary hash page table
powerpc: remove nargs from __SYSCALL
powerpc/64s: Fix unrelocated interrupt trampoline address test
powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
powerpc/fsl: Fix the flush of branch predictor.
powerpc/powernv: Make opal log only readable by root
powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C
powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy
powerpc/64s: system reset interrupt preserve HSRRs
powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search
powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
selftests/powerpc: Remove duplicate header
powerpc sstep: Add support for modsd, modud instructions
...
Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+euQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynyTgCfbV8CLums843sBnT8NnWrTMTdTCcAn1K4re0m
ep8g+6oRLxJy414hogxQ
=bLs2
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
driver core: platform: remove misleading err_alloc label
platform: set of_node in platform_device_register_full()
firmware: hardcode the debug message for -ENOENT
driver core: Add missing description of new struct device_link field
driver core: Fix PM-runtime for links added during consumer probe
drivers/component: kerneldoc polish
async: Add cmdline option to specify drivers to be async probed
driver core: Fix possible supplier PM-usage counter imbalance
PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
driver: platform: Support parsing GpioInt 0 in platform_get_irq()
selftests: firmware: fix verify_reqs() return value
Revert "selftests: firmware: remove use of non-standard diff -Z option"
Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
device: Fix comment for driver_data in struct device
kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
sysfs: remove unused include of kernfs-internal.h
driver core: Postpone DMA tear-down until after devres release
driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
PM-runtime: Take suppliers into account in __pm_runtime_set_status()
device.h: Add __cold to dev_<level> logging functions
...
The Processor Utilzation of Resource Registers (PURR) provide an
estimate of resources used by a cpu thread. Section 7.6 in Book III of
the ISA outlines how to calculate the percentage of shared resources
for threads using the ratio of the PURR delta and Timebase Register
delta for a sampled period.
This calculation is currently done erroneously by the lparstat tool
from the powerpc-utils package. This patch exports the current
timebase value after we sample the PURRs and exposes it to userspace
accounting tools via /proc/ppc64/lparcfg.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no good reason for this helper, just opencode it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we've switched all the powerpc nommu and swiotlb methods to
use the generic dma_direct_* calls we can remove these ops vectors
entirely and rely on the common direct mapping bypass that avoids
indirect function calls entirely. This also allows to remove a whole
lot of boilerplate code related to setting up these operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The coherent cache version of this function already is functionally
identicall to the default version, and by defining the
arch_dma_coherent_to_pfn hook the same is ture for the noncoherent
version as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the generic iommu bypass code instead of overriding set_dma_mask.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops
to simply the code a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
vio_dma_mapping_ops currently does a lot of indirect calls through
dma_iommu_ops, which not only make the code harder to follow but are
also expensive in the post-spectre world. Unwind the indirect calls
by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
just use the dma_iommu_* methods directly where we can.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When binding an SCM volume to a physical address the hypervisor has the
option to return early with a continue token with the expectation that
the guest will resume the bind operation until it completes. A quirk of
this interface is that the bind address will only be returned by the
first bind h-call and the subsequent calls will return
0xFFFF_FFFF_FFFF_FFFF for the bind address.
We currently do not save the address returned by the first h-call. As a
result we will use the junk address as the base of the bound region if
the hypervisor decides to split the bind across multiple h-calls. This
bug was found when testing with very large SCM volumes where the bind
process would take more time than they hypervisor's internal h-call time
limit would allow. This patch fixes the issue by saving the bind address
from the first call.
Cc: stable@vger.kernel.org
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries systems, performing a partition migration can result in
altering the nodes a CPU is assigned to on the destination system. For
exampl, pre-migration on the source system CPUs are in node 1 and 3,
post-migration on the destination system CPUs are in nodes 2 and 3.
Handling the node change for a CPU can cause corruption in the slab
cache if we hit a timing where a CPUs node is changed while cache_reap()
is invoked. The corruption occurs because the slab cache code appears
to rely on the CPU and slab cache pages being on the same node.
The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
does not prevent us from hitting this scenario.
Changing the device tree property update notification handler that
recognizes an affinity change for a CPU to do a full DLPAR remove and
add of the CPU instead of dynamically changing its node resolves this
issue.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are trying to get rid of BUS_ATTR() and the usage of that in
ibmebus.c can be trivially converted to use BUS_ATTR_WO(), so use that
instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adopt nvram module to reduce code duplication. This means CONFIG_NVRAM
becomes available to PPC64 builds. Previously it was only available to
PPC32 builds because it depended on CONFIG_GENERIC_NVRAM.
The IOC_NVRAM_GET_OFFSET ioctl as implemented on PPC64 validates the
offset returned by pmac_get_partition(). Do the same in the nvram module.
Note that the old PPC32 generic_nvram module lacked this test.
So when CONFIG_PPC32 && CONFIG_PPC_PMAC, the IOC_NVRAM_GET_OFFSET ioctl
would have returned 0 (always). But when CONFIG_PPC64 && CONFIG_PPC_PMAC,
the IOC_NVRAM_GET_OFFSET ioctl would have returned -1 (which is -EPERM)
when the requested partition was not found.
With this patch, the result is now -EINVAL on both PPC32 and PPC64 when
the requested PowerMac NVRAM partition is not found. This is a userspace-
visible change, in the non-existent partition case, which would be in
an error path for an IOC_NVRAM_GET_OFFSET ioctl syscall.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3be2df00e2 ("powerpc/pseries/npu: Enable platform support")
added a call to pnv_npu2_init() in pseries code. This causes a build
break if we build with CONFIG_PPC_PSERIES && !CONFIG_PPC_POWERNV:
powerpc64le-pc-linux-gnu-ld: arch/powerpc/platforms/pseries/pci.o: in function `pSeries_final_fixup':
pci.c:(.init.text+0x1b0): undefined reference to `pnv_npu2_init'
This commit therefore wraps that line in an ifdef, so that pseries
builds without powernv.
Fixes: 3be2df00e2 ("powerpc/pseries/npu: Enable platform support")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Frob change log a bit to blame a different commit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
Interface Table), is the first known instance of a memory range
described by a unique "target" proximity domain. Where "initiator" and
"target" proximity domains is an approach that the ACPI HMAT
(Heterogeneous Memory Attributes Table) uses to described the unique
performance properties of a memory range relative to a given initiator
(e.g. CPU or DMA device).
Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
char-device follows the traditional notion of 'numa-node' where the
attribute conveys the closest online numa-node. That numa-node attribute
is useful for cpu-binding and memory-binding processes *near* the
device. However, when the memory range backing a 'pmem', or 'dax' device
is onlined (memory hot-add) the memory-only-numa-node representing that
address needs to be differentiated from the set of online nodes. In
other words, the numa-node association of the device depends on whether
you can bind processes *near* the cpu-numa-node in the offline
device-case, or bind process *on* the memory-range directly after the
backing address range is onlined.
Allow for the case that platform firmware describes persistent memory
with a unique proximity domain, i.e. when it is distinct from the
proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
numa-node translation of that proximity through the libnvdimm region
device to namespaces that are in device-dax mode. With this in place the
proposed kmem driver [1] can optionally discover a unique numa-node
number for the address range as it transitions the memory from an
offline state managed by a device-driver to an online memory range
managed by the core-mm.
[1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com
Reported-by: Fan Du <fan.du@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries
by Christoph Hellwig.
Currently, every architecture that wants to provide common peripheral
busses needs to add some boilerplate code and include the right Kconfig
files. This series instead just selects the presence (when needed) and
then handles everything in the bus-specific Kconfig file under drivers/.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJcJilwAAoJED2LAQed4NsGt1YP/RMTEUqbCSwS/CnTLrE+aVTC
O2aWwB80ZlVwpeBbHLW5/M88OvOev0UaCr+gyzgpFRl5ITzS7Jevb8VbpGzblbH7
bFxIEyZFGQiy9oEWw3Lfu9JRSsLm3jNo7hkmdBSn2Rw3KkEd/YF7K3q9GuA7BpCS
ZxAirebvEpr4KYEzkuc57NqCYx2Tc8G+JWr5D7pZCFaq9vxYt3TddGqw/c7iQVSQ
1Og1809IdhGyCSlA/ExfaqaBMaJHMRAOHX5GgkqZw1EbFcizUFhAAsKCrGL5nBtX
NiWF9jhgHR1M+L69jfctOstrmGQD2KicNgWQf1aS5RQkPfjuqIKGT/i9g6J1pVyX
TaW1J36Hcl8PpsKoPBnnrixd1T41O3/PuqtEJRm7LCBYOQiwS9sEmLO09RDRjER8
SPAAyvkhE8oq+0RHiTYN4tm8dyJc1djZ5wzgLnwFPAnU6SR+mbN02RzBMsYZXD+x
RNbBSGBRJFQDBw6Rn+ktcIQvcKYmUqe1k1YNHMy6kG3QqvhBaDy+8PA/YjIKPQYQ
B/NNUAMEJMys1OQrRL2UDXb2ysaCpzwMmlrBW2IwYsQrX5OwbPkNuQ5Mbe1Lr+mc
4NXR+HubvojsHaAby+OhFbrUX2Jcz3wqYj7aannb9sMRmw0VJXV5dPYUqje3ZhPS
P2AovKT8O9nWsEttqER5
=WxId
-----END PGP SIGNATURE-----
Merge tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig file consolidation from Masahiro Yamada:
"Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries by
Christoph Hellwig.
Currently, every architecture that wants to provide common peripheral
busses needs to add some boilerplate code and include the right
Kconfig files. This series instead just selects the presence (when
needed) and then handles everything in the bus-specific Kconfig file
under drivers/"
* tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
pcmcia: remove per-arch PCMCIA config entry
eisa: consolidate EISA Kconfig entry in drivers/eisa
rapidio: consolidate RAPIDIO config entry in drivers/rapidio
pcmcia: allow PCMCIA support independent of the architecture
PCI: consolidate the PCI_SYSCALL symbol
PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options
PCI: consolidate PCI config entry in drivers/pci
MIPS: remove the HT_PCI config option
Pull Devicetree updates from Rob Herring:
"The biggest highlight here is the start of using json-schema for DT
bindings. Being able to validate bindings has been discussed for years
with little progress.
- Initial support for DT bindings using json-schema language. This is
the start of converting DT bindings from free-form text to a
structured format.
- Reworking of initrd address initialization. This moves to using the
phys address instead of virt addr in the DT parsing code. This
rework was motivated by CONFIG_DEV_BLK_INITRD causing unnecessary
rebuilding of lots of files.
- Fix stale phandle entries in phandle cache
- DT overlay validation improvements. This exposed several memory
leak bugs which have been fixed.
- Use node name and device_type helper functions in DT code
- Last remaining conversions to using %pOFn printk specifier instead
of device_node.name directly
- Create new common RTC binding doc and move all trivial RTC devices
out of trivial-devices.txt.
- New bindings for Freescale MAG3110 magnetometer, Cadence Sierra
PHY, and Xen shared memory
- Update dtc to upstream version v1.4.7-57-gf267e674d145"
* tag 'devicetree-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (68 commits)
of: __of_detach_node() - remove node from phandle cache
of: of_node_get()/of_node_put() nodes held in phandle cache
gpio-omap.txt: add reg and interrupts properties
dt-bindings: mrvl,intc: fix a trivial typo
dt-bindings: iio: magnetometer: add dt-bindings for freescale mag3110
dt-bindings: Convert trivial-devices.txt to json-schema
dt-bindings: arm: mrvl: amend Browstone compatible string
dt-bindings: arm: Convert Tegra board/soc bindings to json-schema
dt-bindings: arm: Convert ZTE board/soc bindings to json-schema
dt-bindings: arm: Add missing Xilinx boards
dt-bindings: arm: Convert Xilinx board/soc bindings to json-schema
dt-bindings: arm: Convert VIA board/soc bindings to json-schema
dt-bindings: arm: Convert ST STi board/soc bindings to json-schema
dt-bindings: arm: Convert SPEAr board/soc bindings to json-schema
dt-bindings: arm: Convert CSR SiRF board/soc bindings to json-schema
dt-bindings: arm: Convert QCom board/soc bindings to json-schema
dt-bindings: arm: Convert TI nspire board/soc bindings to json-schema
dt-bindings: arm: Convert TI davinci board/soc bindings to json-schema
dt-bindings: arm: Convert Calxeda board/soc bindings to json-schema
dt-bindings: arm: Convert Altera board/soc bindings to json-schema
...
Merge misc updates from Andrew Morton:
- large KASAN update to use arm's "software tag-based mode"
- a few misc things
- sh updates
- ocfs2 updates
- just about all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...
A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for architectures
that have devices that perform DMA that is not cache coherent. Based
on the existing arm64 implementation and also used for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel data
leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov
ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb
j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer
TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj
JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG
8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl
xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs
HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF
JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq
apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR
XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ
WgA=
=in72
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
totalram_pages and totalhigh_pages are made static inline function.
Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.
A couple of open coded iterating thru the child node names are converted
to use for_each_child_of_node() instead.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to remove the node name pointer from struct
device_node, convert printf users to use the %pOFn format specifier.
pmem.c was recently added and missed the initial conversion.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In update_lmb_associativity_index() we lookup dr_node using
of_find_node_by_path() which takes a reference for us. In the
non-error case we forget to drop the reference. Note that
find_aa_index() does modify properties of the node, but doesn't need
an extra reference held once it's returned.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powernv platform registers IOMMU groups and adds devices to them
from the pci_controller_ops::setup_bridge() hook except one case when
virtual functions (SRIOV VFs) are added from a bus notifier.
The pseries platform registers IOMMU groups from
the pci_controller_ops::dma_bus_setup() hook and adds devices from
the pci_controller_ops::dma_dev_setup() hook. The very same bus notifier
used for powernv does not add devices for pseries though as
__of_scan_bus() adds devices first, then it does the bus/dev DMA setup.
Both platforms use iommu_add_device() which takes a device and expects
it to have a valid IOMMU table struct with an iommu_table_group pointer
which in turn points the iommu_group struct (which represents
an IOMMU group). Although the helper seems easy to use, it relies on
some pre-existing device configuration and associated data structures
which it does not really need.
This simplifies iommu_add_device() to take the table_group pointer
directly. Pseries already has a table_group pointer handy and the bus
notified is not used anyway. For powernv, this copies the existing bus
notifier, makes it work for powernv only which means an easy way of
getting to the table_group pointer. This was tested on VFs but should
also support physical PCI hotplug.
Since iommu_add_device() receives the table_group pointer directly,
pseries does not do TCE cache invalidation (the hypervisor does) nor
allow multiple groups per a VFIO container (in other words sharing
an IOMMU table between partitionable endpoints), this removes
iommu_table_group_link from pseries.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>