The commit 4c5d87db49 ("powerpc/pseries: PAPR persistent memory
support") set a local variable "count" in dlpar_hp_pmem() but never
use it.
arch/powerpc/platforms/pseries/pmem.c: In function 'dlpar_hp_pmem':
arch/powerpc/platforms/pseries/pmem.c:109:6: warning: variable 'count' set but not used
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The commit b7d6bf4fdd ("powerpc/pseries/pci: Remove obsolete SW
invalidate") left 2 variables unused.
arch/powerpc/platforms/pseries/iommu.c:108:17: warning: variable 'tces' set but not used
__be64 *tcep, *tces;
^~~~
arch/powerpc/platforms/pseries/iommu.c:132:17: warning: variable 'tces' set but not used
__be64 *tcep, *tces;
^~~~
Also, the commit 68c0449ea1 ("powerpc/pseries/iommu: Use memory@
nodes in max RAM address calculation") set "ranges" in
ddw_memory_hotplug_max() but never use it.
arch/powerpc/platforms/pseries/iommu.c: In function 'ddw_memory_hotplug_max':
arch/powerpc/platforms/pseries/iommu.c:948:7: warning: variable 'ranges' set but not used
int ranges, n_mem_addr_cells, n_mem_size_cells, len;
^~~~~~
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
resize_hpt_for_hotplug() reports a warning when it cannot
resize the hash page table ("Unable to resize hash page
table to target order") but in some cases it's not a problem
and can make user thinks something has not worked properly.
This patch moves the warning to arch_remove_memory() to
only report the problem when it is needed.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This is a follow up to the patch that fixed misleading print for TLB
mutlihit due to wrongly populated mc_err_types[] array. Convert all the
static array initialization to '[x] = val' style for better
readability of array indexing and avoid any further confusion.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In cpu_to_drc_index() in the case when FW_FEATURE_DRC_INFO is absent,
we currently use of_read_property() to obtain the pointer to the array
corresponding to the property "ibm,drc-indexes". The elements of this
array are of type __be32, but are accessed without any conversion to
the OS-endianness, which is buggy on a Little Endian OS.
Fix this by using of_property_read_u32_index() accessor function to
safely read the elements of the array.
Fixes: e83636ac33 ("pseries/drc-info: Search DRC properties for CPU indexes")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Pavithra R. Prakash <pavrampu@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[mpe: Make the WARN_ON a WARN_ON_ONCE so it's not retriggerable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Replace the /sys/class/dax device model with /sys/bus/dax, and include
a compat driver so distributions can opt-in to the new ABI.
* Allow for an alternative driver for the device-dax address-range
* Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
* Arrange for the device-dax target-node to be onlined so that the newly
added memory range can be uniquely referenced by numa apis.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJchWpGAAoJEB7SkWpmfYgCJk8P/0Q1DINszUDO/vKjJ09cDs9P
Jw3it6GBIL50rDOu9QdcprSpwYDD0h1mLAV/m6oa3bVO+p4uWGvnxaxRx2HN2c/v
vhZFtUDpHlqR63vzWMNVKRprYixCRJDUr6xQhhCcE3ak/ELN6w7LWfikKVWv15UL
MfR96IQU38f+xRda/zSXnL9606Dvkvu/inEHj84lRcHIwj3sQAUalrE8bR3O32gZ
bDg/l5kzT49o8ZXUo/TegvRSSSZpJmOl2DD0RW+ax5q3NI2bOXFrVDUKBKxf/hcQ
E/V9i57TrqQx0GqRhnU7rN/v53cFZGGs31TEEIB/xs3bzCnADxwXcjL5b5K005J6
vJjBA2ODBewHFK3uVx46Hy1iV4eCtZWj4QrMnrjdSrjXOfbF5GTbWOhPFgoq7TWf
S7VqFEf3I2gDPaMq4o8Ej1kLH4HMYeor2NSOZjyvGn87rSZ3ZIQguwbaNIVl+itz
gdDt0ZOU0BgOBkV+rZIeZDaGdloWCHcDPL15CkZaOZyzdWhfEZ7dod6ad+9udilU
EUPH62RgzXZtfm5zpebYyjNVLbb9pLZ0nT+UypyGR6zqWx1SqU3mXi63NFXPco+x
XA9j//edPeI6NHg2CXLEh8DLuCg3dG1zWRJANkiF+niBwyCR8CHtGWAoY6soXbKe
2UrXGcIfXxyJ8V9v8v4q
=hfa3
-----END PGP SIGNATURE-----
Merge tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull device-dax updates from Dan Williams:
"New device-dax infrastructure to allow persistent memory and other
"reserved" / performance differentiated memories, to be assigned to
the core-mm as "System RAM".
Some users want to use persistent memory as additional volatile
memory. They are willing to cope with potential performance
differences, for example between DRAM and 3D Xpoint, and want to use
typical Linux memory management apis rather than a userspace memory
allocator layered over an mmap() of a dax file. The administration
model is to decide how much Persistent Memory (pmem) to use as System
RAM, create a device-dax-mode namespace of that size, and then assign
it to the core-mm. The rationale for device-dax is that it is a
generic memory-mapping driver that can be layered over any "special
purpose" memory, not just pmem. On subsequent boots udev rules can be
used to restore the memory assignment.
One implication of using pmem as RAM is that mlock() no longer keeps
data off persistent media. For this reason it is recommended to enable
NVDIMM Security (previously merged for 5.0) to encrypt pmem contents
at rest. We considered making this recommendation an actively enforced
requirement, but in the end decided to leave it as a distribution /
administrator policy to allow for emulation and test environments that
lack security capable NVDIMMs.
Summary:
- Replace the /sys/class/dax device model with /sys/bus/dax, and
include a compat driver so distributions can opt-in to the new ABI.
- Allow for an alternative driver for the device-dax address-range
- Introduce the 'kmem' driver to hotplug / assign a device-dax
address-range to the core-mm.
- Arrange for the device-dax target-node to be onlined so that the
newly added memory range can be uniquely referenced by numa apis"
NOTE! I'm not entirely happy with the whole "PMEM as RAM" model because
we currently have special - and very annoying rules in the kernel about
accessing PMEM only with the "MC safe" accessors, because machine checks
inside the regular repeat string copy functions can be fatal in some
(not described) circumstances.
And apparently the PMEM modules can cause that a lot more than regular
RAM. The argument is that this happens because PMEM doesn't necessarily
get scrubbed at boot like RAM does, but that is planned to be added for
the user space tooling.
Quoting Dan from another email:
"The exposure can be reduced in the volatile-RAM case by scanning for
and clearing errors before it is onlined as RAM. The userspace tooling
for that can be in place before v5.1-final. There's also runtime
notifications of errors via acpi_nfit_uc_error_notify() from
background scrubbers on the DIMM devices. With that mechanism the
kernel could proactively clear newly discovered poison in the volatile
case, but that would be additional development more suitable for v5.2.
I understand the concern, and the need to highlight this issue by
tapping the brakes on feature development, but I don't see PMEM as RAM
making the situation worse when the exposure is also there via DAX in
the PMEM case. Volatile-RAM is arguably a safer use case since it's
possible to repair pages where the persistent case needs active
application coordination"
* tag 'devdax-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: "Hotplug" persistent memory for use like normal RAM
mm/resource: Let walk_system_ram_range() search child resources
mm/memory-hotplug: Allow memory resources to be children
mm/resource: Move HMM pr_debug() deeper into resource code
mm/resource: Return real error codes from walk failures
device-dax: Add a 'modalias' attribute to DAX 'bus' devices
device-dax: Add a 'target_node' attribute
device-dax: Auto-bind device after successful new_id
acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node
device-dax: Add /sys/class/dax backwards compatibility
device-dax: Add support for a dax override driver
device-dax: Move resource pinning+mapping into the common driver
device-dax: Introduce bus + driver model
device-dax: Start defining a dax bus model
device-dax: Remove multi-resource infrastructure
device-dax: Kill dax_region base
device-dax: Kill dax_region ida
Merge more updates from Andrew Morton:
- some of the rest of MM
- various misc things
- dynamic-debug updates
- checkpatch
- some epoll speedups
- autofs
- rapidio
- lib/, lib/lzo/ updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
samples/mic/mpssd/mpssd.h: remove duplicate header
kernel/fork.c: remove duplicated include
include/linux/relay.h: fix percpu annotation in struct rchan
arch/nios2/mm/fault.c: remove duplicate include
unicore32: stop printing the virtual memory layout
MAINTAINERS: fix GTA02 entry and mark as orphan
mm: create the new vm_fault_t type
arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
arch: simplify several early memory allocations
openrisc: simplify pte_alloc_one_kernel()
sh: prefer memblock APIs returning virtual address
microblaze: prefer memblock API returning virtual address
powerpc: prefer memblock APIs returning virtual address
lib/lzo: separate lzo-rle from lzo
lib/lzo: implement run-length encoding
lib/lzo: fast 8-byte copy on arm64
lib/lzo: 64-bit CTZ on arm64
lib/lzo: tidy-up ifdefs
ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
ipc: annotate implicit fall through
...
Patch series "memblock: simplify several early memory allocation", v4.
These patches simplify some of the early memory allocations by replacing
usage of older memblock APIs with newer and shinier ones.
Quite a few places in the arch/ code allocated memory using a memblock
API that returns a physical address of the allocated area, then
converted this physical address to a virtual one and then used memset(0)
to clear the allocated range.
More recent memblock APIs do all the three steps in one call and their
usage simplifies the code.
It's important to note that regardless of API used, the core allocation
is nearly identical for any set of memblock allocators: first it tries
to find a free memory with all the constraints specified by the caller
and then falls back to the allocation with some or all constraints
disabled.
The first three patches perform the conversion of call sites that have
exact requirements for the node and the possible memory range.
The fourth patch is a bit one-off as it simplifies openrisc's
implementation of pte_alloc_one_kernel(), and not only the memblock
usage.
The fifth patch takes care of simpler cases when the allocation can be
satisfied with a simple call to memblock_alloc().
The sixth patch removes one-liner wrappers for memblock_alloc on arm and
unicore32, as suggested by Christoph.
This patch (of 6):
There are a several places that allocate memory using memblock APIs that
return a physical address, convert the returned address to the virtual
address and frequently also memset(0) the allocated range.
Update these places to use memblock allocators already returning a
virtual address. Use memblock functions that clear the allocated memory
instead of calling memset(0) where appropriate.
The calls to memblock_alloc_base() that were not followed by memset(0)
are replaced with memblock_alloc_try_nid_raw(). Since the latter does
not panic() when the allocation fails, the appropriate panic() calls are
added to the call sites.
Link: http://lkml.kernel.org/r/1546248566-14910-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mark Salter <msalter@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Notable changes:
- Enable THREAD_INFO_IN_TASK to move thread_info off the stack.
- A big series from Christoph reworking our DMA code to use more of the generic
infrastructure, as he said:
"This series switches the powerpc port to use the generic swiotlb and
noncoherent dma ops, and to use more generic code for the coherent direct
mapping, as well as removing a lot of dead code."
- Increase our vmalloc space to 512T with the Hash MMU on modern CPUs, allowing
us to support machines with larger amounts of total RAM or distance between
nodes.
- Two series from Christophe, one to optimise TLB miss handlers on 6xx, and
another to optimise the way STRICT_KERNEL_RWX is implemented on some 32-bit
CPUs.
- Support for KCOV coverage instrumentation which means we can run syzkaller
and discover even more bugs in our code.
And as always many clean-ups, reworks and minor fixes etc.
Thanks to:
Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea Arcangeli, Andrew
Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir Singh, Brajeswar Ghosh,
Breno Leitao, Christian Lamparter, Christian Zigotzky, Christophe Leroy,
Christoph Hellwig, Corentin Labbe, Daniel Axtens, David Gibson, Diana Craciun,
Firoz Khan, Gustavo A. R. Silva, Igor Stoppa, Joe Lawrence, Joel Stanley,
Jonathan Neuschäfer, Jordan Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh
Salgaonkar, Mark Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce,
Meelis Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot, Nicholas
Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran, Paul Mackerras,
Peter Xu, PrasannaKumar Muralidharan, Qian Cai, Rashmica Gupta, Reza Arbab,
Robert P. J. Day, Russell Currey, Sabyasachi Gupta, Sam Bobroff, Sandipan Das,
Sergey Senozhatsky, Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav
Jain, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcgRJlAAoJEFHr6jzI4aWAL9oP+gPlrZgyaAg/51lmubLtlbtk
QuGU8EiuJZoJD1OHrMPtppBOY7rQZOxJe58AoPig8wTvs+j/TxJ25fmiZncnf5U2
PC8QAjbj0UmQHgy+K30sUeOnDg9tdkHKHJ5/ecjJcvykkqsjyMnV7biFQ1cOA0HT
LflXHEEtiG9P9u7jZoAhtnfpgn1/l9mhTYMe26J1fqvC0164qMDFaXDTQXyDfyvG
gmuqccGMawSk7IdagmQxwXtwyfwOnarmGn+n31XKRejApGZ/pjiEA23JOJOaJcia
m76Jy3roao6sEtCUNpBFXEtwOy9POy3OiGy6yg/9896tDMvG84OuO6ltV1nFGawL
PmwE+ug63L4g/HWxZyAeb26T2oTTp/YIaKQPtsq4d286pvg/qr2KPNzFoAEhmJqU
yLrebv276pVeiLpLmCLPvcPj9t76vWKZaUm0FoE+zUDg7Rl7Alow8A/c4tdjOI6y
QwpbCiYseyiJ32lCZZdbN7Cy6+iM6vb3i1oNKc8MVqhBGTwLJnTU0ruPBSvCaRvD
NoQWO1RWpNu/BuivuLEKS9q3AoxenGwiqowxGhdVmI3Oc9jGWcEYlduR00VDYPVp
/RCfwtTY5NyC++h5cnbz8aLJ1hBXG5m79CXfprV+zPWeiLPCaMT6w9Y5QUS2wqA+
EZ734NknDJOjaHc4cGdZ
=Z9bb
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Enable THREAD_INFO_IN_TASK to move thread_info off the stack.
- A big series from Christoph reworking our DMA code to use more of
the generic infrastructure, as he said:
"This series switches the powerpc port to use the generic swiotlb
and noncoherent dma ops, and to use more generic code for the
coherent direct mapping, as well as removing a lot of dead
code."
- Increase our vmalloc space to 512T with the Hash MMU on modern
CPUs, allowing us to support machines with larger amounts of total
RAM or distance between nodes.
- Two series from Christophe, one to optimise TLB miss handlers on
6xx, and another to optimise the way STRICT_KERNEL_RWX is
implemented on some 32-bit CPUs.
- Support for KCOV coverage instrumentation which means we can run
syzkaller and discover even more bugs in our code.
And as always many clean-ups, reworks and minor fixes etc.
Thanks to: Alan Modra, Alexey Kardashevskiy, Alistair Popple, Andrea
Arcangeli, Andrew Donnellan, Aneesh Kumar K.V, Aravinda Prasad, Balbir
Singh, Brajeswar Ghosh, Breno Leitao, Christian Lamparter, Christian
Zigotzky, Christophe Leroy, Christoph Hellwig, Corentin Labbe, Daniel
Axtens, David Gibson, Diana Craciun, Firoz Khan, Gustavo A. R. Silva,
Igor Stoppa, Joe Lawrence, Joel Stanley, Jonathan Neuschäfer, Jordan
Niethe, Laurent Dufour, Madhavan Srinivasan, Mahesh Salgaonkar, Mark
Cave-Ayland, Masahiro Yamada, Mathieu Malaterre, Matteo Croce, Meelis
Roos, Michael W. Bringmann, Nathan Chancellor, Nathan Fontenot,
Nicholas Piggin, Nick Desaulniers, Nicolai Stange, Oliver O'Halloran,
Paul Mackerras, Peter Xu, PrasannaKumar Muralidharan, Qian Cai,
Rashmica Gupta, Reza Arbab, Robert P. J. Day, Russell Currey,
Sabyasachi Gupta, Sam Bobroff, Sandipan Das, Sergey Senozhatsky,
Souptick Joarder, Stewart Smith, Tyrel Datwyler, Vaibhav Jain,
YueHaibing"
* tag 'powerpc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits)
powerpc/32: Clear on-stack exception marker upon exception return
powerpc: Remove export of save_stack_trace_tsk_reliable()
powerpc/mm: fix "section_base" set but not used
powerpc/mm: Fix "sz" set but not used warning
powerpc/mm: Check secondary hash page table
powerpc: remove nargs from __SYSCALL
powerpc/64s: Fix unrelocated interrupt trampoline address test
powerpc/powernv/ioda: Fix locked_vm counting for memory used by IOMMU tables
powerpc/fsl: Fix the flush of branch predictor.
powerpc/powernv: Make opal log only readable by root
powerpc/xmon: Fix opcode being uninitialized in print_insn_powerpc
powerpc/powernv: move OPAL call wrapper tracing and interrupt handling to C
powerpc/64s: Fix data interrupts vs d-side MCE reentrancy
powerpc/64s: Prepare to handle data interrupts vs d-side MCE reentrancy
powerpc/64s: system reset interrupt preserve HSRRs
powerpc/64s: Fix HV NMI vs HV interrupt recoverability test
powerpc/mm/hash: Handle mmap_min_addr correctly in get_unmapped_area topdown search
powerpc/hugetlb: Handle mmap_min_addr correctly in get_unmapped_area callback
selftests/powerpc: Remove duplicate header
powerpc sstep: Add support for modsd, modud instructions
...
Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXH+euQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynyTgCfbV8CLums843sBnT8NnWrTMTdTCcAn1K4re0m
ep8g+6oRLxJy414hogxQ
=bLs2
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core patchset for 5.1-rc1
More patches than "normal" here this merge window, due to some work in
the driver core by Alexander Duyck to rework the async probe
functionality to work better for a number of devices, and independant
work from Rafael for the device link functionality to make it work
"correctly".
Also in here is:
- lots of BUS_ATTR() removals, the macro is about to go away
- firmware test fixups
- ihex fixups and simplification
- component additions (also includes i915 patches)
- lots of minor coding style fixups and cleanups.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (65 commits)
driver core: platform: remove misleading err_alloc label
platform: set of_node in platform_device_register_full()
firmware: hardcode the debug message for -ENOENT
driver core: Add missing description of new struct device_link field
driver core: Fix PM-runtime for links added during consumer probe
drivers/component: kerneldoc polish
async: Add cmdline option to specify drivers to be async probed
driver core: Fix possible supplier PM-usage counter imbalance
PM-runtime: Fix __pm_runtime_set_status() race with runtime resume
driver: platform: Support parsing GpioInt 0 in platform_get_irq()
selftests: firmware: fix verify_reqs() return value
Revert "selftests: firmware: remove use of non-standard diff -Z option"
Revert "selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to config"
device: Fix comment for driver_data in struct device
kernfs: Allocating memory for kernfs_iattrs with kmem_cache.
sysfs: remove unused include of kernfs-internal.h
driver core: Postpone DMA tear-down until after devres release
driver core: Document limitation related to DL_FLAG_RPM_ACTIVE
PM-runtime: Take suppliers into account in __pm_runtime_set_status()
device.h: Add __cold to dev_<level> logging functions
...
The Processor Utilzation of Resource Registers (PURR) provide an
estimate of resources used by a cpu thread. Section 7.6 in Book III of
the ISA outlines how to calculate the percentage of shared resources
for threads using the ratio of the PURR delta and Timebase Register
delta for a sampled period.
This calculation is currently done erroneously by the lparstat tool
from the powerpc-utils package. This patch exports the current
timebase value after we sample the PURRs and exposes it to userspace
accounting tools via /proc/ppc64/lparcfg.
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no good reason for this helper, just opencode it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we've switched all the powerpc nommu and swiotlb methods to
use the generic dma_direct_* calls we can remove these ops vectors
entirely and rely on the common direct mapping bypass that avoids
indirect function calls entirely. This also allows to remove a whole
lot of boilerplate code related to setting up these operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The coherent cache version of this function already is functionally
identicall to the default version, and by defining the
arch_dma_coherent_to_pfn hook the same is ture for the noncoherent
version as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the generic iommu bypass code instead of overriding set_dma_mask.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Call dma_get_required_mask_pSeriesLP directly instead of dma_iommu_ops
to simply the code a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
vio_dma_mapping_ops currently does a lot of indirect calls through
dma_iommu_ops, which not only make the code harder to follow but are
also expensive in the post-spectre world. Unwind the indirect calls
by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
just use the dma_iommu_* methods directly where we can.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When binding an SCM volume to a physical address the hypervisor has the
option to return early with a continue token with the expectation that
the guest will resume the bind operation until it completes. A quirk of
this interface is that the bind address will only be returned by the
first bind h-call and the subsequent calls will return
0xFFFF_FFFF_FFFF_FFFF for the bind address.
We currently do not save the address returned by the first h-call. As a
result we will use the junk address as the base of the bound region if
the hypervisor decides to split the bind across multiple h-calls. This
bug was found when testing with very large SCM volumes where the bind
process would take more time than they hypervisor's internal h-call time
limit would allow. This patch fixes the issue by saving the bind address
from the first call.
Cc: stable@vger.kernel.org
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries systems, performing a partition migration can result in
altering the nodes a CPU is assigned to on the destination system. For
exampl, pre-migration on the source system CPUs are in node 1 and 3,
post-migration on the destination system CPUs are in nodes 2 and 3.
Handling the node change for a CPU can cause corruption in the slab
cache if we hit a timing where a CPUs node is changed while cache_reap()
is invoked. The corruption occurs because the slab cache code appears
to rely on the CPU and slab cache pages being on the same node.
The current dynamic updating of a CPUs node done in arch/powerpc/mm/numa.c
does not prevent us from hitting this scenario.
Changing the device tree property update notification handler that
recognizes an affinity change for a CPU to do a full DLPAR remove and
add of the CPU instead of dynamically changing its node resolves this
issue.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Tested-by: Michael W. Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We are trying to get rid of BUS_ATTR() and the usage of that in
ibmebus.c can be trivially converted to use BUS_ATTR_WO(), so use that
instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Adopt nvram module to reduce code duplication. This means CONFIG_NVRAM
becomes available to PPC64 builds. Previously it was only available to
PPC32 builds because it depended on CONFIG_GENERIC_NVRAM.
The IOC_NVRAM_GET_OFFSET ioctl as implemented on PPC64 validates the
offset returned by pmac_get_partition(). Do the same in the nvram module.
Note that the old PPC32 generic_nvram module lacked this test.
So when CONFIG_PPC32 && CONFIG_PPC_PMAC, the IOC_NVRAM_GET_OFFSET ioctl
would have returned 0 (always). But when CONFIG_PPC64 && CONFIG_PPC_PMAC,
the IOC_NVRAM_GET_OFFSET ioctl would have returned -1 (which is -EPERM)
when the requested partition was not found.
With this patch, the result is now -EINVAL on both PPC32 and PPC64 when
the requested PowerMac NVRAM partition is not found. This is a userspace-
visible change, in the non-existent partition case, which would be in
an error path for an IOC_NVRAM_GET_OFFSET ioctl syscall.
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3be2df00e2 ("powerpc/pseries/npu: Enable platform support")
added a call to pnv_npu2_init() in pseries code. This causes a build
break if we build with CONFIG_PPC_PSERIES && !CONFIG_PPC_POWERNV:
powerpc64le-pc-linux-gnu-ld: arch/powerpc/platforms/pseries/pci.o: in function `pSeries_final_fixup':
pci.c:(.init.text+0x1b0): undefined reference to `pnv_npu2_init'
This commit therefore wraps that line in an ifdef, so that pseries
builds without powernv.
Fixes: 3be2df00e2 ("powerpc/pseries/npu: Enable platform support")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[mpe: Frob change log a bit to blame a different commit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
Interface Table), is the first known instance of a memory range
described by a unique "target" proximity domain. Where "initiator" and
"target" proximity domains is an approach that the ACPI HMAT
(Heterogeneous Memory Attributes Table) uses to described the unique
performance properties of a memory range relative to a given initiator
(e.g. CPU or DMA device).
Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
char-device follows the traditional notion of 'numa-node' where the
attribute conveys the closest online numa-node. That numa-node attribute
is useful for cpu-binding and memory-binding processes *near* the
device. However, when the memory range backing a 'pmem', or 'dax' device
is onlined (memory hot-add) the memory-only-numa-node representing that
address needs to be differentiated from the set of online nodes. In
other words, the numa-node association of the device depends on whether
you can bind processes *near* the cpu-numa-node in the offline
device-case, or bind process *on* the memory-range directly after the
backing address range is onlined.
Allow for the case that platform firmware describes persistent memory
with a unique proximity domain, i.e. when it is distinct from the
proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
numa-node translation of that proximity through the libnvdimm region
device to namespaces that are in device-dax mode. With this in place the
proposed kmem driver [1] can optionally discover a unique numa-node
number for the address range as it transitions the memory from an
offline state managed by a device-driver to an online memory range
managed by the core-mm.
[1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.com
Reported-by: Fan Du <fan.du@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
The jump label is controlled by HAVE_JUMP_LABEL, which is defined
like this:
#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
# define HAVE_JUMP_LABEL
#endif
We can improve this by testing 'asm goto' support in Kconfig, then
make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
match to the real kernel capability.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries
by Christoph Hellwig.
Currently, every architecture that wants to provide common peripheral
busses needs to add some boilerplate code and include the right Kconfig
files. This series instead just selects the presence (when needed) and
then handles everything in the bus-specific Kconfig file under drivers/.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJcJilwAAoJED2LAQed4NsGt1YP/RMTEUqbCSwS/CnTLrE+aVTC
O2aWwB80ZlVwpeBbHLW5/M88OvOev0UaCr+gyzgpFRl5ITzS7Jevb8VbpGzblbH7
bFxIEyZFGQiy9oEWw3Lfu9JRSsLm3jNo7hkmdBSn2Rw3KkEd/YF7K3q9GuA7BpCS
ZxAirebvEpr4KYEzkuc57NqCYx2Tc8G+JWr5D7pZCFaq9vxYt3TddGqw/c7iQVSQ
1Og1809IdhGyCSlA/ExfaqaBMaJHMRAOHX5GgkqZw1EbFcizUFhAAsKCrGL5nBtX
NiWF9jhgHR1M+L69jfctOstrmGQD2KicNgWQf1aS5RQkPfjuqIKGT/i9g6J1pVyX
TaW1J36Hcl8PpsKoPBnnrixd1T41O3/PuqtEJRm7LCBYOQiwS9sEmLO09RDRjER8
SPAAyvkhE8oq+0RHiTYN4tm8dyJc1djZ5wzgLnwFPAnU6SR+mbN02RzBMsYZXD+x
RNbBSGBRJFQDBw6Rn+ktcIQvcKYmUqe1k1YNHMy6kG3QqvhBaDy+8PA/YjIKPQYQ
B/NNUAMEJMys1OQrRL2UDXb2ysaCpzwMmlrBW2IwYsQrX5OwbPkNuQ5Mbe1Lr+mc
4NXR+HubvojsHaAby+OhFbrUX2Jcz3wqYj7aannb9sMRmw0VJXV5dPYUqje3ZhPS
P2AovKT8O9nWsEttqER5
=WxId
-----END PGP SIGNATURE-----
Merge tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kconfig file consolidation from Masahiro Yamada:
"Consolidation of bus (PCI, PCMCIA, EISA, RapidIO) config entries by
Christoph Hellwig.
Currently, every architecture that wants to provide common peripheral
busses needs to add some boilerplate code and include the right
Kconfig files. This series instead just selects the presence (when
needed) and then handles everything in the bus-specific Kconfig file
under drivers/"
* tag 'kconfig-v4.21-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
pcmcia: remove per-arch PCMCIA config entry
eisa: consolidate EISA Kconfig entry in drivers/eisa
rapidio: consolidate RAPIDIO config entry in drivers/rapidio
pcmcia: allow PCMCIA support independent of the architecture
PCI: consolidate the PCI_SYSCALL symbol
PCI: consolidate the PCI_DOMAINS and PCI_DOMAINS_GENERIC config options
PCI: consolidate PCI config entry in drivers/pci
MIPS: remove the HT_PCI config option
Pull Devicetree updates from Rob Herring:
"The biggest highlight here is the start of using json-schema for DT
bindings. Being able to validate bindings has been discussed for years
with little progress.
- Initial support for DT bindings using json-schema language. This is
the start of converting DT bindings from free-form text to a
structured format.
- Reworking of initrd address initialization. This moves to using the
phys address instead of virt addr in the DT parsing code. This
rework was motivated by CONFIG_DEV_BLK_INITRD causing unnecessary
rebuilding of lots of files.
- Fix stale phandle entries in phandle cache
- DT overlay validation improvements. This exposed several memory
leak bugs which have been fixed.
- Use node name and device_type helper functions in DT code
- Last remaining conversions to using %pOFn printk specifier instead
of device_node.name directly
- Create new common RTC binding doc and move all trivial RTC devices
out of trivial-devices.txt.
- New bindings for Freescale MAG3110 magnetometer, Cadence Sierra
PHY, and Xen shared memory
- Update dtc to upstream version v1.4.7-57-gf267e674d145"
* tag 'devicetree-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (68 commits)
of: __of_detach_node() - remove node from phandle cache
of: of_node_get()/of_node_put() nodes held in phandle cache
gpio-omap.txt: add reg and interrupts properties
dt-bindings: mrvl,intc: fix a trivial typo
dt-bindings: iio: magnetometer: add dt-bindings for freescale mag3110
dt-bindings: Convert trivial-devices.txt to json-schema
dt-bindings: arm: mrvl: amend Browstone compatible string
dt-bindings: arm: Convert Tegra board/soc bindings to json-schema
dt-bindings: arm: Convert ZTE board/soc bindings to json-schema
dt-bindings: arm: Add missing Xilinx boards
dt-bindings: arm: Convert Xilinx board/soc bindings to json-schema
dt-bindings: arm: Convert VIA board/soc bindings to json-schema
dt-bindings: arm: Convert ST STi board/soc bindings to json-schema
dt-bindings: arm: Convert SPEAr board/soc bindings to json-schema
dt-bindings: arm: Convert CSR SiRF board/soc bindings to json-schema
dt-bindings: arm: Convert QCom board/soc bindings to json-schema
dt-bindings: arm: Convert TI nspire board/soc bindings to json-schema
dt-bindings: arm: Convert TI davinci board/soc bindings to json-schema
dt-bindings: arm: Convert Calxeda board/soc bindings to json-schema
dt-bindings: arm: Convert Altera board/soc bindings to json-schema
...
Merge misc updates from Andrew Morton:
- large KASAN update to use arm's "software tag-based mode"
- a few misc things
- sh updates
- ocfs2 updates
- just about all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
memcg, oom: notify on oom killer invocation from the charge path
mm, swap: fix swapoff with KSM pages
include/linux/gfp.h: fix typo
mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
memory_hotplug: add missing newlines to debugging output
mm: remove __hugepage_set_anon_rmap()
include/linux/vmstat.h: remove unused page state adjustment macro
mm/page_alloc.c: allow error injection
mm: migrate: drop unused argument of migrate_page_move_mapping()
blkdev: avoid migration stalls for blkdev pages
mm: migrate: provide buffer_migrate_page_norefs()
mm: migrate: move migrate_page_lock_buffers()
mm: migrate: lock buffers before migrate_page_move_mapping()
mm: migration: factor out code to compute expected number of page references
mm, page_alloc: enable pcpu_drain with zone capability
kmemleak: add config to select auto scan
mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
...
A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for architectures
that have devices that perform DMA that is not cache coherent. Based
on the existing arm64 implementation and also used for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel data
leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov
ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb
j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer
TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj
JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG
8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl
xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs
HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF
JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq
apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR
XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ
WgA=
=in72
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
totalram_pages and totalhigh_pages are made static inline function.
Main motivation was that managed_page_count_lock handling was complicating
things. It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.
[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert string compares of DT node names to use of_node_name_eq helper
instead. This removes direct access to the node name pointer.
A couple of open coded iterating thru the child node names are converted
to use for_each_child_of_node() instead.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to remove the node name pointer from struct
device_node, convert printf users to use the %pOFn format specifier.
pmem.c was recently added and missed the initial conversion.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In update_lmb_associativity_index() we lookup dr_node using
of_find_node_by_path() which takes a reference for us. In the
non-error case we forget to drop the reference. Note that
find_aa_index() does modify properties of the node, but doesn't need
an extra reference held once it's returned.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powernv platform registers IOMMU groups and adds devices to them
from the pci_controller_ops::setup_bridge() hook except one case when
virtual functions (SRIOV VFs) are added from a bus notifier.
The pseries platform registers IOMMU groups from
the pci_controller_ops::dma_bus_setup() hook and adds devices from
the pci_controller_ops::dma_dev_setup() hook. The very same bus notifier
used for powernv does not add devices for pseries though as
__of_scan_bus() adds devices first, then it does the bus/dev DMA setup.
Both platforms use iommu_add_device() which takes a device and expects
it to have a valid IOMMU table struct with an iommu_table_group pointer
which in turn points the iommu_group struct (which represents
an IOMMU group). Although the helper seems easy to use, it relies on
some pre-existing device configuration and associated data structures
which it does not really need.
This simplifies iommu_add_device() to take the table_group pointer
directly. Pseries already has a table_group pointer handy and the bus
notified is not used anyway. For powernv, this copies the existing bus
notifier, makes it work for powernv only which means an easy way of
getting to the table_group pointer. This was tested on VFs but should
also support physical PCI hotplug.
Since iommu_add_device() receives the table_group pointer directly,
pseries does not do TCE cache invalidation (the hypervisor does) nor
allow multiple groups per a VFIO container (in other words sharing
an IOMMU table between partitionable endpoints), this removes
iommu_table_group_link from pseries.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The pci_dma_bus_setup_pSeries and pci_dma_dev_setup_pSeries hooks are
registered for the pseries platform which does not have FW_FEATURE_LPAR;
these would be pre-powernv platforms which we never supported PCI pass
through for anyway so remove it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We already changed NPU API for GPUs to not to call OPAL and the remaining
bit is initializing NPU structures.
This searches for POWER9 NVLinks attached to any device on a PHB and
initializes an NPU structure if any found.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We might have memory@ nodes with "linux,usable-memory" set to zero
(for example, to replicate powernv's behaviour for GPU coherent memory)
which means that the memory needs an extra initialization but since
it can be used afterwards, the pseries platform will try mapping it
for DMA so the DMA window needs to cover those memory regions too;
if the window cannot cover new memory regions, the memory onlining fails.
This walks through the memory nodes to find the highest RAM address to
let a huge DMA window cover that too in case this memory gets onlined
later.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.
However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge our fixes branch again, this has a couple of build fixes and also
a change to do_syscall_trace_enter() that will conflict with a patch we
want to apply in next.
The interleave set cookie is used to determine if a label stored in the
metadata space should be applied to the current region. This is
important in the case of NVDIMMs since the firmware may change the
interleaving configuration of a DIMM which would invalidate the existing
labels. In our case the hypervisor hides those details from us so we
don't really care, but libnvdimm still requires the interleave set
cookie to be non-zero.
For our purposes we just need the set cookie to be unique and fixed for
a given PAPR SCM region and using the unit-guid (really a UUID) is fine
for this purpose.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When a new nvdimm device is registered with libnvdimm via
nvdimm_create() it is added as a device on the nvdimm bus. The probe
function for the DIMM driver is potentially quite slow so actually
registering and probing the device is done in an async domain rather
than immediately after device creation. This can result in a race where
the region device (created 2nd) is probed first and fails to activate at
boot.
To fix this we use the same approach as the ACPI/NFIT driver which is to
check that all the DIMM devices registered successfully. LibNVDIMM
provides the nvdimm_bus_count_dimms() function which synchronises with
the async domain and verifies that the dimm was successfully registered
with the bus.
If either of these does not occur then we bail.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The return values of a h-call are returned in the CPU registers and
written to the provided buffer by the plpar_hcall() wrapper. As a result
the values written to memory are always in the native endian and should
not be byte swapped.
The inital implementation of the H-Call interface was done in qemu and
the returned values were byte swapped unnecessarily in both the
hypervisor and in the driver so this was only noticed when bringing up
the PowerVM implementation.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The ibm,unit-sizes property was originally specified as an array of two
u32s corresponding to the memory block size, and the number of blocks
available in that region. A fairly last-minute change to the SCM DT
specification was splitting that into two seperate u64 properties:
ibm,block-sizes and ibm,number-of-blocks that convey the same
information. No firmware / hypervisor that emitted the ibm,unit-size
property ever appeared in the wild.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u32/u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix an off-by-one error in the memory resource range. This resource is
used to determine the address range of the memory to be hot-plugged as
ZONE_DEVICE memory. The current end address results in the kernel
attempting to map an additional memblock and the hypervisor may reject
the mapping resulting in the entire hot-plug failing.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Making PAPR_SCM select LIBNVDIMM results in circular dependencies in
Kconfig when another symbol depends on it. Fix this by replacing the
select with a depends.
Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Reported-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powerpc iommu code already returns (~(dma_addr_t)0x0) on mapping
failures, so we can switch over to returning DMA_MAPPING_ERROR and let
the core dma-mapping code handle the rest.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove directly accessing device_node.type pointer and use the
accessors instead. This will eventually allow removing the type
pointer.
Replace the open coded iterating over child nodes with
for_each_child_of_node() while we're here.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There is no good reason to duplicate the PCI menu in every architecture.
Instead provide a selectable HAVE_PCI symbol that indicates availability
of PCI support, and a FORCE_PCI symbol to for PCI on and the handle the
rest in drivers/pci.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The previous commit, "of: overlay: add missing of_node_get() in
__of_attach_node_sysfs" added a missing of_node_get() to
__of_attach_node_sysfs(). This results in a refcount imbalance
for nodes attached with dlpar_attach_node(). The calling sequence
from dlpar_attach_node() to __of_attach_node_sysfs() is:
dlpar_attach_node()
of_attach_node()
__of_attach_node_sysfs()
For more detailed description of the node refcount, see
commit 68baf692c4 ("powerpc/pseries: Fix of_node_put() underflow
during DLPAR remove").
Tested-by: Alan Tull <atull@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Frank Rowand <frank.rowand@sony.com>
Some things that I missed due to travel, or that came in late.
Two fixes also going to stable:
- A revert of a buggy change to the 8xx TLB miss handlers.
- Our flushing of SPE (Signal Processing Engine) registers on fork was broken.
Other changes:
- A change to the KVM decrementer emulation to use proper APIs.
- Some cleanups to the way we do code patching in the 8xx code.
- Expose the maximum possible memory for the system in /proc/powerpc/lparcfg.
- Merge some updates from Scott: "a couple device tree updates, and a fix for a
missing prototype warning."
A few other minor fixes and a handful of fixes for our selftests.
Thanks to:
Aravinda Prasad, Breno Leitao, Camelia Groza, Christophe Leroy, Felipe Rechia,
Joel Stanley, Naveen N. Rao, Paul Mackerras, Scott Wood, Tyrel Datwyler.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJb3C+uAAoJEFHr6jzI4aWAPJgQAIX0aD/PiYfEUI/rm/Q0vnJI
HO3FCKroi+LVF/URU24+NLA/1KGCBfO9by9m6D/nnmHl+vi2P69fFgokywO0Ajru
nf9a+9Gx53IbO7EEUf1fZVwxCMobBqU8eWq1hIBndd5HTz9QEftc/RXgpgqZcQ/x
x8xN1FNMSUT9NwMk750QDO7CFBrSfSjFC+/WrkViBaMiRWx2rwle+2tDipQ/fegY
Tsu4wg0qEzWMT//MFP4yOlkcLV8M6d2Sw65Km59rWHA1I2wqsTek1yQ68Epo9Co0
RdJh9Nt1kjLC5XXteneFhe18UUPKRmrXbYDFByw5CUhs5VI99Dq4w5kamh197XLr
+jA3XHAeAyaXf21I9zmmZXbhHanowCPZGyzZqZXWJ86bVJp5v328wXmnxtKrb0Nz
pH7fjQ6zjzsZgIcN9i2CFpIvuDQ/z1A/QyHdBnRvJ8HoXlTerZCn22JTgY7d2VJu
XJn1n+VABG2BrzJexW/7quY3Z7V6tvdkloWwOA3PdAwkcoImd4BfYq9K2DU//diN
BnXPnDs2K7JwDG9s+cgUEHnrP6DOKsxT+mmYpqXf0Ta0wtyoZJ1zgSdfZAUOmjnb
MhcK46l+F4E891qjnsuuVNnspqI4yPMLmAGmife5OUrfcoFdm/vFbM75FaXameBx
cMOmidrJJhO6z5eWSKvO
=VCkW
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some things that I missed due to travel, or that came in late.
Two fixes also going to stable:
- A revert of a buggy change to the 8xx TLB miss handlers.
- Our flushing of SPE (Signal Processing Engine) registers on fork
was broken.
Other changes:
- A change to the KVM decrementer emulation to use proper APIs.
- Some cleanups to the way we do code patching in the 8xx code.
- Expose the maximum possible memory for the system in
/proc/powerpc/lparcfg.
- Merge some updates from Scott: "a couple device tree updates, and a
fix for a missing prototype warning"
A few other minor fixes and a handful of fixes for our selftests.
Thanks to: Aravinda Prasad, Breno Leitao, Camelia Groza, Christophe
Leroy, Felipe Rechia, Joel Stanley, Naveen N. Rao, Paul Mackerras,
Scott Wood, Tyrel Datwyler"
* tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
selftests/powerpc: Fix compilation issue due to asm label
selftests/powerpc/cache_shape: Fix out-of-tree build
selftests/powerpc/switch_endian: Fix out-of-tree build
selftests/powerpc/pmu: Link ebb tests with -no-pie
selftests/powerpc/signal: Fix out-of-tree build
selftests/powerpc/ptrace: Fix out-of-tree build
powerpc/xmon: Relax frame size for clang
selftests: powerpc: Fix warning for security subdir
selftests/powerpc: Relax L1d miss targets for rfi_flush test
powerpc/process: Fix flush_all_to_thread for SPE
powerpc/pseries: add missing cpumask.h include file
selftests/powerpc: Fix ptrace tm failure
KVM: PPC: Use exported tb_to_ns() function in decrementer emulation
powerpc/pseries: Export maximum memory value
powerpc/8xx: Use patch_site for perf counters setup
powerpc/8xx: Use patch_site for memory setup patching
powerpc/code-patching: Add a helper to get the address of a patch_site
Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
powerpc/8xx: add missing header in 8xx_mmu.c
powerpc/8xx: Add DT node for using the SEC engine of the MPC885
...
add_memory() currently does not take the device_hotplug_lock, however
is aleady called under the lock from
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
to synchronize against CPU hot-remove and similar.
In general, we should hold the device_hotplug_lock when adding memory to
synchronize against online/offline request (e.g. from user space) - which
already resulted in lock inversions due to device_lock() and
mem_hotplug_lock - see 30467e0b3b ("mm, hotplug: fix concurrent memory
hot-add deadlock"). add_memory()/add_memory_resource() will create memory
block devices, so this really feels like the right thing to do.
Holding the device_hotplug_lock makes sure that a memory block device
can really only be accessed (e.g. via .online/.state) from user space,
once the memory has been fully added to the system.
The lock is not held yet in
drivers/xen/balloon.c
arch/powerpc/platforms/powernv/memtrace.c
drivers/s390/char/sclp_cmd.c
drivers/hv/hv_balloon.c
So, let's either use the locked variants or take the lock.
Don't export add_memory_resource(), as it once was exported to be used by
XEN, which is never built as a module. If somebody requires it, we also
have to export a locked variant (as device_hotplug_lock is never
exported).
Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.
Reading through the code and studying how mem_hotplug_lock is to be used,
I noticed that there are two places where we can end up calling
device_online()/device_offline() - online_pages()/offline_pages() without
the mem_hotplug_lock. And there are other places where we call
device_online()/device_offline() without the device_hotplug_lock.
While e.g.
echo "online" > /sys/devices/system/memory/memory9/state
is fine, e.g.
echo 1 > /sys/devices/system/memory/memory9/online
Will not take the mem_hotplug_lock. However the device_lock() and
device_hotplug_lock.
E.g. via memory_probe_store(), we can end up calling
add_memory()->online_pages() without the device_hotplug_lock. So we can
have concurrent callers in online_pages(). We e.g. touch in
online_pages() basically unprotected zone->present_pages then.
Looks like there is a longer history to that (see Patch #2 for details),
and fixing it to work the way it was intended is not really possible. We
would e.g. have to take the mem_hotplug_lock in device/base/core.c, which
sounds wrong.
Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
More details can be found in patch 3 and patch 6.
I propose the general rules (documentation added in patch 6):
1. add_memory/add_memory_resource() must only be called with
device_hotplug_lock.
2. remove_memory() must only be called with device_hotplug_lock. This is
already documented and holds for all callers.
3. device_online()/device_offline() must only be called with
device_hotplug_lock. This is already documented and true for now in core
code. Other callers (related to memory hotplug) have to be fixed up.
4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
online_pages/offline_pages.
To me, this looks way cleaner than what we have right now (and easier to
verify). And looking at the documentation of remove_memory, using
lock_device_hotplug also for add_memory() feels natural.
This patch (of 6):
remove_memory() is exported right now but requires the
device_hotplug_lock, which is not exported. So let's provide a variant
that takes the lock and only export that one.
The lock is already held in
arch/powerpc/platforms/pseries/hotplug-memory.c
drivers/acpi/acpi_memhotplug.c
arch/powerpc/platforms/powernv/memtrace.c
Apart from that, there are not other users in the tree.
Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Rashmica Gupta <rashmica.g@gmail.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch exports the maximum possible amount of memory
configured on the system via /proc/powerpc/lparcfg.
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch exports the raw per-CPU VPA data via debugfs.
A per-CPU file is created which exports the VPA data of
that CPU to help debug some of the VPA related issues or
to analyze the per-CPU VPA related statistics.
v3: Removed offline CPU check.
v2: Included offline CPU check and other review comments.
Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Adds a driver that implements support for enabling and accessing PAPR
SCM regions. Unfortunately due to how the PAPR interface works we can't
use the existing of_pmem driver (yet) because:
a) The guest is required to use the H_SCM_BIND_MEM h-call to add
add the SCM region to it's physical address space, and
b) There is currently no mechanism for relating a bare of_pmem region
to the backing DIMM (or not-a-DIMM for our case).
Both of these are easily handled by rolling the functionality into a
seperate driver so here we are...
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch implements support for discovering storage class memory
devices at boot and for handling hotplug of new regions via RTAS
hotplug events.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Fix CONFIG_MEMORY_HOTPLUG=n build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The powerpc mobility code may receive RTAS requests to perform PRRN
(Platform Resource Reassignment Notification) topology changes at any
time, including during LPAR migration operations.
In some configurations where the affinity of CPUs or memory is being
changed on that platform, the PRRN requests may apply or refer to
outdated information prior to the complete update of the device-tree.
This patch changes the duration for which topology updates are
suppressed during LPAR migrations from just the rtas_ibm_suspend_me()
/ 'ibm,suspend-me' call(s) to cover the entire migration_store()
operation to allow all changes to the device-tree to be applied prior
to accepting and applying any PRRN requests.
For tracking purposes, pr_info notices are added to the functions
start_topology_update() and stop_topology_update() of 'numa.c'.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The variable 'aa_index' is defined as an unsigned value in
update_lmb_associativity_index(), but find_aa_index() may return -1
when dlpar_clone_property() fails. So change find_aa_index() to return
a bool, which indicates whether 'aa_index' was found or not.
Fixes: c05a5a4096 ("powerpc/pseries: Dynamic add entires to associativity lookup array")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Nathan Fontenot nfont@linux.vnet.ibm.com>
[mpe: Tweak changelog, rename is_found to just found]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The wait_state member of eeh_ops does not need to be platform
dependent; it's just logic around eeh_ops.get_state(). Therefore,
merge the two (slightly different!) platform versions into a new
function, eeh_wait_state() and remove the eeh_ops member.
While doing this, also correct:
* The wait logic, so that it never waits longer than max_wait.
* The wait logic, so that it never waits less than
EEH_STATE_MIN_WAIT_TIME.
* One call site where the result is treated like a bit field before
it's checked for negative error values.
* In pseries_eeh_get_state(), rename the "state" parameter to "delay"
because that's what it is.
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instances of struct eeh_pe are placed in a tree structure using the
fields "child_list" and "child", so place these next to each other
in the definition.
The field "child" is a list entry, so remove the unnecessary and
misleading use of the list initializer, LIST_HEAD(), on it.
The eeh_dev struct contains two list entry fields, called "list" and
"rmv_list". Rename them to "entry" and "rmv_entry" and, as above, stop
initializing them with LIST_HEAD().
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently a flag, EEH_POSTPONED_PROBE, is used to prevent an incorrect
message "EEH: No capable adapters found" from being displayed during
the boot of powernv systems.
It is necessary because, on powernv, the call to eeh_probe_devices()
made from eeh_init() is too early and EEH can't yet be enabled. A
second call is made later from eeh_pnv_post_init(), which succeeds.
(On pseries, the first call succeeds because PCI devices are set up
early enough and no second call is made.)
This can be simplified by moving the early call to eeh_probe_devices()
from eeh_init() (where it's seen by both platforms) to
pSeries_final_fixup(), so that each platform only calls
eeh_probe_devices() once, at a point where it can succeed.
This is slightly later in the boot sequence, but but still early
enough and it is now in the same place in the sequence for both
platforms (the pcibios_fixup hook).
The display of the message can be cleaned up as well, by moving it
into eeh_probe_devices().
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.
Also since commit f467c5640c ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:
...
One side effect of (and the main motivation for) this change is making
the following two definitions behave exactly the same:
config FOO
bool
config FOO
bool
default n
With this change, neither of these will generate a
'# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
That might make it clearer to people that a bare 'default n' is
redundant.
...
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we look up dtl_idx in
the lppaca to determine the number of entries in the buffer. Since
lppaca is in big endian, we need to do an endian conversion before using
this in our calculation to determine the number of entries in the
buffer. Without this, we do not iterate over the existing entries in the
DTL buffer properly.
Fixes: 7c105b63bd ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not set, we register the DTL
buffer for a cpu when the associated file under powerpc/dtl in debugfs
is opened. When doing so, we need to set the size of the buffer being
registered in the second u32 word of the buffer. This needs to be in big
endian, but we are not doing the conversion resulting in the below error
showing up in dmesg:
dtl_start: DTL registration for cpu 0 (hw 0) failed with -4
Fix this in the obvious manner.
Fixes: 7c105b63bd ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to remove the node name pointer from struct device_node,
convert printf users to use the %pOFn format specifier.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of calling both of_irq_parse_one() and
irq_create_of_mapping(), call of_irq_get() instead which does
essentially the same thing. of_irq_get() also calls irq_find_host()
for deferred probe support, but this should be fine as
irq_create_of_mapping() also calls that internally. This gets us
closer to making the former 2 functions static.
In the process of simplifying request_event_sources_irqs(), combine
the the pr_err() and WARN_ON() calls to just a WARN().
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In lparcfg_write we hard code kbuf_sz and then use this as the variable
length of kbuf creating a variable length array. Since we're hard coding
the length anyway just define the array using this as the length and
remove the need for kbuf_sz, thus removing the variable length array.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are three instances in which dlpar hotplug events are invoked;
handling a hotplug interrupt (in a kvm guest), handling a dlpar
request through sysfs, and updating LMB affinity when handling a
PRRN event. Only in the case of handling a hotplug interrupt do we
have to put the work on a workqueue, the other cases can handle the
dlpar request directly.
This patch exports the handle_dlpar_errorlog() function so that
dlpar hotplug events can be handled directly and updates the two
instances mentioned above to use the direct invocation.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The updates to powerpc numa and memory hotplug code now use the
in-kernel LMB array instead of the device tree. This change allows the
pseries memory DLPAR code to only update the device tree once after
successfully handling a DLPAR request.
Prior to the in-kernel LMB array, the numa code looked up the affinity
for memory being added in the device tree, the code now looks this up
in the LMB array. This change means the memory hotplug code can just
update the affinity for an LMB in the LMB array instead of updating
the device tree.
This also provides a savings in kernel memory. When updating the
device tree old properties are never free'ed since there is no
usecount on properties. This behavior leads to a new copy of the
property being allocated every time a LMB is added or removed (i.e. a
request to add 100 LMBs creates 100 new copies of the property). With
this update only a single new property is created when a DLPAR request
completes successfully.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Extract the MCE error details from RTAS extended log and display it to
console.
With this patch you should now see mce logs like below:
[ 142.371818] Severe Machine check interrupt [Recovered]
[ 142.371822] NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
[ 142.371822] Initiator: CPU
[ 142.371823] Error type: SLB [Multihit]
[ 142.371824] Effective address: d00000000ca70000
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries, as of today system crashes if we get a machine check
exceptions due to SLB errors. These are soft errors and can be fixed
by flushing the SLBs so the kernel can continue to function instead of
system crash. We do this in real mode before turning on MMU. Otherwise
we would run into nested machine checks. This patch now fetches the
rtas error log in real mode and flushes the SLBs on SLB/ERAT errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On pseries, the machine check error details are part of RTAS extended
event log passed under Machine check exception section. This patch adds
the definition of rtas MCE event section and related helper
functions.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This hypervisor's call allows to remove up to 8 ptes with only call to
tlbie.
The virtual pages must be all within the same naturally aligned 8 pages
virtual address block and have the same page and segment size encodings.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This part of code will be called also when dealing with H_BLOCK_REMOVE.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This feature tells if the hcall H_BLOCK_REMOVE is available.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Notable changes:
- A fix for a bug in our page table fragment allocator, where a page table page
could be freed and reallocated for something else while still in use, leading
to memory corruption etc. The fix reuses pt_mm in struct page (x86 only) for
a powerpc only refcount.
- Fixes to our pkey support. Several are user-visible changes, but bring us in
to line with x86 behaviour and/or fix outright bugs. Thanks to Florian Weimer
for reporting many of these.
- A series to improve the hvc driver & related OPAL console code, which have
been seen to cause hardlockups at times. The hvc driver changes in particular
have been in linux-next for ~month.
- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in use
anywhere other than as a paper weight.
- An optimised memcmp implementation using Power7-or-later VMX instructions
- Support for barrier_nospec on some NXP CPUs.
- Support for flushing the count cache on context switch on some IBM CPUs
(controlled by firmware), as a Spectre v2 mitigation.
- A series to enhance the information we print on unhandled signals to bring it
into line with other arches, including showing the offending VMA and dumping
the instructions around the fault.
Thanks to:
Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Alexey
Spirkov, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar,
Arnd Bergmann, Bartosz Golaszewski, Benjamin Herrenschmidt, Bharat Bhushan,
Bjoern Noetel, Boqun Feng, Breno Leitao, Bryant G. Ly, Camelia Groza,
Christophe Leroy, Christoph Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt,
Darren Stevens, Dave Young, David Gibson, Diana Craciun, Finn Thain, Florian
Weimer, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel Stanley,
Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh Salgaonkar, Markus
Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues, Michael Hanselmann, Michael
Neuling, Michael Schmitz, Mukesh Ojha, Murilo Opsfelder Araujo, Nicholas
Piggin, Parth Y Shah, Paul Mackerras, Paul Menzel, Ram Pai, Randy Dunlap,
Rashmica Gupta, Reza Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff,
Scott Wood, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson,
Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat Rao
B, zhong jiang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAlt2O6cTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgC7hD/4+cj796Df7GsVsIMxzQm7SS9dklIdO
JuKj2Nr5HRzTH59jWlXukLG9mfTNCFgFJB4gEpK1ArDOTcHTCI9RRsLZTZ/kum66
7Pd+7T40dLYXB5uecuUs0vMXa2fI3syKh1VLzACSXv3Dh9BBIKQBwW/aD2eww4YI
1fS5LnXZ2PSxfr6KNAC6ogZnuaiD0sHXOYrtGHq+S/TFC7+Z6ySa6+AnPS+hPVoo
/rHDE1Khr66aj7uk+PP2IgUrCFj6Sbj6hTVlS/iAuwbMjUl9ty6712PmvX9x6wMZ
13hJQI+g6Ci+lqLKqmqVUpXGSr6y4NJGPS/Hko4IivBTJApI+qV/tF2H9nxU+6X0
0RqzsMHPHy13n2torA1gC7ttzOuXPI4hTvm6JWMSsfmfjTxLANJng3Dq3ejh6Bqw
76EMowpDLexwpy7/glPpqNdsP4ySf2Qm8yq3mR7qpL4m3zJVRGs11x+s5DW8NKBL
Fl5SqZvd01abH+sHwv6NLaLkEtayUyohxvyqu2RU3zu5M5vi7DhqstybTPjKPGu0
icSPh7b2y10WpOUpC6lxpdi8Me8qH47mVc/trZ+SpgBrsuEmtJhGKszEnzRCOqos
o2IhYHQv3lQv86kpaAFQlg/RO+Lv+Lo5qbJ209V+hfU5nYzXpEulZs4dx1fbA+ze
fK8GEh+u0L4uJg==
=PzRz
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- A fix for a bug in our page table fragment allocator, where a page
table page could be freed and reallocated for something else while
still in use, leading to memory corruption etc. The fix reuses
pt_mm in struct page (x86 only) for a powerpc only refcount.
- Fixes to our pkey support. Several are user-visible changes, but
bring us in to line with x86 behaviour and/or fix outright bugs.
Thanks to Florian Weimer for reporting many of these.
- A series to improve the hvc driver & related OPAL console code,
which have been seen to cause hardlockups at times. The hvc driver
changes in particular have been in linux-next for ~month.
- Increase our MAX_PHYSMEM_BITS to 128TB when SPARSEMEM_VMEMMAP=y.
- Remove Power8 DD1 and Power9 DD1 support, neither chip should be in
use anywhere other than as a paper weight.
- An optimised memcmp implementation using Power7-or-later VMX
instructions
- Support for barrier_nospec on some NXP CPUs.
- Support for flushing the count cache on context switch on some IBM
CPUs (controlled by firmware), as a Spectre v2 mitigation.
- A series to enhance the information we print on unhandled signals
to bring it into line with other arches, including showing the
offending VMA and dumping the instructions around the fault.
Thanks to: Aaro Koskinen, Akshay Adiga, Alastair D'Silva, Alexey
Kardashevskiy, Alexey Spirkov, Alistair Popple, Andrew Donnellan,
Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Bartosz Golaszewski,
Benjamin Herrenschmidt, Bharat Bhushan, Bjoern Noetel, Boqun Feng,
Breno Leitao, Bryant G. Ly, Camelia Groza, Christophe Leroy, Christoph
Hellwig, Cyril Bur, Dan Carpenter, Daniel Klamt, Darren Stevens, Dave
Young, David Gibson, Diana Craciun, Finn Thain, Florian Weimer,
Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven, Geoff Levand,
Guenter Roeck, Gustavo Romero, Haren Myneni, Hari Bathini, Joel
Stanley, Jonathan Neuschäfer, Kees Cook, Madhavan Srinivasan, Mahesh
Salgaonkar, Markus Elfring, Mathieu Malaterre, Mauro S. M. Rodrigues,
Michael Hanselmann, Michael Neuling, Michael Schmitz, Mukesh Ojha,
Murilo Opsfelder Araujo, Nicholas Piggin, Parth Y Shah, Paul
Mackerras, Paul Menzel, Ram Pai, Randy Dunlap, Rashmica Gupta, Reza
Arbab, Rodrigo R. Galvao, Russell Currey, Sam Bobroff, Scott Wood,
Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stan Johnson, Thiago
Jung Bauermann, Tyrel Datwyler, Vaibhav Jain, Vasant Hegde, Venkat
Rao, zhong jiang"
* tag 'powerpc-4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (234 commits)
powerpc/mm/book3s/radix: Add mapping statistics
powerpc/uaccess: Enable get_user(u64, *p) on 32-bit
powerpc/mm/hash: Remove unnecessary do { } while(0) loop
powerpc/64s: move machine check SLB flushing to mm/slb.c
powerpc/powernv/idle: Fix build error
powerpc/mm/tlbflush: update the mmu_gather page size while iterating address range
powerpc/mm: remove warning about ‘type’ being set
powerpc/32: Include setup.h header file to fix warnings
powerpc: Move `path` variable inside DEBUG_PROM
powerpc/powermac: Make some functions static
powerpc/powermac: Remove variable x that's never read
cxl: remove a dead branch
powerpc/powermac: Add missing include of header pmac.h
powerpc/kexec: Use common error handling code in setup_new_fdt()
powerpc/xmon: Add address lookup for percpu symbols
powerpc/mm: remove huge_pte_offset_and_shift() prototype
powerpc/lib: Use patch_site to patch copy_32 functions once cache is enabled
powerpc/pseries: Fix endianness while restoring of r3 in MCE handler.
powerpc/fadump: merge adjacent memory ranges to reduce PT_LOAD segements
powerpc/fadump: handle crash memory ranges array index overflow
...
During Machine Check interrupt on pseries platform, register r3 points
RTAS extended event log passed by hypervisor. Since hypervisor uses r3
to pass pointer to rtas log, it stores the original r3 value at the
start of the memory (first 8 bytes) pointed by r3. Since hypervisor
stores this info and rtas log is in BE format, linux should make
sure to restore r3 value in correct endian format.
Without this patch when MCE handler, after recovery, returns to code that
that caused the MCE may end up with Data SLB access interrupt for invalid
address followed by kernel panic or hang.
Severe Machine check interrupt [Recovered]
NIP [d00000000ca301b8]: init_module+0x1b8/0x338 [bork_kernel]
Initiator: CPU
Error type: SLB [Multihit]
Effective address: d00000000ca70000
cpu 0xa: Vector: 380 (Data SLB Access) at [c0000000fc7775b0]
pc: c0000000009694c0: vsnprintf+0x80/0x480
lr: c0000000009698e0: vscnprintf+0x20/0x60
sp: c0000000fc777830
msr: 8000000002009033
dar: a803a30c000000d0
current = 0xc00000000bc9ef00
paca = 0xc00000001eca5c00 softe: 3 irq_happened: 0x01
pid = 8860, comm = insmod
vscnprintf+0x20/0x60
vprintk_emit+0xb4/0x4b0
vprintk_func+0x5c/0xd0
printk+0x38/0x4c
init_module+0x1c0/0x338 [bork_kernel]
do_one_initcall+0x54/0x230
do_init_module+0x8c/0x248
load_module+0x12b8/0x15b0
sys_finit_module+0xa8/0x110
system_call+0x58/0x6c
--- Exception: c00 (System Call) at 00007fff8bda0644
SP (7fffdfbfe980) is in userspace
This patch fixes this issue.
Fixes: a08a53ea4c ("powerpc/le: Enable RTAS events support")
Cc: stable@vger.kernel.org # v3.15+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In Makefiles if we're testing a CONFIG_FOO symbol for equality with 'y'
we can instead just use ifdef. The latter reads easily, so convert to
it where possible.
Signed-off-by: Rodrigo R. Galvao <rosattig@linux.vnet.ibm.com>
Reviewed-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use the existing hypercall to determine the appropriate settings for
the count cache flush, and then call the generic powerpc code to set
it up based on the security feature flags.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we require platform code to call setup_barrier_nospec(). But
if we add an empty definition for the !CONFIG_PPC_BARRIER_NOSPEC case
then we can call it in setup_arch().
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
rtas_log_buf is a buffer to hold RTAS event data that are communicated
to kernel by hypervisor. This buffer is then used to pass RTAS event
data to user through proc fs. This buffer is allocated from
vmalloc (non-linear mapping) area.
On Machine check interrupt, register r3 points to RTAS extended event
log passed by hypervisor that contains the MCE event. The pseries
machine check handler then logs this error into rtas_log_buf. The
rtas_log_buf is a vmalloc-ed (non-linear) buffer we end up taking up a
page fault (vector 0x300) while accessing it. Since machine check
interrupt handler runs in NMI context we can not afford to take any
page fault. Page faults are not honored in NMI context and causes
kernel panic. Apart from that, as Nick pointed out,
pSeries_log_error() also takes a spin_lock while logging error which
is not safe in NMI context. It may endup in deadlock if we get another
MCE before releasing the lock. Fix this by deferring the logging of
rtas error to irq work queue.
Current implementation uses two different buffers to hold rtas error
log depending on whether extended log is provided or not. This makes
bit difficult to identify which buffer has valid data that needs to
logged later in irq work. Simplify this using single buffer, one per
paca, and copy rtas log to it irrespective of whether extended log is
provided or not. Allocate this buffer below RMA region so that it can
be accessed in real mode mce handler.
Fixes: b96672dd84 ("powerpc: Machine check interrupt is a non-maskable interrupt")
Cc: stable@vger.kernel.org # v4.14+
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The global mce data buffer that used to copy rtas error log is of 2048
(RTAS_ERROR_LOG_MAX) bytes in size. Before the copy we read
extended_log_length from rtas error log header, then use max of
extended_log_length and RTAS_ERROR_LOG_MAX as a size of data to be copied.
Ideally the platform (phyp) will never send extended error log with
size > 2048. But if that happens, then we have a risk of buffer overrun
and corruption. Fix this by using min_t instead.
Fixes: d368514c30 ("powerpc: Fix corruption when grabbing FWNMI data")
Reported-by: Michal Suchanek <msuchanek@suse.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's identical to xive_teardown_cpu() so just use the latter
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When a PCI device is detected, pdev->is_added is set to 1 and proc and
sysfs entries are created.
When the device is removed, pdev->is_added is checked for one and then
device is detached with clearing of proc and sys entries and at end,
pdev->is_added is set to 0.
is_added and is_busmaster are bit fields in pci_dev structure sharing same
memory location.
A strange issue was observed with multiple removal and rescan of a PCIe
NVMe device using sysfs commands where is_added flag was observed as zero
instead of one while removing device and proc,sys entries are not cleared.
This causes issue in later device addition with warning message
"proc_dir_entry" already registered.
Debugging revealed a race condition between the PCI core setting the
is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
setting the is_busmaster bit in pci_set_master(). As these fields are not
handled atomically, that clears the is_added bit.
Move the is_added bit to a separate private flag variable and use atomic
functions to set and retrieve the device addition state. This avoids the
race because is_added no longer shares a memory location with is_busmaster.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283
Signed-off-by: Hari Vyas <hari.vyas@broadcom.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
EEH recovery currently fails on pSeries for some IOV capable PCI
devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
certain device tree properties for the device. (Found on an IOV
capable device using the ipr driver.)
Recovery fails in pci_enable_resources() at the check on r->parent,
because r->flags is set and r->parent is not. This state is due to
sriov_init() setting the start, end and flags members of the IOV BARs
but the parent not being set later in
pseries_pci_fixup_iov_resources(), because the
"ibm,open-sriov-vf-bar-info" property is missing.
Correct this by zeroing the resource flags for IOV BARs when they
can't be configured (this is the same method used by sriov_init() and
__pci_read_base()).
VFs cleared this way can't be enabled later, because that requires
another device tree property, "ibm,number-of-configurable-vfs" as well
as support for the RTAS function "ibm_map_pes". These are all part of
hypervisor support for IOV and it seems unlikely that a hypervisor
would ever partially, but not fully, support it. (None are currently
provided by QEMU/KVM.)
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
asm/tlbflush.h is only needed for:
- using functions xxx_flush_tlb_xxx()
- using MMU_NO_CONTEXT
- including asm-generic/pgtable.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
files not using feature fixup don't need asm/feature-fixups.h
files using feature fixup need asm/feature-fixups.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Only include linux/stringify.h is files using __stringify()
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch moves ASM_CONST() and stringify_in_c() into
dedicated asm-const.h, then cleans all related inclusions.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: asm-compat.h should include asm-const.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This patch adds error reporting to H_ENTER and H_READ hcalls. A
failure for both these hcalls are mostly fatal and it would be good to
log the failure reason.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Switch from printk to pr_fmt() / pr_xxx().
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Notable changes:
- Support for split PMD page table lock on 64-bit Book3S (Power8/9).
- Add support for HAVE_RELIABLE_STACKTRACE, so we properly support live
patching again.
- Add support for patching barrier_nospec in copy_from_user() and syscall entry.
- A couple of fixes for our data breakpoints on Book3S.
- A series from Nick optimising TLB/mm handling with the Radix MMU.
- Numerous small cleanups to squash sparse/gcc warnings from Mathieu Malaterre.
- Several series optimising various parts of the 32-bit code from Christophe Leroy.
- Removal of support for two old machines, "SBC834xE" and "C2K" ("GEFanuc,C2K"),
which is why the diffstat has so many deletions.
And many other small improvements & fixes.
There's a few out-of-area changes. Some minor ftrace changes OK'ed by Steve, and
a fix to our powernv cpuidle driver. Then there's a series touching mm, x86 and
fs/proc/task_mmu.c, which cleans up some details around pkey support. It was
ack'ed/reviewed by Ingo & Dave and has been in next for several weeks.
Thanks to:
Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al Viro, Andrew
Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd Bergmann, Balbir Singh,
Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Dave
Hansen, Fabio Estevam, Finn Thain, Frederic Barrat, Gautham R. Shenoy, Haren
Myneni, Hari Bathini, Ingo Molnar, Jonathan Neuschäfer, Josh Poimboeuf,
Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu
Malaterre, Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica Gupta, Ravi
Bangoria, Russell Currey, Sam Bobroff, Samuel Mendoza-Jonas, Segher
Boessenkool, Shilpasri G Bhat, Simon Guo, Souptick Joarder, Stewart Smith,
Thiago Jung Bauermann, Torsten Duwe, Vaibhav Jain, Wei Yongjun, Wolfram Sang,
Yisheng Xie, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJbGQKBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBq
TRAAioK7rz5xYMkxaM3Ng3ybobEeNAwQqOolz98xvmnB9SfDWNuc99vf8cGu0/fQ
zc8AKZ5RcnwipOjyGlxW9oa1ZhVq0xtYnQPiYLEKMdLQmh5D+C7+KpvAd1UElweg
ub40/xDySWfMujfuMSF9JDCWPIXyojt4Xg5nJKIVRrAm/3YMe/+i5Am7NWHuMCEb
aQmZtlYW5Mz81XY0968hjpUO6eKFRmsaM7yFAhGTXx6+oLRpGj1PZB4AwdRIKS2L
Ak7q/VgxtE4W+s3a0GK2s+eXIhGKeFuX9AVnx3nti+8/K1OqrqhDcLMUC/9JpCpv
EvOtO7dxPnZujHjdu4Eai/xNoo4h6zRy7bWqve9LoBM40CP5jljKzu1lwqqb5yO0
jC7/aXhgiSIxxcRJLjoI/TYpZPu40MifrkydmczykdPyPCnMIWEJDcj4KsRL/9Y8
9SSbJzRNC/SgQNTbUYPZFFi6G0QaMmlcbCb628k8QT+Gn3Xkdf/ZtxzqEyoF4Irq
46kFBsiSSK4Bu0rVlcUtJQLgdqytWULO6NKEYnD67laxYcgQd8pGFQ8SjZhRZLgU
q5LA3HIWhoAI4M0wZhOnKXO6JfiQ1UbO8gUJLsWsfF0Fk5KAcdm+4kb4jbI1H4Qk
Vol9WNRZwEllyaiqScZN9RuVVuH0GPOZeEH1dtWK+uWi0lM=
=ZlBf
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Support for split PMD page table lock on 64-bit Book3S (Power8/9).
- Add support for HAVE_RELIABLE_STACKTRACE, so we properly support
live patching again.
- Add support for patching barrier_nospec in copy_from_user() and
syscall entry.
- A couple of fixes for our data breakpoints on Book3S.
- A series from Nick optimising TLB/mm handling with the Radix MMU.
- Numerous small cleanups to squash sparse/gcc warnings from Mathieu
Malaterre.
- Several series optimising various parts of the 32-bit code from
Christophe Leroy.
- Removal of support for two old machines, "SBC834xE" and "C2K"
("GEFanuc,C2K"), which is why the diffstat has so many deletions.
And many other small improvements & fixes.
There's a few out-of-area changes. Some minor ftrace changes OK'ed by
Steve, and a fix to our powernv cpuidle driver. Then there's a series
touching mm, x86 and fs/proc/task_mmu.c, which cleans up some details
around pkey support. It was ack'ed/reviewed by Ingo & Dave and has
been in next for several weeks.
Thanks to: Akshay Adiga, Alastair D'Silva, Alexey Kardashevskiy, Al
Viro, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Arnd
Bergmann, Balbir Singh, Cédric Le Goater, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dave Hansen, Fabio Estevam, Finn Thain,
Frederic Barrat, Gautham R. Shenoy, Haren Myneni, Hari Bathini, Ingo
Molnar, Jonathan Neuschäfer, Josh Poimboeuf, Kamalesh Babulal,
Madhavan Srinivasan, Mahesh Salgaonkar, Mark Greer, Mathieu Malaterre,
Matthew Wilcox, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Nicolai Stange, Olof Johansson, Paul Gortmaker, Paul
Mackerras, Peter Rosin, Pridhiviraj Paidipeddi, Ram Pai, Rashmica
Gupta, Ravi Bangoria, Russell Currey, Sam Bobroff, Samuel
Mendoza-Jonas, Segher Boessenkool, Shilpasri G Bhat, Simon Guo,
Souptick Joarder, Stewart Smith, Thiago Jung Bauermann, Torsten Duwe,
Vaibhav Jain, Wei Yongjun, Wolfram Sang, Yisheng Xie, YueHaibing"
* tag 'powerpc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (251 commits)
powerpc/64s/radix: Fix missing ptesync in flush_cache_vmap
cpuidle: powernv: Fix promotion from snooze if next state disabled
powerpc: fix build failure by disabling attribute-alias warning in pci_32
ocxl: Fix missing unlock on error in afu_ioctl_enable_p9_wait()
powerpc-opal: fix spelling mistake "Uniterrupted" -> "Uninterrupted"
powerpc: fix spelling mistake: "Usupported" -> "Unsupported"
powerpc/pkeys: Detach execute_only key on !PROT_EXEC
powerpc/powernv: copy/paste - Mask SO bit in CR
powerpc: Remove core support for Marvell mv64x60 hostbridges
powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
powerpc/boot: Remove support for Marvell mv64x60 i2c controller
powerpc/boot: Remove support for Marvell MPSC serial controller
powerpc/embedded6xx: Remove C2K board support
powerpc/lib: optimise PPC32 memcmp
powerpc/lib: optimise 32 bits __clear_user()
powerpc/time: inline arch_vtime_task_switch()
powerpc/Makefile: set -mcpu=860 flag for the 8xx
powerpc: Implement csum_ipv6_magic in assembly
powerpc/32: Optimise __csum_partial()
powerpc/lib: Adjust .balign inside string functions for PPC32
...
Check what firmware told us and enable/disable the barrier_nospec as
appropriate.
We err on the side of enabling the barrier, as it's no-op on older
systems, see the comment for more detail.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs.
Currently this is done by reading PURR in context switch and timer
interrupt, and storing that into a per-CPU variable. These are summed
to provide the value.
This does not work with all timer schemes (e.g., NO_HZ_FULL), and it
is sub-optimal for performance because it reads the PURR register on
every context switch, although that's been difficult to distinguish
from noise in the contxt_switch microbenchmark.
This patch implements the sum by calling a function on each CPU, to
read and add PURR values of each CPU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CPUs we can prevent a vulnerability related to store-to-load
forwarding by preventing store forwarding between privilege domains,
by inserting a barrier in kernel entry and exit paths.
This is known to be the case on at least Power7, Power8 and Power9
powerpc CPUs.
Barriers must be inserted generally before the first load after moving
to a higher privilege, and after the last store before moving to a
lower privilege, HV and PR privilege transitions must be protected.
Barriers are added as patch sections, with all kernel/hypervisor entry
points patched, and the exit points to lower privilge levels patched
similarly to the RFI flush patching.
Firmware advertisement is not implemented yet, so CPU flush types
are hard coded.
Thanks to Michal Suchánek for bug fixes and review.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hcall H_INT_RESET should be called to make sure XIVE is fully
reseted.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hcall_exit() tracepoint has retval defined as unsigned long. That
leads to humours results like:
bash-3686 [009] d..2 854.134094: hcall_entry: opcode=24
bash-3686 [009] d..2 854.134095: hcall_exit: opcode=24 retval=18446744073709551609
It's normal for some hcalls to return negative values, displaying them
as unsigned isn't very helpful. So change it to signed.
bash-3711 [001] d..2 471.691008: hcall_entry: opcode=24
bash-3711 [001] d..2 471.691008: hcall_exit: opcode=24 retval=-7
Which can be more easily compared to H_NOT_FOUND in hvcall.h
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
After migration the security feature flags might have changed (e.g.,
destination system with unpatched firmware), but some flags are not
set/clear again in init_cpu_char_feature_flags() because it assumes
the security flags to be the defaults.
Additionally, if the H_GET_CPU_CHARACTERISTICS hypercall fails then
init_cpu_char_feature_flags() does not run again, which potentially
might leave the system in an insecure or sub-optimal configuration.
So, just restore the security feature flags to the defaults assumed
by init_cpu_char_feature_flags() so it can set/clear them correctly,
and to ensure safe settings are in place in case the hypercall fail.
Fixes: f636c14790 ("powerpc/pseries: Set or clear security feature flags")
Depends-on: 19887d6a28e2 ("powerpc: Move default security feature flags")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Bring in yet another series that touches KVM code, and might need to
be merged into the kvm-ppc branch to resolve conflicts.
This required some changes in pnv_power9_force_smt4_catch/release()
due to the paca array becomming an array of pointers.
The H_CPU_BEHAV_* flags should be checked for in the 'behaviour' field
of 'struct h_cpu_char_result' -- 'character' is for H_CPU_CHAR_*
flags.
Found by playing around with QEMU's implementation of the hypercall:
H_CPU_CHAR=0xf000000000000000
H_CPU_BEHAV=0x0000000000000000
This clears H_CPU_BEHAV_FAVOUR_SECURITY and H_CPU_BEHAV_L1D_FLUSH_PR
so pseries_setup_rfi_flush() disables 'rfi_flush'; and it also
clears H_CPU_CHAR_L1D_THREAD_PRIV flag. So there is no RFI flush
mitigation at all for cpu_show_meltdown() to report; but currently
it does:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Mitigation: RFI Flush
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/meltdown
Not affected
H_CPU_CHAR=0x0000000000000000
H_CPU_BEHAV=0xf000000000000000
This sets H_CPU_BEHAV_BNDS_CHK_SPEC_BAR so cpu_show_spectre_v1() should
report vulnerable; but currently it doesn't:
Original kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Not affected
Patched kernel:
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v1
Vulnerable
Brown-paper-bag-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: f636c14790 ("powerpc/pseries: Set or clear security feature flags")
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We no longer allocate lppacas in an array, so this patch removes the
1kB static alignment for the structure, and enforces the PAPR
alignment requirements at allocation time. We can not reduce the 1kB
allocation size however, due to existing KVM hypervisors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the paca array into an array of pointers to pacas. Allocate
pacas individually.
This allows flexibility in where the PACAs are allocated. Future work
will allocate them node-local. Platforms that don't have address limits
on PACAs would be able to defer PACA allocations until later in boot
rather than allocate all possible ones up-front then freeing unused.
This is slightly more overhead (one additional indirection) for cross
CPU paca references, but those aren't too common.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With enabled DEBUG, there is a compile error:
"error: ‘flags’ is used uninitialized in this function".
This moves pr_devel() little further where @flags are initialized.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have the security flags we can simplify the code in
pseries_setup_rfi_flush() because the security flags have pessimistic
defaults.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have feature flags for security related things, set or
clear them based on what we receive from the hypercall.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We might have migrated to a machine that uses a different flush type,
or doesn't need flushing at all.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This ensures the fallback flush area is always allocated on pseries,
so in case a LPAR is migrated from a patched to an unpatched system,
it is possible to enable the fallback flush in the target system.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On POWER9, since commit cc3d294013 ("powerpc/64: Enable use of radix
MMU under hypervisor on POWER9", 2017-01-30), we set both the radix and
HPT bits in the client-architecture-support (CAS) vector, which tells
the hypervisor that we can do either radix or HPT. According to PAPR,
if we use this combination we are promising to do a H_REGISTER_PROC_TBL
hcall later on to let the hypervisor know whether we are doing radix
or HPT. We currently do this call if we are doing radix but not if
we are doing HPT. If the hypervisor is able to support both radix
and HPT guests, it would be entitled to defer allocation of the HPT
until the H_REGISTER_PROC_TBL call, and to fail any attempts to create
HPTEs until the H_REGISTER_PROC_TBL call. Thus we need to do a
H_REGISTER_PROC_TBL call when we are doing HPT; otherwise we may
crash at boot time.
This adds the code to call H_REGISTER_PROC_TBL in this case, before
we attempt to create any HPT entries using H_ENTER.
Fixes: cc3d294013 ("powerpc/64: Enable use of radix MMU under hypervisor on POWER9")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Back in 2013 we added some hypercall wrappers which misspelled
"plpar" (P-series Logical PARtition) as "plapr".
Visually they're hard to distinguish and it almost doesn't matter, but
it is confusing when grepping to miss some calls because of the typo.
They've also started spreading, so before they take over let's fix
them all to be "plpar".
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
smp_query_cpu_stopped() and related #defines are currently in
plpar_wrappers.h. The function actually does an RTAS call, not an
hcall, and basically has nothing to do with plpar_wrappers.h
Move it into pseries.h, where it can easily be used by the only two
callers in pseries/smp.c and pseries/hotplug-cpu.c.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some versions of firmware will have a setting that can be configured
to disable the RFI flush, add support for it.
Fixes: 8989d56878 ("powerpc/pseries: Query hypervisor for RFI flush settings")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit e67e02a544 ("powerpc/pseries: Fix cpu hotplug crash with
memoryless nodes") adds an unconditional call to
find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR
is enabled. This results in the following build error if this is not
the case.
arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu':
arch/powerpc/platforms/pseries/hotplug-cpu.c:369:
undefined reference to `.find_and_online_cpu_nid'
Follow the guideline provided by similar functions and provide a dummy
function if CONFIG_PPC_SPLPAR is not enabled. This also moves the
external function declaration into an include file where it should be.
Fixes: e67e02a544 ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[mpe: Change subject to emphasise the build fix]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently if the kernel receives a memory hot-unplug event early
enough, it may get stuck in an infinite loop in
dissolve_free_huge_pages(). This appears as a stall just after:
pseries-hotplug-mem: Attempting to hot-remove XX LMB(s) at YYYYYYYY
It appears to be caused by "minimum_order" being uninitialized, due to
init_ras_IRQ() executing before hugetlb_init().
To correct this, extract the part of init_ras_IRQ() that enables
hotplug event processing and place it in the machine_late_initcall
phase, which is guaranteed to be after hugetlb_init() is called.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Reorder the functions to make the diff readable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When DLPAR removing a CPU, the unmapping of the cpu from a node in
unmap_cpu_from_node() should also invalidate the CPUs entry in the
numa_cpu_lookup_table. There is not a guarantee that on a subsequent
DLPAR add of the CPU the associativity will be the same and thus
could be in a different node. Invalidating the entry in the
numa_cpu_lookup_table causes the associativity to be read from the
device tree at the time of the add.
The current behavior of not invalidating the CPUs entry in the
numa_cpu_lookup_table can result in scenarios where the the topology
layout of CPUs in the partition does not match the device tree
or the topology reported by the HMC.
This bug looks like it was introduced in 2004 in the commit titled
"ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the
linux-fullhist tree. Hence tag it for all stable releases.
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Highlights:
- Enable support for memory protection keys aka "pkeys" on Power7/8/9 when
using the hash table MMU.
- Extend our interrupt soft masking to support masking PMU interrupts as well
as "normal" interrupts, and then use that to implement local_t for a ~4x
speedup vs the current atomics-based implementation.
- A new driver "ocxl" for "Open Coherent Accelerator Processor Interface
(OpenCAPI)" devices.
- Support for new device tree properties on PowerVM to describe hotpluggable
memory and devices.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO.
- Freescale updates from Scott:
"Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a
minor cleanup patch."
As well as quite a lot of other changes all over the place, and small fixes and
cleanups as always.
Thanks to:
Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas
Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman
Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater,
Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes
do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G.
Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim
Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright,
Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre,
Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai,
Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart
Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl
Gomonovych.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJadF6wExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYA2
nBAAnguCEyAIYpc+ffE3WU9xJEWxa6bKuVufHcUFVntGiGD+igmMS+SHp4ay3Aos
HcA4WFrpzNb2KZ++kmFWtAKWnMfCiW9xuYJNicjr7X5ZiVBEhLWN/mQCwBKs3p6L
5+HhvytcdkKVbEcyVjEGvRL40AyxXNOI02o6Co9X8vanHsmWB4q0eWe4PHstZqlg
6K6kazMp+NTvEFYwKNXDOvuHouKSL57l14SLROH7CpJkNTOQ9s+W59/LmnuCjRlu
o70b7iWOAEbF9tvMma1ksDZVNj7mSyaymLYCyOXu4CkuuleJacZYJ9oQGNddoIbC
wk7l93vPT/yze7DYg8x3uXpKcaDEvEepPuQ/ubz+UXFQWuJtl5ej6Cv+0eOmyZIs
+bjWhGHKdNttnsiPlTRCX/gWD13RE1dB6xXJlfOJ7Oz9OnXXK8ZKc1NTREbQXRWM
8tClAwf9upWpm86GHPVnyrgYbgZo5b1os4SoS8e3kESzakrQVQP7J376u2DtccRq
2AGqjJ+tl5tYPnhm8zG1cNrpqHHpgkNGqLS7DvWRg3EPmEKVQcltN1b/0aKaAjHA
aTRofjrVo+jJ4MX1uyEo59yNCEQPfjkmHRQdLwm+xjWTzEPfIMzpWyXm14tawDQf
OjcAe90W/qQ18brw4z+2BI14J76XziOSX/QcunOn1u/sqaM=
=3rYn
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights:
- Enable support for memory protection keys aka "pkeys" on Power7/8/9
when using the hash table MMU.
- Extend our interrupt soft masking to support masking PMU interrupts
as well as "normal" interrupts, and then use that to implement
local_t for a ~4x speedup vs the current atomics-based
implementation.
- A new driver "ocxl" for "Open Coherent Accelerator Processor
Interface (OpenCAPI)" devices.
- Support for new device tree properties on PowerVM to describe
hotpluggable memory and devices.
- Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit
VDSO.
- Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI
erratum workaround, plus a minor cleanup patch.
As well as quite a lot of other changes all over the place, and small
fixes and cleanups as always.
Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy,
Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V,
Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann,
Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G.
Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur,
David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic
Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva,
Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh
Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal,
Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael
Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot,
Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud,
Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee,
Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann,
Vaibhav Jain, Vasyl Gomonovych"
* tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits)
powerpc/mm/radix: Fix build error when RADIX_MMU=n
macintosh/ams-input: Use true and false for boolean values
macintosh: change some data types from int to bool
powerpc/watchdog: Print the NIP in soft_nmi_interrupt()
powerpc/watchdog: regs can't be null in soft_nmi_interrupt()
powerpc/watchdog: Tweak watchdog printks
powerpc/cell: Remove axonram driver
rtc-opal: Fix handling of firmware error codes, prevent busy loops
powerpc/mpc52xx_gpt: make use of raw_spinlock variants
macintosh/adb: Properly mark continued kernel messages
powerpc/pseries: Fix cpu hotplug crash with memoryless nodes
powerpc/numa: Ensure nodes initialized for hotplug
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
powerpc/kernel: Block interrupts when updating TIDR
powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn
powerpc/mm/nohash: do not flush the entire mm when range is a single page
powerpc/pseries: Add Initialization of VF Bars
powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV
powerpc/eeh: Add EEH notify resume sysfs
powerpc/eeh: Add EEH operations to notify resume
...
This pull requests contains a consolidation of the generic no-IOMMU code,
a well as the glue code for swiotlb. All the code is based on the x86
implementation with hooks to allow all architectures that aren't cache
coherent to use it. The x86 conversion itself has been deferred because
the x86 maintainers were a little busy in the last months.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlpxcVoLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN/Lw/+Je9teM4NPQ8lU/ncbJN/bUzCFGJ6dFt2eVX/6xs3
sfl8vBdeHt6CBM02rRNecEr31z3+orjQes5JnlEJFYeG3jumV0zCPw/zbxqjzbJ1
3n6cckLxbxzy8Ca1G/BVjHLAUX5eWp1ujn/Q4d03VKVQZhJvFYlqDbP3TrNVx7xn
k86u37p/o+ngjwX66UdZ3C4iIBF8zqy6n2kkpv4HUQtHHzPwEvliN39eNilovb56
iGOzjDX1UWHAu4xCTVnPHSG4fA4XU41NWzIN3DIVPE25lYSISSl9TFAdR8GeZA0G
0Yj6sW53pRSoUwco1ocoS44/FgrPOB5/vHIL06pABvicXBiomje1QylqcK7zAczk
esjkfPEZrmZuu99GtqFyDNKEvKKdy+aBGaTZ3y+NxsuBs+0xS2Owz1IE4Tk28xaw
xh7zn+CVdk2fJh6ZIdw5Eu9b9VN08UriqDmDzO/ylDlcNGcDi7wcxiSTEkHJ1ON/
g9nletV6f3egL0wljDcOnhCJCHTvmWEeq3z8lE55QzPzSH0hHpnGQ2WD0tKrroxz
kjOZp0TdXa4F5iysOHe2xl2sftOH0zIkBQJ+oBcK12mTaLu21+yeuCggQXJ/CBdk
1Ol7l9g9T0TDuZPfiTHt5+6jmECQs92LElWA8x7uF7Fpix3BpnafWaaSMSsosF3F
D1Y=
=Nrl9
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
"Except for a runtime warning fix from Christian this is all about
consolidation of the generic no-IOMMU code, a well as the glue code
for swiotlb.
All the code is based on the x86 implementation with hooks to allow
all architectures that aren't cache coherent to use it.
The x86 conversion itself has been deferred because the x86
maintainers were a little busy in the last months"
* tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping: (57 commits)
MAINTAINERS: add the iommu list for swiotlb and xen-swiotlb
arm64: use swiotlb_alloc and swiotlb_free
arm64: replace ZONE_DMA with ZONE_DMA32
mips: use swiotlb_{alloc,free}
mips/netlogic: remove swiotlb support
tile: use generic swiotlb_ops
tile: replace ZONE_DMA with ZONE_DMA32
unicore32: use generic swiotlb_ops
ia64: remove an ifdef around the content of pci-dma.c
ia64: clean up swiotlb support
ia64: use generic swiotlb_ops
ia64: replace ZONE_DMA with ZONE_DMA32
swiotlb: remove various exports
swiotlb: refactor coherent buffer allocation
swiotlb: refactor coherent buffer freeing
swiotlb: wire up ->dma_supported in swiotlb_dma_ops
swiotlb: add common swiotlb_map_ops
swiotlb: rename swiotlb_free to swiotlb_exit
x86: rename swiotlb_dma_ops
powerpc: rename swiotlb_dma_ops
...
On powerpc systems with shared configurations of CPUs and memory and
memoryless nodes at boot, an event ordering problem was observed on a
SLES12 build platforms with the hot-add of CPUs to the memoryless
nodes.
* The most common error occurred when the memory SLAB driver attempted
to reference the memoryless node to which a CPU was being added
before the kernel had finished initializing all of the data
structures for the CPU and exited 'device_online' under
DLPAR/hot-add.
Normally the memoryless node would be initialized through the call
path device_online ... arch_update_cpu_topology ... find_cpu_nid ...
try_online_node. This patch ensures that the powerpc node will be
initialized as early as possible, even if it was memoryless and
CPU-less at the point when we are trying to hot-add a new CPU to it.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When enabling SR-IOV in pseries platform, the VF bar properties for a
PF are reported on the device node in the device tree.
This patch adds the IOV Bar resources to Linux structures from the
device tree for later use when configuring SR-IOV by PF driver.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
After initial validation of SR-IOV resources, firmware will associate
PEs to the dynamic VFs created within this call. This patch adds the
association of PEs to the PF array of PE numbers indexed by VF.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When pseries SR-IOV is enabled and after a PF driver has resumed from
EEH, platform has to be notified of the event so the child VFs can be
allowed to resume their normal recovery path.
This patch makes the EEH operation allow unfreeze platform dependent
code and adds the call to pseries EEH code.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To correctly use EEH code one has to make sure that the EEH_PE_VF is
set for dynamic created VFs. Therefore this patch allocates an eeh_pe
of eeh type EEH_PE_VF and associates PE with parent.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add EEH platform operations for pseries to update VF config space.
With this change after EEH, the VF will have updated config space for
pseries platform.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Platforms with a panic handler that halts the system can have problems
getting kernel messages out, because the panic notifiers are called
before kernel/panic.c does its flushing of printk buffers an console
etc.
This was attempted to be solved with commit a3b2cb30f2 ("powerpc: Do
not call ppc_md.panic in fadump panic notifier"), but that wasn't the
right approach and caused other problems, and was reverted by commit
ab9dbf771f.
Instead, the powernv shutdown paths have already had a similar
problem, fixed by taking the message flushing sequence from
kernel/panic.c. That's a little bit ugly, but while we have the code
duplicated, it will work for this case as well. So have ppc panic
handlers do the same flushing before they terminate.
Without this patch, a qemu pseries_le_defconfig guest stops silently
when issued the nmi command when xmon is off and no crash dumpers
enabled. Afterwards, an oops is printed by each CPU as expected.
Fixes: ab9dbf771f ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Symbolic macros are unintuitive and hard to read, whereas octal constants
are much easier to interpret. Replace macros for the basic permission
flags (user/group/other read/write/execute) with numeric constants
instead, across the whole powerpc tree.
Introducing a significant number of changes across the tree for no runtime
benefit isn't exactly desirable, but so long as these macros are still
used in the tree people will keep sending patches that add them. Not only
are they hard to parse at a glance, there are multiple ways of coming to
the same value (as you can see with 0444 and 0644 in this patch) which
hurts readability.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge our fixes branch from the 4.15 cycle.
Unusually the fixes branch saw some significant features merged,
notably the RFI flush patches, so we want the code in next to be
tested against that, to avoid any surprises when the two are merged.
There's also some other work on the panic handling that was reverted
in fixes and we now want to do properly in next, which would conflict.
And we also fix a few other minor merge conflicts.
pseries/drc-info: Provide parallel routines to convert between
drc_index and CPU numbers at runtime, using the older device-tree
properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types"
and "ibm,drc-power-domains"), or the new property "ibm,drc-info".
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Firmware Features: Define new bit flag representing the presence of
new device tree property "ibm,drc-info". The flag is used to tell
the front end processor whether the Linux kernel supports the new
property, and by the front end processor to tell the Linux kernel
that the new property is present in the device tree.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of manually coding the loop with of_find_node_by_type(), let's
switch to the standard macro for iterating over nodes with given type.
Also fixed a couple of refcount leaks in the aforementioned loops.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add required bits to the architecture vector to enable support
of the ibm,dynamic-memory-v2 device tree property.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that the powerpc code parses dynamic reconfiguration memory
LMB information from the LMB array and not the device tree
directly we can move the of_drconf_cell struct to drmem.h where
it fits better.
In addition, the struct is renamed to of_drconf_cell_v1 in
anticipation of upcoming support for version 2 of the dynamic
reconfiguration property and the members are typed as __be*
values to reflect how they exist in the device tree.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Update the pseries memory hotplug code to use the newly added
dynamic reconfiguration LMB array. Doing this is required for the
upcoming support of version 2 of the dynamic reconfiguration
device tree property.
In addition, making this change cleans up the code that parses the
LMB information as we no longer need to worry about device tree
format. This allows us to discard one of the first steps on memory
hotplug where we make a working copy of the device tree property and
convert the entire property to cpu format. Instead we just use the
LMB array directly while holding the memory hotplug lock.
This patch also moves the updating of the device tree property to
powerpc/mm/drmem.c. This allows to the hotplug code to work without
needing to know the device tree format and provides a single
routine for updating the device tree property. This new routine
will handle determination of the proper device tree format and
generate a properly formatted device tree property.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We want to use the dma_direct_ namespace for a generic implementation,
so rename powerpc to the second best choice: dma_nommu_.
Signed-off-by: Christoph Hellwig <hch@lst.de>
A new hypervisor call is available which tells the guest settings
related to the RFI flush. Use it to query the appropriate flush
instruction(s), and whether the flush is required.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hotplug code uses its own workqueue to handle IRQ requests
(pseries_hp_wq), however that workqueue is initialized after
init_ras_IRQ(). That can lead to a kernel panic if any hotplug
interrupts fire after init_ras_IRQ() but before pseries_hp_wq is
initialised. eg:
UDP-Lite hash table entries: 2048 (order: 0, 65536 bytes)
NET: Registered protocol family 1
Unpacking initramfs...
(qemu) object_add memory-backend-ram,id=mem1,size=10G
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
Unable to handle kernel paging request for data at address 0xf94d03007c421378
Faulting instruction address: 0xc00000000012d744
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.15.0-rc2-ziviani+ #26
task: (ptrval) task.stack: (ptrval)
NIP: c00000000012d744 LR: c00000000012d744 CTR: 0000000000000000
REGS: (ptrval) TRAP: 0380 Not tainted (4.15.0-rc2-ziviani+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 28088042 XER: 20040000
CFAR: c00000000012d3c4 SOFTE: 0
...
NIP [c00000000012d744] __queue_work+0xd4/0x5c0
LR [c00000000012d744] __queue_work+0xd4/0x5c0
Call Trace:
[c0000000fffefb90] [c00000000012d744] __queue_work+0xd4/0x5c0 (unreliable)
[c0000000fffefc70] [c00000000012dce4] queue_work_on+0xb4/0xf0
This commit makes the RAS IRQ registration explicitly dependent on the
creation of the pseries_hp_wq.
Reported-by: Min Deng <mdeng@redhat.com>
Reported-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Tested-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Add a pci_vf_drivers_autoprobe() interface. Setting autoprobe to false
on the PF prevents drivers from binding to VFs when they are enabled.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add calls for pseries platform to configure/deconfigure SR-IOV.
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Juan J. Alvarez <jjalvare@us.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This reverts commit a3b2cb30f2.
That commit tried to fix problems with panic on powerpc in certain
circumstances, where some output from the generic panic code was being
dropped.
Unfortunately, it breaks things worse in other circumstances. In
particular when running a PAPR guest, it will now attempt to reboot
instead of informing the hypervisor (KVM or PowerVM) that the guest
has crashed. The crash notification is important to some
virtualization management layers.
Revert it for now until we can come up with a better solution.
Fixes: a3b2cb30f2 ("powerpc: Do not call ppc_md.panic in fadump panic notifier")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[mpe: Tweak change log a bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.
Update arch/powerpc/
Miscellanea:
o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs in our
implementation and to bring the semantics exactly into line with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a true NMI
(ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors can be
reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM to notify
the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on some Power9
processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on some
Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a CONFIG), we
believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting for long
running operations performed by OPAL firmware, and changes to the
powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are using
transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on Power9.
- Improvements to the VAS facility used to access coprocessors on Power9, and
related improvements to the way the NX crypto driver handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew Donnellan, Aneesh
Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Breno Leitao,
Christophe Leroy, Christophe Lombard, Cyril Bur, Frederic Barrat, Gautham R.
Shenoy, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo Romero, Haren
Myneni, Joel Stanley, Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami
Hiramatsu, Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia Franco de
Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee, Shriya, Stephen
Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, William A. Kennington III.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJaDXGuAAoJEFHr6jzI4aWAEqwP/0TA35KFAK6wqfkCf67z4q+O
I+5piI4eDV4jdCakfoIN1JfjhQRULNePSoCHTccan30mu/bm30p69xtOLL2/h5xH
Mhz/eDBAOo0lrT20nyZfYMW3FnM66wnNf++qJ0O+8L052r4WOB02J0k1uM1ST01D
5Lb5mUoxRLRzCgKRYAYWJifn+IFPUB9NMsvMTym94krAFlIjIzMEQXhDoln+jJMr
QmY5f1BTA/fLfXobn0zwoc/C1oa2PUtxd+rxbwGrLoZ6G843mMqUi90SMr5ybhXp
RzepnBTj4by3vOsnk/X1mANyaZfLsunp75FwnjHdPzKrAS/TuPp8D/iSxxE/PzEq
cLwJFBnFXSgQMefDErXxhHSDz2dAg5r14rsTpDcq2Ko8TPV4rPsuSfmbd9Txekb0
yWHsjoJUBBMl2QcWqIHl+AlV8j1RklF6solcTBcGnH1CZJMfa05VKXV7xGEvOHa0
RJ+/xPyR9KjoB/SUp++9Vmx/M6SwQYFOJlr3Zpg9LNtR8WpoPYu1E6eO+u1Hhzny
eJqaNstH+i+VdY9eqszkAsEBh8o9M/+b+7Wx7TetvU+v368CbXtgFYs9qy2oZjPF
t9sY/BHaHZ8eZ7I00an77a0fVV5B1PVASUtIz5CqkwGpMvX6Z6W2K/XUUFI61kuu
E06HS6Ht8UPJAzrAPUMl
=Rq81
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"A bit of a small release, I suspect in part due to me travelling for
KS. But my backlog of patches to review is smaller than usual, so I
think in part folks just didn't send as much this cycle.
Non-highlights:
- Five fixes for the >128T address space handling, both to fix bugs
in our implementation and to bring the semantics exactly into line
with x86.
Highlights:
- Support for a new OPAL call on bare metal machines which gives us a
true NMI (ie. is not masked by MSR[EE]=0) for debugging etc.
- Support for Power9 DD2 in the CXL driver.
- Improvements to machine check handling so that uncorrectable errors
can be reported into the generic memory_failure() machinery.
- Some fixes and improvements for VPHN, which is used under PowerVM
to notify the Linux partition of topology changes.
- Plumbing to enable TM (transactional memory) without suspend on
some Power9 processors (PPC_FEATURE2_HTM_NO_SUSPEND).
- Support for emulating vector loads form cache-inhibited memory, on
some Power9 revisions.
- Disable the fast-endian switch "syscall" by default (behind a
CONFIG), we believe it has never had any users.
- A major rework of the API drivers use when initiating and waiting
for long running operations performed by OPAL firmware, and changes
to the powernv_flash driver to use the new API.
- Several fixes for the handling of FP/VMX/VSX while processes are
using transactional memory.
- Optimisations of TLB range flushes when using the radix MMU on
Power9.
- Improvements to the VAS facility used to access coprocessors on
Power9, and related improvements to the way the NX crypto driver
handles requests.
- Implementation of PMEM_API and UACCESS_FLUSHCACHE for 64-bit.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Allen Pais, Andrew
Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Balbir Singh, Benjamin
Herrenschmidt, Breno Leitao, Christophe Leroy, Christophe Lombard,
Cyril Bur, Frederic Barrat, Gautham R. Shenoy, Geert Uytterhoeven,
Guilherme G. Piccoli, Gustavo Romero, Haren Myneni, Joel Stanley,
Kamalesh Babulal, Kautuk Consul, Markus Elfring, Masami Hiramatsu,
Michael Bringmann, Michael Neuling, Michal Suchanek, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pedro Miraglia
Franco de Carvalho, Philippe Bergheaud, Sandipan Das, Seth Forshee,
Shriya, Stephen Rothwell, Stewart Smith, Sukadev Bhattiprolu, Tyrel
Datwyler, Vaibhav Jain, Vaidyanathan Srinivasan, and William A.
Kennington III"
* tag 'powerpc-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (151 commits)
powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
powerpc/64s: Fix masking of SRR1 bits on instruction fault
powerpc/64s: mm_context.addr_limit is only used on hash
powerpc/64s/radix: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Allow MAP_FIXED allocations to cross 128TB boundary
powerpc/64s/hash: Fix fork() with 512TB process address space
powerpc/64s/hash: Fix 128TB-512TB virtual address boundary case allocation
powerpc/64s/hash: Fix 512T hint detection to use >= 128T
powerpc: Fix DABR match on hash based systems
powerpc/signal: Properly handle return value from uprobe_deny_signal()
powerpc/fadump: use kstrtoint to handle sysfs store
powerpc/lib: Implement UACCESS_FLUSHCACHE API
powerpc/lib: Implement PMEM API
powerpc/powernv/npu: Don't explicitly flush nmmu tlb
powerpc/powernv/npu: Use flush_all_mm() instead of flush_tlb_mm()
powerpc/powernv/idle: Round up latency and residency values
powerpc/kprobes: refactor kprobe_lookup_name for safer string operations
powerpc/kprobes: Blacklist emulate_update_regs() from kprobes
powerpc/kprobes: Do not disable interrupts for optprobes and kprobes_on_ftrace
powerpc/kprobes: Disable preemption before invoking probe handler for optprobes
...
Summary of modules changes for the 4.15 merge window:
- Treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- Minor code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
fbGX8juZWP4ypbK+irTB
=c8vb
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
- kbuild cleanups and improvements for dtbs
- Code clean-up of overlay code and fixing for some long standing memory
leak and race condition in applying overlays
- Improvements to DT memory usage making sysfs/kobjects optional and
skipping unflattening of disabled nodes. This is part of kernel
tinification efforts.
- Final piece of removing storing the full path for every DT node. The
prerequisite conversion of printk's to use device_node format
specifier happened in 4.14.
- Sync with current upstream dtc. This brings additional checks to dtb
compiling.
- Binding doc tree wide removal of leading 0s from examples
- RTC binding documentation adding missing devices and some
consolidation of duplicated bindings
- Vendor prefix documentation for nutsboard, Silicon Storage Technology,
shimafuji, Tecon Microprocessor Technologies, DH electronics GmbH,
Opal Kelly, and Next Thing
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJaCwaSEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcNzeA/8
C8uQhSsX2+UQZvFzcEA8KQAMGT3kYdrcf+gidRKwCEUWg1qscUEpTb3n3Rm5NUbU
RPD1s6GSlh6fJCMHDTQ6Tti/T59L7nZa2/AIGmUishGu4x4q1o18AobpFJmYP/EM
SJPwnmm5RV9WcZFao1y+sY3Xtn8DStxHO4cS+dyF5/EvPN9D8nbLJfu7bgTBAZww
HktIMB9kx+GTipRQZBvBwXoy5MJjthIZub4XwzesA4tGananj4cXlc0xaVxpdYy3
5bO6q5F7cbrZ2uyrF+oIChpCENK4VaXh80m0WHc8EzaG++shzEkR4he1vYkwnV+I
OYo4vsUg9dP8rBksUG1eYhS8fJKPvEBRNP7ETT5utVBy5I/tDEbo/crmQZRTIDIC
hZbhcdZlISZj0DzkMK2ZHQV9UYtRWzXrJbZHFIPP12GCyvXVxYJUIWb9iYnUYSon
KugygsFSpZHMWmfAhemw5/ctJZ19qhM5UIl2KZk5tMBHAf466ILmZjg0me6fYkOp
eADfwHJ1dLMdK79CVMHSfp+vArcZXp35B16c3sWpJB36Il97Mc/9siEufCL4GKX7
IBBnQBlbpSBKBejWVyI7Ip/Xp5u4qAQD+ZMJ9oLqBRqfWerHbDuOERlEOgwGqJYr
9v4HvP7V8eVUvAdqXka4EBfCyAgUzXDAxG2Dfmv9vGU=
=jgpN
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
"A bigger diffstat than usual with the kbuild changes and a tree wide
fix in the binding documentation.
Summary:
- kbuild cleanups and improvements for dtbs
- Code clean-up of overlay code and fixing for some long standing
memory leak and race condition in applying overlays
- Improvements to DT memory usage making sysfs/kobjects optional and
skipping unflattening of disabled nodes. This is part of kernel
tinification efforts.
- Final piece of removing storing the full path for every DT node.
The prerequisite conversion of printk's to use device_node format
specifier happened in 4.14.
- Sync with current upstream dtc. This brings additional checks to
dtb compiling.
- Binding doc tree wide removal of leading 0s from examples
- RTC binding documentation adding missing devices and some
consolidation of duplicated bindings
- Vendor prefix documentation for nutsboard, Silicon Storage
Technology, shimafuji, Tecon Microprocessor Technologies, DH
electronics GmbH, Opal Kelly, and Next Thing"
* tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
dt-bindings: usb: add #phy-cells to usb-nop-xceiv
dt-bindings: Remove leading zeros from bindings notation
kbuild: handle dtb-y and CONFIG_OF_ALL_DTBS natively in Makefile.lib
MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
kbuild: clean up *.dtb and *.dtb.S patterns from top-level Makefile
.gitignore: move *.dtb and *.dtb.S patterns to the top-level .gitignore
.gitignore: sort normal pattern rules alphabetically
dt-bindings: add vendor prefix for Next Thing Co.
scripts/dtc: Update to upstream version v1.4.5-6-gc1e55a5513e9
of: dynamic: fix memory leak related to properties of __of_node_dup
of: overlay: make pr_err() string unique
of: overlay: pr_err from return NOTIFY_OK to overlay apply/remove
of: overlay: remove unneeded check for NULL kbasename()
of: overlay: remove a dependency on device node full_name
of: overlay: simplify applying symbols from an overlay
of: overlay: avoid race condition between applying multiple overlays
of: overlay: loosen overly strict phandle clash check
of: overlay: expand check of whether overlay changeset can be removed
of: overlay: detect cases where device tree may become corrupt
of: overlay: minor restructuring
...
CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
found in "server" processors, from IBM mainly.
Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
annoying to have two symbols that always have the same value, it's not
quite annoying enough to bother removing one.
However with the arrival of Power9, we now have the situation where
CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
Radix MMU - *not* the "standard" MMU. So it is now actively confusing
to use it, because it implies that code is disabled or inactive when
the Radix MMU is in use, however that is not necessarily true.
So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
formatting updates of some of the affected lines.
This will be a pain for backports, but c'est la vie.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When a vdevice is DLPAR removed from the system the vio subsystem
doesn't bother unmapping the virq from the irq_domain. As a result we
have a virq mapped to a hardware irq that is no longer valid for the
irq_domain. A side effect is that we are left with /proc/irq/<irq#>
affinity entries, and attempts to modify the smp_affinity of the irq
will fail.
In the following observed example the kernel log is spammed by
ics_rtas_set_affinity errors after the removal of a VSCSI adapter.
This is a result of irqbalance trying to adjust the affinity every 10
seconds.
rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed
ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3
This patch fixes the issue by calling irq_dispose_mapping() on the
virq of the viodev on unregister.
Fixes: f2ab621996 ("powerpc/pseries: Add PFO support to the VIO bus")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Although kfree(NULL) is legal, it's a bit lazy to rely on that to
implement the error handling. So do it the normal Linux way using
labels for each failure path.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[mpe: Squash a few patches and rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/hotplug: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources. During hotplug
CPU operations, this patch resets the timer on topology update work
function to a small value to better ensure that the CPU topology is
detected and configured sooner.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With dependencies on full_name containing the entire device node path
removed, stop storing the full_name in nodes created by
dlpar_configure_connector() and pSeries_reconfig_add_node().
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
A reference to the parent device node is held by add_dt_node() for the
node to be added. If the call to dlpar_configure_connector() fails
add_dt_node() returns ENOENT and that reference is not freed.
Add a call to of_node_put(parent_dn) prior to bailing out after a
failed dlpar_configure_connector() call.
Fixes: 8d5ff32076 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 215ee763f8 ("powerpc: pseries: remove dlpar_attach_node
dependency on full path") reworked dlpar_attach_node() to no longer
look up the parent node "/cpus", but instead to have the parent node
passed by the caller in the function parameter list.
As a result dlpar_attach_node() is no longer responsible for freeing
the reference to the parent node. However, commit 215ee763f8 failed
to remove the of_node_put(parent) call in dlpar_attach_node(), or to
take into account that the reference to the parent in the caller
dlpar_cpu_add() needs to be held until after dlpar_attach_node()
returns.
As a result doing repeated cpu add/remove dlpar operations will
eventually result in the following error:
OF: ERROR: Bad of_node_put() on /cpus
CPU: 0 PID: 10896 Comm: drmgr Not tainted 4.13.0-autotest #1
Call Trace:
dump_stack+0x15c/0x1f8 (unreliable)
of_node_release+0x1a4/0x1c0
kobject_put+0x1a8/0x310
kobject_del+0xbc/0xf0
__of_detach_node_sysfs+0x144/0x210
of_detach_node+0xf0/0x180
dlpar_detach_node+0xc4/0x120
dlpar_cpu_remove+0x280/0x560
dlpar_cpu_release+0xbc/0x1b0
arch_cpu_release+0x6c/0xb0
cpu_release_store+0xa0/0x100
dev_attr_store+0x68/0xa0
sysfs_kf_write+0xa8/0xf0
kernfs_fop_write+0x2cc/0x400
__vfs_write+0x5c/0x340
vfs_write+0x1a8/0x3d0
SyS_write+0xa8/0x1a0
system_call+0x58/0x6c
Fix the issue by removing the of_node_put(parent) call from
dlpar_attach_node(), and ensuring that the reference to the parent
node is properly held and released by the caller dlpar_cpu_add().
Fixes: 215ee763f8 ("powerpc: pseries: remove dlpar_attach_node dependency on full path")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
[mpe: Add a comment in the code and frob the change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
GFP_TEMPORARY was introduced by commit e12ba74d8f ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.
Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.
Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:
- H_INT_GET_SOURCE_INFO
used to obtain the address of the MMIO page of the Event State
Buffer (PQ bits) entry associated with the source.
- H_INT_SET_SOURCE_CONFIG
assigns a source to a "target".
- H_INT_GET_SOURCE_CONFIG
determines to which "target" and "priority" is assigned to a source
- H_INT_GET_QUEUE_INFO
returns the address of the notification management page associated
with the specified "target" and "priority".
- H_INT_SET_QUEUE_CONFIG
sets or resets the event queue for a given "target" and "priority".
It is also used to set the notification config associated with the
queue, only unconditional notification for the moment. Reset is
performed with a queue size of 0 and queueing is disabled in that
case.
- H_INT_GET_QUEUE_CONFIG
returns the queue settings for a given "target" and "priority".
- H_INT_RESET
resets all of the partition's interrupt exploitation structures to
their initial state, losing all configuration set via the hcalls
H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
- H_INT_SYNC
issue a synchronisation on a source to make sure sure all
notifications have reached their queue.
As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :
- "reg"
contains the base address and size of the thread interrupt
managnement areas (TIMA), also called rings, for the User level and
for the Guest OS level. Only the Guest OS level is taken into
account today.
- "ibm,xive-eq-sizes"
the size of the event queues. One cell per size supported, contains
log2 of size, in ascending order.
- "ibm,xive-lisn-ranges"
the interrupt numbers ranges assigned to the guest. These are
allocated using a simple bitmap.
and also :
- "/ibm,plat-res-int-priorities"
contains a list of priorities that the hypervisor has reserved for
its own use.
Tested with a QEMU XIVE model for pseries and with the Power hypervisor.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Check if an LMB is assigned before attempting to call dlpar_acquire_drc
in order to avoid any unnecessary rtas calls. This substantially
reduces the running time of memory hot add on lpars with large amounts
of memory.
[mpe: We need to explicitly set rc to 0 in the success case, otherwise
the compiler might think we use rc without initialising it.]
Fixes: c21f515c74 ("powerpc/pseries: Make the acquire/release of the drc for memory a seperate step")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
.llong is an undocumented PPC specific directive. The generic
equivalent is .quad, but even better (because it's self describing) is
.8byte.
Convert all .llong directives to .8byte.
Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.
The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.
With these 2 changes, we can finally get rid of the OF node back pointer.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.
This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.
Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some PowerVM firmware when delivering a system reset interrupt to a
little endian OS will mess up SRR registers. They are byteswapped, and
SRR1 is incorrect. An example from a crash:
NIP: 14dd0900000000c0
MSR: 1000000200000080
It's possible to detect this pattern in SRR1 (that would never happen
in normal operation), and at least fix the NIP. After this patch, the
same interrupt reports NIP properly:
NIP [c00000000009dd14] plpar_hcall_norets+0x1c/0x28
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.
Currently, system reset injection causes the guest to silently stop.
Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.
Remove ppc_md.panic. Move fadump panic notifier into fadump code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation to stop storing the full node path in full_name, remove the
dependency on full_name from dlpar_attach_node(). Callers of
dlpar_attach_node() already have the parent device_node, so just pass the
parent node into dlpar_attach_node instead of the path. This avoids doing
a lookup of the parent node by the path.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Now that we have a custom printf format specifier, convert users of
full_name to use %pOF instead. This is preparation to remove storing
of the full path string for each node.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Scott Wood <oss@buserror.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently in the vio.c code we use a comparision against the parent
device node's full path to decide if the device is a PFO or VIO family
device.
Both the ibm,platform-facilities and vdevice nodes are defined by PAPR,
and must have a matching device_type. So instead of using the path we
can instead compare the device_type.
I've checked Qemu and kvmtool both do this correctly, and all the
PowerVM systems I have access to do also. So it seems to be safe.
This removes the dependency on full_name, which is being removed
upstream.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
This driver currently reports the H_BEST_ENERGY hypervisor call is
unsupported (even when booting in a non-virtualised environment). This
is not something the administrator can do much with, and not
significant for debugging.
Remove it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When DLPAR adding or removing memory we need to check the device
offline status before trying to online/offline the memory. This is
needed because calls to device_online() and device_offline() will
return non-zero for memory that is already online and offline
respectively.
This update resolves two scenarios. First, for a kernel built with
auto-online memory enabled (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y),
memory will be onlined as part of calls to add_memory(). After adding
the memory the pseries DLPAR code tries to online it and fails since
the memory is already online. The DLPAR code then tries to remove the
memory which produces the oops message below because the memory is not
offline.
The second scenario occurs when removing memory that is already
offline, i.e. marking memory offline (via sysfs) and then trying to
remove that memory. This doesn't work because offlining the already
offline memory does not succeed and the DLPAR code then fails the
DLPAR remove operation.
The fix for both scenarios is to check the device.offline status
before making the calls to device_online() or device_offline().
kernel BUG at mm/memory_hotplug.c:1936!
...
NIP [c0000000002ca428] .remove_memory+0xb8/0xc0
LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0
Call Trace:
.remove_memory+0x5c/0xc0 (unreliable)
.dlpar_add_lmb+0x384/0x400
.dlpar_memory+0x5dc/0xca0
.handle_dlpar_errorlog+0x74/0xe0
.pseries_hp_work_fn+0x2c/0x90
.process_one_work+0x17c/0x460
.worker_thread+0x88/0x500
.kthread+0x15c/0x1a0
.ret_from_kernel_thread+0x58/0xc0
Fixes: 943db62c31 ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
[mpe: Use bool, add explicit rc=0 case, change log typos & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As for commit 68baf692c4 ("powerpc/pseries: Fix of_node_put()
underflow during DLPAR remove"), the call to of_node_put() must be
removed from pSeries_reconfig_remove_node().
dlpar_detach_node() and pSeries_reconfig_remove_node() both call
of_detach_node(), and thus the node should not be released in both
cases.
Fixes: 0829f6d1f6 ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All cases initialise rv, and if they didn't that would be a bug. By
dropping the initialisation we give the compiler the chance to catch
those bugs for us.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use memdup_user_nul() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to:
Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
Thiago Jung Bauermann, Yang Li.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
bBDvrRnvupusz8qCWrxe
=w8rj
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Highlights include:
- Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
- Platform support for FSP2 (476fpe) board
- Enable ZONE_DEVICE on 64-bit server CPUs.
- Generic & powerpc spin loop primitives to optimise busy waiting
- Convert VDSO update function to use new update_vsyscall() interface
- Optimisations to hypercall/syscall/context-switch paths
- Improvements to the CPU idle code on Power8 and Power9.
As well as many other fixes and improvements.
Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
Bauermann, Yang Li"
* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
powerpc/mm/hash: Implement mark_rodata_ro() for hash
powerpc/vmlinux.lds: Align __init_begin to 16M
powerpc/lib/code-patching: Use alternate map for patch_instruction()
powerpc/xmon: Add patch_instruction() support for xmon
powerpc/kprobes/optprobes: Use patch_instruction()
powerpc/kprobes: Move kprobes over to patch_instruction()
powerpc/mm/radix: Fix execute permissions for interrupt_vectors
powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
powerpc/64s: Blacklist rtas entry/exit from kprobes
powerpc/64s: Blacklist functions invoked on a trap
powerpc/64s: Un-blacklist system_call() from kprobes
powerpc/64s: Move system_call() symbol to just after setting MSR_EE
powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
powerpc/64s: Convert .L__replay_interrupt_return to a local label
powerpc64/elfv1: Only dereference function descriptor for non-text symbols
cxl: Export library to support IBM XSL
powerpc/dts: Use #include "..." to include local DT
powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
...
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls
to ->mapping_error so that the dma_map_ops instances are
more self contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
+i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
R4o=
=0Fso
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping infrastructure from Christoph Hellwig:
"This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls to
->mapping_error so that the dma_map_ops instances are more self
contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)"
* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
ARM: dma-mapping: Remove traces of NOMMU code
ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
ARM: NOMMU: Introduce dma operations for noMMU
drivers: dma-mapping: allow dma_common_mmap() for NOMMU
drivers: dma-coherent: Introduce default DMA pool
drivers: dma-coherent: Account dma_pfn_offset when used with device tree
dma: Take into account dma_pfn_offset
dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
dma-mapping: remove dmam_free_noncoherent
crypto: qat - avoid an uninitialized variable warning
au1100fb: remove a bogus dma_free_nonconsistent call
MAINTAINERS: add entry for dma mapping helpers
powerpc: merge __dma_set_mask into dma_set_mask
dma-mapping: remove the set_dma_mask method
powerpc/cell: use the dma_supported method for ops switching
powerpc/cell: clean up fixed mapping dma_ops initialization
tile: remove dma_supported and mapping_error methods
xen-swiotlb: remove xen_swiotlb_set_dma_mask
arm: implement ->dma_supported instead of ->set_dma_mask
mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
...
Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpX4A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymobgCfd0d13IfpZoq1N41wc6z2Z0xD7cwAnRMeH1/p
kEeISGpHPYP9f8PBh9FO
=Hfqt
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues"
* tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits)
arm: mach-rpc: ecard: fix build error
zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()
driver-core: remove struct bus_type.dev_attrs
powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
powerpc: vio: use dev_groups and not dev_attrs for bus_type
USB: usbip: convert to use DRIVER_ATTR_RW
s390: drivers: convert to use DRIVER_ATTR_RO/WO
platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW
pcmcia: ds: convert to use DRIVER_ATTR_RO
wireless: ipw2x00: convert to use DRIVER_ATTR_RW
net: ehea: convert to use DRIVER_ATTR_RO
net: caif: convert to use DRIVER_ATTR_RO
TTY: hvc: convert to use DRIVER_ATTR_RW
PCI: pci-driver: convert to use DRIVER_ATTR_WO
IB: nes: convert to use DRIVER_ATTR_RW
HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups
arm: ecard: fix dev_groups patch typo
tty: serdev: use dev_groups and not dev_attrs for bus_type
sparc: vio: use dev_groups and not dev_attrs for bus_type
hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type
...
Once upon a time there were only two PP (page protection) bits. In ISA
2.03 an additional PP bit was added, but because of the layout of the
HPTE it could not be made contiguous with the existing PP bits.
The result is that we now have three PP bits, named pp0, pp1, pp2,
where pp0 occupies bit 63 of dword 1 of the HPTE and pp1 and pp2
occupy bits 1 and 0 respectively. Until recently Linux hasn't used
pp0, however with the addition of _PAGE_KERNEL_RO we started using it.
The problem arises in the LPAR code, where we need to translate the PP
bits into the argument for the H_PROTECT hypercall. Currently the code
only passes bits 0-2 of newpp, which covers pp1, pp2 and N (no
execute), meaning pp0 is not passed to the hypervisor at all.
We can't simply pass it through in bit 63, as that would collide with a
different field in the flags argument, as defined in PAPR. Instead we
have to shift it down to bit 8 (IBM bit 55).
Fixes: e58e87adc8 ("powerpc/mm: Update _PAGE_KERNEL_RO")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Simplify the test, rework change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge our fixes branch, a few of them are tripping people up while
working on top of next, and we also have a dependency between the CXL
fixes and new CXL code we want to merge into next.
POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
DMA_ERROR_CODE is going to go away, so don't rely on it. Instead
define a ->mapping_error method for all IOMMU based dma operation
instances. The direct ops don't ever return an error and don't
need a ->mapping_error method.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a
compile time constant, which shouldn't be compared against during cpu kick.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
During secondary start, we do not need to BUG_ON if an invalid CPU number
is passed. We already print an error if secondary cannot be started, so
just return an error instead.
Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To register fadump, boot memory area - the size of low memory chunk that
is required for a kernel to boot successfully when booted with restricted
memory, is assumed to have no holes. But this memory area is currently
not protected from hot-remove operations. So, fadump could fail to
re-register after a memory hot-remove operation, if memory is removed
from boot memory area. To avoid this, ensure that memory from boot
memory area is not hot-removed when fadump is registered.
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: <linuxppc-dev@lists.ozlabs.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The dev_attrs field has long been "depreciated" and is finally being
removed, so move the driver to use the "correct" dev_groups field
instead for struct bus_type.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Johan Hovold <johan@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There was only 2 remaining users of CLASS_ATTR() so let's finally get
rid of them and force everyone to use the correct RW/RO/WO versions
instead.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Collation of some spelling fixes from Colin.
Attemping -> Attempting
intialized -> initialized
missmanaged -> mismanaged
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding or removing memory, the aa_index (affinity value) for the
memblock must also be converted to match the endianness of the rest
of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval
of the attribute will likely lead to non-existent nodes, followed by
using the default node in the code inappropriately.
Fixes: 5f97b2a0d1 ("powerpc/pseries: Implement memory hotplug add in the kernel")
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Fix sparse warnings in drivers/of/.
- Add more overlay unittests.
- Update dtc to v1.4.4-8-g756ffc4f52f6. This adds more checks on dts
files such as unit-address formatting and stricter character sets for
node and property names.
- Add a common DT modalias function.
- Move trivial-devices.txt up and out of i2c dir.
- ARM NVIC interrupt controller binding.
- Vendor prefixes for Sensirion, Dioo, Nordic, ROHM.
- Correct some binding file locations.
-----BEGIN PGP SIGNATURE-----
iQItBAABCAAXBQJZDM+bEBxyb2JoQGtlcm5lbC5vcmcACgkQ+vtdtY28YcM7Yw/+
NPgcfP2iFXWTC/D54neh/QliH7n5jO1YILQOd5ZJulTaKVCv1sNf4JCFVDQO/vuO
592f2jq/blhhh8yKrH0uzHvTADYO7G+XEv3D64Lu/MpmnBHTjut9WeG5h8SZWJD+
jGoz3Kx+Cmgxh/w1Ud/fh2qO3T0Es74TH+Ovi/AfpiywiYD8AqMlw0Tk+UtMRb12
NsihgiYXo8nT/BKe9aOFxOJjV9Pp+pSmyX5Mu2IdOtLnl0MXk5Rvn0mXLU2/oN5n
MONTyielvLB9opxaaSeSBadv3iXcVTagH6MinYjeQRwsLGWDy2YzxLccC/jBkyR5
v8X/IJtpivUzCm3ji+mFEKje5u5c+N3ZLhKmTkeyNBJGwrnD7zj0gVcM5GGdK5aW
Q4exqECklSgmpiCvlL1KWyXi3QsgMuu/wbsv5H5PgDe1wgMAtfKPrIm4kPpbLpW2
Dp+scFfL1v9drfvbcEum6PNw2/EGZY/okaRlcr0zFn8eMsa+yBlPBIXNLzzO7arp
6/lU6O5jDaSFRVKeKZ4qYSsc3GvN81XV+d9go6R3WR964xK4JdEkyt9Hntr1H0Hh
lBwyhSWH4nWnsXunc4GepRPVw+cdnOQdrj6T68bqLM5Gd6XWjh88WIDXqLynH6Kn
OBBBP/lOD7Su3antNkjnlaX9TP6BF8rNQePWne3AzzA=
=ca1M
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- fix sparse warnings in drivers/of/
- add more overlay unittests
- update dtc to v1.4.4-8-g756ffc4f52f6. This adds more checks on dts
files such as unit-address formatting and stricter character sets for
node and property names
- add a common DT modalias function
- move trivial-devices.txt up and out of i2c dir
- ARM NVIC interrupt controller binding
- vendor prefixes for Sensirion, Dioo, Nordic, ROHM
- correct some binding file locations
* tag 'devicetree-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (24 commits)
of: fix sparse warnings in fdt, irq, reserved mem, and resolver code
of: fix sparse warning in of_pci_range_parser_one
of: fix sparse warnings in of_find_next_cache_node
of/unittest: Missing unlocks on error
of: fix uninitialized variable warning for overlay test
of: fix unittest build without CONFIG_OF_OVERLAY
of: Add unit tests for applying overlays
of: per-file dtc compiler flags
fpga: region: add missing DT documentation for config complete timeout
of: Add vendor prefix for ROHM Semiconductor
of: fix "/cpus" reference leak in of_numa_parse_cpu_nodes()
of: Add vendor prefix for Nordic Semiconductor
dt-bindings: arm,nvic: Binding for ARM NVIC interrupt controller on Cortex-M
dtc: update warning settings for new bus and node/property name checks
scripts/dtc: Update to upstream version v1.4.4-8-g756ffc4f52f6
scripts/dtc: automate getting dtc version and log in update script
of: Add function for generating a DT modalias with a newline
of: fix of_device_get_modalias returned length when truncating buffers
Documentation: devicetree: move trivial-devices out of I2C realm
dt-bindings: add vendor prefix for Dioo
..
This enables VFIO on pseries host in order to allow VFIO in nested guest under
PR KVM or DPDK in a HV guest. This adds support of the VFIO_SPAPR_TCE_IOMMU
type.
This adds exchange() callback to allow TCE updates by the SPAPR TCE IOMMU
driver in VFIO.
This initializes DMA32 window parameters in iommu_table_group as as this does
not implement VFIO_SPAPR_TCE_v2_IOMMU and VFIO_SPAPR_TCE_IOMMU just reuses the
existing DMA32 window.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Have the NMI IPI code use this op when the platform defines it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Historically struct device_node references were tracked using a kref embedded as
a struct field. Commit 75b57ecf9d ("of: Make device nodes kobjects so they
show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
the device tree could by more simply exposed to userspace using sysfs.
Commit 0829f6d1f6 ("of: device_node kobject lifecycle fixes") (Mar 2014)
followed up these changes to better control the kobject lifecycle and in
particular the referecne counting via of_node_get(), of_node_put(), and
of_node_init().
A result of this second commit was that it introduced an of_node_put() call when
a dynamic node is detached, in of_node_remove(), that removes the initial kobj
reference created by of_node_init().
Traditionally as the original dynamic device node user the pseries code had
assumed responsibilty for releasing this final reference in its platform
specific DLPAR detach code.
This patch fixes a refcount underflow introduced by commit 0829f6d1f6, and
recently exposed by the upstreaming of the recount API.
Messages like the following are no longer seen in the kernel log with this
patch following DLPAR remove operations of cpus and pci devices.
rpadlpar_io: slot PHB 72 removed
refcount_t: underflow; use-after-free.
------------[ cut here ]------------
WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110
Fixes: 0829f6d1f6 ("of: device_node kobject lifecycle fixes")
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
[mpe: Make change log commit references more verbose]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The pseries platform supports Power4 and later CPUs, all of which are
multithreaded and/or multicore.
In practice no one ever builds a SMP=n kernel for these machines. So as
we did for powernv, have the pseries platform imply SMP=y.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Change the doorbell callers to know about their msgsnd addressing,
rather than have them set a per-cpu target data tag at boot that gets
sent to the cause_ipi functions. The data is only used for doorbell IPI
functions, no other IPI types, so it makes sense to keep that detail
local to doorbell.
Have the platform code understand doorbell IPIs, rather than the
interrupt controller code understand them. Platform code can look at
capabilities it has available and decide which to use.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.
Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.
Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>