We do them at the start of tlb flush, and we are sure a pte update will be
followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This helps us to do some optimization for application exit case, where we can
skip the DD1 style pte update sequence.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In the kernel we do follow the below sequence in different code paths.
pte = ptep_get_clear(ptep)
....
set_pte_at(ptep, pte)
We do that for mremap, autonuma protection update and softdirty clearing. This
implies our optimization to skip a tlb flush when clearing a pte update is
not valid, because for DD1 system that followup set_pte_at will be done witout
doing the required tlbflush. Fix that by always doing the dd1 style pte update
irrespective of new_pte value. In a later patch we will optimize the application
exit case.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With radix, we can get page fault with DSISR_PROTFAULT value set in case of
PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check
where we consider the access bad. For autonuma we should fall through and fixup
the access mask correctly.
Without this patch we trigger the WARN_ON() on radix. This code moves that
WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside
the if condition making it apply for all type of faults (exec/write/read). It
is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to
handle the D/I cache sync.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently xmon data-breakpoint feature is broken.
Whenever there is a watchpoint match occurs, hw_breakpoint_handler will
be called by do_break via notifier chains mechanism. If watchpoint is
registered by xmon, hw_breakpoint_handler won't find any associated
perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break
also returns without notifying to xmon.
Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not
find any perf_event associated with matched watchpoint, rather than
NOTIFY_STOP, which tells the core code to continue calling the other
breakpoint handlers including the xmon one.
Cc: stable@vger.kernel.org
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The recently merged HPT (Hash Page Table) resize support broke the build
when BOOK3S_64=n (ie. 32-bit or 64-bit Book3E) and MEMORY_HOTPLUG=y:
arch/powerpc/mm/mem.o: In function `.arch_add_memory':
(.text+0x4e4): undefined reference to `.resize_hpt_for_hotplug'
Fix it by adding a dummy version.
Fixes: 438cc81a41 ("powerpc/pseries: Automatically resize HPT for memory hot add/remove")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y:
arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’:
arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function)
if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
^~~~~~~~~~~
Fix it by using the existing is_migrate_cma_page(), which evaulates to
false when CMA=n.
Fixes: 2e5bbb5461 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If we enable RADIX but disable HUGETLBFS, the build breaks with:
arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge'
arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge'
Fix it by stubbing those functions when HUGETLBFS=n.
Fixes: 4b5d62ca17 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix typo in "hotplug_delay" parameter description. This allows modinfo
to match the help text to the parameter.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
... as the generic weak variant will do.
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The final paragraph of the help text is reversed. We want to enable
this option by default, and disable it if the toolchain has a working
-mprofile-kernel.
Fixes: 8c50b72a3b ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently the opal_exit tracepoint usually shows the opcode as 0:
<idle>-0 [047] d.h. 635.654292: opal_entry: opcode=63
<idle>-0 [047] d.h. 635.654296: opal_exit: opcode=0 retval=0
kopald-1209 [019] d... 636.420943: opal_entry: opcode=10
kopald-1209 [019] d... 636.420959: opal_exit: opcode=0 retval=0
This is because we incorrectly load the opcode into r0 before calling
__trace_opal_exit(), whereas it expects the opcode in r3 (first function
parameter). In fact we are leaving the retval in r3, so opcode and
retval will always show the same value.
Instead load the opcode into r3, resulting in:
<idle>-0 [040] d.h. 636.618625: opal_entry: opcode=63
<idle>-0 [040] d.h. 636.618627: opal_exit: opcode=63 retval=0
Fixes: c49f63530b ("powernv: Add OPAL tracepoints")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently we get a warning that _mcount() can't be versioned:
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
Add a prototype to asm-prototypes.h to fix it.
The prototype is not really correct, mcount() is not a normal function,
it has a special ABI. But for the purpose of versioning it doesn't
matter.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The generic implementation of of_node_to_nid() is EXPORT_SYMBOL, added
in commit 298535c00a ("of, numa: Add NUMA of binding
implementation.").
The powerpc implementation added in commit 953039c8df ("[PATCH]
powerpc: Allow devices to register with numa topology") is
EXPORT_SYMBOL_GPL.
This creates an inconsistency for of_node_to_nid() callers across
architectures.
Update the powerpc implementation to be exported consistently with the
generic implementation.
Signed-off-by: Shailendra Singh <shailendras@nvidia.com>
Reviewed-by: Andy Ritger <aritger@nvidia.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Kprobe placed on the kretprobe_trampoline() during boot time can be
optimized, since the instruction at probe point is a 'nop'.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.
To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.
Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
- one, having to do with a missing include for linux/module.h for
MODULE_NAME_LEN -- this didn't show up previously since the only
users of kprobe_lookup_name were in kprobes.c, which included
linux/module.h through other headers, and
- two, with a missing const qualifier for a local variable which ends
up referring a string literal. Again, this is unique to how
kprobe_lookup_name is being invoked in optprobes.c
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.
This patch verifies whether the target address is within +/- 32MB
range or not.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.
This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed. This tries to always keep the HPT at
a reasonable size for our current memory size.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.
If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.
This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.
The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().
This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.
These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".
It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Recent versions of OPAL can provide names for the various OPAL interrupts,
so let's use them. This also modernises the code that fetches the
interrupt array to use the helpers provided by the generic code instead
of hand-parsing the property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Free irqs on error, check allocation of names, consolidate error
handling, whitespace.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On some CAPP errors we see console messages that prints unknown HMIs for
which CAPI recovery is in progress. This patch fixes this by printing
correct error info for HMI generated due to CAPP recovery.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
These are common on bare metal machines, so put them in the defconfig.
This adds 216KB to the vmlinux size
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All entry points already read the MSR so they can easily do
the right thing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Without this we will always find the feature disabled.
Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).
We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All shipping firmware versions have it wrong in the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Retrieved from device-tree when available
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It will be used to calculate the associativity
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).
We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
We don't patch instructions based on the cache lines or block
sizes these days.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The variables are defined twice in setup_32.c and setup_64.c, do it
once in setup-common.c instead
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
It's an kernel private macro, it doesn't belong there
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Enable support for GCC plugins on powerpc.
Add an additional version check in gcc-plugins-check to advise users to
upgrade to gcc 5.2+ on powerpc to avoid issues with header files (gcc <=
4.6) or missing copies of rs6000-cpus.def (4.8 to 5.1 on 64-bit
targets).
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 38addce8b6 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover
prom_init.o, but ended up saying init.o (which doesn't exist) instead.
Fix the typo.
Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
As we add the ability to do DLPAR of additional devices through
the sysfs interface we need to know which devices are supported.
This adds the reporting of supported devices with a comma separated
list reported in the existing /sys/kernel/dlpar.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Extend the existing PRRN infrastructure to perform the actual affinity
updating for cpus and memory in addition to the device tree updating.
For cpus, dynamic affinity updating already appears to exist in the
kernel in the form of arch_update_cpu_topology(). For memory, we must
place a READD operation on the hotplug queue for any phandle included in
the PRRN event that is determined to be an LMB.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When adding and removing LMBs we should make the acquire/release of
the DRC a separate step to allow for a few improvements. First
this will ensure that LMBs removed during a remove by count operation
are all available if a error occurs and we need to add them back. By
first removeing all the LMBs from the kernel before releasing their
DRCs the LMBs are available to add back should an error occur.
Also, this will allow for faster re-add operations of memory for
PRRN event handling since we can skip the unneeded step of having
to release the DRC and the acquire it back.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>