Commit Graph

678293 Commits

Author SHA1 Message Date
Balbir Singh
d924cc3fed powerpc/vmlinux.lds: Align __init_begin to 16M
For CONFIG_STRICT_KERNEL_RWX align __init_begin to 16M. We use 16M
since its the larger of 2M on radix and 16M on hash for our linear
mapping. The plan is to have .text, .rodata and everything upto
__init_begin marked as RX. Note we still have executable read only
data. We could further align rodata to another 16M boundary. I've used
keeping text plus rodata as read-only-executable as a trade-off to
doing read-only-executable for text and read-only for rodata.

We don't use multi PT_LOAD in PHDRS because we are not sure if all
bootloaders support them. This patch keeps PHDRS in vmlinux.lds.S as
the same they are with just one PT_LOAD for all of the kernel marked
as RWX (7).

mpe: What this means is the added alignment bloats the resulting
binary on disk, a powernv kernel goes from 17M to 22M.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
37bc3e5fd7 powerpc/lib/code-patching: Use alternate map for patch_instruction()
This patch creates the window using text_poke_area, allocated via
get_vm_area(). text_poke_area is per CPU to avoid locking.
text_poke_area for each cpu is setup using late_initcall, prior to
setup of these alternate mapping areas, we continue to use direct
write to change/modify kernel text. With the ability to use alternate
mappings to write to kernel text, it provides us the freedom to then
turn text read-only and implement CONFIG_STRICT_KERNEL_RWX.

This code is CPU hotplug aware to ensure that the we have mappings for
any new cpus as they come online and tear down mappings for any CPUs
that go offline.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
efe4fbb1ac powerpc/xmon: Add patch_instruction() support for xmon
Move from mwrite() to patch_instruction() for xmon for
breakpoint addition and removal.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
f3eca95638 powerpc/kprobes/optprobes: Use patch_instruction()
So that we can implement STRICT_RWX, use patch_instruction() in
optprobes.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
d07df82c43 powerpc/kprobes: Move kprobes over to patch_instruction()
arch_arm/disarm_probe() use direct assignment for copying
instructions, replace them with patch_instruction(). We don't need to
call flush_icache_range() because patch_instruction() does it for us.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
7f6d498ed3 powerpc/mm/radix: Fix execute permissions for interrupt_vectors
Commit 9abcc981de ("powerpc/mm/radix: Only add X for pages
overlapping kernel text") changed the linear mapping on Radix to only
mark the kernel text executable.

However if the kernel is run relocated, for example as a kdump kernel,
then the exception vectors are split from the kernel text, ie. they
remain at real address 0.

We tend to get away with it, because the kernel itself will usually be
below 1G, which means the 1G page at 0-1G is marked executable and
everything works OK. However if the kernel is loaded above 1G, or the
system has less than 1G in total (meaning we can't use a 1G page),
then the exception vectors will not be marked executable and the
kernel will fail to boot.

Fix it by also checking if the address range overlaps the exception
vectors when deciding if we should add PAGE_KERNEL_X.

Fixes: 9abcc981de ("powerpc/mm/radix: Only add X for pages overlapping kernel text")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Combine with the existing check, rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Balbir Singh
e71ff982ae powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
Once upon a time there were only two PP (page protection) bits. In ISA
2.03 an additional PP bit was added, but because of the layout of the
HPTE it could not be made contiguous with the existing PP bits.

The result is that we now have three PP bits, named pp0, pp1, pp2,
where pp0 occupies bit 63 of dword 1 of the HPTE and pp1 and pp2
occupy bits 1 and 0 respectively. Until recently Linux hasn't used
pp0, however with the addition of _PAGE_KERNEL_RO we started using it.

The problem arises in the LPAR code, where we need to translate the PP
bits into the argument for the H_PROTECT hypercall. Currently the code
only passes bits 0-2 of newpp, which covers pp1, pp2 and N (no
execute), meaning pp0 is not passed to the hypervisor at all.

We can't simply pass it through in bit 63, as that would collide with a
different field in the flags argument, as defined in PAPR. Instead we
have to shift it down to bit 8 (IBM bit 55).

Fixes: e58e87adc8 ("powerpc/mm: Update _PAGE_KERNEL_RO")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Simplify the test, rework change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:19 +10:00
Naveen N. Rao
90653a8405 powerpc/64s: Blacklist rtas entry/exit from kprobes
We can't take traps with relocation off, so blacklist enter_rtas() and
rtas_return_loc(). However, instead of blacklisting all of enter_rtas(),
introduce a new symbol __enter_rtas from where on we can't take a trap
and blacklist that.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:02 +10:00
Naveen N. Rao
15770a13be powerpc/64s: Blacklist functions invoked on a trap
Blacklist all functions involved while handling a trap. We:
- convert some of the symbols into private symbols, and
- blacklist most functions involved while handling a trap.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:01 +10:00
Naveen N. Rao
3639d6619c powerpc/64s: Un-blacklist system_call() from kprobes
It is actually safe to probe system_call() in entry_64.S, but only till
we unset MSR_RI. To allow this, add a new symbol system_call_exit()
after the mtmsrd and blacklist that.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:01 +10:00
Naveen N. Rao
266de3a842 powerpc/64s: Move system_call() symbol to just after setting MSR_EE
It is common to get a PMU interrupt right after the mtmsr instruction that
enables interrupts. Due to this, the stack trace profile gets needlessly split
across system_call_common() and system_call().

Previously, system_call() symbol was at the current place to hide a few
earlier symbols which have since been made private or removed entirely.

So, let's move system_call() slightly higher up, right after the mtmsr
instruction that enables interrupts. Convert existing references to
system_call to a local syscall symbol.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:12:00 +10:00
Naveen N. Rao
cf7d6fb067 powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
Convert some of the symbols into private symbols and blacklist
system_call_common() and system_call() from kprobes. We can't take a
trap at parts of these functions as either MSR_RI is unset or the
kernel stack pointer is not yet setup.

Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Don't convert system_call_common to _GLOBAL()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:11:25 +10:00
Naveen N. Rao
9d6c452352 powerpc/64s: Convert .L__replay_interrupt_return to a local label
Commit b48bbb82e2 ("powerpc/64s: Don't unbalance the return branch
predictor in __replay_interrupt()") introduced __replay_interrupt_return
symbol with '.L' prefix in hopes of keeping it private. However, due to
the use of LOAD_REG_ADDR(), the assembler kept this symbol visible. Fix
the same by instead using the local label '1'.

Fixes: Commit b48bbb82e2 ("powerpc/64s: Don't unbalance the return branch
predictor in __replay_interrupt()")
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:08:51 +10:00
Naveen N. Rao
83e840c770 powerpc64/elfv1: Only dereference function descriptor for non-text symbols
Currently, we assume that the function pointer we receive in
ppc_function_entry() points to a function descriptor. However, this is
not always the case. In particular, assembly symbols without the right
annotation do not have an associated function descriptor. Some of these
symbols are added to the kprobe blacklist using _ASM_NOKPROBE_SYMBOL().

When such addresses are subsequently processed through
arch_deref_entry_point() in populate_kprobe_blacklist(), we see the
below errors during bootup:
    [    0.663963] Failed to find blacklist at 7d9b02a648029b6c
    [    0.663970] Failed to find blacklist at a14d03d0394a0001
    [    0.663972] Failed to find blacklist at 7d5302a6f94d0388
    [    0.663973] Failed to find blacklist at 48027d11e8610178
    [    0.663974] Failed to find blacklist at f8010070f8410080
    [    0.663976] Failed to find blacklist at 386100704801f89d
    [    0.663977] Failed to find blacklist at 7d5302a6f94d00b0

Fix this by checking if the function pointer we receive in
ppc_function_entry() already points to kernel text. If so, we just
return it as is. If not, we assume that this is a function descriptor
and proceed to dereference it.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:08:50 +10:00
Christophe Lombard
3ced8d7300 cxl: Export library to support IBM XSL
This patch exports a in-kernel 'library' API which can be called by
other drivers to help interacting with an IBM XSL on a POWER9 system.

The XSL (Translation Service Layer) is a stripped down version of the
PSL (Power Service Layer) used in some cards such as the Mellanox CX5.
Like the PSL, it implements the CAIA architecture, but has a number
of differences, mostly in it's implementation dependent registers.

The XSL also uses a special DMA cxl mode, which uses a slightly
different init sequence for the CAPP and PHB.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:07:03 +10:00
Michael Ellerman
218ea31039 Merge branch 'fixes' into next
Merge our fixes branch, a few of them are tripping people up while
working on top of next, and we also have a dependency between the CXL
fixes and new CXL code we want to merge into next.
2017-07-03 23:05:43 +10:00
Masahiro Yamada
5405c92bc2 powerpc/dts: Use #include "..." to include local DT
Most of DT files in PowerPC use #include "..." to make pre-processor
include DT in the same directory, but we have 3 exceptional files
that use #include <...> for that.

Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from
dtc_cpp_flags.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:00:33 +10:00
Thiago Jung Bauermann
bfaa7834b6 powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
On POWER9 SMT8 the 24x7 API returns two result elements for physical core
and virtual CPU events and we need to add their counts to get the final
result.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:33 +10:00
Thiago Jung Bauermann
2e6553aae3 powerpc/perf/hv-24x7: Support v2 of the hypervisor API
POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:33 +10:00
Thiago Jung Bauermann
ebd4a5a3eb powerpc/perf/hv-24x7: Minor improvements
There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request.

There's also an HV_PERF_DOMAIN_MAX constant, so use it in
h_24x7_event_init. This makes the comment above the check redundant,
so remove it.

In add_event_to_24x7_request, a statement is terminated with a comma
instead of a semicolon. Fix it.

In hv-24x7.h, improve comments in struct hv_24x7_result.

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:32 +10:00
Thiago Jung Bauermann
38d8184610 powerpc/perf/hv-24x7: Fix return value of hcalls
The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix
this in the code.

The H_GET_24X7_DATA hcall can return a signed error code, so fix this in
the code. Also, don't truncate it to 32 bit to use as return value for
make_24x7_request. In case of error h_24x7_event_commit_txn passes that
return value to generic code, so it should be a proper errno. The other
caller of make_24x7_request is single_24x7_request, whose callers don't
actually care which error code is returned so they are not affected by this
change.

Finally, h_24x7_get_value doesn't use the error code from
single_24x7_request, so there's no need to store it.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:32 +10:00
Thiago Jung Bauermann
62714a1492 powerpc-perf/hx-24x7: Don't log failed hcall twice
make_24x7_request already calls log_24x7_hcall if it fails, so callers
don't have to do it again.

In fact, since the latter is now only called from the former, there's no
need for a separate log_24x7_hcall anymore so remove it.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:31 +10:00
Thiago Jung Bauermann
41f577eb01 powerpc/perf/hv-24x7: Properly iterate through results
hv-24x7.h has a comment mentioning that result_buffer->results can't be
indexed as a normal array because it may contain results of variable sizes,
so fix the loop in h_24x7_event_commit_txn to take the variation into
account when iterating through results.

Another problem in that loop is that it sets h24x7hw->events[i] to NULL.
This assumes that only the i'th result maps to the i'th request, but that
is not guaranteed to be true. We need to leave the event in the array so
that we don't dereference a NULL pointer in case more than one result maps
to one request.

We still assume that each result has only one result element, so warn if
that assumption is violated.

Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:30 +10:00
Thiago Jung Bauermann
36c8fb2c61 powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check
request_buffer can hold 254 requests, so if it already has that number of
entries we can't add a new one.

Also, define constant to show where the number comes from.

Fixes: e3ee15dc5d ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()")
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:30 +10:00
Thiago Jung Bauermann
12bf85a710 powerpc/perf/hv-24x7: Fix passing of catalog version number
H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from
the first catalog page obtained previously. This is a 64 bit number, but
create_events_from_catalog truncates it to 32-bit.

This worked on POWER8, but POWER9 actually uses the upper bits so the call
fails with H_P3 because the hypervisor doesn't recognize the version.

This patch also adds the hcall return code to the error message, which is
helpful when debugging the problem.

Fixes: 5c5cd7b502 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events")
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:29 +10:00
Oliver O'Halloran
c07424414c powerpc/mm: Enable ZONE_DEVICE on powerpc
Flip the switch. Running around and screaming "IT'S ALIVE" is optional,
but recommended.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:29 +10:00
Anton Blanchard
1b644f57b3 powerpc/mm: Wire up hpte_removebolted for powernv
Adds support for removing bolted (i.e kernel linear mapping) mappings on
powernv. This is needed to support memory hot unplug operations which
are required for the teardown of DAX/PMEM devices.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:28 +10:00
Oliver O'Halloran
ebd3119793 powerpc/mm: Add devmap support for ppc64
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S.  This
is used to differentiate device backed memory from transparent huge
pages since they are handled in more or less the same manner by the core
mm code.

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:28 +10:00
Oliver O'Halloran
b584c25440 powerpc/vmemmap: Add altmap support
Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An
altmap is a driver provided region that is used to provide the backing
storage for the struct pages of ZONE_DEVICE memory. In situations where
large amount of ZONE_DEVICE memory is being added to the system the
altmap reduces pressure on main system memory by allowing the mm/
metadata to be stored on the device itself rather in main memory.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:27 +10:00
Oliver O'Halloran
d7d9b612f1 powerpc/vmemmap: Reshuffle vmemmap_free()
Removes an indentation level and shuffles some code around to make the
following patch cleaner. No functional changes.

Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:26 +10:00
Oliver O'Halloran
65f7d04978 mm, x86: Add ARCH_HAS_ZONE_DEVICE to Kconfig
Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as
new architectures (and platforms) get ZONE_DEVICE support. Move to an
arch selected Kconfig option to save us the trouble.

Cc: linux-mm@kvack.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:26 +10:00
Oliver O'Halloran
7a849a6cf3 powerpc/hugetlbfs: Export HPAGE_SHIFT
Export it so it can be referenced inside a module.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:25 +10:00
Andrew Donnellan
8c7d0a0406 MAINTAINERS: cxl: update maintainership
As Ian's stepping down from his maintainer role now that he's leaving IBM,
Frederic has asked me to add myself to the cxl maintainer list. Updating
accordingly.

Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Cc: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:25 +10:00
Ian Munsie
5fc3a7f754 MAINTAINERS: Remove myself as cxl maintainer
I am no longer employed by IBM and will no longer have access to cxl
hardware, so remove myself as a cxl maintainer.

If anyone needs to contact me in the future, please use my personal
email address darkstarsword@gmail.com

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:24 +10:00
Nicholas Piggin
4e287e655e powerpc: use spin loop primitives in some functions
Use the different spin loop primitives in some simple powerpc
spin loops, including those which will spin as a common case.

This will help to test the spin loop primitives before more
conversions are done.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add some includes of <linux/processor.h>]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:24 +10:00
Nicholas Piggin
ede8e2bbb0 powerpc/64: implement spin loop primitives
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:17 +10:00
Nicholas Piggin
fd851a3cdc spin loop primitives for busy waiting
Current busy-wait loops are implemented by repeatedly calling cpu_relax()
to give an arch option for a low-latency option to improve power and/or
SMT resource contention.

This poses some difficulties for powerpc, which has SMT priority setting
instructions (priorities determine how ifetch cycles are apportioned).
powerpc's cpu_relax() is implemented by setting a low priority then
setting normal priority. This has several problems:

 - Changing thread priority can have some execution cost and potential
   impact to other threads in the core. It's inefficient to execute them
   every time around a busy-wait loop.

 - Depending on implementation details, a `low ; medium` sequence may
   not have much if any affect. Some software with similar pattern
   actually inserts a lot of nops between, in order to cause a few fetch
   cycles with the low priority.

 - The busy-wait loop runs with regular priority. This might only be a few
   fetch cycles, but if there are several threads running such loops, they
   could cause a noticable impact on a non-idle thread.

Implement spin_begin, spin_end primitives that can be used around busy
wait loops, which default to no-ops. And spin_cpu_relax which defaults to
cpu_relax.

This will allow architectures to hook the entry and exit of busy-wait
loops, and will allow powerpc to set low SMT priority at entry, and
normal priority at exit.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 22:49:11 +10:00
Akshay Adiga
4d0d7c02df powerpc/powernv/idle: Clear r12 on wakeup from stop lite
pnv_wakeup_noloss() expects r12 to contain SRR1 value to determine if the wakeup
reason is an HMI in CHECK_HMI_INTERRUPT.

When we wakeup with ESL=0, SRR1 will not contain the wakeup reason, so there is
no point setting r12 to SRR1.

However, we don't set r12 at all so r12 contains garbage (likely a kernel
pointer), and is still used to check HMI assuming that it contained SRR1. This
causes the OPAL msglog to be filled with the following print:

  HMI: Received HMI interrupt: HMER = 0x0040000000000000

This patch clears r12 after waking up from stop with ESL=EC=0, so that we don't
accidentally enter the HMI handler in pnv_wakeup_noloss() if the value of
r12[42:45] corresponds to HMI as wakeup reason.

Prior to commit 9d29250136 ("powerpc/64s/idle: Avoid SRR usage in idle
sleep/wake paths") this bug existed, in that we would incorrectly look at SRR1
to check for a HMI when SRR1 didn't contain a wakeup reason. However the SRR1
value would just happen to never have bits 42:45 set.

Fixes: 9d29250136 ("powerpc/64s/idle: Avoid SRR usage in idle sleep/wake paths")
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log and comment massaging]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 22:46:06 +10:00
Anshuman Khandual
39e4675183 powerpc/mm: Add comments on vmemmap physical mapping
Adds some explaination on how the vmemmap based struct page layout's
physical mapping is allocated and tracked through linked list. It
also keeps note of a possible race condition.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:17 +10:00
Anshuman Khandual
b0f36c10de powerpc/mm: Add comments to the vmemmap layout
Add some explaination to the layout of vmemmap virtual address
space and how physical page mapping is only used for valid PFNs
present at any point on the system.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:17 +10:00
Santosh Sivaraj
c642af9c41 powerpc/smp: Convert NR_CPUS to nr_cpu_ids
nr_cpu_ids can be limited by nr_cpus boot parameter, whereas NR_CPUS is a
compile time constant, which shouldn't be compared against during cpu kick.

Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:16 +10:00
Santosh Sivaraj
f8d0d5dc64 powerpc/smp: Do not BUG_ON if invalid CPU during kick
During secondary start, we do not need to BUG_ON if an invalid CPU number
is passed. We already print an error if secondary cannot be started, so
just return an error instead.

Signed-off-by: Santosh Sivaraj <santosh@fossix.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:16 +10:00
Javier Martinez Canillas
adeb8667ea powerpc/44x: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:15 +10:00
Javier Martinez Canillas
8d0590cefd powerpc/83xx: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:14 +10:00
Javier Martinez Canillas
9b40916827 powerpc/512x: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:14 +10:00
Javier Martinez Canillas
226b9391d1 powerpc/fsl: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:13 +10:00
Javier Martinez Canillas
fd39318806 powerpc/5200: Add generic compatible string for I2C EEPROM
The at24 driver allows to register I2C EEPROM chips using different vendor
and devices, but the I2C subsystem does not take the vendor into account
when matching using the I2C table since it only has device entries.

But when matching using an OF table, both the vendor and device has to be
taken into account so the driver defines only a set of compatible strings
using the "atmel" vendor as a generic fallback for compatible I2C devices.

So add this generic fallback to the device node compatible string to make
the device to match the driver using the OF device ID table.

Signed-off-by: Javier Martinez Canillas <javier@dowhile0.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:13 +10:00
Nicholas Piggin
7ded429152 cpuidle: powerpc: no memory barrier after break from idle
A memory barrier is not required after the task wakes up,
only if we clear the polling flag before waking. The case
where we have work to do is the important one, so optimise
for it.

Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:12 +10:00
Nicholas Piggin
624e46d035 cpuidle: powerpc: read mostly for common globals
Ensure these don't get put into bouncing cachelines.

Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:12 +10:00
Nicholas Piggin
3fc5ee927f cpuidle: powerpc: cpuidle set polling before enabling irqs
local_irq_enable can cause interrupts to be taken which could
take significant amount of processing time. The idle process
should set its polling flag before this, so another process that
wakes it during this time will not have to send an IPI.

Expand the TIF_POLLING_NRFLAG coverage to as large as possible.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:11 +10:00