Commit Graph

19443 Commits

Author SHA1 Message Date
Suraj Jitindar Singh
8400f87406 KVM: PPC: Book3S HV: Align gfn to L1 page size when inserting nest-rmap entry
Nested rmap entries are used to store the translation from L1 gpa to L2
gpa when entries are inserted into the shadow (nested) page tables. This
rmap list is located by indexing the rmap array in the memslot by L1
gfn. When we come to search for these entries we only know the L1 page size
(which could be PAGE_SIZE, 2M or a 1G page) and so can only select a gfn
aligned to that size. This means that when we insert the entry, so we can
find it later, we need to align the gfn we use to select the rmap list
in which to insert the entry to L1 page size as well.

By not doing this we were missing nested rmap entries when modifying L1
ptes which were for a page also passed through to an L2 guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-21 14:37:43 +11:00
Suraj Jitindar Singh
bec6e03b5e KVM: PPC: Book3S HV: Hold kvm->mmu_lock across updating nested pte rc bits
We already hold the kvm->mmu_lock spin lock across updating the rc bits
in the pte for the L1 guest. Continue to hold the lock across updating
the rc bits in the pte for the nested guest as well to prevent
invalidations from occurring.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-21 14:37:43 +11:00
Mahesh Salgaonkar
0db6896ff6 powerpc/fadump: Do not allow hot-remove memory from fadump reserved area.
For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.

However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Mahesh Salgaonkar
f86593be1e powerpc/fadump: Throw proper error message on fadump registration failure
fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the
fadump reserved memory area. Throw a meaningful error message to the
user in such case.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Mahesh Salgaonkar
a4e92ce8e4 powerpc/fadump: Reservationless firmware assisted dump
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.

In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.

In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".

We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.

Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.

Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7557         193        6822          12         541        6725
Swap:          4095           0        4095

With this patch:
[root@zzxx-yy10 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           8133         194        7464          12         475        7338
Swap:          4095           0        4095

Changes made here are completely transparent to how fadump has
traditionally worked.

Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.

TODO:
- Handle case where CMA reservation spans nodes.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Mahesh Salgaonkar
08fb726df1 powerpc/powernv: Move opal_power_control_init() call in opal_init().
opal_power_control_init() depends on opal message notifier to be
initialized, which is done in opal_init()->opal_message_init(). But both
these initialization are called through machine initcalls and it all
depends on in which order they being called. So far these are called in
correct order (may be we got lucky) and never saw any issue. But it is
clearer to control initialization order explicitly by moving
opal_power_control_init() into opal_init().

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Markus Elfring
ae6263cc33 powerpc/4xx: Delete an unnecessary return statement in two functions
The script "checkpatch.pl" pointed information out like the following.

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected functions.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Markus Elfring
a8d5dadae5 powerpc/4xx: Delete error message for a ENOMEM in two functions
Omit an extra message for a memory allocation failure in these
functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Markus Elfring
52930bc6e8 powerpc/4xx: Use seq_putc() in ocm_debugfs_show()
A single character (line break) should be put into a sequence.
Thus use the corresponding function "seq_putc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Markus Elfring
b52106a040 powerpc/4xx: Combine four seq_printf() calls into two in ocm_debugfs_show()
Some data were printed into a sequence by four separate function calls.
Print the same data by two single function calls instead.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Christophe Leroy
96d19d70e1 powerpc/8xx: Allow pinning IMMR TLB when using early debug console
CONFIG_EARLY_DEBUG_CPM requires IMMR area TLB to be pinned
otherwise it doesn't survive MMU_init, and the boot fails.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Oliver O'Halloran
5f639e5fad powerpc/powernv: Remove PCI_MSI ifdef checks
CONFIG_PCI_MSI was made mandatory by commit a311e738b6
("powerpc/powernv: Make PCI non-optional") so the #ifdef
checks around CONFIG_PCI_MSI here can be removed entirely.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Alexandre Belloni
a083787680 powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved"
Fix a spelling mistake in a register description.

Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
Ravi Bangoria
0c9108b083 Powerpc/perf: Wire up PMI throttling
Commit 14c63f17b1 ("perf: Drop sample rate when sampling is too
slow") introduced a way to throttle PMU interrupts if we're spending
too much time just processing those. Wire up powerpc PMI handler to
use this infrastructure.

We have throttling of the *rate* of interrupts, but this adds
throttling based on the *time taken* to process the interrupts.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-21 11:32:49 +11:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Radim Krčmář
cfdfaf4a86 PPC KVM update for 4.21
The main new feature this time is support in HV nested KVM for passing
 a device that is emulated by a level 0 hypervisor and presented to
 level 1 as a PCI device through to a level 2 guest using VFIO.
 
 Apart from that there are improvements for migration of radix guests
 under HV KVM and some other fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJcGFzEAAoJEJ2a6ncsY3GfKjoH/Azcf8QIO5ftyHrjazFZOSUh
 5Lr24HZTYHheowp6obzuZWRAIyckHmflRmOkv8RVGuA8+Sp+m5pBxN3WTVPOwDUh
 WanOWVGJsuhl6qATmkm7xIxmYhQEyLxVNbnWva7WXuZ92rgGCNfHtByHWAx/7vTe
 q5Shr4fLIQ8HRzor8Xqqph1I0hQNTE9VsaK1hW/PxI0gsO8qjDwOR8SDpT/aaJrS
 Sir+lM0TwCbJREuObDxYAXn1OWy8rMYjlb9fEBv5tmPCQKiB9vJz4tV+ahR9eJ14
 PEF57MoBOGwzQXo4geFLuo/Bu8fDygKsKQX1eYGcn6tRGA4pnTxzYl0+dHLBkOM=
 =3WkD
 -----END PGP SIGNATURE-----

Merge tag 'kvm-ppc-next-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

PPC KVM update for 4.21 from Paul Mackerras

The main new feature this time is support in HV nested KVM for passing
a device that is emulated by a level 0 hypervisor and presented to
level 1 as a PCI device through to a level 2 guest using VFIO.

Apart from that there are improvements for migration of radix guests
under HV KVM and some other fixes and cleanups.
2018-12-20 14:54:09 +01:00
YueHaibing
8c6c942d33 powerpc/eeh: Fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy
c20577014f powerpc/powernv/eeh/npu: Fix uninitialized variables in opal_pci_eeh_freeze_status
The current implementation of the OPAL_PCI_EEH_FREEZE_STATUS call in
skiboot's NPU driver does not touch the pci_error_type parameter so
it might have garbage but the powernv code analyzes it nevertheless.

This initializes pcierr and fstate to zero in all call sites.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy
a25de7af34 powerpc/powernv/ioda: Reduce a number of hooks in pnv_phb
fixup_phb() is never used, this removes it.

pick_m64_pe() and reserve_m64_pe() are always defined for all powernv
PHBs: they are initialized by pnv_ioda_parse_m64_window() which is
called unconditionally from pnv_pci_init_ioda_phb() which initializes
all known PHB types on powernv so we can open code them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
dfa88658fb powerpc/fsl: Update Spectre v2 reporting
Report branch predictor state flush as a mitigation for
Spectre variant 2.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy
f21b0a45e4 powerpc/powernv/ioda1: Remove dead code for a single device PE
At the moment PNV_IODA_PE_DEV is only used for NPU PEs which are not
present on IODA1 machines (i.e. POWER7) so let's remove a piece of
dead code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
3bc8ea8603 powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
If the user choses not to use the mitigations, replace
the code sequence with nops.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
e7aa61f47b powerpc/fsl: Flush branch predictor when entering KVM
Switching from the guest to host is another place
where the speculative accesses can be exploited.
Flush the branch predictor when entering KVM.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy
fa1ada7889 powerpc/powernv/npu: Remove unused headers and a macro.
The macro and few headers are not used so remove them.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
7fef436295 powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e.the kernel
is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Alexey Kardashevskiy
bdbf649efe powerpc/powernv/ioda: Allocate indirect TCE levels of cached userspace addresses on demand
The powernv platform maintains 2 TCE tables for VFIO - a hardware TCE
table and a table with userspace addresses; the latter is used for
marking pages dirty when corresponging TCEs are unmapped from
the hardware table.

a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels
on demand") enabled on-demand allocation of the hardware table,
however it missed the other table so it has still been fully allocated
at the boot time. This fixes the issue by allocating a single level,
just like we do for the hardware table.

Fixes: a68bd1267b ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
10c5e83afd powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
In order to protect against speculation attacks on
indirect branches, the branch predictor is flushed at
kernel entry to protect for the following situations:
- userspace process attacking another userspace process
- userspace process attacking the kernel
Basically when the privillege level change (i.e. the
kernel is entered), the branch predictor state is flushed.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
f633a8ad63 powerpc/fsl: Add nospectre_v2 command line argument
When the command line argument is present, the Spectre variant 2
mitigations are disabled.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
98518c4d87 powerpc/fsl: Emulate SPRN_BUCSR register
In order to flush the branch predictor the guest kernel performs
writes to the BUCSR register which is hypervisor privilleged. However,
the branch predictor is flushed at each KVM entry, so the branch
predictor has been already flushed, so just return as soon as possible
to guest.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
[mpe: Tweak comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
7d8bad99ba powerpc/fsl: Fix spectre_v2 mitigations reporting
Currently for CONFIG_PPC_FSL_BOOK3E the spectre_v2 file is incorrect:

  $ cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
  "Mitigation: Software count cache flush"

Which is wrong. Fix it to report vulnerable for now.

Fixes: ee13cb249f ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:59:03 +11:00
Diana Craciun
1cbf8990d7 powerpc/fsl: Add macro to flush the branch predictor
The BUCSR register can be used to invalidate the entries in the
branch prediction mechanisms.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:58:57 +11:00
Diana Craciun
76a5eaa38b powerpc/fsl: Add infrastructure to fixup branch predictor flush
In order to protect against speculation attacks (Spectre
variant 2) on NXP PowerPC platforms, the branch predictor
should be flushed when the privillege level is changed.
This patch is adding the infrastructure to fixup at runtime
the code sections that are performing the branch predictor flush
depending on a boot arg parameter which is added later in a
separate patch.

Signed-off-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:53:39 +11:00
Christophe Leroy
f242e0ac95 powerpc/prom: move the device tree if not in declared memory.
If the device tree doesn't reside in the memory which is declared
inside it, it has to be moved as well as this memory will not be
mapped by the kernel.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Michael Ellerman
2b874a5c7b powerpc/configs: Don't enable PPC_EARLY_DEBUG in defconfigs
This reverts the remains of commit b9ef7d6b11 ("powerpc: Update
default configurations").

That commit was proceeded by a commit which added a config option to
control use of BOOTX for early debug, ie. PPC_EARLY_DEBUG_BOOTX, and
then the update of the defconfigs was intended to not change behaviour
by then enabling the new config option.

However enabling PPC_EARLY_DEBUG had other consequences, notably
causing us to register the udbg console at the end of udbg_early_init().

This means on a system which doesn't have anything that BOOTX can
use (most systems), we register the udbg console very early but the
bootx code just throws everything away, meaning early boot messages
are never printed to the console.

What we want to happen is for the udbg console to only be registered
later (from setup_arch()) once we've setup udbg_putc, and then all
early boot messages will be replayed.

Fixes: b9ef7d6b11 ("powerpc: Update default configurations")
Reported-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Arnd Bergmann
2fea82db11 powerpc: eeh_event: convert semaphore to completion
For this use case, completions and semaphores are equivalent,
but semaphores are an awkward interface that should generally
be avoided, so use the completion instead.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Benjamin Herrenschmidt
3cfb9ebe90 powerpc/44x/bamboo: Fix PCI range
The bamboo dts has a bug: it uses a non-naturally aligned range
for PCI memory space. This isnt' supported by the code, thus
causing PCI to break on this system.

This is due to the fact that while the chip memory map has 1G
reserved for PCI memory, it's only 512M aligned. The code doesn't
know how to split that into 2 different PMMs and fails, so limit
the region to 512M.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Darren Stevens
51f4cc2047 powerpc/pasemi: Add Nemo board IRQ initroutine
Add a IRQ init routine for the Nemo board which inits and attatches
the i8259 found in the SB600, and a cascade routine to dispatch the
interrupts.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Darren Stevens
656fdf3ad8 powerpc/pasemi: Add Nemo board device init code.
Add routines for Nemo specific devices to init at boot time, these
being board level power-off and SB600's rtc.

Also add a run time variable to prevent these being activated
if we boot on a reference board.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Darren Stevens
0428a5f494 powerpc/pasemi: Add Nemo board IRQ initroutine
Add a IRQ init routine for the Nemo board which inits and attatches
the i8259 found in the SB600, and a cascade routine to dispatch the
interrupts.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Darren Stevens
68f211a4d1 powerpc/pasemi: Add PCI initialisation for Nemo board.
The A-Eon Amigaone X1000's Nemo motherboard has an AMD SB600
connected to one of the PCI-e root ports on its PaSemi
Pwrficient 1628M SoC. Normally the SB600 southbridge would be
connected to a hidden PCI-e port on the system's northbridge,
and as a result doesn't fully comply with the PCI-e spec.

Add code to relax the PCI-e detection in both the root port
and the Linux kernel allowing on board devices to be detected.

Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christophe Leroy
49a502ea23 powerpc/mm: Make NULL pointer deferences explicit on bad page faults.
As several other arches including x86, this patch makes it explicit
that a bad page fault is a NULL pointer dereference when the fault
address is lower than PAGE_SIZE

In the mean time, this page makes all bad_page_fault() messages
shorter so that they remain on one single line. And it prefixes them
by "BUG: " so that they get easily grepped.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Avoid pr_cont()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Dmitry V. Levin
8dbdec0bcb powerpc/ptrace: Combine SYSCALL_EMU & SYSCALL_TRACE handling
Combine the SYSCALL_EMU and SYSCALL_TRACE handling so that we only
call tracehook_report_syscall_entry() in one place.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
[mpe: Flesh out change log, s/cached_flags/flags/, reflow comments]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
25078dc1f7 powerpc: use mm zones more sensibly
Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.

Move to a scheme closer to what other architectures use (and I dare to
say the intent of the system):

 - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
 - ZONE_NORMAL: everything addressable by the kernel
 - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels

Also provide information on how ZONE_DMA is used by defining
ARCH_ZONE_DMA_BITS.

Contains various fixes from Benjamin Herrenschmidt.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
44a0337b32 powerpc/dma: split the two __dma_alloc_coherent implementations
The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
any code with the one for systems with coherent caches.  Split it off
and merge it with the helpers in dma-noncoherent.c that have no other
callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
9c15a87cfc powerpc/dma: remove the unused dma_iommu_ops export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
acddff9dc4 powerpc/dma: remove the unused ISA_DMA_THRESHOLD export
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
0e652390fb powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
0aeba2d0d2 powerpc/dma: properly wire up the unmap_page and unmap_sg methods
The unmap methods need to transfer memory ownership back from the
device to the cpu by identical means as dma_sync_*_to_cpu. I'm not
sure powerpc needs to do any work in this transfer direction, but
given that it does invalidate the caches in dma_sync_*_to_cpu already
we should make sure we also do so on unmapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[mpe: s/dir/direction in dma_nommu_unmap_page()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christoph Hellwig
9286356907 powerpc: allow NOT_COHERENT_CACHE for amigaone
AMIGAONE selects NOT_COHERENT_CACHE, so we better allow it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Christophe Leroy
b18f0ae92b powerpc/prom: fix early DEBUG messages
This patch fixes early DEBUG messages in prom.c:
- Use %px instead of %p to see the addresses
- Cast memblock_phys_mem_size() with (unsigned long long) to
avoid build failure when phys_addr_t is not 64 bits.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 22:21:20 +11:00
Joel Stanley
72e7bcc2cd powerpc/32: Avoid unsupported flags with clang
When building for ppc32 with clang these flags are unsupported:

  -ffixed-r2 and -mmultiple

llvm's lib/Target/PowerPC/PPCRegisterInfo.cpp marks r2 as reserved on
when building for SVR4ABI and !ppc64:

  // The SVR4 ABI reserves r2 and r13
  if (Subtarget.isSVR4ABI()) {
    // We only reserve r2 if we need to use the TOC pointer. If we have no
    // explicit uses of the TOC pointer (meaning we're a leaf function with
    // no constant-pool loads, etc.) and we have no potential uses inside an
    // inline asm block, then we can treat r2 has an ordinary callee-saved
    // register.
    const PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
    if (!TM.isPPC64() || FuncInfo->usesTOCBasePtr() || MF.hasInlineAsm())
      markSuperRegs(Reserved, PPC::R2);  // System-reserved register
    markSuperRegs(Reserved, PPC::R13); // Small Data Area pointer register
  }

This means we can safely omit -ffixed-r2 when building for 32-bit
targets.

The -mmultiple/-mno-multiple flags are not supported by clang, so
platforms that might support multiple miss out on using multiple word
instructions.

We wrap these flags in cc-option so that when Clang gains support the
kernel will be able use these flags.

Clang 8 can then build a ppc44x_defconfig which boots in Qemu:

  make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-  ppc44x_defconfig
  ./scripts/config -e CONFIG_DEVTMPFS -d DEVTMPFS_MOUNT
  make CC=clang-8 ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu-

  qemu-system-ppc -M bamboo \
   -kernel arch/powerpc/boot/zImage \
   -dtb arch/powerpc/boot/dts/bamboo.dtb \
   -initrd ~/ppc32-440-rootfs.cpio \
   -nographic -serial stdio -monitor pty -append "console=ttyS0"

Link: https://github.com/ClangBuiltLinux/linux/issues/261
Link: https://bugs.llvm.org/show_bug.cgi?id=39556
Link: https://bugs.llvm.org/show_bug.cgi?id=39555
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
3757cba80a powerpc/perf: Remove l2 bus events from HW cache event array
Remove PM_L2_ST_MISS and PM_L2_ST from HW cache event array since
these are bus events. And these needs to be programmed in groups.
Hence remove them.

Fixes: f1fb60bfde ('powerpc/perf: Export Power9 generic and cache events to sysfs')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
59029136d7 powerpc/perf: Add constraints for power9 l2/l3 bus events
In previous generation processors, both bus events and direct
events of performance monitoring unit can be individually
programmabled and monitored in PMCs.

But in Power9, L2/L3 bus events are always available as a
"bank" of 4 events. To obtain the counts for any of the
l2/l3 bus events in a given bank, the user will have to
program PMC4 with corresponding l2/l3 bus event for that
bank.

Patch enforce two contraints incase of L2/L3 bus events.

1)Any L2/L3 event when programmed is also expected to program corresponding
PMC4 event from that group.
2)PMC4 event should always been programmed first due to group constraint
logic limitation

For ex. consider these L3 bus events

PM_L3_PF_ON_CHIP_MEM (0x460A0),
PM_L3_PF_MISS_L3 (0x160A0),
PM_L3_CO_MEM (0x260A0),
PM_L3_PF_ON_CHIP_CACHE (0x360A0),

1) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
	perf stat -e "{r160A0,r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
	perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >

2) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
	perf stat -e "{r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
	perf stat -e "{r460A0,r260A0,r360A0}" < >

3) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
	perf stat -e "{r360A0}" < >

And this is a VALID group for L3 Bus events:
	perf stat -e "{r460A0,r360A0}" < >

Patch here implements group constraint logic suggested by Michael Ellerman.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
2d46d4877b powerpc/perf: Fix unit_sel/cache_sel checks
Raw event code has couple of fields "unit" and "cache" in it, to capture
the "unit" to monitor for a given pmcxsel and cache reload qualifier to
program in MMCR1.

isa207_get_constraint() refers "unit" field to update the MMCRC (L2/L3)
Event bus control fields with "cache" bits of the raw event code.
These are power8 specific and not supported by PowerISA v3.0 pmu. So wrap
the checks to be power8 specific. Also, "cache" bit field is referred to
update MMCR1[16:17] and this check can be power8 specific.

Fixes: 7ffd948fae ('powerpc/perf: factor out power8 pmu functions')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
8c31459d61 powerpc/perf: Cleanup cache_sel bits comment
Update the raw event code comment in power9-pmu.c with respect to
"cache" bits, since power9 MMCRC does not support these.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
333804dc3b powerpc/perf: Update perf_regs structure to include SIER
On each sample, Sample Instruction Event Register (SIER) content
is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
but instead, SIER content is saved in the "dar" register of pt_regs.

Patch adds another entry to the perf_regs structure to include the "SIER"
printing which internally maps to the "dar" of pt_regs.

It also check for the SIER availability in the platform and present
value accordingly

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Madhavan Srinivasan
17cfccc915 powerpc/perf: Fix thresholding counter data for unknown type
MMCRA[34:36] and MMCRA[38:44] expose the thresholding counter value.
Thresholding counter can be used to count latency cycles such as
load miss to reload. But threshold counter value is not relevant
when the sampled instruction type is unknown or reserved. Patch to
fix the thresholding counter value to zero when sampled instruction
type is unknown or reserved.

Fixes: 170a315f41c6('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:53:11 +11:00
Aneesh Kumar K.V
374f3f5979 powerpc/mm/hash: Handle user access of kernel address gracefully
In commit 2865d08dd9 ("powerpc/mm: Move the DSISR_PROTFAULT sanity
check") we moved the protection fault access check before the vma
lookup. That means we hit that WARN_ON when user space accesses a
kernel address. Before that commit this was handled by find_vma() not
finding vma for the kernel address and considering that access as bad
area access.

Avoid the confusing WARN_ON and convert that to a ratelimited printk.

With the patch we now get:

for load:
  a.out[5997]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
  a.out[5997]: segfault (11) at c00000000000dea0 nip 1317c0798 lr 7fff80d6441c code 1 in a.out[1317c0000+10000]
  a.out[5997]: code: 60000000 60420000 3c4c0002 38427790 4bffff20 3c4c0002 38427784 fbe1fff8
  a.out[5997]: code: f821ffc1 7c3f0b78 60000000 e9228030 <89290000> 993f002f 60000000 383f0040

for exec:
  a.out[6067]: User access of kernel address (c00000000000dea0) - exploit attempt? (uid: 1000)
  a.out[6067]: segfault (11) at c00000000000dea0 nip c00000000000dea0 lr 129d507b0 code 1
  a.out[6067]: Bad NIP, not dumping instructions.

Fixes: 2865d08dd9 ("powerpc/mm: Move the DSISR_PROTFAULT sanity check")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Tested-by: Breno Leitao <leitao@debian.org>
[mpe: Don't split printk() string across lines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-20 20:52:54 +11:00
Joerg Roedel
03ebe48e23 Merge branches 'iommu/fixes', 'arm/renesas', 'arm/mediatek', 'arm/tegra', 'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next 2018-12-20 10:05:20 +01:00
Christophe Leroy
385e89d5b2 powerpc/mm: add exec protection on powerpc 603
The 603 doesn't have a HASH table, TLB misses are handled by
software. It is then possible to generate page fault when
_PAGE_EXEC is not set like in nohash/32.

There is one "reserved" PTE bit available, this patch uses
it for _PAGE_EXEC.

In order to support it, set_pte_filter() and
set_access_flags_filter() are made common, and the handling
is made dependent on MMU_FTR_HPTE_TABLE

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
badb9687ce powerpc/mm: define an empty slice_init_new_context_exec()
Define slice_init_new_context_exec() at all time to avoid

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
05a4ab8239 powerpc/uaccess: fix warning/error with access_ok()
With the following piece of code, the following compilation warning
is encountered:

	if (_IOC_DIR(ioc) != _IOC_NONE) {
		int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

		if (!access_ok(verify, ioarg, _IOC_SIZE(ioc))) {

drivers/platform/test/dev.c: In function 'my_ioctl':
drivers/platform/test/dev.c:219:7: warning: unused variable 'verify' [-Wunused-variable]
   int verify = _IOC_DIR(ioc) & _IOC_READ ? VERIFY_WRITE : VERIFY_READ;

This patch fixes it by referencing 'type' in the macro allthough
doing nothing with it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
c62ce9ef97 powerpc: remove remaining bits from CONFIG_APUS
commit f21f49ea63 ("[POWERPC] Remove the dregs of APUS support from
arch/powerpc") removed CONFIG_APUS, but forgot to remove the logic
which adapts tophys() and tovirt() for it.

This patch removes the last stale pieces.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
32c8c4c621 powerpc/xmon: fix dump_segments()
mfsrin() takes segment num from bits 31-28 (IBM bits 0-3).

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Clarify bit numbering]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
0ed5b55884 powerpc/8xx: add exception frame marker
This patch adds STACK_FRAME_REGS_MARKER in the stack at exception entry
in order to see interrupts in call traces as below:

[    0.013964] Call Trace:
[    0.014014] [c0745db0] [c007a9d4] tick_periodic.constprop.5+0xd8/0x104 (unreliable)
[    0.014086] [c0745dc0] [c007aa20] tick_handle_periodic+0x20/0x9c
[    0.014181] [c0745de0] [c0009cd0] timer_interrupt+0xa0/0x264
[    0.014258] [c0745e10] [c000e484] ret_from_except+0x0/0x14
[    0.014390] --- interrupt: 901 at console_unlock.part.7+0x3f4/0x528
[    0.014390]     LR = console_unlock.part.7+0x3f0/0x528
[    0.014455] [c0745ee0] [c0050334] console_unlock.part.7+0x114/0x528 (unreliable)
[    0.014542] [c0745f30] [c00524e0] register_console+0x3d8/0x44c
[    0.014625] [c0745f60] [c0675aac] cpm_uart_console_init+0x18/0x2c
[    0.014709] [c0745f70] [c06614f4] console_init+0x114/0x1cc
[    0.014795] [c0745fb0] [c0658b68] start_kernel+0x300/0x3d8
[    0.014864] [c0745ff0] [c00022cc] start_here+0x44/0x98

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
e93ba1b7eb powerpc/book3s/32: fix number of bats in p/v_block_mapped()
This patch fixes the loop in p_block_mapped() and v_block_mapped()
to scan the entire bat_addrs[] array.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
712877f874 powerpc/mm: Eliminate not possible mmu features at compile time
Depending on the CONFIG selected, many of the MMU features are
not possible. Lets only get the possible ones in MMU_FTRS_POSSIBLE.

This allows gcc to get rid at compile time of code related to
not possible features.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
8a01960fb5 powerpc/smp: Use code patching to restore reset vector
Instead of hardcoding reset vector restore, use patch_instruction()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
6c16816b91 powerpc/44x: use patch_sites for TLB handlers patching
Use patch sites and associated helpers to manage TLB handlers
patching instead of hardcoding.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
d16952a629 powerpc/signal: Use code patching instead of hardcoding
Instead of hardcoding code modifications, use code patching functions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
002cdfc2c7 powerpc/8xx: use modify_instruction_site()
Instead of hardcoding the TLB handlers patching, use
the newly created modify_instruction_site() helper.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
9efc74ff52 powerpc/book3s/32: Use patch_site to patch hash functions
Use patch_sites and the new modify_instruction_site() function
instead of hardcoding hash functions patching.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
4a3a224c5a powerpc/book3s/32: Use MMU_FTR_HPTE_TABLE in head_32.S
Instead of manually patching a blr at hash_page() entry in
MMU_init_hw(), this patch adds a features section in head_32.S

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
04b0a72f28 powerpc/32: use patch_site_addr() in machine_init()
Use patch_site_addr() instead of hardcoding the
address calculation in machine_init()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
36b08b431e powerpc: add modify_instruction() and modify_instruction_site()
Add two helpers to avoid hardcoding of instructions modifications.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
45090c2661 powerpc: simplify patch_instruction_site() and patch_branch_site()
Using patch_site_addr() helper, patch_instruction_site() and
patch_branch_site() can be simplified and inlined.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
584dbc7727 powerpc/mm: remove unused variable
In file included from ./include/linux/hugetlb.h:445:0,
                 from arch/powerpc/kernel/setup-common.c:37:
./arch/powerpc/include/asm/hugetlb.h: In function ‘huge_ptep_clear_flush’:
./arch/powerpc/include/asm/hugetlb.h:154:8: error: variable ‘pte’ set but not used [-Werror=unused-but-set-variable]

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:32 +11:00
Christophe Leroy
6bf752daca powerpc: implement CONFIG_DEBUG_VIRTUAL
This patch implements CONFIG_DEBUG_VIRTUAL to warn about
incorrect use of virt_to_phys() and page_to_phys()

Below is the result of test_debug_virtual:

[    1.438746] WARNING: CPU: 0 PID: 1 at ./arch/powerpc/include/asm/io.h:808 test_debug_virtual_init+0x3c/0xd4
[    1.448156] CPU: 0 PID: 1 Comm: swapper Not tainted 4.20.0-rc5-00560-g6bfb52e23a00-dirty #532
[    1.457259] NIP:  c066c550 LR: c0650ccc CTR: c066c514
[    1.462257] REGS: c900bdb0 TRAP: 0700   Not tainted  (4.20.0-rc5-00560-g6bfb52e23a00-dirty)
[    1.471184] MSR:  00029032 <EE,ME,IR,DR,RI>  CR: 48000422  XER: 20000000
[    1.477811]
[    1.477811] GPR00: c0650ccc c900be60 c60d0000 00000000 006000c0 c9000000 00009032 c7fa0020
[    1.477811] GPR08: 00002400 00000001 09000000 00000000 c07b5d04 00000000 c00037d8 00000000
[    1.477811] GPR16: 00000000 00000000 00000000 00000000 c0760000 c0740000 00000092 c0685bb0
[    1.477811] GPR24: c065042c c068a734 c0685b8c 00000006 00000000 c0760000 c075c3c0 ffffffff
[    1.512711] NIP [c066c550] test_debug_virtual_init+0x3c/0xd4
[    1.518315] LR [c0650ccc] do_one_initcall+0x8c/0x1cc
[    1.523163] Call Trace:
[    1.525595] [c900be60] [c0567340] 0xc0567340 (unreliable)
[    1.530954] [c900be90] [c0650ccc] do_one_initcall+0x8c/0x1cc
[    1.536551] [c900bef0] [c0651000] kernel_init_freeable+0x1f4/0x2cc
[    1.542658] [c900bf30] [c00037ec] kernel_init+0x14/0x110
[    1.547913] [c900bf40] [c000e1d0] ret_from_kernel_thread+0x14/0x1c
[    1.553971] Instruction dump:
[    1.556909] 3ca50100 bfa10024 54a5000e 3fa0c076 7c0802a6 3d454000 813dc204 554893be
[    1.564566] 7d294010 7d294910 90010034 39290001 <0f090000> 7c3e0b78 955e0008 3fe0c062
[    1.572425] ---[ end trace 6f6984225b280ad6 ]---
[    1.577467] PA: 0x09000000 for VA: 0xc9000000
[    1.581799] PA: 0x061e8f50 for VA: 0xc61e8f50

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-19 18:56:26 +11:00
Mathieu Malaterre
ebd1d3b74f powerpc/32: Move the old 6xx -mcpu logic before the TARGET_CPU logic
The code:

  ifdef CONFIG_6xx
  KBUILD_CFLAGS          += -mcpu=powerpc
  endif

was added in 2006 in commit f48b8296b3 ("[PATCH] powerpc32: Set cpu
explicitly in kernel compiles"). This change was acceptable since the
TARGET_CPU logic was 64-bit only.

Since commit 0e00a8c9fd ("powerpc: Allow CPU selection
also on PPC32") this logic is no longer acceptable after the TARGET_CPU
specific. It currently appends -mcpu=powerpc at the end of the command
line, after any TARGET_CPU specific:

  gcc -Wp,-MD,init/.do_mounts.o.d ...
    -mcpu=powerpc -mbig-endian -m32 ...
    -mcpu=e300c2 ...
    -mcpu=powerpc ...
    ../init/do_mounts.c

Fixes: 0e00a8c9fd ("powerpc: Allow CPU selection also on PPC32")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17 22:12:30 +11:00
Michael Ellerman
c7e900c05b powerpc/ipic: Remove unused ipic_set_priority()
ipic_set_priority() has been unused since 2006 when the last usage was
removed in commit b9f0f1bb2b ("[POWERPC] Adapt ipic driver to new
host_ops interface, add set_irq_type to set IRQ sense").

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-17 22:12:30 +11:00
Michael Ellerman
4d6a198273 Merge branch 'fixes' into next
Merge our fixes branch again, this has a couple of build fixes and also
a change to do_syscall_trace_enter() that will conflict with a patch we
want to apply in next.
2018-12-17 22:11:54 +11:00
Joerg Roedel
bf8763d8f8 powerpc/iommu: Use device_iommu_mapped()
Use the new function to replace the open-coded iommu check.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell Currey <ruscur@russell.cc>
Cc: Sam Bobroff <sbobroff@linux.ibm.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2018-12-17 10:38:43 +01:00
Suraj Jitindar Singh
95d386c2d2 KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L3 guest
Previously when a device was being emulated by an L1 guest for an L2
guest, that device couldn't then be passed through to an L3 guest. This
was because the L1 guest had no method for accessing L3 memory.

The hcall H_COPY_TOFROM_GUEST provides this access. Thus this setup for
passthrough can now be allowed.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
6ff887b8bd KVM: PPC: Book3S: Introduce new hcall H_COPY_TOFROM_GUEST to access quadrants 1 & 2
A guest cannot access quadrants 1 or 2 as this would result in an
exception. Thus introduce the hcall H_COPY_TOFROM_GUEST to be used by a
guest when it wants to perform an access to quadrants 1 or 2, for
example when it wants to access memory for one of its nested guests.

Also provide an implementation for the kvm-hv module.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
873db2cd9a KVM: PPC: Book3S HV: Allow passthrough of an emulated device to an L2 guest
Allow for a device which is being emulated at L0 (the host) for an L1
guest to be passed through to a nested (L2) guest.

The existing kvmppc_hv_emulate_mmio function can be used here. The main
challenge is that for a load the result must be stored into the L2 gpr,
not an L1 gpr as would normally be the case after going out to qemu to
complete the operation. This presents a challenge as at this point the
L2 gpr state has been written back into L1 memory.

To work around this we store the address in L1 memory of the L2 gpr
where the result of the load is to be stored and use the new io_gpr
value KVM_MMIO_REG_NESTED_GPR to indicate that this is a nested load for
which completion must be done when returning back into the kernel. Then
in kvmppc_complete_mmio_load() the resultant value is written into L1
memory at the location of the indicated L2 gpr.

Note that we don't currently let an L1 guest emulate a device for an L2
guest which is then passed through to an L3 guest.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
cc6929cc84 KVM: PPC: Update kvmppc_st and kvmppc_ld to use quadrants
The functions kvmppc_st and kvmppc_ld are used to access guest memory
from the host using a guest effective address. They do so by translating
through the process table to obtain a guest real address and then using
kvm_read_guest or kvm_write_guest to make the access with the guest real
address.

This method of access however only works for L1 guests and will give the
incorrect results for a nested guest.

We can however use the store_to_eaddr and load_from_eaddr kvmppc_ops to
perform the access for a nested guesti (and a L1 guest). So attempt this
method first and fall back to the old method if this fails and we aren't
running a nested guest.

At this stage there is no fall back method to perform the access for a
nested guest and this is left as a future improvement. For now we will
return to the nested guest and rely on the fact that a translation
should be faulted in before retrying the access.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
dceadcf91b KVM: PPC: Add load_from_eaddr and store_to_eaddr to the kvmppc_ops struct
The kvmppc_ops struct is used to store function pointers to kvm
implementation specific functions.

Introduce two new functions load_from_eaddr and store_to_eaddr to be
used to load from and store to a guest effective address respectively.

Also implement these for the kvm-hv module. If we are using the radix
mmu then we can call the functions to access quadrant 1 and 2.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
d7b4561522 KVM: PPC: Book3S HV: Implement functions to access quadrants 1 & 2
The POWER9 radix mmu has the concept of quadrants. The quadrant number
is the two high bits of the effective address and determines the fully
qualified address to be used for the translation. The fully qualified
address consists of the effective lpid, the effective pid and the
effective address. This gives then 4 possible quadrants 0, 1, 2, and 3.

When accessing these quadrants the fully qualified address is obtained
as follows:

Quadrant		| Hypervisor		| Guest
--------------------------------------------------------------------------
			| EA[0:1] = 0b00	| EA[0:1] = 0b00
0			| effLPID = 0		| effLPID = LPIDR
			| effPID  = PIDR	| effPID  = PIDR
--------------------------------------------------------------------------
			| EA[0:1] = 0b01	|
1			| effLPID = LPIDR	| Invalid Access
			| effPID  = PIDR	|
--------------------------------------------------------------------------
			| EA[0:1] = 0b10	|
2			| effLPID = LPIDR	| Invalid Access
			| effPID  = 0		|
--------------------------------------------------------------------------
			| EA[0:1] = 0b11	| EA[0:1] = 0b11
3			| effLPID = 0		| effLPID = LPIDR
			| effPID  = 0		| effPID  = 0
--------------------------------------------------------------------------

In the Guest;
Quadrant 3 is normally used to address the operating system since this
uses effPID=0 and effLPID=LPIDR, meaning the PID register doesn't need to
be switched.
Quadrant 0 is normally used to address user space since the effLPID and
effPID are taken from the corresponding registers.

In the Host;
Quadrant 0 and 3 are used as above, however the effLPID is always 0 to
address the host.

Quadrants 1 and 2 can be used by the host to address guest memory using
a guest effective address. Since the effLPID comes from the LPID register,
the host loads the LPID of the guest it would like to access (and the
PID of the process) and can perform accesses to a guest effective
address.

This means quadrant 1 can be used to address the guest user space and
quadrant 2 can be used to address the guest operating system from the
hypervisor, using a guest effective address.

Access to the quadrants can cause a Hypervisor Data Storage Interrupt
(HDSI) due to being unable to perform partition scoped translation.
Previously this could only be generated from a guest and so the code
path expects us to take the KVM trampoline in the interrupt handler.
This is no longer the case so we modify the handler to call
bad_page_fault() to check if we were expecting this fault so we can
handle it gracefully and just return with an error code. In the hash mmu
case we still raise an unknown exception since quadrants aren't defined
for the hash mmu.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:50 +11:00
Suraj Jitindar Singh
d232afebf9 KVM: PPC: Book3S HV: Add function kvmhv_vcpu_is_radix()
There exists a function kvm_is_radix() which is used to determine if a
kvm instance is using the radix mmu. However this only applies to the
first level (L1) guest. Add a function kvmhv_vcpu_is_radix() which can
be used to determine if the current execution context of the vcpu is
radix, accounting for if the vcpu is running a nested guest.

Currently all nested guests must be radix but this may change in the
future.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:49 +11:00
Suraj Jitindar Singh
693ac10a88 KVM: PPC: Book3S: Only report KVM_CAP_SPAPR_TCE_VFIO on powernv machines
The kvm capability KVM_CAP_SPAPR_TCE_VFIO is used to indicate the
availability of in kernel tce acceleration for vfio. However it is
currently the case that this is only available on a powernv machine,
not for a pseries machine.

Thus make this capability dependent on having the cpu feature
CPU_FTR_HVMODE.

[paulus@ozlabs.org - fixed compilation for Book E.]

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 11:33:49 +11:00
Paul Mackerras
5af3e9d06d KVM: PPC: Book3S HV: Flush guest mappings when turning dirty tracking on/off
This adds code to flush the partition-scoped page tables for a radix
guest when dirty tracking is turned on or off for a memslot.  Only the
guest real addresses covered by the memslot are flushed.  The reason
for this is to get rid of any 2M PTEs in the partition-scoped page
tables that correspond to host transparent huge pages, so that page
dirtiness is tracked at a system page (4k or 64k) granularity rather
than a 2M granularity.  The page tables are also flushed when turning
dirty tracking off so that the memslot's address space can be
repopulated with THPs if possible.

To do this, we add a new function kvmppc_radix_flush_memslot().  Since
this does what's needed for kvmppc_core_flush_memslot_hv() on a radix
guest, we now make kvmppc_core_flush_memslot_hv() call the new
kvmppc_radix_flush_memslot() rather than calling kvm_unmap_radix()
for each page in the memslot.  This has the effect of fixing a bug in
that kvmppc_core_flush_memslot_hv() was previously calling
kvm_unmap_radix() without holding the kvm->mmu_lock spinlock, which
is required to be held.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 10:58:51 +11:00
Paul Mackerras
c43c3a8683 KVM: PPC: Book3S HV: Cleanups - constify memslots, fix comments
This adds 'const' to the declarations for the struct kvm_memory_slot
pointer parameters of some functions, which will make it possible to
call those functions from kvmppc_core_commit_memory_region_hv()
in the next patch.

This also fixes some comments about locking.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 10:58:43 +11:00
Paul Mackerras
f460f6791a KVM: PPC: Book3S HV: Map single pages when doing dirty page logging
For radix guests, this makes KVM map guest memory as individual pages
when dirty page logging is enabled for the memslot corresponding to the
guest real address.  Having a separate partition-scoped PTE for each
system page mapped to the guest means that we have a separate dirty
bit for each page, thus making the reported dirty bitmap more accurate.
Without this, if part of guest memory is backed by transparent huge
pages, the dirty status is reported at a 2MB granularity rather than
a 64kB (or 4kB) granularity for that part, causing userspace to have
to transmit more data when migrating the guest.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 10:58:33 +11:00
Bharata B Rao
f032b73459 KVM: PPC: Pass change type down to memslot commit function
Currently, kvm_arch_commit_memory_region() gets called with a
parameter indicating what type of change is being made to the memslot,
but it doesn't pass it down to the platform-specific memslot commit
functions.  This adds the `change' parameter to the lower-level
functions so that they can use it in future.

[paulus@ozlabs.org - fix book E also.]

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-17 10:57:27 +11:00
Linus Torvalds
4645453cef powerpc fixes for 4.20 #4
One notable fix for our change to split pt_regs between user/kernel, we forgot
 to update BPF to use the user-visible type which was an ABI break for BPF
 programs.
 
 A slightly ugly but minimal fix to do_syscall_trace_enter() so that we use
 tracehook_report_syscall_entry() properly. We'll rework the code in next to
 avoid the empty if body.
 
 Seven commits fixing bugs in the new papr_scm (Storage Class Memory) driver.
 The driver was finally able to be tested on the other hypervisor which exposed
 several bugs. The fixes are all fairly minimal at least.
 
 Fix a crash in our MSI code if an MSI-capable device is plugged into a non-MSI
 capable PHB, only seen on older hardware (MPC8378).
 
 Fix our legacy serial code to look for "stdout-path" since the device trees were
 updated to use that instead of "linux,stdout-path".
 
 A change to the COFF zImage code to fix booting old powermacs.
 
 A couple of minor build fixes.
 
 Thanks to:
   Benjamin Herrenschmidt, Daniel Axtens, Dmitry V. Levin, Elvira Khabirova,
   Oliver O'Halloran, Paul Mackerras, Radu Rendec, Rob Herring, Sandipan Das.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcE7YGAAoJEFHr6jzI4aWA/rAP/2k8NqgbkfKMdY/nHQ/+Tvwu
 6EdKZ5dMi875380TomrP80mIyAAwwtN/c1MmJT7plZRCwYcuhGk4UnQjTRcNzrfK
 Q7SokdMAzSjSDaj7VPiVdpJ6yVaedZYmsRe9m+uP7frmcHr9KL4vjgDn3z8IL8bV
 I46dvjuSCvgWhFvhXkxf9PHIsC+sP4Z6JRNVhyBjlzQHZiGparq238H8jJVsNHRs
 f/fsze0z6m3S6dVKEZKZyDlCb3TkP+DjuXpv4hbT9nvnOY132kqO53L++7rQ2YvV
 TNkabwFfj+p/DnrXXH7OHNkvnDW4cy3KjhyStOnTH5lxYYVYNd5vnjj1AnXa37xE
 GCBFiHExEHbdG6vifwBmQjvoNRPf6Kh4RGYuRMm8ci7W0WUmq00LKZhxeLbwoou9
 1K7lNp0JwLHgqOxCSy9liwNV7YQp1upvC/caQ1qE/ZISnUOXMLxomVfmGQYW51Xc
 0/OXCdmacViA0RGatrGSScMO+CTNG0Wa3pm+fo/ufLahxspTDMVdJgs/kEfCAGeN
 6BruUoEvm6wAOsPF8+zDumYOpUN9cD7tOQ573B2pMbJYIv3HJxRMDHYGBSijEgRK
 xy8BbQ0ZtxgfILjIuA95QfGQ5V35ZjugQfM4mt33GPrQ/qIV9SJ+/zZJNISSqaKI
 Y1xgZdJeVbECA9YYswRK
 =wI/G
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One notable fix for our change to split pt_regs between user/kernel,
  we forgot to update BPF to use the user-visible type which was an ABI
  break for BPF programs.

  A slightly ugly but minimal fix to do_syscall_trace_enter() so that we
  use tracehook_report_syscall_entry() properly. We'll rework the code
  in next to avoid the empty if body.

  Seven commits fixing bugs in the new papr_scm (Storage Class Memory)
  driver. The driver was finally able to be tested on the other
  hypervisor which exposed several bugs. The fixes are all fairly
  minimal at least.

  Fix a crash in our MSI code if an MSI-capable device is plugged into a
  non-MSI capable PHB, only seen on older hardware (MPC8378).

  Fix our legacy serial code to look for "stdout-path" since the device
  trees were updated to use that instead of "linux,stdout-path".

  A change to the COFF zImage code to fix booting old powermacs.

  A couple of minor build fixes.

  Thanks to: Benjamin Herrenschmidt, Daniel Axtens, Dmitry V. Levin,
  Elvira Khabirova, Oliver O'Halloran, Paul Mackerras, Radu Rendec, Rob
  Herring, Sandipan Das"

* tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
  powerpc/mm: Fallback to RAM if the altmap is unusable
  powerpc/papr_scm: Use ibm,unit-guid as the iset cookie
  powerpc/papr_scm: Fix DIMM device registration race
  powerpc/papr_scm: Remove endian conversions
  powerpc/papr_scm: Update DT properties
  powerpc/papr_scm: Fix resource end address
  powerpc/papr_scm: Use depend instead of select
  powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT
  powerpc/boot: Fix build failures with -j 1
  powerpc: Look for "stdout-path" when setting up legacy consoles
  powerpc/msi: Fix NULL pointer access in teardown code
  powerpc/mm: Fix linux page tables build with some configs
  powerpc: Fix COFF zImage booting on old powermacs
2018-12-14 09:33:34 -08:00
Paolo Bonzini
e5d83c74a5 kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic
The first such capability to be handled in virt/kvm/ will be manual
dirty page reprotection.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-12-14 12:34:18 +01:00
Suraj Jitindar Singh
6142236cd9 KVM: PPC: Book3S PR: Set hflag to indicate that POWER9 supports 1T segments
When booting a kvm-pr guest on a POWER9 machine the following message is
observed:
"qemu-system-ppc64: KVM does not support 1TiB segments which guest expects"

This is because the guest is expecting to be able to use 1T segments
however we don't indicate support for it. This is because we don't set
the BOOK3S_HFLAG_MULTI_PGSIZE flag in the hflags in kvmppc_set_pvr_pr()
on POWER9.

POWER9 does indeed have support for 1T segments, so add a case for
POWER9 to the switch statement to ensure it is set.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14 15:39:47 +11:00
Yangtao Li
0f6ddf34be KVM: PPC: Book3S HV: Change to use DEFINE_SHOW_ATTRIBUTE macro
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14 15:39:47 +11:00
Paul Mackerras
234ff0b729 KVM: PPC: Book3S HV: Fix race between kvm_unmap_hva_range and MMU mode switch
Testing has revealed an occasional crash which appears to be caused
by a race between kvmppc_switch_mmu_to_hpt and kvm_unmap_hva_range_hv.
The symptom is a NULL pointer dereference in __find_linux_pte() called
from kvm_unmap_radix() with kvm->arch.pgtable == NULL.

Looking at kvmppc_switch_mmu_to_hpt(), it does indeed clear
kvm->arch.pgtable (via kvmppc_free_radix()) before setting
kvm->arch.radix to NULL, and there is nothing to prevent
kvm_unmap_hva_range_hv() or the other MMU callback functions from
being called concurrently with kvmppc_switch_mmu_to_hpt() or
kvmppc_switch_mmu_to_radix().

This patch therefore adds calls to spin_lock/unlock on the kvm->mmu_lock
around the assignments to kvm->arch.radix, and makes sure that the
partition-scoped radix tree or HPT is only freed after changing
kvm->arch.radix.

This also takes the kvm->mmu_lock in kvmppc_rmap_reset() to make sure
that the clearing of each rmap array (one per memslot) doesn't happen
concurrently with use of the array in the kvm_unmap_hva_range_hv()
or the other MMU callbacks.

Fixes: 18c3640cef ("KVM: PPC: Book3S HV: Add infrastructure for running HPT guests on radix host")
Cc: stable@vger.kernel.org # v4.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-12-14 15:33:15 +11:00
Christoph Hellwig
55897af630 dma-direct: merge swiotlb_dma_ops into the dma_direct code
While the dma-direct code is (relatively) clean and simple we actually
have to use the swiotlb ops for the mapping on many architectures due
to devices with addressing limits.  Instead of keeping two
implementations around this commit allows the dma-direct
implementation to call the swiotlb bounce buffering functions and
thus share the guts of the mapping implementation.  This also
simplified the dma-mapping setup on a few architectures where we
don't have to differenciate which implementation to use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:17 +01:00
Christoph Hellwig
7249c1a52d dma-mapping: move various slow path functions out of line
There is no need to have all setup and coherent allocation / freeing
routines inline.  Move them out of line to keep the implemeation
nicely encapsulated and save some kernel text size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-12-13 21:06:10 +01:00
David S. Miller
addb067983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-12-11

The following pull-request contains BPF updates for your *net-next* tree.

It has three minor merge conflicts, resolutions:

1) tools/testing/selftests/bpf/test_verifier.c

 Take first chunk with alignment_prevented_execution.

2) net/core/filter.c

  [...]
  case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
  case bpf_ctx_range(struct __sk_buff, wire_len):
        return false;
  [...]

3) include/uapi/linux/bpf.h

  Take the second chunk for the two cases each.

The main changes are:

1) Add support for BPF line info via BTF and extend libbpf as well
   as bpftool's program dump to annotate output with BPF C code to
   facilitate debugging and introspection, from Martin.

2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
   and all JIT backends, from Jiong.

3) Improve BPF test coverage on archs with no efficient unaligned
   access by adding an "any alignment" flag to the BPF program load
   to forcefully disable verifier alignment checks, from David.

4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
   proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.

5) Extend tc BPF programs to use a new __sk_buff field called wire_len
   for more accurate accounting of packets going to wire, from Petar.

6) Improve bpftool to allow dumping the trace pipe from it and add
   several improvements in bash completion and map/prog dump,
   from Quentin.

7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
   kernel addresses and add a dedicated BPF JIT backend allocator,
   from Ard.

8) Add a BPF helper function for IR remotes to report mouse movements,
   from Sean.

9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
   member naming consistent with existing conventions, from Yonghong
   and Song.

10) Misc cleanups and improvements in allowing to pass interface name
    via cmdline for xdp1 BPF example, from Matteo.

11) Fix a potential segfault in BPF sample loader's kprobes handling,
    from Daniel T.

12) Fix SPDX license in libbpf's README.rst, from Andrey.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 18:00:43 -08:00
David S. Miller
4cc1feeb6f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts, seemingly all over the place.

I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.

The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.

The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.

cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.

__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy.  Or at least I think it was :-)

Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.

The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-09 21:43:31 -08:00
Elvira Khabirova
a225f15674 powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
Arch code should use tracehook_*() helpers, as documented in
include/linux/tracehook.h, ptrace_report_syscall() is not expected to
be used outside that file.

The patch does not look very nice, but at least it is correct
and opens the way for PTRACE_GET_SYSCALL_INFO API.

Co-authored-by: Dmitry V. Levin <ldv@altlinux.org>
Fixes: 5521eb4bca ("powerpc/ptrace: Add support for PTRACE_SYSEMU")
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
[mpe: Take this as a minimal fix for 4.20, we'll rework it later]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-10 15:19:58 +11:00
Linus Torvalds
d48f782e4f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A decent batch of fixes here. I'd say about half are for problems that
  have existed for a while, and half are for new regressions added in
  the 4.20 merge window.

   1) Fix 10G SFP phy module detection in mvpp2, from Baruch Siach.

   2) Revert bogus emac driver change, from Benjamin Herrenschmidt.

   3) Handle BPF exported data structure with pointers when building
      32-bit userland, from Daniel Borkmann.

   4) Memory leak fix in act_police, from Davide Caratti.

   5) Check RX checksum offload in RX descriptors properly in aquantia
      driver, from Dmitry Bogdanov.

   6) SKB unlink fix in various spots, from Edward Cree.

   7) ndo_dflt_fdb_dump() only works with ethernet, enforce this, from
      Eric Dumazet.

   8) Fix FID leak in mlxsw driver, from Ido Schimmel.

   9) IOTLB locking fix in vhost, from Jean-Philippe Brucker.

  10) Fix SKB truesize accounting in ipv4/ipv6/netfilter frag memory
      limits otherwise namespace exit can hang. From Jiri Wiesner.

  11) Address block parsing length fixes in x25 from Martin Schiller.

  12) IRQ and ring accounting fixes in bnxt_en, from Michael Chan.

  13) For tun interfaces, only iface delete works with rtnl ops, enforce
      this by disallowing add. From Nicolas Dichtel.

  14) Use after free in liquidio, from Pan Bian.

  15) Fix SKB use after passing to netif_receive_skb(), from Prashant
      Bhole.

  16) Static key accounting and other fixes in XPS from Sabrina Dubroca.

  17) Partially initialized flow key passed to ip6_route_output(), from
      Shmulik Ladkani.

  18) Fix RTNL deadlock during reset in ibmvnic driver, from Thomas
      Falcon.

  19) Several small TCP fixes (off-by-one on window probe abort, NULL
      deref in tail loss probe, SNMP mis-estimations) from Yuchung
      Cheng"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
  net/sched: cls_flower: Reject duplicated rules also under skip_sw
  bnxt_en: Fix _bnxt_get_max_rings() for 57500 chips.
  bnxt_en: Fix NQ/CP rings accounting on the new 57500 chips.
  bnxt_en: Keep track of reserved IRQs.
  bnxt_en: Fix CNP CoS queue regression.
  net/mlx4_core: Correctly set PFC param if global pause is turned off.
  Revert "net/ibm/emac: wrong bit is used for STA control"
  neighbour: Avoid writing before skb->head in neigh_hh_output()
  ipv6: Check available headroom in ip6_xmit() even without options
  tcp: lack of available data can also cause TSO defer
  ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output
  mlxsw: spectrum_switchdev: Fix VLAN device deletion via ioctl
  mlxsw: spectrum_router: Relax GRE decap matching check
  mlxsw: spectrum_switchdev: Avoid leaking FID's reference count
  mlxsw: spectrum_nve: Remove easily triggerable warnings
  ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes
  sctp: frag_point sanity check
  tcp: fix NULL ref in tail loss probe
  tcp: Do not underestimate rwnd_limited
  net: use skb_list_del_init() to remove from RX sublists
  ...
2018-12-09 15:12:33 -08:00
Masahiro Yamada
63fea0af43 x86, powerpc: Remove -funit-at-a-time compiler option entirely
GCC 4.6 manual says:

  -funit-at-a-time
    This option is left for compatibility reasons. -funit-at-a-time has
    no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder
    and -fno-section-anchors. Enabled by default.

Remove it.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Richard Weinberger <richard@sigma-star.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1541990120-9643-3-git-send-email-yamada.masahiro@socionext.com
2018-12-09 11:55:32 +01:00
Oliver O'Halloran
9ef34630a4 powerpc/mm: Fallback to RAM if the altmap is unusable
The "altmap" is used to provide a pool of memory that is reserved for
the vmemmap backing of hot-plugged memory. This is useful when adding
large amount of ZONE_DEVICE memory to a system with a limited amount of
normal memory.

On ppc64 we use huge pages to map the vmemmap which requires the backing
storage to be contigious and aligned to the hugepage size. The altmap
implementation allows for the altmap provider to reserve a few PFNs at
the start of the range for it's own uses and when this occurs the
first chunk of the altmap is not usable for hugepage mappings. On hash
there is no sane way to fall back to a normal sized page mapping so we
fail the allocation. This results in memory hotplug failing with
ENOMEM when the new range doesn't fall into an existing vmemmap block.

This patch handles this case by falling back to using system memory
rather than failing if we cannot allocate from the altmap. This
fallback should only ever be used for the first vmemmap block so it
should not cause excess memory consumption.

Fixes: 7b73d978a5 ("mm: pass the vmem_altmap to vmemmap_populate")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09 21:33:21 +11:00
Oliver O'Halloran
43001c52b6 powerpc/papr_scm: Use ibm,unit-guid as the iset cookie
The interleave set cookie is used to determine if a label stored in the
metadata space should be applied to the current region. This is
important in the case of NVDIMMs since the firmware may change the
interleaving configuration of a DIMM which would invalidate the existing
labels. In our case the hypervisor hides those details from us so we
don't really care, but libnvdimm still requires the interleave set
cookie to be non-zero.

For our purposes we just need the set cookie to be unique and fixed for
a given PAPR SCM region and using the unit-guid (really a UUID) is fine
for this purpose.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09 21:32:51 +11:00
Oliver O'Halloran
b0d65a8cbc powerpc/papr_scm: Fix DIMM device registration race
When a new nvdimm device is registered with libnvdimm via
nvdimm_create() it is added as a device on the nvdimm bus. The probe
function for the DIMM driver is potentially quite slow so actually
registering and probing the device is done in an async domain rather
than immediately after device creation. This can result in a race where
the region device (created 2nd) is probed first and fails to activate at
boot.

To fix this we use the same approach as the ACPI/NFIT driver which is to
check that all the DIMM devices registered successfully. LibNVDIMM
provides the nvdimm_bus_count_dimms() function which synchronises with
the async domain and verifies that the dimm was successfully registered
with the bus.

If either of these does not occur then we bail.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09 21:32:30 +11:00
Oliver O'Halloran
409dd7dc83 powerpc/papr_scm: Remove endian conversions
The return values of a h-call are returned in the CPU registers and
written to the provided buffer by the plpar_hcall() wrapper. As a result
the values written to memory are always in the native endian and should
not be byte swapped.

The inital implementation of the H-Call interface was done in qemu and
the returned values were byte swapped unnecessarily in both the
hypervisor and in the driver so this was only noticed when bringing up
the PowerVM implementation.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09 21:32:30 +11:00
Oliver O'Halloran
683ec0e04a powerpc/papr_scm: Update DT properties
The ibm,unit-sizes property was originally specified as an array of two
u32s corresponding to the memory block size, and the number of blocks
available in that region. A fairly last-minute change to the SCM DT
specification was splitting that into two seperate u64 properties:
ibm,block-sizes and ibm,number-of-blocks that convey the same
information. No firmware / hypervisor that emitted the ibm,unit-size
property ever appeared in the wild.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Use kernel types (u32/u64)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-09 21:32:16 +11:00
Jiong Wang
44cf43c04b ppc: bpf: implement jitting of BPF_ALU | BPF_ARSH | BPF_*
This patch implements code-gen for BPF_ALU | BPF_ARSH | BPF_*.

Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-07 13:30:48 -08:00
Oliver O'Halloran
5961352611 powerpc/papr_scm: Fix resource end address
Fix an off-by-one error in the memory resource range. This resource is
used to determine the address range of the memory to be hot-plugged as
ZONE_DEVICE memory. The current end address results in the kernel
attempting to map an additional memblock and the hypervisor may reject
the mapping resulting in the entire hot-plug failing.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07 23:32:02 +11:00
Oliver O'Halloran
14ebfec071 powerpc/papr_scm: Use depend instead of select
Making PAPR_SCM select LIBNVDIMM results in circular dependencies in
Kconfig when another symbol depends on it. Fix this by replacing the
select with a depends.

Fixes: b5beae5e22 ("powerpc/pseries: Add driver for PAPR SCM regions")
Reported-by: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07 23:32:01 +11:00
Sandipan Das
a6460b03f9 powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT
Now that there are different variants of pt_regs for userspace and
kernel, the uapi for the BPF_PROG_TYPE_PERF_EVENT program type must be
changed by exporting the user_pt_regs structure instead of the pt_regs
structure that is in-kernel only.

Fixes: 002af9391b ("powerpc: Split user/kernel definitions of struct pt_regs")
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-07 23:19:04 +11:00
Christoph Hellwig
7c703e54cc arch: switch the default on ARCH_HAS_SG_CHAIN
These days architectures are mostly out of the business of dealing with
struct scatterlist at all, unless they have architecture specific iommu
drivers.  Replace the ARCH_HAS_SG_CHAIN symbol with a ARCH_NO_SG_CHAIN
one only enabled for architectures with horrible legacy iommu drivers
like alpha and parisc, and conditionally for arm which wants to keep it
disable for legacy platforms.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
2018-12-06 07:04:56 -08:00
Christoph Hellwig
d11e3d3d03 powerpc/iommu: remove the mapping_error dma_map_ops method
The powerpc iommu code already returns (~(dma_addr_t)0x0) on mapping
failures, so we can switch over to returning DMA_MAPPING_ERROR and let
the core dma-mapping code handle the rest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-06 06:56:38 -08:00
Christoph Hellwig
b0cbeae494 dma-direct: remove the mapping_error dma_map_ops method
The dma-direct code already returns (~(dma_addr_t)0x0) on mapping
failures, so we can switch over to returning DMA_MAPPING_ERROR and let
the core dma-mapping code handle the rest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-06 06:56:36 -08:00
AKASHI Takahiro
735c2f90e3 powerpc, kexec_file: factor out memblock-based arch_kexec_walk_mem()
Memblock list is another source for usable system memory layout.
So move powerpc's arch_kexec_walk_mem() to common code so that other
memblock-based architectures, particularly arm64, can also utilise it.
A moved function is now renamed to kexec_walk_memblock() and integrated
into kexec_locate_mem_hole(), which will now be usable for all
architectures with no need for overriding arch_kexec_walk_mem().

With this change, arch_kexec_walk_mem() need no longer be a weak function,
and was now renamed to kexec_walk_resources().

Since powerpc doesn't support kdump in its kexec_file_load(), the current
kexec_walk_memblock() won't work for kdump either in this form, this will
be fixed in the next patch.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Acked-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 14:38:50 +00:00
Michael Ellerman
e41b93a6be powerpc/boot: Fix build failures with -j 1
In commit 5e9dcb6188 ("powerpc/boot: Expose Kconfig symbols to
wrapper") we added a dependency to serial.c on autoconf.h:

  $(obj)/serial.c: $(obj)/autoconf.h

This works when building in-tree (ie. with KBUILD_OUTPUT unset)
because the obj tree is the src tree.

But when building with eg. O=build and -j 1 the build fails:

  gcc ... -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o arch/powerpc/boot/serial.c
  gcc: error: arch/powerpc/boot/serial.c: No such file or directory

Why this is only happening with -j 1 is not clear, when building with
-j greater than 1 somehow we decide to look for serial.c in the src
tree (../), eg:

  gcc -I../arch/powerpc/boot -c -o arch/powerpc/boot/serial.o ../arch/powerpc/boot/serial.c

Regardless we shouldn't be specifying a dependency on serial.c in the
build tree, we want to add a dependency to the version in $(srctree)
so fix the rule to say that.

Fixes: 5e9dcb6188 ("powerpc/boot: Expose Kconfig symbols to wrapper")
Tested-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-06 16:10:15 +11:00
David S. Miller
e37d05a538 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2018-12-05

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) fix bpf uapi pointers for 32-bit architectures, from Daniel.

2) improve verifer ability to handle progs with a lot of branches, from Alexei.

3) strict btf checks, from Yonghong.

4) bpf_sk_lookup api cleanup, from Joe.

5) other misc fixes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-05 16:30:30 -08:00
Christophe Leroy
7c91efce16 powerpc/mm: dump block address translation on book3s/32
This patch adds a debugfs file to dump block address translation:

~# cat /sys/kernel/debug/powerpc/block_address_translation
---[ Instruction Block Address Translations ]---
0:         -
1:         -
2: 0xc0000000-0xcfffffff 0x00000000 Kernel EXEC coherent
3: 0xd0000000-0xdfffffff 0x10000000 Kernel EXEC coherent
4:         -
5:         -
6:         -
7:         -

---[ Data Block Address Translations ]---
0:         -
1:         -
2: 0xc0000000-0xcfffffff 0x00000000 Kernel RW coherent
3: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent
4:         -
5:         -
6:         -
7:         -

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:46:49 +11:00
Christophe Leroy
0261a508c9 powerpc/mm: dump segment registers on book3s/32
This patch creates a debugfs file to see content of
segment registers

  # cat /sys/kernel/debug/segment_registers
  ---[ User Segments ]---
  0x00000000-0x0fffffff Kern key 1 User key 1 VSID 0xade2b0
  0x10000000-0x1fffffff Kern key 1 User key 1 VSID 0xade3c1
  0x20000000-0x2fffffff Kern key 1 User key 1 VSID 0xade4d2
  0x30000000-0x3fffffff Kern key 1 User key 1 VSID 0xade5e3
  0x40000000-0x4fffffff Kern key 1 User key 1 VSID 0xade6f4
  0x50000000-0x5fffffff Kern key 1 User key 1 VSID 0xade805
  0x60000000-0x6fffffff Kern key 1 User key 1 VSID 0xade916
  0x70000000-0x7fffffff Kern key 1 User key 1 VSID 0xadea27
  0x80000000-0x8fffffff Kern key 1 User key 1 VSID 0xadeb38
  0x90000000-0x9fffffff Kern key 1 User key 1 VSID 0xadec49
  0xa0000000-0xafffffff Kern key 1 User key 1 VSID 0xaded5a
  0xb0000000-0xbfffffff Kern key 1 User key 1 VSID 0xadee6b

  ---[ Kernel Segments ]---
  0xc0000000-0xcfffffff Kern key 0 User key 1 VSID 0x000ccc
  0xd0000000-0xdfffffff Kern key 0 User key 1 VSID 0x000ddd
  0xe0000000-0xefffffff Kern key 0 User key 1 VSID 0x000eee
  0xf0000000-0xffffffff Kern key 0 User key 1 VSID 0x000fff

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Move it under /sys/kernel/debug/powerpc, make sr_init() __init]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:54 +11:00
Joel Stanley
b682c86924 powerpc/math-emu: Update macros from GCC
The add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros originate
from GCC's longlong.h which in turn was copied from GMP's longlong.h a
few decades ago.

This was found when compiling with clang:

   arch/powerpc/math-emu/fnmsub.c:46:2: error: invalid use of a cast in a
   inline asm context requiring an l-value: remove the cast or build with
   -fheinous-gnu-extensions
           FP_ADD_D(R, T, B);
           ^~~~~~~~~~~~~~~~~
   ...

   ./arch/powerpc/include/asm/sfp-machine.h:283:27: note: expanded from
   macro 'sub_ddmmss'
                  : "=r" ((USItype)(sh)),                                  \
                          ~~~~~~~~~~^~~

Segher points out: this was fixed in GCC over 16 years ago
( https://gcc.gnu.org/r56600 ), and in GMP (where it comes from)
presumably before that.

Update the add_ssaaaa, sub_ddmmss, umul_ppmm and udiv_qrnnd macros to
the latest GCC version in order to git rid of the invalid casts. These
were taken as-is from GCC's longlong in order to make future syncs
obvious. Other parts of sfp-machine.h were left as-is as the file
contains more features than present in longlong.h.

Link: https://github.com/ClangBuiltLinux/linux/issues/260
Signed-off-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Russell Currey
afa202b6bd powerpc/tools/checkpatch: Ignore DT_SPLIT_BINDING_PATCH
From what I've seen, every time this warning comes up it's bogus,
so let's ignore it.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
b14fc50266 powerpc/8xx: regroup TLB handler routines
As this is running with MMU off, the CPU only does speculative
fetch for code in the same page.

Following the significant size reduction of TLB handler routines,
the side handlers can be brought back close to the main part,
ie in the same page.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
74fabcadfd powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers
This patch reworks the TLB Miss handler in order to not use r12
register, hence avoiding having to save it into SPRN_SPRG_SCRATCH2.

In the DAR Fixup code we can now use SPRN_M_TW, freeing
SPRN_SPRG_SCRATCH2.

Then SPRN_SPRG_SCRATCH2 may be used for something else in the future.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
55c8fc3f49 powerpc/8xx: reintroduce 16K pages with HW assistance
Using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

In order to use hardware assistance with 16K pages, this patch does
the following modifications:
- Make PGD size independent of the main page size
- In 16k pages mode, redefine pte_t as a struct with 4 elements,
and populate those 4 elements in __set_pte_at() and pte_update()
- Adapt the size of the hugepage tables.
- Define a PTE_FRAGMENT_NB so that a 16k page contains 4 page tables.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
3fb69c6a1a powerpc/8xx: Enable 512k hugepage support with HW assistance
For using 512k pages with hardware assistance, the PTEs have to be spread
every 128 bytes in the L2 table.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
22569b881d powerpc/8xx: Enable 8M hugepage support with HW assistance
HW assistance naturally supports 8M huge pages without
further modifications.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
6a8f911b50 powerpc/8xx: Use hardware assistance in TLB handlers
Today, on the 8xx the TLB handlers do SW tablewalk by doing all
the calculation in ASM, in order to match with the Linux page
table structure.

The 8xx offers hardware assistance which allows significant size
reduction of the TLB handlers, hence also reduces the time spent
in the handlers.

However, using this HW assistance implies some constraints on the
page table structure:
- Regardless of the main page size used (4k or 16k), the
level 1 table (PGD) contains 1024 entries and each PGD entry covers
a 4Mbytes area which is managed by a level 2 table (PTE) containing
also 1024 entries each describing a 4k page.
- 16k pages require 4 identifical entries in the L2 table
- 512k pages PTE have to be spread every 128 bytes in the L2 table
- 8M pages PTE are at the address pointed by the L1 entry and each
8M page require 2 identical entries in the PGD.

This patch modifies the TLB handlers to use HW assistance for 4K PAGES.

Before that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 80 ticks
- DTLB miss: 62 ticks
After that patch, the mean time spent in TLB miss handlers is:
- ITLB miss: 72 ticks
- DTLB miss: 54 ticks
So the improvement is 10% for ITLB and 13% for DTLB misses

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
5af543be14 powerpc/8xx: Temporarily disable 16k pages and hugepages
In preparation of making use of hardware assistance in TLB handlers,
this patch temporarily disables 16K pages and hugepages. The reason
is that when using HW assistance in 4K pages mode, the linux model
fit with the HW model for 4K pages and 8M pages.

However for 16K pages and 512K mode some additional work is needed
to get linux model fit with HW model.
For the 8M pages, they will naturaly come back when we switch to
HW assistance, without any additional handling.
In order to keep the following patch smaller, the removal of the
current special handling for 8M pages gets removed here as well.

Therefore the 4K pages mode will be implemented first and without
support for 512k hugepages. Then the 512k hugepages will be brought
back. And the 16K pages will be implemented in the following step.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
8cfe4f5242 powerpc/8xx: Move SW perf counters in first 32kb of memory
In order to simplify time critical exceptions handling 8xx
specific SW perf counters, this patch moves the counters into
the beginning of memory. This is possible because .text is readable
and the counters are never modified outside of the handlers.

By doing this, we avoid having to set a second register with
the upper part of the address of the counters.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
32bff4b905 powerpc/mm: remove unnecessary test in pgtable_cache_init()
pgtable_cache_add() gracefully handles the case when a cache that
size already exists by returning early with the following test:

	if (PGT_CACHE(shift))
		return; /* Already have a cache of this size */

It is then not needed to test the existence of the cache before.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
1e03c7e2ea powerpc/mm: fix a warning when a cache is common to PGD and hugepages
While implementing TLB miss HW assistance on the 8xx, the following
warning was encountered:

[  423.732965] WARNING: CPU: 0 PID: 345 at mm/slub.c:2412 ___slab_alloc.constprop.30+0x26c/0x46c
[  423.733033] CPU: 0 PID: 345 Comm: mmap Not tainted 4.18.0-rc8-00664-g2dfff9121c55 #671
[  423.733075] NIP:  c0108f90 LR: c0109ad0 CTR: 00000004
[  423.733121] REGS: c455bba0 TRAP: 0700   Not tainted  (4.18.0-rc8-00664-g2dfff9121c55)
[  423.733147] MSR:  00021032 <ME,IR,DR,RI>  CR: 24224848  XER: 20000000
[  423.733319]
[  423.733319] GPR00: c0109ad0 c455bc50 c4521910 c60053c0 007080c0 c0011b34 c7fa41e0 c455be30
[  423.733319] GPR08: 00000001 c00103a0 c7fa41e0 c49afcc4 24282842 10018840 c079b37c 00000040
[  423.733319] GPR16: 73f00000 00210d00 00000000 00000001 c455a000 00000100 00000200 c455a000
[  423.733319] GPR24: c60053c0 c0011b34 007080c0 c455a000 c455a000 c7fa41e0 00000000 00009032
[  423.734190] NIP [c0108f90] ___slab_alloc.constprop.30+0x26c/0x46c
[  423.734257] LR [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734283] Call Trace:
[  423.734326] [c455bc50] [00000100] 0x100 (unreliable)
[  423.734430] [c455bcc0] [c0109ad0] kmem_cache_alloc+0x210/0x23c
[  423.734543] [c455bcf0] [c0011b34] huge_pte_alloc+0xc0/0x1dc
[  423.734633] [c455bd20] [c01044dc] hugetlb_fault+0x408/0x48c
[  423.734720] [c455bdb0] [c0104b20] follow_hugetlb_page+0x14c/0x44c
[  423.734826] [c455be10] [c00e8e54] __get_user_pages+0x1c4/0x3dc
[  423.734919] [c455be80] [c00e9924] __mm_populate+0xac/0x140
[  423.735020] [c455bec0] [c00db14c] vm_mmap_pgoff+0xb4/0xb8
[  423.735127] [c455bf00] [c00f27c0] ksys_mmap_pgoff+0xcc/0x1fc
[  423.735222] [c455bf40] [c000e0f8] ret_from_syscall+0x0/0x38
[  423.735271] Instruction dump:
[  423.735321] 7cbf482e 38fd0008 7fa6eb78 7fc4f378 4bfff5dd 7fe3fb78 4bfffe24 81370010
[  423.735536] 71280004 41a2ff88 4840c571 4bffff80 <0fe00000> 4bfffeb8 81340010 712a0004
[  423.735757] ---[ end trace e9b222919a470790 ]---

This warning occurs when calling kmem_cache_zalloc() on a
cache having a constructor.

In this case it happens because PGD cache and 512k hugepte cache are
the same size (4k). While a cache with constructor is created for
the PGD, hugepages create cache without constructor and uses
kmem_cache_zalloc(). As both expect a cache with the same size,
the hugepages reuse the cache created for PGD, hence the conflict.

In order to avoid this conflict, this patch:
- modifies pgtable_cache_add() so that a zeroising constructor is
added for any cache size.
- replaces calls to kmem_cache_zalloc() by kmem_cache_alloc()

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
0356656284 powerpc/mm: replace hugetlb_cache by PGT_CACHE(PTE_T_ORDER)
Instead of opencoding cache handling for the special case
of hugepage tables having a single pte_t element, this
patch makes use of the common pgtable_cache helpers

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
129dd323dd powerpc/mm: enable the use of page table cache of order 0
hugepages uses a cache of order 0. Lets allow page tables
of order 0 in the common part in order to avoid open coding
in hugetlb

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
32ea4c1499 powerpc/mm: Extend pte_fragment functionality to PPC32
In order to allow the 8xx to handle pte_fragments, this patch
extends the use of pte_fragments to PPC32 platforms.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
a74791dd98 powerpc/mm: add helpers to get/set mm.context->pte_frag
In order to handle pte_fragment functions with single fragment
without adding pte_frag in all mm_context_t, this patch creates
two helpers which do nothing on platforms using a single fragment.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
d09780f3a8 powerpc/mm: Move pgtable_t into platform headers
This patch move pgtable_t into platform headers.

It gets rid of the CONFIG_PPC_64K_PAGES case for PPC64
as nohash/64 doesn't support CONFIG_PPC_64K_PAGES.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
994da93d19 powerpc/mm: move platform specific mmu-xxx.h in platform directories
The purpose of this patch is to move platform specific
mmu-xxx.h files in platform directories like pte-xxx.h files.

In the meantime this patch creates common nohash and
nohash/32 + nohash/64 mmu.h files for future common parts.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
2a146533bf powerpc/mm: Avoid useless lock with single page fragments
There is no point in taking the page table lock as pte_frag or
pmd_frag are always NULL when we have only one fragment.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
a95d133c86 powerpc/mm: Move pte_fragment_alloc() to a common location
In preparation of next patch which generalises the use of
pte_fragment_alloc() for all, this patch moves the related functions
in a place that is common to all subarches.

The 8xx will need that for supporting 16k pages, as in that mode
page tables still have a size of 4k.

Since pte_fragment with only once fragment is not different
from what is done in the general case, we can easily migrate all
subarchs to pte fragments.

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
ddfc20a3b9 powerpc/8xx: Remove PTE_ATOMIC_UPDATES
commit 1bc54c0311 ("powerpc: rework 4xx PTE access and TLB miss")
introduced non atomic PTE updates and started the work of removing
PTE updates in TLB miss handlers, but kept PTE_ATOMIC_UPDATES for the
8xx with the following comment:
/* Until my rework is finished, 8xx still needs atomic PTE updates */

commit fe11dc3f96 ("powerpc/8xx: Update TLB asm so it behaves as
linux mm expects") removed all PTE updates done in TLB miss handlers

Therefore, atomic PTE updates are not needed anymore for the 8xx

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Christophe Leroy
a43ccc4bc4 powerpc/book3s32: Remove CONFIG_BOOKE dependent code
BOOK3S/32 cannot be BOOKE, so remove useless code

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Stephen Rothwell
8ad940217c powerpc: annotate implicit fall throughs
There is a plan to build the kernel with -Wimplicit-fallthrough and these
places in the code produced warnings, but because we build arch/powerpc
with -Werror, they became errors.  Fix them up.

This patch produces no change in behaviour, but should be reviewed in
case these are actually bugs not intentional fallthoughs.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Breno Leitao
f91203e71c powerpc/mm: remove unused function prototype
Commit f384796c40 ("powerpc/mm: Add support for handling > 512TB address
in SLB miss") removed function slb_miss_bad_addr(struct pt_regs *regs), but
kept its declaration in the prototype file. This patch simply removes the
function definition.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Breno Leitao
8d4a862276 powerpc/xmon: Fix invocation inside lock region
Currently xmon needs to get devtree_lock (through rtas_token()) during its
invocation (at crash time). If there is a crash while devtree_lock is being
held, then xmon tries to get the lock but spins forever and never get into
the interactive debugger, as in the following case:

	int *ptr = NULL;
	raw_spin_lock_irqsave(&devtree_lock, flags);
	*ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock,
at crash time. This new mechanism proposes getting the token at
initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock
being held or not.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-12-04 19:45:01 +11:00
Ingo Molnar
4bbfd7467c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

- Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

- Replace calls of RCU-bh and RCU-sched update-side functions
  to their vanilla RCU counterparts.  This series is a step
  towards complete removal of the RCU-bh and RCU-sched update-side
  functions.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- Documentation updates, including a number of flavor-consolidation
  updates from Joel Fernandes.

- Miscellaneous fixes.

- Automate generation of the initrd filesystem used for
  rcutorture testing.

- Convert spin_is_locked() assertions to instead use lockdep.

  ( Note that some of these conversions are going upstream via their
    respective maintainers. )

- SRCU updates, especially including a fix from Dennis Krein
  for a bag-on-head-class bug.

- RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-04 07:52:30 +01:00
Kees Cook
ea84b580b9 pstore: Convert buf_lock to semaphore
Instead of running with interrupts disabled, use a semaphore. This should
make it easier for backends that may need to sleep (e.g. EFI) when
performing a write:

|BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
|in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
|Preemption disabled at:
|[<ffffffff99d60512>] pstore_dump+0x72/0x330
|CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G      D           4.20.0-rc3 #45
|Call Trace:
| dump_stack+0x4f/0x6a
| ___might_sleep.cold.91+0xd3/0xe4
| __might_sleep+0x50/0x90
| wait_for_completion+0x32/0x130
| virt_efi_query_variable_info+0x14e/0x160
| efi_query_variable_store+0x51/0x1a0
| efivar_entry_set_safe+0xa3/0x1b0
| efi_pstore_write+0x109/0x140
| pstore_dump+0x11c/0x330
| kmsg_dump+0xa4/0xd0
| oops_exit+0x22/0x30
...

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: 21b3ddd39f ("efi: Don't use spinlocks for efi vars")
Signed-off-by: Kees Cook <keescook@chromium.org>
2018-12-03 17:11:02 -08:00