The dcbtls instruction is able to lock data inside the L1 cache.
We don't want to give the guest actual access to hardware cache locks,
as that could influence other VMs on the same system. But we can tell
the guest that its locking attempt failed.
By implementing the instruction we at least don't give the guest a
program exception which it definitely does not expect.
Signed-off-by: Alexander Graf <agraf@suse.de>
The L1 instruction cache control register contains bits that indicate
that we're still handling a request. Mask those out when we set the SPR
so that a read doesn't assume we're still doing something.
Signed-off-by: Alexander Graf <agraf@suse.de>
* pci/misc:
PCI: Fix return value from pci_user_{read,write}_config_*()
PCI: Turn pcibios_penalize_isa_irq() into a weak function
PCI: Test for std config alias when testing extended config space
* pci/hotplug:
PCI: cpqphp: Fix possible null pointer dereference
NVMe: Implement PCIe reset notification callback
PCI: Notify driver before and after device reset
* pci/pci_is_bridge:
pcmcia: Use pci_is_bridge() to simplify code
PCI: pciehp: Use pci_is_bridge() to simplify code
PCI: acpiphp: Use pci_is_bridge() to simplify code
PCI: cpcihp: Use pci_is_bridge() to simplify code
PCI: shpchp: Use pci_is_bridge() to simplify code
PCI: rpaphp: Use pci_is_bridge() to simplify code
sparc/PCI: Use pci_is_bridge() to simplify code
powerpc/PCI: Use pci_is_bridge() to simplify code
ia64/PCI: Use pci_is_bridge() to simplify code
x86/PCI: Use pci_is_bridge() to simplify code
PCI: Use pci_is_bridge() to simplify code
PCI: Add new pci_is_bridge() interface
PCI: Rename pci_is_bridge() to pci_has_subordinate()
* pci/virtualization:
PCI: Introduce new device binding path using pci_dev.driver_override
Conflicts:
drivers/pci/pci-sysfs.c
Now that CONFIG_USB_DEBUG is gone, remove it from a number of defconfig
files that were enabling it.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The PPC fixes are important because they fix breakage that is new in 3.15.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJTdivEAAoJEBvWZb6bTYbyw3YQAIILnflhHNtklj1mfPnnibQf
c3BLCkJ0gtK6A0FO2aAHgSja0kpgbEEnSphE/A/cb0vkLon3n5O0pQoSKjGUUbBO
Mo0ndjzBYNmCP4MGxhkrg49VdqD40NaR0BjJAZudb4vUOw892WLFIJMIVmIqs9eG
8V/y6S7mPLmrooAKHZxXql9y30UC77T1VZ3r4pXwYgKtUT51BQfTyWiSfjQBa8yI
oGOSb8uqEC7YiOYPJYUNIMsyVqW4E6Qqs46rqtP4XZmSxzWXDzzgP4nQHHyJJCdZ
aBYkeG+sJZG7ZwleJLejAncjWUY9Oq9GkMYNj0cTAoP/zA6jBGAll96KGKRbes9z
bZUtCNL3ifLcgbIGeAxgjmYOq0XLGahHbqm9QISYW2XdRkBI+8EJs5FCP4YEHzZn
FSm3zcCQ+wtbqjBbZZcqqLa6A/CGzjyO26qz+BCxrZ0BQkQX/2am3UykQ0JWam3H
vX5ZM2ewJhs6SjFisPcswd20AN+SHjPyzPvErBLDfrqnAVbwj2ehgqyN2slVsqrj
UyGzeKCfJgA0TiEH/4K6j6hvQWynUU+/2JglIfGE6AXmWddazCzl/qx4LvuGKFoB
b8JSQ7YaHSsq/tHc8WhHkvcP0FSDZEiHcJN2iY1pwLKTSQp9JN3aPNruPKiO8dsW
N+LoHL5fFcDi6Uu6wS7w
=E2fU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Small fixes for x86, slightly larger fixes for PPC, and a forgotten
s390 patch. The PPC fixes are important because they fix breakage
that is new in 3.15"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: announce irqfd capability
KVM: x86: disable master clock if TSC is reset during suspend
KVM: vmx: disable APIC virtualization in nested guests
KVM guest: Make pv trampoline code executable
KVM: PPC: Book3S: ifdef on CONFIG_KVM_BOOK3S_32_HANDLER for 32bit
KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit
KVM: PPC: Book3S: HV: make _PAGE_NUMA take effect
If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
(ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
get the following messages during boot:
[ 0.089866] POWER8 performance monitor hardware support registered
[ 0.089985] power8-pmu: PMAO restore workaround active.
[ 5.095419] Processor 1 is stuck.
[ 10.097933] Processor 2 is stuck.
[ 15.100480] Processor 3 is stuck.
[ 20.102982] Processor 4 is stuck.
[ 25.105489] Processor 5 is stuck.
[ 30.108005] Processor 6 is stuck.
[ 35.110518] Processor 7 is stuck.
[ 40.113369] Processor 9 is stuck.
[ 45.115879] Processor 10 is stuck.
[ 50.118389] Processor 11 is stuck.
[ 55.120904] Processor 12 is stuck.
[ 60.123425] Processor 13 is stuck.
[ 65.125970] Processor 14 is stuck.
[ 70.128495] Processor 15 is stuck.
[ 75.131316] Processor 17 is stuck.
Note that only the sibling threads are stuck, while the primary threads (0, 8,
16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
that kexec tries to wakeup (bring online) the sibling threads of all the cores,
before performing kexec:
[ 9464.131231] Starting new kernel
[ 9464.148507] kexec: Waking offline cpu 1.
[ 9464.148552] kexec: Waking offline cpu 2.
[ 9464.148600] kexec: Waking offline cpu 3.
[ 9464.148636] kexec: Waking offline cpu 4.
[ 9464.148671] kexec: Waking offline cpu 5.
[ 9464.148708] kexec: Waking offline cpu 6.
[ 9464.148743] kexec: Waking offline cpu 7.
[ 9464.148779] kexec: Waking offline cpu 9.
[ 9464.148815] kexec: Waking offline cpu 10.
[ 9464.148851] kexec: Waking offline cpu 11.
[ 9464.148887] kexec: Waking offline cpu 12.
[ 9464.148922] kexec: Waking offline cpu 13.
[ 9464.148958] kexec: Waking offline cpu 14.
[ 9464.148994] kexec: Waking offline cpu 15.
[ 9464.149030] kexec: Waking offline cpu 17.
Instrumenting this piece of code revealed that the cpu_up() operation actually
fails with -EBUSY. Thus, only the primary threads of all the cores are online
during kexec, and hence this is a sure-shot receipe for disaster, as explained
in commit e8e5c2155b (powerpc/kexec: Fix orphaned offline CPUs across kexec),
as well as in the comment above wake_offline_cpus().
It turns out that cpu_up() was returning -EBUSY because the variable
'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
by migrate_to_reboot_cpu() inside kernel_kexec().
Now, migrate_to_reboot_cpu() was originally written with the assumption that
any further code will not need to perform CPU hotplug, since we are anyway in
the reboot path. However, kexec is clearly not such a case, since we depend on
onlining CPUs, atleast on powerpc.
So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
kexec path, to fix this regression in kexec on powerpc.
Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
can catch such issues more easily in the future.
Fixes: c97102ba96 (kexec: migrate to reboot cpu)
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With binutils 2.24, various 64 bit builds fail with relocation errors
such as
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
against symbol `interrupt_base_book3e' defined in .text section
in arch/powerpc/kernel/built-in.o
arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
against symbol `interrupt_end_book3e' defined in .text section
in arch/powerpc/kernel/built-in.o
The assembler maintainer says:
I changed the ABI, something that had to be done but unfortunately
happens to break the booke kernel code. When building up a 64-bit
value with lis, ori, shl, oris, ori or similar sequences, you now
should use @high and @higha in place of @h and @ha. @h and @ha
(and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
now report overflow if the value is out of 32-bit signed range.
ie. @h and @ha assume you're building a 32-bit value. This is needed
to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
and @toc@ha expressions, and for consistency I did the same for all
other @h and @ha relocs.
Replacing @h with @high in one strategic location fixes the relocation
errors. This has to be done conditionally since the assembler either
supports @h or @high but not both.
Cc: <stable@vger.kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
is not used by some architectures. Make pcibios_penalize_isa_irq() a
__weak function to simplify the code. This removes the need for new
platforms to add stub implementations of pcibios_penalize_isa_irq().
[bhelgaas: changelog, comments]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Use pci_is_bridge() to simplify code. No functional change.
Requires: 326c1cdae7 PCI: Rename pci_is_bridge() to pci_has_subordinate()
Requires: 1c86438c94 PCI: Add new pci_is_bridge() interface
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
There is now a way to ensure all platform devices get a unique name when
populated from the device tree, and the DCR_NATIVE code path is broken
anyway. PowerPC Cell (PS3) is the only platform that actually uses this
path. Most likely nobody will notice if it is killed. Remove the code
and associated ugly #ifdef.
The user-visible impact of this patch is that any DCR device on Cell
will get a new name in the /sys/devices hierarchy.
Signed-off-by: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This request includes a few bug fixes that really shouldn't wait for the next
release.
It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM
acceleration when NX gets honored by the host. Furthermore we fix transactional
memory support and numa support on HV KVM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJTcKFaAAoJECszeR4D/txg7qYP/RX3V32i2zQYH2NpjQrDCwtY
Wur+CQrn/VA6xhtTK1rT2zH5rNFLt6ClhtxCMkZFfBdUE4sHi3OTlEdcvXBZjbls
JqQ/7lBkUPN8pTpz2NHP9gvH7g6v07EruysRQNa/JZMzlwhpzWk8D7yXakaCPNY/
JZRgVTrfKnhQ8OtXt48Bp4EmEKllbNqi9kNN7dewD2dEb3fAco3Jpk6WoeG+1f0o
jv3NmeTsp87KaRpjvDzPb7iCe6PA7GVqvJIQpir3Rpk2Kpx0yj58AfacF+f72GOf
CPlJGepiumJCaANhV6dbvtS49vaiiAnSvbqCil2USNl0LIGWQXdSjs5lztEuiMyr
tAav0YSVpnIcw0HJxXug/M31VwfRjYCX3hnCCIOd3Xj2jgAqwD+Lo95uUrRGJ9TP
75zKh8E093tOXIC9CyMaiYajpFMUrCSMgnpJ+7fpeHiyigB6yc8juFxahIHsw8q1
NgHggroJm6QNIm8JSY/tG/YET4AT7H4ZetGP8MeeRUg0TpqQXvYpkMGB8YDouaBA
XzxjwyTq57BOYgLGExnwW3Jj0kbqVY+ts0aDGQVGrl5YFzooGqrQ61CRmwG5BvI8
sou3l6TJ2ng8qrc7Maw9MHca1QB3mtXD7I26T/QEfQm9NLRTTqJyaxH5J1q9siRI
PpHVE5FKnmWPNr8JlxtC
=t2S+
-----END PGP SIGNATURE-----
Merge tag 'signed-for-3.15' of git://github.com/agraf/linux-2.6 into kvm-master
Patch queue for 3.15 - 2014-05-12
This request includes a few bug fixes that really shouldn't wait for the next
release.
It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM
acceleration when NX gets honored by the host. Furthermore we fix transactional
memory support and numa support on HV KVM.
I am seeing an issue where a CPU running perf eventually hangs.
Traces show timer interrupts happening every 4 seconds even
when a userspace task is running on the CPU. /proc/timer_list
also shows pending hrtimers have not run in over an hour,
including the scheduler.
Looking closer, decrementers_next_tb is getting set to
0xffffffffffffffff, and at that point we will never take
a timer interrupt again.
In __timer_interrupt() we set decrementers_next_tb to
0xffffffffffffffff and rely on ->event_handler to update it:
*next_tb = ~(u64)0;
if (evt->event_handler)
evt->event_handler(evt);
In this case ->event_handler is hrtimer_interrupt. This will eventually
call back through the clockevents code with the next event to be
programmed:
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev)
{
/* Don't adjust the decrementer if some irq work is pending */
if (test_irq_work_pending())
return 0;
__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
If irq work came in between these two points, we will return
before updating decrementers_next_tb and we never process a timer
interrupt again.
This looks to have been introduced by 0215f7d8c5 (powerpc: Fix races
with irq_work). Fix it by removing the early exit and relying on
code later on in the function to force an early decrementer:
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Resetting root port has more stuff to do than that for PCIe switch
ports and we should have resetting root port done in firmware instead
of the kernel itself. The problem was introduced by commit 5b2e198e
("powerpc/powernv: Rework EEH reset").
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the devspec OF attribute to PCI common code's set of device attributes
since it's not architecture dependent. As a side effect microblaze and
powerpc no longer need to use pcibios_add_platform_entries().
[bhelgaas: fold in #include for compile error]
Link: https://lkml.kernel.org/r/alpine.LFD.2.11.1404141101500.1529@denkbrett
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now powerpc is the only user of struct boot_param_header and FDT defines,
so they can be moved into the powerpc architecture code.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
With libfdt support, we can take advantage of helper accessors in libfdt
for accessing the FDT header data. This makes the code more readable and
makes the FDT blob structure more opaque to the kernel. This also
prepares for removing struct boot_param_header completely.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
Move the /memreserve/ processing and dtb memory reservations into
early_init_fdt_scan_reserved_mem. This converts arm, arm64, and powerpc
as they are the only users of early_init_fdt_scan_reserved_mem.
memblock_reserve is safe to call on the same region twice, so the
reservation check for the dtb in powerpc 32-bit reservations is safe to
remove.
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
Both powerpc and microblaze have the same FDT blob in debugfs feature.
Move this to common location and remove the powerpc and microblaze
implementations. This feature could become more useful when FDT
overlay support is added.
This changes the path of the blob from "$arch/flat-device-tree" to
"device-tree/flat-device-tree".
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
Make of_get_flat_dt_prop arguments compatible with libfdt fdt_getprop
call in preparation to convert FDT code to use libfdt. Make the return
value const and the property length ptr type an int.
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michal Simek <michal.simek@xilinx.com>
Tested-by: Grant Likely <grant.likely@linaro.org>
Tested-by: Stephen Chivers <schivers@csc.com>
The call to early_init_fdt_scan_reserved_mem will be skipped if
reserved-ranges is not found. Move the call earlier so that it is called
unconditionally.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Tested-by: Stephen Chivers <schivers@csc.com>
Our PV guest patching code assembles chunks of instructions on the fly when it
encounters more complicated instructions to hijack. These instructions need
to live in a section that we don't mark as non-executable, as otherwise we
fault when jumping there.
Right now we put it into the .bss section where it automatically gets marked
as non-executable. Add a check to the NX setting function to ensure that we
leave these particular pages executable.
Signed-off-by: Alexander Graf <agraf@suse.de>
The book3s_32 target can get built as module which means we don't see the
config define for it in code. Instead, check on the bool define
CONFIG_KVM_BOOK3S_32_HANDLER whenever we want to know whether we're building
for a book3s_32 host.
This fixes running book3s_32 kvm as a module for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Testing by Michael Neuling revealed that commit e4e3812150 ("KVM:
PPC: Book3S HV: Add transactional memory support") is missing the code
that saves away the checkpointed state of the guest when switching to
the host. This adds that code, which was in earlier versions of the
patch but went missing somehow.
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Numa fault is a method which help to achieve auto numa balancing.
When such a page fault takes place, the page fault handler will check
whether the page is placed correctly. If not, migration should be
involved to cut down the distance between the cpu and pages.
A pte with _PAGE_NUMA help to implement numa fault. It means not to
allow the MMU to access the page directly. So a page fault is triggered
and numa fault handler gets the opportunity to run checker.
As for the access of MMU, we need special handling for the powernv's guest.
When we mark a pte with _PAGE_NUMA, we already call mmu_notifier to
invalidate it in guest's htab, but when we tried to re-insert them,
we firstly try to map it in real-mode. Only after this fails, we fallback
to virt mode, and most of important, we run numa fault handler in virt
mode. This patch guards the way of real-mode to ensure that if a pte is
marked with _PAGE_NUMA, it will NOT be mapped in real mode, instead, it will
be mapped in virt mode and have the opportunity to be checked with placement.
Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch fixes this section mismatch:
WARNING: vmlinux.o(.text+0x1efc4): Section mismatch in reference from
the function apm821xx_pciex_init_port_hw() to the function
.init.text:ppc4xx_pciex_wait_on_sdr.isra.9()
The function apm821xx_pciex_init_port_hw() references the function
__init ppc4xx_pciex_wait_on_sdr.isra.9(). This is often because
apm821xx_pciex_init_port_hw lacks a __init annotation or the
annotation of ppc4xx_pciex_wait_on_sdr.isra.9 is wrong.
apm821xx_pciex_init_port_hw is only referenced by a struct in
__initdata, so it should be safe to add __init to
apm821xx_pciex_init_port_hw.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the guest cedes the vcpu or the vcpu has no guest to
run it naps. Clear the runlatch bit of the vcpu before
napping to indicate an idle cpu.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The secondary threads in the core are kept offline before launching guests
in kvm on powerpc: "371fefd6f2dc4666:KVM: PPC: Allow book3s_hv guests to use
SMT processor modes."
Hence their runlatch bits are cleared. When the secondary threads are called
in to start a guest, their runlatch bits need to be set to indicate that they
are busy. The primary thread has its runlatch bit set though, but there is no
harm in setting this bit once again. Hence set the runlatch bit for all
threads before they start guest.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Up until now we have been setting the runlatch bits for a busy CPU and
clearing it when a CPU enters idle state. The runlatch bit has thus
been consistent with the utilization of a CPU as long as the CPU is online.
However when a CPU is hotplugged out the runlatch bit is not cleared. It
needs to be cleared to indicate an unused CPU. Hence this patch has the
runlatch bit cleared for an offline CPU just before entering an idle state
and sets it immediately after it exits the idle state.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
While testing memory hot-remove, I found following dead lock:
Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
Process #1120 is trying to online memory499 by
echo 1 > memory499/online
In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
The backtrace of both processes are shown below:
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c000000000263ca4>] .online_pages+0x74/0x7b0
[<c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<c00000000053cbe8>] .device_online+0xb8/0x120
[<c00000000053cd04>] .online_store+0xb4/0xc0
[<c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c00000000030be14>] .__kernfs_remove+0x204/0x300
[<c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<c000000000539354>] .device_remove_attrs+0x54/0xc0
[<c000000000539fd8>] .device_del+0x158/0x250
[<c00000000053a104>] .device_unregister+0x34/0xa0
[<c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<c00000000026446c>] .remove_memory+0x8c/0xe0
[<c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():
* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/
void __ref remove_memory(int nid, u64 start, u64 size)
With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
module_init should return 0 or a negative errno.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Bump the boot wrapper BOOT_COMMAND_LINE_SIZE to match the
kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I've had a report that the current limit is too small for
an automated network based installer. Bump it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have two definitions of COMMAND_LINE_SIZE, one for the kernel
and one for the boot wrapper. I assume this is so the boot
wrapper can be self sufficient and not rely on kernel headers.
Having two defines with the same name is confusing, I just
updated the wrong one when trying to bump it.
Make the boot wrapper define unique by calling it
BOOT_COMMAND_LINE_SIZE.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The catalog version number was changed from a be32 (with proceeding
32bits of padding) to a be64, update the code to treat it as a be64
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fixup for "powerpc/perf: Add support for the hv gpci (get performance
counter info) interface".
Makes the "not enabled" message less awful (and hidden unless
debugging).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
fixup for "powerpc/perf: Add support for the hv 24x7 interface"
Makes the "not enabled" message less awful (and hides it in most cases).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The if condition check was based on a draft ISA doc. Remove the same.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have two copies of code that creates an OPAL sg list. Consolidate
these into a common set of helpers and fix the endian issues.
The flash interface embedded a version number in the num_entries
field, whereas the dump interface did did not. Since versioning
wasn't added to the flash interface and it is impossible to add
this in a backwards compatible way, just remove it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fix little endian issues with the OPAL error log code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The bitmap in opal_poll_events and opal_handle_interrupt is
big endian, so we need to byteswap it on little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We had some duplication of the internal OPAL functions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Using size_t in our APIs is asking for trouble, especially
when some OPAL calls use size_t pointers.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
leads to the pci_dev is not destroyed when hotplugging a pci device.
This patch release the unnecessary refcount.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
With this patch I was able to update firmware on an LE kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We have a subtle race when sending CPUs back to OPAL on kexec.
We mark them as "in real mode" right before we send them down. Once
we've booted the new kernel, it might try to call opal_reinit_cpus()
to change endianness, and that requires all CPUs to be spinning inside
OPAL.
However there is no synchronization here and we've observed cases
where the returning CPUs hadn't established their new state inside
OPAL before opal_reinit_cpus() is called, causing it to fail.
The proper fix is to actually wait for them to go down all the way
from the kexec'ing kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The size of the sysparam sysfs files is determined from the device tree
at boot. However the buffer is hard coded to 64 bytes. If we encounter a
parameter that is larger than 64, or miss-parse the device tree, the
buffer will overflow when reading or writing to the parameter.
Check it at discovery time, and if the parameter is too large, do not
create a sysfs entry for it.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The sysparam code currently uses the userspace supplied number of
bytes when memcpy()ing in to a local 64-byte buffer.
Limit the maximum number of bytes by the size of the buffer.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The OPAL calls are returning int64_t values, which the sysparam code
stores in an int, and the sysfs callback returns ssize_t. Make code a
easier to read by consistently using ssize_t.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When a sysparam query in OPAL returned a negative value (error code),
sysfs would spew out a decent chunk of memory; almost 64K more than
expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
sysfs code at sys_param_show.
The return value of sys_param_show is a ssize_t, calculated using
return ret ? ret : attr->param_size;
Alan Modra explains:
"attr->param_size" is an unsigned int, "ret" an int, so the overall
expression has type unsigned int. Result is that ret is cast to
unsigned int before being cast to ssize_t.
Instead of using the ternary operator, set ret to the param_size if an
error is not detected. The same bug exists in the sysfs write callback;
this patch fixes it in the same way.
A note on debugging this next time: on my system gcc will warn about
this if compiled with -Wsign-compare, which is not enabled by -Wall,
only -Wextra.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
commit 41dd03a9 may cause Oops in rtas_stop_self().
The reason is that the rtas_args was moved into stack space. For a box
with more that 4GB RAM, the stack could easily be outside 32bit range,
but RTAS is 32bit.
So the patch moves rtas_args away from stack by adding static before
it.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit aac416fc38 (lkdtm: flush icache and report actions) calls
flush_icache_range from a module. It's exported on most architectures
that implement it, but not on powerpc. This patch exports it to fix
the module link failure.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Powerpc allows reordering over its ll/sc implementation. Implement the
two new barriers as appropriate.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-gg2ffgq32sjgy9b8lj6m3hsc@git.kernel.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
3bc955987f ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000041d78
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
Call Trace:
[c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
[c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98
Fix it by using a temporary variable for the iterator.
[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987f powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
3bc955987f ("powerpc/PCI: Use list_for_each_entry() for bus traversal")
caused a NULL pointer dereference because the loop body set the iterator to
NULL:
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xc000000000041d78
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c000000000041d78] .sys_pciconfig_iobase+0x68/0x1f0
LR [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0
Call Trace:
[c0000003b4787db0] [c000000000041e0c] .sys_pciconfig_iobase+0xfc/0x1f0 (unreliable)
[c0000003b4787e30] [c000000000009ed8] syscall_exit+0x0/0x98
Fix it by using a temporary variable for the iterator.
[bhelgaas: changelog, drop tmp_bus initialization]
Fixes: 3bc955987f powerpc/PCI: Use list_for_each_entry() for bus traversal
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Commit 8f619b5429 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
Pull audit updates from Eric Paris.
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
- Fix for a recently introduced CPU hotplug regression in ARM KVM
from Ming Lei.
- Fixes for breakage in the at32ap, loongson2_cpufreq, and unicore32
cpufreq drivers introduced during the 3.14 cycle (-stable material)
from Chen Gang and Viresh Kumar.
- New powernv cpufreq driver from Vaidyanathan Srinivasan, with bits
from Gautham R Shenoy and Srivatsa S Bhat.
- Exynos cpufreq driver fix preventing it from being included into
multiplatform builds that aren't supported by it from Sachin Kamat.
- cpufreq cleanups related to the usage of the driver_data field in
struct cpufreq_frequency_table from Viresh Kumar.
- cpufreq ppc driver cleanup from Sachin Kamat.
- Intel BayTrail support for intel_idle and ACPI idle from Len Brown.
- Intel CPU model 54 (Atom N2000 series) support for intel_idle from
Jan Kiszka.
- intel_idle fix for Intel Ivy Town residency targets from Len Brown.
- turbostat updates (Intel Broadwell support and output cleanups)
from Len Brown.
- New cpuidle sysfs attribute for exporting C-states' target residency
information to user space from Daniel Lezcano.
- New kernel command line argument to prevent power domains enabled
by the bootloader from being turned off even if they are not in use
(for diagnostics purposes) from Tushar Behera.
- Fixes for wakeup sysfs attributes documentation from Geert Uytterhoeven.
- New ACPI video blacklist entry for ThinkPad Helix from Stephen Chandler
Paul.
- Assorted ACPI cleanups and a Kconfig help update from Jonghwan Choi,
Zhihui Zhang, Hanjun Guo.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTRxLAAAoJEILEb/54YlRxnCwP/16UwO/eVE8SIi0TqQboikFC
k8u7F3zgDYG+xPSzXlCR+J7thTxGueQlrb+aM18PYuMVgaw2rpy7U7SIqEk8s6oR
uFnzZCWKA5ZebbZn+NlodnQaJmbgJxwsVJDuuechUka/e67CaIc54JULi2ynZ0lz
Kg/nU3NJhu4S81cT5SOTkJ9xE63oxHcCwKbNqEmxn7x7ddFzGK/DThG67NMEnW1F
vHbBTSyI6vmXXg1f9aobUtuo3PfJkkx5jD+nR1H2e6wmB64tW7JPVKV3mi6LJfYM
ui/8/gNb3PUMHMX1QbL9EFbPxl9miQx2NJ7dgFKa1HZ/WPyiXpJjz7uGr9O3Fau3
cFVREdaW8p2TAYWOEgH8luohhdK0j8UEpR/sEm0TrTjsK8wqczVf/hz6RraVJZiN
ck6eVHjY6m3/bFQauZQ/r+DNeeNcdr+iLejgjbh/MXuF3j0kx+1dkKkzCEU2TgEZ
9etF0uzjlgyXySyxNKBeSW13+ssVA6kF5/BHns7LHoxTfGu7Y4oVaWUi+j74i66O
bc+2ileNal71mS4v9gomnj6Ffj8oH8KXFA7k0sEsAdwLZNgThB5bTppmY/U7Y5Ce
hTS81tcGe2vOVQzF9iFOF7LNKKussAVAtrgkkrA8lJLeOTfQbIo4+fMhORxf3X/p
3O7R/jc4cT+IXK8a2xRt
=hGKg
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI and power management fixes and updates from Rafael Wysocki:
"This is PM and ACPI material that has emerged over the last two weeks
and one fix for a CPU hotplug regression introduced by the recent CPU
hotplug notifiers registration series.
Included are intel_idle and turbostat updates from Len Brown (these
have been in linux-next for quite some time), a new cpufreq driver for
powernv (that might spend some more time in linux-next, but BenH was
asking me so nicely to push it for 3.15 that I couldn't resist), some
cpufreq fixes and cleanups (including fixes for some silly breakage in
a couple of cpufreq drivers introduced during the 3.14 cycle),
assorted ACPI cleanups, wakeup framework documentation fixes, a new
sysfs attribute for cpuidle and a new command line argument for power
domains diagnostics.
Specifics:
- Fix for a recently introduced CPU hotplug regression in ARM KVM
from Ming Lei.
- Fixes for breakage in the at32ap, loongson2_cpufreq, and unicore32
cpufreq drivers introduced during the 3.14 cycle (-stable material)
from Chen Gang and Viresh Kumar.
- New powernv cpufreq driver from Vaidyanathan Srinivasan, with bits
from Gautham R Shenoy and Srivatsa S Bhat.
- Exynos cpufreq driver fix preventing it from being included into
multiplatform builds that aren't supported by it from Sachin Kamat.
- cpufreq cleanups related to the usage of the driver_data field in
struct cpufreq_frequency_table from Viresh Kumar.
- cpufreq ppc driver cleanup from Sachin Kamat.
- Intel BayTrail support for intel_idle and ACPI idle from Len Brown.
- Intel CPU model 54 (Atom N2000 series) support for intel_idle from
Jan Kiszka.
- intel_idle fix for Intel Ivy Town residency targets from Len Brown.
- turbostat updates (Intel Broadwell support and output cleanups)
from Len Brown.
- New cpuidle sysfs attribute for exporting C-states' target
residency information to user space from Daniel Lezcano.
- New kernel command line argument to prevent power domains enabled
by the bootloader from being turned off even if they are not in use
(for diagnostics purposes) from Tushar Behera.
- Fixes for wakeup sysfs attributes documentation from Geert
Uytterhoeven.
- New ACPI video blacklist entry for ThinkPad Helix from Stephen
Chandler Paul.
- Assorted ACPI cleanups and a Kconfig help update from Jonghwan
Choi, Zhihui Zhang, Hanjun Guo"
* tag 'pm+acpi-3.15-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
ACPI: Update the ACPI spec information in Kconfig
arm, kvm: fix double lock on cpu_add_remove_lock
cpuidle: sysfs: Export target residency information
cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
cpufreq: create another field .flags in cpufreq_frequency_table
cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
cpufreq: don't print value of .driver_data from core
cpufreq: ia64: don't set .driver_data to index
cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
cpufreq: powernv: cpufreq driver for powernv platform
cpufreq: at32ap: don't declare local variable as static
cpufreq: loongson2_cpufreq: don't declare local variable as static
cpufreq: unicore32: fix typo issue for 'clk'
cpufreq: exynos: Disable on multiplatform build
PM / wakeup: Correct presence vs. emptiness of wakeup_* attributes
PM / domains: Add pd_ignore_unused to keep power domains enabled
ACPI / dock: Drop dock_device_ids[] table
ACPI / video: Favor native backlight interface for ThinkPad Helix
ACPI / thermal: Fix wrong variable usage in debug statement
...
Pull more powerpc updates from Ben Herrenschmidt:
"Here are a few more powerpc things for you.
So you'll find here the conversion of the two new firmware sysfs
interfaces to the new API for self-removing files that Greg and Tejun
introduced, so they can finally remove the old one.
I'm also reverting the hwmon driver for powernv. I shouldn't have
merged it, I got a bit carried away here. I hadn't realized it was
never CCed to the relevant maintainer(s) and list(s), and happens to
have some issues so I'm taking it out and it will come back via the
proper channels.
The rest is a bunch of LE fixes (argh, some of the new stuff was
broken on LE, I really need to start testing LE myself !) and various
random fixes here and there.
Finally one bit that's not strictly a fix, which is the HVC OPAL
change to "kick" the HVC thread when the firmware tells us there is
new incoming data. I don't feel like waiting for this one, it's
simple enough, and it makes a big difference in console responsiveness
which is good for my nerves"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (26 commits)
powerpc/powernv Adapt opal-elog and opal-dump to new sysfs_remove_file_self
Revert "powerpc/powernv: hwmon driver for power values, fan rpm and temperature"
power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
powerpc/opal: Add missing include
powerpc: Convert last uses of __FUNCTION__ to __func__
powerpc: Add lq/stq emulation
powerpc/powernv: Add invalid OPAL call
powerpc/powernv: Add OPAL message log interface
powerpc/book3s: Fix mc_recoverable_range buffer overrun issue.
powerpc: Remove dead code in sycall entry
powerpc: Use of_node_init() for the fakenode in msi_bitmap.c
powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()
powerpc/powernv: Fix endian issues with sensor code
powerpc/powernv: Fix endian issues with OPAL async code
tty/hvc_opal: Kick the HVC thread on OPAL console events
powerpc/powernv: Add opal_notifier_unregister() and export to modules
powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early
powerpc/ppc64: Gracefully handle early interrupts
...
We are currently using sysfs_schedule_callback() which is deprecated
and about to be removed. Switch to the new interface instead.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since v1:
Edited the comment according to Srivatsa's suggestion.
During the testing, we encounter below WARN followed by Oops:
WARNING: at kernel/sched/core.c:6218
...
NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200
LR [c000000000101358] .build_sched_domains+0xec8/0x1200
PACATMSCRATCH [800000000000f032]
Call Trace:
[c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200
[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
...
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0
LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200
PACATMSCRATCH [8000000000029032]
Call Trace:
[c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0
[c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200
[c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510
[c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0
[c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30
...
This was caused by that 'sd->groups == NULL' after building groups, which
was caused by the empty 'sd->span'.
The cpu's domain contained nothing because the cpu was assigned to a wrong
node, due to the following unfortunate sequence of events:
1. The hypervisor sent a topology update to the guest OS, to notify changes
to the cpu-node mapping. However, the update was actually redundant - i.e.,
the "new" mapping was exactly the same as the old one.
2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting
the 'for-loop' in arch_update_cpu_topology().
3. So we ended up calling stop-machine() with an empty cpumask list, which made
stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as
the cpu to run the payload (the update_cpu_topology() function).
4. This causes update_cpu_topology() to be run by CPU0. And since 'updates'
is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology()
finds update->cpu as well as update->new_nid to be 0. In other words, we
end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly.
Along with the following wrong updating, it causes the sched-domain rebuild
code to break and crash the system.
Fix this by skipping the topology update in cases where we find that
the topology has not actually changed in reality (ie., spurious updates).
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Robert Jennings <rcj@linux.vnet.ibm.com>
CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
CC: Alistair Popple <alistair@popple.id.au>
Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When building modules with a native le toolchain the linker will
generate R_PPC64_TOCSAVE relocations when it's safe to omit saving r2 on
a plt call. This isn't helpful in the conext of a kernel module and the
kernel will fail to load those modules with an error like:
nf_conntrack: Unknown ADD relocation: 109
This patch tells the linker to avoid createing R_PPC64_TOCSAVE
relocations allowing modules to load.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Here rcu_assign_pointer() is ensuring that the
initialization of a structure is carried out before storing a pointer
to that structure.
So, rcu_assign_pointer(p, NULL) can always safely be converted to
RCU_INIT_POINTER(p, NULL).
Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
next-20140324 currently fails compiling celleb_defconfig with:
arch/powerpc/include/asm/opal.h:894:42: error: 'struct notifier_block' declared inside parameter list [-Werror]
arch/powerpc/include/asm/opal.h:894:42: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
arch/powerpc/include/asm/opal.h:896:14: error: 'struct notifier_block' declared inside parameter list [-Werror]
This is due to a missing include which is added here.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Just about all of these have been converted to __func__,
so convert the last uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Recent CPUs support quad word load and store instructions. Add
support to the alignment handler for them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This call will not be understood by OPAL, and cause it to add an error
to it's log. Among other things, this is useful for testing the
behaviour of the log as it fills up.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
OPAL provides an in-memory circular buffer containing a message log
populated with various runtime messages produced by the firmware.
Provide a sysfs interface /sys/firmware/opal/msglog for userspace to
view the messages.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently we wrongly allocate mc_recoverable_range buffer (to hold
recoverable ranges) based on size of the property "mcheck-recoverable-ranges".
This results in allocating less memory to hold available recoverable range
entries from /proc/device-tree/ibm,opal/mcheck-recoverable-ranges.
This patch fixes this issue by allocating mc_recoverable_range buffer based
on number of entries of recoverable ranges instead of device property size.
Without this change we end up allocating less memory and run into memory
corruption issue.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In:
commit 742415d6b6
Author: Michael Neuling <mikey@neuling.org>
powerpc: Turn syscall handler into macros
We converted the syscall entry code onto macros, but in doing this we
introduced some cruft that's never run and should never have been added.
This removes that code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We need to handle numa pte via the slow path
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
One OPAL call and one device tree property needed byte swapping.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* pm-cpufreq:
cpufreq: ppc: Remove duplicate inclusion of fsl_soc.h
cpufreq: create another field .flags in cpufreq_frequency_table
cpufreq: use kzalloc() to allocate memory for cpufreq_frequency_table
cpufreq: don't print value of .driver_data from core
cpufreq: ia64: don't set .driver_data to index
cpufreq: powernv: Select CPUFreq related Kconfig options for powernv
cpufreq: powernv: Use cpufreq_frequency_table.driver_data to store pstate ids
cpufreq: powernv: cpufreq driver for powernv platform
cpufreq: at32ap: don't declare local variable as static
cpufreq: loongson2_cpufreq: don't declare local variable as static
cpufreq: unicore32: fix typo issue for 'clk'
cpufreq: exynos: Disable on multiplatform build
Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...
Fix breakage which will be exposed by the patch "kconfig: make allnoconfig
disable options behind EMBEDDED and EXPERT".
arch/powerpc/kernel/mce.c, compiled in for PPC_BOOK3S_64, calls
functions only built when IRQ_WORK, so select it. Fixes the following
build error:
arch/powerpc/kernel/built-in.o: In function `.machine_check_queue_event':
(.text+0x11260): undefined reference to `.irq_work_queue'
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes an artificial RapidIO bus root device and establishes
actual device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip
or an eternal bridge, known as "mport").
Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller. With introduction of devices using
multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
is very limiting or does not work at all. The implemented changes allow
to properly reference platform's local RapidIO mport devices and provide
device details needed for upper layers.
This change to RapidIO device hierarchy does not break any known
existing kernel or user space interfaces.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The purpose of this single series of commits from Srivatsa S Bhat (with
a small piece from Gautham R Shenoy) touching multiple subsystems that use
CPU hotplug notifiers is to provide a way to register them that will not
lead to deadlocks with CPU online/offline operations as described in the
changelog of commit 93ae4f978c (CPU hotplug: Provide lockless versions
of callback registration functions).
The first three commits in the series introduce the API and document it
and the rest simply goes through the users of CPU hotplug notifiers and
converts them to using the new method.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTQow2AAoJEILEb/54YlRxW4QQAJlYRDUzwFJzJzYhltQYuVR+
4D74XMtvXgoJfg3cwdSWvMKKpJZnA9BVN0f7Hcx9wYmgdexYUuHeZJmMNyc3S2+g
KjKBIsugvgmZhHbbLd6TJ6GBbhGT5JLt9VmSfL9zIkveInU1YHFUUqL/mxdHm4J0
BSGKjk2rN3waRJgmY+xfliFLtQjDKFwJpMuvrgtoUyfas3f4sIV43UNbqdvA/weJ
rzedxXOlKH/id4b56lj/4iIzcoL3mwvJJ7r6n0CEMsKv87z09kqR0O+69Tsq/cgs
j17CsvoJOmZGk3QTeKVMQWBsvk6aPoDu3zK83gLbQMt+qjOpSTbJLz/3HZw4/TrW
ss4nuZne1DLMGS+6hoxYbTP+6Ni//Kn+l/LrHc5jb7m1X3lMO4W2aV3IROtIE1rv
lEP1IG01NU4u9YwkVj1dyhrkSp8tLPul4SrUK8W+oNweOC5crjJV7vJbIPJgmYiM
IZN55wln0yVRtR4TX+rmvN0PixsInE8MeaVCmReApyF9pdzul/StxlBze5BKLSJD
cqo1kNPpsmdxoDucqUpQ/gSvy+IOl2qnlisB5PpV93sk7De6TFDYrGHxjYIW7jMf
StXwdCDDQhzd2Q8Kfpp895A1dbIl8rKtwA6bTU2eX+BfMVFzuMdT44cvosx1+UdQ
sWl//rg76nb13dFjvF+q
=SW7Q
-----END PGP SIGNATURE-----
Merge tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
"The purpose of this single series of commits from Srivatsa S Bhat
(with a small piece from Gautham R Shenoy) touching multiple
subsystems that use CPU hotplug notifiers is to provide a way to
register them that will not lead to deadlocks with CPU online/offline
operations as described in the changelog of commit 93ae4f978c ("CPU
hotplug: Provide lockless versions of callback registration
functions").
The first three commits in the series introduce the API and document
it and the rest simply goes through the users of CPU hotplug notifiers
and converts them to using the new method"
* tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
net/iucv/iucv.c: Fix CPU hotplug callback registration
net/core/flow.c: Fix CPU hotplug callback registration
mm, zswap: Fix CPU hotplug callback registration
mm, vmstat: Fix CPU hotplug callback registration
profile: Fix CPU hotplug callback registration
trace, ring-buffer: Fix CPU hotplug callback registration
xen, balloon: Fix CPU hotplug callback registration
hwmon, via-cputemp: Fix CPU hotplug callback registration
hwmon, coretemp: Fix CPU hotplug callback registration
thermal, x86-pkg-temp: Fix CPU hotplug callback registration
octeon, watchdog: Fix CPU hotplug callback registration
oprofile, nmi-timer: Fix CPU hotplug callback registration
intel-idle: Fix CPU hotplug callback registration
clocksource, dummy-timer: Fix CPU hotplug callback registration
drivers/base/topology.c: Fix CPU hotplug callback registration
acpi-cpufreq: Fix CPU hotplug callback registration
zsmalloc: Fix CPU hotplug callback registration
scsi, fcoe: Fix CPU hotplug callback registration
scsi, bnx2fc: Fix CPU hotplug callback registration
scsi, bnx2i: Fix CPU hotplug callback registration
...
Enable CPUFreq for PowerNV. Select "performance", "powersave",
"userspace" and "ondemand" governors. Choose "ondemand" to be the
default governor.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Backend driver to dynamically set voltage and frequency on
IBM POWER non-virtualized platforms. Power management SPRs
are used to set the required PState.
This driver works in conjunction with cpufreq governors
like 'ondemand' to provide a demand based frequency and
voltage setting on IBM POWER non-virtualized platforms.
PState table is obtained from OPAL v3 firmware through device
tree.
powernv_cpufreq back-end driver would parse the relevant device-tree
nodes and initialise the cpufreq subsystem on powernv platform.
The code was originally written by svaidy@linux.vnet.ibm.com. Over
time it was modified to accomodate bug-fixes as well as updates to the
the cpu-freq core. Relevant portions of the change logs corresponding
to those modifications are noted below:
* The policy->cpus needs to be populated in a hotplug-invariant
manner instead of using cpu_sibling_mask() which varies with
cpu-hotplug. This is because the cpufreq core code copies this
content into policy->related_cpus mask which should not vary on
cpu-hotplug. [Authored by srivatsa.bhat@linux.vnet.ibm.com]
* Create a helper routine that can return the cpu-frequency for the
corresponding pstate_id. Also, cache the values of the pstate_max,
pstate_min and pstate_nominal and nr_pstates in a static structure
so that they can be reused in the future to perform any
validations. [Authored by ego@linux.vnet.ibm.com]
* Create a driver attribute named cpuinfo_nominal_freq which creates
a sysfs read-only file named cpuinfo_nominal_freq. Export the
frequency corresponding to the nominal_pstate through this
interface.
Nominal frequency is the highest non-turbo frequency for the
platform. This is generally used for setting governor policies
from user space for optimal energy efficiency. [Authored by
ego@linux.vnet.ibm.com]
* Implement a powernv_cpufreq_get(unsigned int cpu) method which will
return the current operating frequency. Export this via the sysfs
interface cpuinfo_cur_freq by setting powernv_cpufreq_driver.get to
powernv_cpufreq_get(). [Authored by ego@linux.vnet.ibm.com]
[Change log updated by ego@linux.vnet.ibm.com]
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
OPAL defines opal_msg as a big endian struct so we have to
byte swap it on little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
opal_notifier_register() is missing a pending "unregister" variant
and should be exposed to modules.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Turn them on at the same time as we allow MSR_IR/DR in the paca
kernel MSR, ie, after the MMU has been setup enough to be able
to handle relocated access to the linear mapping.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If we take an interrupt such as a trap caused by a BUG_ON before the
MMU has been setup, the interrupt handlers try to enable virutal mode
and cause a recursive crash, making the original problem very hard
to debug.
This fixes it by adjusting the "kernel_msr" value in the PACA so that
it only has MSR_IR and MSR_DR (translation for instruction and data)
set after the MMU has been initialized for the processor.
We may still not have a console yet but at least we don't get into
a recursive fault (and early debug console or memory dump via JTAG
of the kernel buffer *will* give us the proper error).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
All our cpu feature updates were done for every CPU in the device-tree,
thus overwriting the cputable bits over and over again. Instead do them
only for the boot CPU.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move the definition to setup-common.c and set the init value
to -1 on both 32 and 64-bit (it was 0 on 64-bit).
Additionally add a check to prom.c to garantee that the init
value has been udpated after the DT scan.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
For historical reasons that code was under #ifdef CONFIG_PPC_PSERIES
but it applies equally to all 64-bit platforms.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>