Commit Graph

3814 Commits

Author SHA1 Message Date
Gavin Shan
5459ae1431 powerpc/eeh: Use interruptible sleep in keehd
To replace down() with down_interrutible() to avoid following
warning:

[c00000007ba7b710] [c000000000014410] .__switch_to+0x1b0/0x380
[c00000007ba7b7c0] [c0000000007b408c] .__schedule+0x3ec/0x970
[c00000007ba7ba50] [c0000000007b1f24] .schedule_timeout+0x1a4/0x2b0
[c00000007ba7bb30] [c0000000007b34a4] .__down+0xa4/0x104
[c00000007ba7bbf0] [c0000000000b9230] .down+0x60/0x70
[c00000007ba7bc80] [c0000000000336d0] .eeh_event_handler+0x70/0x190
[c00000007ba7bd30] [c0000000000b1a58] .kthread+0xe8/0xf0
[c00000007ba7be30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x8

This also avoids keeping the load average up while doing nothing.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-25 17:24:41 +10:00
Gavin Shan
ef6a285773 powerpc/eeh: Remove eeh_mutex
Originally, eeh_mutex was introduced to protect the PE hierarchy
tree and the attached EEH devices because EEH core was possiblly
running with multiple threads to access the PE hierarchy tree.
However, we now have only one kthread in EEH core. So we needn't
the eeh_mutex and just remove it.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-25 17:24:41 +10:00
Michael Neuling
540e07c67e powerpc/hw_brk: Fix clearing of extraneous IRQ
In 9422de3 "powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint
registers" we changed the way we mark extraneous irqs with this:

-	info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) &&
-			(dar - bp->attr.bp_addr < bp->attr.bp_len));
+	if (!((bp->attr.bp_addr <= dar) &&
+	      (dar - bp->attr.bp_addr < bp->attr.bp_len)))
+		info->type |= HW_BRK_TYPE_EXTRANEOUS_IRQ;

Unfortunately this is bogus as it never clears extraneous IRQ if it's already
set.

This correctly clears extraneous IRQ before possibly setting it.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-25 17:24:40 +10:00
Michael Neuling
b0b0aa9c7f powerpc/hw_brk: Fix setting of length for exact mode breakpoints
The smallest match region for both the DABR and DAWR is 8 bytes, so the
kernel needs to filter matches when users want to look at regions smaller than
this.

Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8.
This is wrong as in exact mode we should only match on 1 address, hence the
length should be 1.

This ensures that the kernel will filter out any exact mode hardware breakpoint
matches on any addresses other than the requested one.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-25 17:24:39 +10:00
Aneesh Kumar K.V
12bc9f6fc1 powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte
Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
document why we don't need to handle transparent hugepages at callsites.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-21 16:01:54 +10:00
Gavin Shan
b95cd2cd44 powerpc/eeh: Allow to check fenced PHB proactively
It's meaningless to handle frozen PE if we already had fenced PHB.
The patch intends to check the PHB state before checking PE. If the
PHB has been put into fenced state, we need take care of that firstly.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:53 +10:00
Gavin Shan
8a6b1bc70d powerpc/eeh: EEH core to handle special event
On PowerNV platform, the EEH event caused by interrupt won't have
binding PE. The patch enables EEH core to handle the special event.
To avoid the current logic we have, The eeh_handle_event() is renamed
to eeh_handle_normal_event(), and the eeh_handle_special_event() is
introduced. The function eeh_handle_event() dispatches to above two
functions according to the input parameter. Besides, new backend
"next_error" added to eeh_ops and it's expected to have following
return values:

        4 - Dead IOC           3 - Dead PHB
        2 - Fenced PHB         1 - Frozen PE
        0 - No error found

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:14 +10:00
Gavin Shan
4907581dc2 powerpc/eeh: Export confirm_error_lock
An EEH event is created and queued to the event queue for each
ingress EEH error. When there're mutiple EEH errors, we need serialize
the process to keep consistent PE state (flags). The spinlock
"confirm_error_lock" was introduced for the purpose. We'll inject
EEH event upon error reporting interrupts on PowerNV platform. So
we export the spinlock for that to use for consistent PE state.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:11 +10:00
Gavin Shan
9986659534 powerpc/eeh: Allow to purge EEH events
On PowerNV platform, we might run into the situation where subsequent
events are duplicated events of former one, which is being processed.
For the case, we need the function implemented by the patch to purge
EEH events accordingly.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:07 +10:00
Gavin Shan
5a71978e4b powerpc/eeh: Trace time on first error for PE
We're not expecting that one specific PE got frozen for over 5
times in last hour. Otherwise, the PE will be removed from the
system upon newly coming EEH errors. The patch introduces time
stamp to trace the first error on specific PE in last hour and
function to update that accordingly. Besides, the time stamp
is recovered during PE hotplug path as we did for frozen count.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:04 +10:00
Gavin Shan
c86085580d powerpc/eeh: Single kthread to handle events
We possiblly have multiple kthreads running for multiple EEH errors
(events) and use one spinlock to make the process of handling those
EEH events serialized. That's unnecessary and the patch creates only
one kthread, which is started during EEH core initialization time in
eeh_init(). A new semaphore introduced to count the number of existing
EEH events in the queue and the kthread waiting on the semaphore.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:01 +10:00
Gavin Shan
26a74850b3 powerpc/eeh: Delay EEH probe during hotplug
While doing EEH recovery, the PCI devices of the problematic PE
should be removed and then added to the system again. During the
so-called hotplug event, the PCI devices of the problematic PE
will be probed through early/late phase. We would delay EEH probe
on late point for PowerNV platform since the PCI device isn't
available in early phase.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:58 +10:00
Gavin Shan
326a98ea93 powerpc/eeh: Refactor eeh_reset_pe_once()
We shouldn't check that the returned PE status is exactly equal to
(EEH_STATE_MMIO_ACTIVE | EEH_STATE_DMA_ACTIVE) but instead only check
that they are both set.

[benh: changelog]
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:54 +10:00
Gavin Shan
21fd21f590 powerpc/eeh: EEH post initialization operation
The patch adds new EEH operation post_init. It's used to notify
the platform that EEH core has completed the EEH probe. By that,
PowerNV platform starts to use the services supplied by EEH
functionality.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:51 +10:00
Gavin Shan
51fb5f5632 powerpc/eeh: Make eeh_init() public
For EEH on PowerNV platform, we will do EEH probe based on the
real PCI devices. The PCI devices are available after PCI probe.
So we have to call eeh_init() explicitly on PowerNV platform
after PCI probe. The patch also does EEH probe for PowerNV platform
in eeh_init().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:48 +10:00
Gavin Shan
8cdb283371 powerpc/eeh: Trace PCI bus from PE
There're several types of PEs can be supported for now: PHB, Bus
and Device dependent PE. For PCI bus dependent PE, tracing the
corresponding PCI bus from PE (struct eeh_pe) would make the code
more efficient. The patch also enables the retrieval of PCI bus based
on the PCI bus dependent PE.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:45 +10:00
Gavin Shan
0156680854 powerpc/eeh: Make eeh_pe_get() public
While processing EEH event interrupt from P7IOC, we need function
to retrieve the PE according to the indicated EEH device. The patch
makes function eeh_pe_get() public so that other source files can call
it for that purpose. Also, the patch fixes referring to wrong BDF
(Bus/Device/Function) address while searching PE in function
__eeh_pe_get().

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:41 +10:00
Gavin Shan
9ff67433ce powerpc/eeh: Make eeh_phb_pe_get() public
One of the possible cases indicated by P7IOC interrupt is fenced
PHB. For that case, we need fetch the PE corresponding to the PHB
and disable the PHB and all subordinate PCI buses/devices, recover
from the fenced state and eventually enable the whole PHB. We need
one function to fetch the PHB PE outside eeh_pe.c and the patch is
going to make eeh_phb_pe_get() public for that purpose.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:38 +10:00
Gavin Shan
317f06de78 powerpc/eeh: Move common part to kernel directory
The patch moves the common part of EEH core into arch/powerpc/kernel
directory so that we needn't PPC_PSERIES while compiling POWERNV
platform:

        * Move the EEH common part into arch/powerpc/kernel
        * Move the functions for PCI hotplug from pSeries platform to
          arch/powerpc/kernel/pci-hotplug.c
        * Move CONFIG_EEH from arch/powerpc/platforms/pseries/Kconfig to
          arch/powerpc/platforms/Kconfig
        * Adjust makefile accordingly

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:35 +10:00
Michael Neuling
87b4e5393a powerpc/tm: Fix return of active 64bit signals
Currently we only restore signals which are transactionally suspended but it's
possible that the transaction can be restored even when it's active.  Most
likely this will result in a transactional rollback by the hardware as the
transaction will have been doomed by an earlier treclaim.

The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel.  That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
assumptions based on having software rollback.

This changes the signal return code to always restore both contexts on 64 bit
signal return.  It also ensures that the MSR TM bits are properly restored from
the signal context which they are not currently.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:28 +10:00
Michael Neuling
55e4341850 powerpc/tm: Fix return of 32bit rt signals to active transactions
Currently we only restore signals which are transactionally suspended but it's
possible that the transaction can be restored even when it's active.  Most
likely this will result in a transactional rollback by the hardware as the
transaction will have been doomed by an earlier treclaim.

The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel.  That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
assumptions based on having software rollback.

This changes the signal return code to always restore both contexts on 32 bit
rt signal return.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:25 +10:00
Michael Neuling
2c27a18f87 powerpc/tm: Fix restoration of MSR on 32bit signal return
Currently we clear out the MSR TM bits on signal return assuming that the
signal should never return to an active transaction.

This is bogus as the user may do this.  It's most likely the transaction will
be doomed due to a treclaim but that's a problem for the HW not the kernel.

The current code is a legacy of earlier kernel implementations which did
software rollback of active transactions in the kernel.  That code has now gone
but we didn't correctly fix up this part of the signals code which still makes
the assumption that it must be returning to a suspended transaction.

This pulls out both MSR TM bits from the user supplied context rather than just
setting TM suspend.  We pull out only the bits needed to ensure the user can't
do anything dangerous to the MSR.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:22 +10:00
Michael Neuling
fee5545071 powerpc/tm: Fix 32 bit non-rt signals
Currently sys_sigreturn() is TM unaware.  Therefore, if we take a 32 bit signal
without SIGINFO (non RT) inside a transaction, on signal return we don't
restore the signal frame correctly.

This checks if the signal frame being restoring is an active transaction, and
if so, it copies the additional state to ptregs so it can be restored.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:18 +10:00
Michael Neuling
1d25f11fdb powerpc/tm: Fix writing top half of MSR on 32 bit signals
The MSR TM controls are in the top 32 bits of the MSR hence on 32 bit signals,
we stick the top half of the MSR in the checkpointed signal context so that the
user can access it.

Unfortunately, we don't currently write anything to the checkpointed signal
context when coming in a from a non transactional process and hence the top MSR
bits can contain junk.

This updates the 32 bit signal handling code to always write something to the
top MSR bits so that users know if the process is transactional or not and the
kernel can use it on signal return.

Signed-off-by: Michael Neuling <mikey@neuling.org>
cc: stable@vger.kernel.org (v3.9+)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:15 +10:00
Benjamin Herrenschmidt
968219fa33 powerpc/8xx: Remove 8xx specific "minimal FPU emulation"
This is duplicated code from math-emu and implements such a small
subset of the FPU (load/stores/fmr) that it's essentially pointless
nowdays.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:12 +10:00
Benjamin Herrenschmidt
4e63f8edfe powerpc/math-emu: Allow math-emu to be used for HW FPU
(Including 64-bit ones)

This allow SW emulation by the kernel of optional instructions
such as fsqrt which aren't implemented on some processors, and
thus fixes some Fedora 19 issues such as Anaconda since the
compiler is set to generate those by default on 64-bit.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:05:09 +10:00
Bharat Bhushan
13d543cd79 powerpc: Restore dbcr0 on user space exit
On BookE (Branch taken + Single Step) is as same as Branch Taken
on BookS and in Linux we simulate BookS behavior for BookE as well.
When doing so, in Branch taken handling we want to set DBCR0_IC but
we update the current->thread->dbcr0 and not DBCR0.

Now on 64bit the current->thread.dbcr0 (and other debug registers)
is synchronized ONLY on context switch flow. But after handling
Branch taken in debug exception if we return back to user space
without context switch then single stepping change (DBCR0_ICMP)
does not get written in h/w DBCR0 and Instruction Complete exception
does not happen.

This fixes using ptrace reliably on BookE-PowerPC

lmbench latency test (lat_syscall) Results are (they varies a little
on each run)

1) ./lat_syscall <action> /dev/shm/uImage

action:	Open	read	write	stat	fstat	null
Before:	3.8618	0.2017	0.2851	1.6789	0.2256	0.0856
After:	3.8580	0.2017	0.2851	1.6955	0.2255	0.0856

1) ./lat_syscall -P 2 -N 10 <action> /dev/shm/uImage
action:	Open	read	write	stat	fstat	null
Before:	4.1388	0.2238	0.3066	1.7106	0.2256	0.0856
After:	4.1413	0.2236	0.3062	1.7107	0.2256	0.0856

[ Slightly modified to avoid extra branch in the fast path
  on Book3S and fix build on all non-BookE 64-bit -- BenH
]

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:04:19 +10:00
Alexey Kardashevskiy
4e13c1ac6b powerpc/vfio: Enable on PowerNV platform
This initializes IOMMU groups based on the IOMMU configuration
discovered during the PCI scan on POWERNV (POWER non virtualized)
platform.  The IOMMU groups are to be used later by the VFIO driver,
which is used for PCI pass through.

It also implements an API for mapping/unmapping pages for
guest PCI drivers and providing DMA window properties.
This API is going to be used later by QEMU-VFIO to handle
h_put_tce hypercalls from the KVM guest.

The iommu_put_tce_user_mode() does only a single page mapping
as an API for adding many mappings at once is going to be
added later.

Although this driver has been tested only on the POWERNV
platform, it should work on any platform which supports
TCE tables.  As h_put_tce hypercall is received by the host
kernel and processed by the QEMU (what involves calling
the host kernel again), performance is not the best -
circa 220MB/s on 10Gb ethernet network.

To enable VFIO on POWER, enable SPAPR_TCE_IOMMU config
option and configure VFIO as required.

Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:14 +10:00
Alistair Popple
071df9422a powerpc: Add a configuration option for early BootX/OpenFirmware debug
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:12 +10:00
Jeremy Kerr
0962e8004e powerpc/prom: Scan reserved-ranges node for memory reservations
Based on benh's proposal at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2012-September/101237.html,
this change provides support for reserving memory from the
reserved-ranges node at the root of the device tree.

We just call memblock_reserve on these ranges for now.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:11 +10:00
Kevin Hao
3139b0a797 powerpc: Remove the unneeded trigger of decrementer interrupt in decrementer_check_overflow
Previously in order to handle the edge sensitive decrementers,
we choose to set the decrementer to 1 to trigger a decrementer
interrupt when re-enabling interrupts. But with the rework of the
lazy EE, we would replay the decrementer interrupt when re-enabling
interrupts if a decrementer interrupt occurs with irq soft-disabled.
So there is no need to trigger a decrementer interrupt in this case
any more.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:10 +10:00
Suzuki K. Poulose
35fd219a26 powerpc: Move the single step enable code to a generic path
This patch moves the single step enable code used by kprobe to a generic
routine header so that, it can be re-used by other code, in this case,
uprobes. No functional changes.

Signed-off-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:	Ananth N Mavinakaynahalli <ananth@in.ibm.com>
Cc:	Kumar Gala <galak@kernel.crashing.org>
Cc:	linuxppc-dev@ozlabs.org
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:09 +10:00
Suzuki K. Poulose
85f395c5b0 powerpc/kprobes: Do not disable External interrupts during single step
External/Decrement exceptions have lower priority than the Debug Exception.
So, we don't have to disable the External interrupts before a single step.
However, on BookE, Critical Input Exception(CE) has higher priority than a
Debug Exception. Hence we mask them.

Signed-off-by: 	Suzuki K. Poulose <suzuki@in.ibm.com>
Cc:		Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc:		Ananth N Mavinakaynahalli <ananth@in.ibm.com>
Cc:		Kumar Gala <galak@kernel.crashing.org>
Cc:		linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:09 +10:00
Benjamin Herrenschmidt
230b303479 powerpc: Fix missing/delayed calls to irq_work
When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
2013-06-15 12:33:30 +10:00
Paul Mackerras
bf593907f7 powerpc: Fix emulation of illegal instructions on PowerNV platform
Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr).  The emulation of unimplemented instructions is currently not
working on the PowerNV platform.  The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt.  This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt().  With this, old programs that use no-longer
implemented instructions such as dcba now work again.

CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15 12:24:11 +10:00
Michael Ellerman
0e37739b1c powerpc: Fix stack overflow crash in resume_kernel when ftracing
It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0a
  "powerpc: Rework runlatch code" by myself --BenH
]

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-15 12:21:57 +10:00
Bjorn Helgaas
df58f46c0f Merge branch 'pci/jiang-bus-lock-v3' into next
* pci/jiang-bus-lock-v3:
  PCI: Return early on allocation failures to unindent mainline code
  PCI: Simplify IOV implementation and fix reference count races
  PCI: Drop redundant setting of bus->is_added in virtfn_add_bus()
  unicore32/PCI: Remove redundant call of pci_bus_add_devices()
  m68k/PCI: Remove redundant call of pci_bus_add_devices()
  PCI: Rename pci_release_bus_bridge_dev() to pci_release_host_bridge_dev()
  PCI: Fix refcount issue in pci_create_root_bus() error recovery path
  ia64/PCI: Clean up pci_scan_root_bus() usage
  PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)
  PCI: Introduce pci_alloc_dev(struct pci_bus*) to replace alloc_pci_dev()
  PCI: Introduce pci_bus_{get|put}() to manage PCI bus reference count

Conflicts:
	drivers/pci/probe.c
2013-06-14 17:47:46 -06:00
Rob Herring
c45640e4a9 ibmebus: convert of_platform_driver to platform_driver
ibmebus is the last remaining user of of_platform_driver and the
conversion to a regular platform driver is trivial.

Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
2013-06-12 12:37:26 +01:00
Michael Ellerman
b11ae95100 powerpc: Partial revert of "Context switch more PMU related SPRs"
In commit 59affcd I added context switching of more PMU SPRs, because
they are potentially exposed to userspace on Power8. However despite me
being a smart arse in the commit message it's actually not correct. In
particular it interacts badly with a global perf record.

We will have to do something more complicated, but that will have to
wait for 3.11.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:35 +10:00
Michael Neuling
82a9f16adc powerpc/hw_breakpoints: Add DABRX cpu feature to fix 32-bit regression
When introducing support for DABRX in 4474ef0, we broke older 32-bit CPUs
that don't have that register.

Some CPUs have a DABR but not DABRX.  Configuration are:
- No 32bit CPUs have DABRX but some have DABR.
- POWER4+ and below have the DABR but no DABRX.
- 970 and POWER5 and above have DABR and DABRX.
- POWER8 has DAWR, hence no DABRX.

This introduces CPU_FTR_DABRX and sets it on appropriate CPUs.  We use
the top 64 bits for CPU FTR bits since only 64 bit CPUs have this.

Processors that don't have the DABRX will still work as they will fall
back to software filtering these breakpoints via perf_exclude_event().

Signed-off-by: Michael Neuling <mikey@neuling.org>
Reported-by: "Gorelik, Jacob (335F)" <jacob.gorelik@jpl.nasa.gov>
cc: stable@vger.kernel.org (v3.9 only)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:29 +10:00
Michael Neuling
fb0fce3e55 powerpc/power8: Update denormalization handler
POWER8 can take a denormalisation exception on any VSX registers.

This does the extra 32 VSX registers we don't currently handle.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:26 +10:00
Michael Neuling
d7c67fb1cf powerpc/pseries: Simplify denormalization handler
The following simplifies the denorm code by using macros to generate the long
stream of almost identical instructions.

This patch results in no changes to the output binary, but removes a lot of
lines of code.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:22 +10:00
Michael Neuling
6a60f9e7d8 powerpc/power8: Fix oprofile and perf
In 2ac6f42 powerpc/cputable: Fix oprofile_cpu_type on power8
we broke all power8 hw events.

This reverts this change and uses oprofile_type instead. Perf now works
on POWER8 again and oprofile will revert to using timers on POWER8.

Kudos to mpe this fix.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:19 +10:00
Kevin Hao
c5df457ffe powerpc/pci: Check the bus address instead of resource address in pcibios_fixup_resources
If a BAR has the value of 0, we would assume that it is unset yet and
then mark the resource as unset and would reassign it later. But after
commit 6c5705fe (powerpc/PCI: get rid of device resource fixups)
the pcibios_fixup_resources is invoked after the bus address was
translated to linux resource. So the value of res->start is resource
address. And since the resource and bus address may be different, we
should translate it to the bus address before doing the check.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-10 08:36:13 +10:00
Gu Zheng
8b1fce04dc PCI: Convert alloc_pci_dev(void) to pci_alloc_dev(bus)
Use the new pci_alloc_dev(bus) to replace the existing using of
alloc_pci_dev(void).

[bhelgaas: drop pci_bus ref later in pci_release_dev()]
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Neela Syam Kolli <megaraidlinux@lsi.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
2013-06-05 13:49:36 -06:00
Will Schmidt
badec11b64 powerpc/cputable: Fix typo on P7+ cputable entry
Fix a typo in setting COMMON_USER2_POWER7 bits to .cpu_user_features2
cpu specs table.

Signed-off-by: Will Schmidt <will_schmidt@vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 09:30:03 +10:00
Kevin Hao
858957ab1e powerpc/pci: Remove the unused variables in pci_process_bridge_OF_ranges
The codes which ever used these two variables have gone. Throw away
them too.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Kevin Hao
2798389604 powerpc/pci: Remove the stale comments of pci_process_bridge_OF_ranges
These comments already don't apply to the current code. So just remove
them.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:28 +10:00
Priyanka Jain
f7b3367774 powerpc/32bit:Store temporary result in r0 instead of r8
Commit a9c4e541ea
"powerpc/kprobe: Complete kprobe and migrate exception frame"
introduced a regression:

While returning from exception handling in case of PREEMPT enabled,
_TIF_NEED_RESCHED bit is checked in TI_FLAGS (thread_info flag) of current
task. Only if this bit is set, it should continue with the process of
calling preempt_schedule_irq() to schedule highest priority task if
available.

Current code assumes that r8 contains TI_FLAGS and check this for
_TIF_NEED_RESCHED, but as r8 is modified in the code which executes before
this check, r8 no longer contains the expected TI_FLAGS information.

As a result check for comparison with _TIF_NEED_RESCHED was failing even if
NEED_RESCHED bit is set in the current thread_info flag. Due to this,
preempt_schedule_irq() and in turn scheduler was not getting called even if
highest priority task is ready for execution.

So, store temporary results in r0 instead of r8 to prevent r8 from getting
modified as subsequent code is dependent on its value.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
CC: <stable@vger.kernel.org> [v3.7+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:27 +10:00
Michael Neuling
a515348fc6 powerpc/pseries: Kill all prefetch streams on context switch
On context switch, we should have no prefetch streams leak from one
userspace process to another.  This frees up prefetch resources for the
next process.

Based on patch from Milton Miller.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
Nishanth Aravamudan
2ac6f427ad powerpc/cputable: Fix oprofile_cpu_type on power8
Maynard informed me that neither the oprofile kernel module nor oprofile
userspace has been updated to support that "legacy" oprofile module
interface for power8, which is indicated by "ppc64/power8." This results
in no samples. The solution is to default to the "timer" type, instead.
The raw entry also should be updated, as "ppc64/ibm-compat-v1" indicates
to oprofile userspace to use "compatibility events" which are obsolete
in ISA 2.07.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:25 +10:00
Michael Neuling
2b3f8e87cf powerpc/tm: Fix userspace stack corruption on signal delivery for active transactions
When in an active transaction that takes a signal, we need to be careful with
the stack.  It's possible that the stack has moved back up after the tbegin.
The obvious case here is when the tbegin is called inside a function that
returns before a tend.  In this case, the stack is part of the checkpointed
transactional memory state.  If we write over this non transactionally or in
suspend, we are in trouble because if we get a tm abort, the program counter
and stack pointer will be back at the tbegin but our in memory stack won't be
valid anymore.

To avoid this, when taking a signal in an active transaction, we need to use
the stack pointer from the checkpointed state, rather than the speculated
state.  This ensures that the signal context (written tm suspended) will be
written below the stack required for the rollback.  The transaction is aborted
becuase of the treclaim, so any memory written between the tbegin and the
signal will be rolled back anyway.

For signals taken in non-TM or suspended mode, we use the
normal/non-checkpointed stack pointer.

Tested with 64 and 32 bit signals

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:23 +10:00
Michael Neuling
6ce6c629fd powerpc/tm: Abort on emulation and alignment faults
If we are emulating an instruction inside an active user transaction that
touches memory, the kernel can't emulate it as it operates in transactional
suspend context.  We need to abort these transactions and send them back to
userspace for the hardware to rollback.

We can service these if the user transaction is in suspend mode, since the
kernel will operate in the same suspend context.

This adds a check to all alignment faults and to specific instruction
emulations (only string instructions for now).  If the user process is in an
active (non-suspended) transaction, we abort the transaction go back to
userspace allowing the HW to roll back the transaction and tell the user of the
failure.  This also adds new tm abort cause codes to report the reason of the
persistent error to the user.

Crappy test case here http://neuling.org/devel/junkcode/aligntm.c

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: <stable@vger.kernel.org> # v3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01 08:29:22 +10:00
Benjamin Herrenschmidt
b72c1f6514 powerpc: Make radeon 32-bit MSI quirk work on powernv
This moves the quirk itself to pci_64.c as to get built on all ppc64
platforms (the only ones with a pci_dn), factors the two implementations
of get_pdn() into a single pci_get_dn() and use the quirk to do 32-bit
MSIs on IODA based powernv platforms.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:45 +10:00
Michael Ellerman
59affcd3e4 powerpc: Context switch more PMU related SPRs
In commit 9353374 "Context switch the new EBB SPRs" we added support for
context switching some new EBB SPRs. However despite four of us signing
off on that patch we missed some. To be fair these are not actually new
SPRs, but they are now potentially user accessible so need to be context
switched.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:45 +10:00
Benjamin Herrenschmidt
bee7dd9c5f powerpc/pci: Fix bogus message at boot about empty memory resources
The message is only meant to be displayed if resource 0 is empty,
but was displayed if any is.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:44 +10:00
Benjamin Herrenschmidt
8fc1f5d7ef powerpc: Fix TLB cleanup at boot on POWER8
The TLB has 512 congruence classes (2048 entries 4 way set associative)
while P7 had 128

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-24 18:13:44 +10:00
Bjorn Helgaas
c2defb5f78 powerpc/PCI: Use PCI_UNKNOWN for unknown power state
Previously we initialized dev->current_state to 4 (PCI_D3cold), but I think
we wanted PCI_UNKNOWN (5) here based on the comment and the fact that the
generic version of this code, pci_setup_device(), uses PCI_UNKNOWN.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-17 15:08:50 -06:00
Linus Torvalds
a2c7a54fcc Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Benjamin Herrenschmidt:
 "This is mostly bug fixes (some of them regressions, some of them I
  deemed worth merging now) along with some patches from Li Zhong
  hooking up the new context tracking stuff (for the new full NO_HZ)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
  powerpc: Set show_unhandled_signals to 1 by default
  powerpc/perf: Fix setting of "to" addresses for BHRB
  powerpc/pmu: Fix order of interpreting BHRB target entries
  powerpc/perf: Move BHRB code into CONFIG_PPC64 region
  powerpc: select HAVE_CONTEXT_TRACKING for pSeries
  powerpc: Use the new schedule_user API on userspace preemption
  powerpc: Exit user context on notify resume
  powerpc: Exception hooks for context tracking subsystem
  powerpc: Syscall hooks for context tracking subsystem
  powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
  powerpc: Fix irq_set_affinity() return values
  powerpc: Provide __bswapdi2
  powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3
  powerpc/powernv: Detect OPAL v3 API version
  powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
  powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS
  powerpc: Bring all threads online prior to migration/hibernation
  powerpc/rtas_flash: Fix validate_flash buffer overflow issue
  powerpc/kexec: Fix kexec when using VMX optimised memcpy
  powerpc: Fix build errors STRICT_MM_TYPECHECKS
  ...
2013-05-14 07:43:11 -07:00
Benjamin Herrenschmidt
e34166ad63 powerpc: Set show_unhandled_signals to 1 by default
Just like other architectures

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 18:01:04 +10:00
Li Zhong
5d1c574511 powerpc: Use the new schedule_user API on userspace preemption
This patch corresponds to
[PATCH] x86: Use the new schedule_user API on userspace preemption
  commit 0430499ce9

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:20 +10:00
Li Zhong
106ed886ab powerpc: Exit user context on notify resume
This patch allows RCU usage in do_notify_resume, e.g. signal handling.
It corresponds to
[PATCH] x86: Exit RCU extended QS on notify resume
  commit edf55fda35

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:20 +10:00
Li Zhong
ba12eedee3 powerpc: Exception hooks for context tracking subsystem
This is the exception hooks for context tracking subsystem, including
data access, program check, single step, instruction breakpoint, machine check,
alignment, fp unavailable, altivec assist, unknown exception, whose handlers
might use RCU.

This patch corresponds to
[PATCH] x86: Exception hooks for userspace RCU extended QS
  commit 6ba3c97a38

But after the exception handling moved to generic code, and some changes in
following two commits:
56dd9470d7
  context_tracking: Move exception handling to generic code
6c1e0256fa
  context_tracking: Restore correct previous context state on exception exit

it is able for exception hooks to use the generic code above instead of a
redundant arch implementation.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Li Zhong
22ecbe8dce powerpc: Syscall hooks for context tracking subsystem
This is the syscall slow path hooks for context tracking subsystem,
corresponding to
[PATCH] x86: Syscall hooks for userspace RCU extended QS
  commit bf5a3c13b9

TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there
is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is
better for it to be in the same 16 bits with others in the group, so in the
asm code, andi. with this group could work.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
Scott Wood
6cecf76b47 powerpc/booke64: Fix kernel hangs at kernel_dbg_exc
MSR_DE is not cleared on entry to the kernel, and we don't clear it
explicitly outside of debug code.  If we have MSR_DE set in
prime_debug_regs(), and the new thread has events enabled in DBCR0
(e.g.  ICMP is set in thread->dbsr0, even though it was cleared in the
real DBCR0 when the thread got scheduled out), we'll end up taking a
debug exception in the kernel when DBCR0 is loaded.  DSRR0 will not
point to an exception vector, and the kernel ends up hanging at
kernel_dbg_exc.  Fix this by always clearing MSR_DE when we load new
debug state.

Another observed source of kernel_dbg_exc hangs is with the branch
taken event.  If this event is active, but we take a non-debug trap
(e.g. a TLB miss or an asynchronous interrupt) before the next branch.
We end up taking a branch-taken debug exception on the initial branch
instruction of the exception vector, but because the debug exception is
DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even
checking the DSRR0 address.  Fix this by checking for DBSR_BT as well
as DBSR_IC, which is what 32-bit does and what the comments suggest was
intended in the 64-bit code as well.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:19 +10:00
David Woodhouse
ca9d7aea59 powerpc: Provide __bswapdi2
Some versions of GCC apparently expect this to be provided by libgcc.

Updates from Mikey to fix 32 bit version and adding "r" to registers.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 16:00:17 +10:00
Li Zhong
af945cf4bf powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again
Saw this warning again, and this time from the ret_from_fork path.

It seems we could clear the back chain earlier in copy_thread(), which
could cover both path, and also fix potential lockdep usage in
schedule_tail(), or exception occurred before we clear the back chain.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:35 +10:00
Robert Jennings
120496ac2d powerpc: Bring all threads online prior to migration/hibernation
This patch brings online all threads which are present but not online
prior to migration/hibernation.  After migration/hibernation those
threads are taken back offline.

During migration/hibernation all online CPUs must call H_JOIN, this is
required by the hypervisor.  Without this patch, threads that are offline
(H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be
deadlocked (all threads either JOIN'd or CEDE'd).

Cc: <stable@kernel.org>
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:29 +10:00
Vasant Hegde
a94a14720e powerpc/rtas_flash: Fix validate_flash buffer overflow issue
ibm,validate-flash-image RTAS call output buffer contains 150 - 200
bytes of data on latest system. Presently we have output
buffer size as 64 bytes and we use sprintf to copy data from
RTAS buffer to local buffer. This causes kernel oops (see below
call trace).

This patch increases local buffer size to 256 and also uses
snprintf instead of sprintf to copy data from RTAS buffer.

Kernel call trace :
-------------------
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod
Supported: Yes
NIP: 4520323031333130 LR: 4520323031333130 CTR: 0000000000000000
REGS: c0000001b91779b0 TRAP: 0400   Tainted: G            X  (3.0.13-0.27-ppc64)
MSR: 8000000040009032 <EE,ME,IR,DR>  CR: 44022488  XER: 20000018
TASK = c0000001bca1aba0[4736] 'cat' THREAD: c0000001b9174000 CPU: 36
GPR00: 4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b
GPR04: c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031
GPR08: 3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4
GPR12: 0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90
GPR16: 00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000
GPR20: 0000000000008000 00000000100fae60 000000000000005e 0000000000000000
GPR24: 0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031
GPR28: 333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130
NIP [4520323031333130] 0x4520323031333130
LR [4520323031333130] 0x4520323031333130
Call Trace:
[c0000001b9177c30] [4520323031333130] 0x4520323031333130 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:26 +10:00
Anton Blanchard
79c66ce8f6 powerpc/kexec: Fix kexec when using VMX optimised memcpy
commit b3f271e86e (powerpc: POWER7 optimised memcpy using VMX and
enhanced prefetch) uses VMX when it is safe to do so (ie not in
interrupt). It also looks at the task struct to decide if we have to
save the current tasks' VMX state.

kexec calls memcpy() at a point where the task struct may have been
overwritten by the new kexec segments. If it has been overwritten
then when memcpy -> enable_altivec looks up current->thread.regs->msr
we get a cryptic oops or lockup.

I also notice we aren't initialising thread_info->cpu, which means
smp_processor_id is broken. Fix that too.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@vger.kernel.org> # 3.6+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:23 +10:00
Aneesh Kumar K.V
83d5e64b7e powerpc: Fix build errors STRICT_MM_TYPECHECKS
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-14 14:36:20 +10:00
Linus Torvalds
c4cc75c332 Merge git://git.infradead.org/users/eparis/audit
Pull audit changes from Eric Paris:
 "Al used to send pull requests every couple of years but he told me to
  just start pushing them to you directly.

  Our touching outside of core audit code is pretty straight forward.  A
  couple of interface changes which hit net/.  A simple argument bug
  calling audit functions in namei.c and the removal of some assembly
  branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
  audit: fix message spacing printing auid
  Revert "audit: move kaudit thread start from auditd registration to kaudit init"
  audit: vfs: fix audit_inode call in O_CREAT case of do_last
  audit: Make testing for a valid loginuid explicit.
  audit: fix event coverage of AUDIT_ANOM_LINK
  audit: use spin_lock in audit_receive_msg to process tty logging
  audit: do not needlessly take a lock in tty_audit_exit
  audit: do not needlessly take a spinlock in copy_signal
  audit: add an option to control logging of passwords with pam_tty_audit
  audit: use spin_lock_irqsave/restore in audit tty code
  helper for some session id stuff
  audit: use a consistent audit helper to log lsm information
  audit: push loginuid and sessionid processing down
  audit: stop pushing loginid, uid, sessionid as arguments
  audit: remove the old depricated kernel interface
  audit: make validity checking generic
  audit: allow checking the type of audit message in the user filter
  audit: fix build break when AUDIT_DEBUG == 2
  audit: remove duplicate export of audit_enabled
  Audit: do not print error when LSMs disabled
  ...
2013-05-11 14:29:11 -07:00
Linus Torvalds
3644bc2ec7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull stray syscall bits from Al Viro:
 "Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
  unicore32: just use mmap_pgoff()...
  unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
  x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
2013-05-10 09:21:05 -07:00
Al Viro
91c2e0bcae unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-09 13:46:38 -04:00
Alistair Popple
30650239ad powerpc: Add an in memory udbg console
This patch adds a new udbg early debug console which utilises
statically defined input and output buffers stored within the kernel
BSS. It is primarily designed to assist with bring up of new hardware
which may not have a working console but which has a method of
reading/writing kernel memory.

This version incorporates comments made by Ben H (thanks!).

Changes from v1:
	- Add memory barriers.
	- Ensure updating of read/write positions is atomic.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-08 06:36:49 +10:00
Benjamin Herrenschmidt
d5dae72130 powerpc/topology: Fix spurr attribute permission
We are registering the attribute with permission 0600 but it
doesn't have a store callback, which causes WARN_ON's during
boot. Fix the permission.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 15:02:40 +10:00
Benjamin Herrenschmidt
3fd47f063b powerpc/pci: Support per-aperture memory offset
The PCI core supports an offset per aperture nowadays but our arch
code still has a single offset per host bridge representing the
difference betwen CPU memory addresses and PCI MMIO addresses.

This is a problem as new machines and hypervisor versions are
coming out where the 64-bit windows will have a different offset
(basically mapped 1:1) from the 32-bit windows.

This fixes it by using separate offsets. In the long run, we probably
want to get rid of that intermediary struct pci_controller and have
those directly stored into the pci_host_bridge as they are parsed
but this will be a more invasive change.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 13:40:40 +10:00
Benjamin Herrenschmidt
a0b8e76fac powerpc/pci: Don't add bogus empty resources to PHBs
When converting to use the new pci_add_resource_offset() we didn't
properly account for empty resources (0 flags) and add those bogons
to the PHBs. The result is some annoying messages in the log.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:41 +10:00
Nishanth Aravamudan
748645bf06 powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:40 +10:00
Nishanth Aravamudan
6c1bf48275 powerpc/cputable: Advertise ISEL support on appropriate embedded processors
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:39 +10:00
Nishanth Aravamudan
4a1ae4f3c5 powerpc/cputable: Advertise DSCR support on P7/P7+
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:39 +10:00
Kleber Sacilotto de Souza
d82fb31abc powerpc/pseries: Perform proper max_bus_speed detection
On pseries machines the detection for max_bus_speed should be done
through an OpenFirmware property. This patch adds a function to perform
this detection and a hook to perform dynamic adding of the function only
for pseries. This is done by overwriting the weak
pcibios_root_bridge_prepare function which is called by
pci_create_root_bus().

From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:38 +10:00
Anton Blanchard
73d2fb758e powerpc: Emulate non privileged DSCR read and write
POWER8 allows read and write of the DSCR in userspace. We added
kernel emulation so applications could always use the instructions
regardless of the CPU type.

Unfortunately there are two SPRs for the DSCR and we only added
emulation for the privileged one. Add code to match the non
privileged one.

A simple test was created to verify the fix:

http://ozlabs.org/~anton/junkcode/user_dscr_test.c

Without the patch we get a SIGILL and it passes with the patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-06 09:25:35 +10:00
Linus Torvalds
01227a889e Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Gleb Natapov:
 "Highlights of the updates are:

  general:
   - new emulated device API
   - legacy device assignment is now optional
   - irqfd interface is more generic and can be shared between arches

  x86:
   - VMCS shadow support and other nested VMX improvements
   - APIC virtualization and Posted Interrupt hardware support
   - Optimize mmio spte zapping

  ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

  ARM:
   - reworking of Hyp idmaps

  s390:
   - ioeventfd for virtio-ccw

  And many other bug fixes, cleanups and improvements"

* tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  kvm: Add compat_ioctl for device control API
  KVM: x86: Account for failing enable_irq_window for NMI window request
  KVM: PPC: Book3S: Add API for in-kernel XICS emulation
  kvm/ppc/mpic: fix missing unlock in set_base_addr()
  kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
  kvm/ppc/mpic: remove users
  kvm/ppc/mpic: fix mmio region lists when multiple guests used
  kvm/ppc/mpic: remove default routes from documentation
  kvm: KVM_CAP_IOMMU only available with device assignment
  ARM: KVM: iterate over all CPUs for CPU compatibility check
  KVM: ARM: Fix spelling in error message
  ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
  KVM: ARM: Fix API documentation for ONE_REG encoding
  ARM: KVM: promote vfp_host pointer to generic host cpu context
  ARM: KVM: add architecture specific hook for capabilities
  ARM: KVM: perform HYP initilization for hotplugged CPUs
  ARM: KVM: switch to a dual-step HYP init code
  ARM: KVM: rework HYP page table freeing
  ARM: KVM: enforce maximum size for identity mapped code
  ARM: KVM: move to a KVM provided HYP idmap
  ...
2013-05-05 14:47:31 -07:00
Linus Torvalds
5a148af669 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc update from Benjamin Herrenschmidt:
 "The main highlights this time around are:

   - A pile of addition POWER8 bits and nits, such as updated
     performance counter support (Michael Ellerman), new branch history
     buffer support (Anshuman Khandual), base support for the new PCI
     host bridge when not using the hypervisor (Gavin Shan) and other
     random related bits and fixes from various contributors.

   - Some rework of our page table format by Aneesh Kumar which fixes a
     thing or two and paves the way for THP support.  THP itself will
     not make it this time around however.

   - More Freescale updates, including Altivec support on the new e6500
     cores, new PCI controller support, and a pile of new boards support
     and updates.

   - The usual batch of trivial cleanups & fixes"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  powerpc: Fix build error for book3e
  powerpc: Context switch the new EBB SPRs
  powerpc: Turn on the EBB H/FSCR bits
  powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
  powerpc: Setup BHRB instructions facility in HFSCR for POWER8
  powerpc: Fix interrupt range check on debug exception
  powerpc: Update tlbie/tlbiel as per ISA doc
  powerpc: Print page size info during boot
  powerpc: print both base and actual page size on hash failure
  powerpc: Fix hpte_decode to use the correct decoding for page sizes
  powerpc: Decode the pte-lp-encoding bits correctly.
  powerpc: Use encode avpn where we need only avpn values
  powerpc: Reduce PTE table memory wastage
  powerpc: Move the pte free routines from common header
  powerpc: Reduce the PTE_INDEX_SIZE
  powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format
  powerpc: New hugepage directory format
  powerpc: Don't truncate pgd_index wrongly
  powerpc: Don't hard code the size of pte page
  powerpc: Save DAR and DSISR in pt_regs on MCE
  ...
2013-05-02 10:16:16 -07:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Michael Ellerman
9353374b8e powerpc: Context switch the new EBB SPRs
This context switches the new Event Based Branching (EBB) SPRs.  The three new
SPRs are:
  - Event Based Branch Handler Register (EBBHR)
  - Event Based Branch Return Register (EBBRR)
  - Branch Event Status and Control Register (BESCR)

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Matt Evans <matt@ozlabs.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:37:36 +10:00
Michael Neuling
1ddf499e1a powerpc: Turn on the EBB H/FSCR bits
This turns Event Based Branching (EBB) on in the Hypervisor Facility Status and
Control Register (HFSCR) and Facility Status and Control Register (FSCR).

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:36:55 +10:00
Michael Ellerman
1de2bd4e0c powerpc: Replace CPU_FTR_BCTAR with CPU_FTR_ARCH_207S
We are getting low on cpu feature bits. So rather than add a separate bit for
every new Power8 feature, add a bit for arch 2.07 server catagory and use that
instead.

Hijack the value we had for BCTAR, but swap the value with CFAR so that all the
ARCH defines are together.

Note we don't touch CPU_FTR_TM, because it is conditionally enabled if
the kernel is built with TM support.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:16 +10:00
Anshuman Khandual
53b56ca019 powerpc: Setup BHRB instructions facility in HFSCR for POWER8
Make BHRB instructions available in problem and privileged states.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:35:15 +10:00
Bharat Bhushan
fc2a6cfe05 powerpc: Fix interrupt range check on debug exception
We do not want to take single step and branch-taken debug exception
in kernel exception code. But the address range check was not covering
all kernel exception handlers address range.

With this patch we defined the interrupt_end label which defines the
end on kernel exception code. So now we check interrupt_base to
interrupt_end range for not handling debug exception in kernel
exception entry.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-05-02 10:31:01 +10:00
David Howells
e8eeded3c5 ppc: Clean up rtas_flash driver somewhat
Clean up some of the problems with the rtas_flash driver:

 (1) It shouldn't fiddle with the internals of the procfs filesystem (altering
     pde->count).

 (2) If pid namespaces are in effect, then you can get multiple inodes
     connected to a single pde, thereby rendering the pde->count > 2 test
     useless.

 (3) The pde->count fudging doesn't work for forked, dup'd or cloned file
     descriptors, so add static mutexes and use them to wrap access to the
     driver through read, write and release methods.

 (4) The driver can only handle one device, so allocate most of the data
     previously attached to the pde->data as static variables instead (though
     allocate the validation data buffer with kmalloc).

 (5) We don't need to save the pde pointers as long as we have the filenames
     available for removal.

 (6) Don't try to multiplex what the update file read method does based on the
     filename.  Instead provide separate file ops and split the function.

Whilst we're at it, tabulate the procfile information and loop through it when
creating or destroying them rather than manually coding each one.

[Folded fixes from Vasant Hegde <hegdevasant@linux.vnet.ibm.com>]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
cc: Paul Mackerras <paulus@samba.org>
cc: Anton Blanchard <anton@samba.org>
cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:45 -04:00
David Howells
271a15eabe proc: Supply PDE attribute setting accessor functions
Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-01 17:29:18 -04:00
Linus Torvalds
08d7676083 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull compat cleanup from Al Viro:
 "Mostly about syscall wrappers this time; there will be another pile
  with patches in the same general area from various people, but I'd
  rather push those after both that and vfs.git pile are in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
  syscalls.h: slightly reduce the jungles of macros
  get rid of union semop in sys_semctl(2) arguments
  make do_mremap() static
  sparc: no need to sign-extend in sync_file_range() wrapper
  ppc compat wrappers for add_key(2) and request_key(2) are pointless
  x86: trim sys_ia32.h
  x86: sys32_kill and sys32_mprotect are pointless
  get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
  merge compat sys_ipc instances
  consolidate compat lookup_dcookie()
  convert vmsplice to COMPAT_SYSCALL_DEFINE
  switch getrusage() to COMPAT_SYSCALL_DEFINE
  switch epoll_pwait to COMPAT_SYSCALL_DEFINE
  convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
  switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
  make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect
  make HAVE_SYSCALL_WRAPPERS unconditional
  consolidate cond_syscall and SYSCALL_ALIAS declarations
  teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long
  get rid of duplicate logics in __SC_....[1-6] definitions
2013-05-01 07:21:43 -07:00
Tejun Heo
a43cb95d54 dump_stack: unify debug information printed by show_regs()
show_regs() is inherently arch-dependent but it does make sense to print
generic debug information and some archs already do albeit in slightly
different forms.  This patch introduces a generic function to print debug
information from show_regs() so that different archs print out the same
information and it's much easier to modify what's printed.

show_regs_print_info() prints out the same debug info as dump_stack()
does plus task and thread_info pointers.

* Archs which didn't print debug info now do.

  alpha, arc, blackfin, c6x, cris, frv, h8300, hexagon, ia64, m32r,
  metag, microblaze, mn10300, openrisc, parisc, score, sh64, sparc,
  um, xtensa

* Already prints debug info.  Replaced with show_regs_print_info().
  The printed information is superset of what used to be there.

  arm, arm64, avr32, mips, powerpc, sh32, tile, unicore32, x86

* s390 is special in that it used to print arch-specific information
  along with generic debug info.  Heiko and Martin think that the
  arch-specific extra isn't worth keeping s390 specfic implementation.
  Converted to use the generic version.

Note that now all archs print the debug info before actual register
dumps.

An example BUG() dump follows.

 kernel BUG at /work/os/work/kernel/workqueue.c:4841!
 invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #7
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
 task: ffff88007c85e040 ti: ffff88007c860000 task.ti: ffff88007c860000
 RIP: 0010:[<ffffffff8234a07e>]  [<ffffffff8234a07e>] init_workqueues+0x4/0x6
 RSP: 0000:ffff88007c861ec8  EFLAGS: 00010246
 RAX: ffff88007c861fd8 RBX: ffffffff824466a8 RCX: 0000000000000001
 RDX: 0000000000000046 RSI: 0000000000000001 RDI: ffffffff8234a07a
 RBP: ffff88007c861ec8 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff8234a07a
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: ffff88015f7ff000 CR3: 00000000021f1000 CR4: 00000000000007f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Stack:
  ffff88007c861ef8 ffffffff81000312 ffffffff824466a8 ffff88007c85e650
  0000000000000003 0000000000000000 ffff88007c861f38 ffffffff82335e5d
  ffff88007c862080 ffffffff8223d8c0 ffff88007c862080 ffffffff81c47760
 Call Trace:
  [<ffffffff81000312>] do_one_initcall+0x122/0x170
  [<ffffffff82335e5d>] kernel_init_freeable+0x9b/0x1c8
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  [<ffffffff81c4776e>] kernel_init+0xe/0xf0
  [<ffffffff81c6be9c>] ret_from_fork+0x7c/0xb0
  [<ffffffff81c47760>] ? rest_init+0x140/0x140
  ...

v2: Typo fix in x86-32.

v3: CPU number dropped from show_regs_print_info() as
    dump_stack_print_info() has been updated to print it.  s390
    specific implementation dropped as requested by s390 maintainers.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>		[tile bits]
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Tejun Heo
196779b9b4 dump_stack: consolidate dump_stack() implementations and unify their behaviors
Both dump_stack() and show_stack() are currently implemented by each
architecture.  show_stack(NULL, NULL) dumps the backtrace for the
current task as does dump_stack().  On some archs, dump_stack() prints
extra information - pid, utsname and so on - in addition to the
backtrace while the two are identical on other archs.

The usages in arch-independent code of the two functions indicate
show_stack(NULL, NULL) should print out bare backtrace while
dump_stack() is used for debugging purposes when something went wrong,
so it does make sense to print additional information on the task which
triggered dump_stack().

There's no reason to require archs to implement two separate but mostly
identical functions.  It leads to unnecessary subtle information.

This patch expands the dummy fallback dump_stack() implementation in
lib/dump_stack.c such that it prints out debug information (taken from
x86) and invokes show_stack(NULL, NULL) and drops arch-specific
dump_stack() implementations in all archs except blackfin.  Blackfin's
dump_stack() does something wonky that I don't understand.

Debug information can be printed separately by calling
dump_stack_print_info() so that arch-specific dump_stack()
implementation can still emit the same debug information.  This is used
in blackfin.

This patch brings the following behavior changes.

* On some archs, an extra level in backtrace for show_stack() could be
  printed.  This is because the top frame was determined in
  dump_stack() on those archs while generic dump_stack() can't do that
  reliably.  It can be compensated by inlining dump_stack() but not
  sure whether that'd be necessary.

* Most archs didn't use to print debug info on dump_stack().  They do
  now.

An example WARN dump follows.

 WARNING: at kernel/workqueue.c:4841 init_workqueues+0x35/0x505()
 Hardware name: empty
 Modules linked in:
 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.9.0-rc1-work+ #9
  0000000000000009 ffff88007c861e08 ffffffff81c614dc ffff88007c861e48
  ffffffff8108f50f ffffffff82228240 0000000000000040 ffffffff8234a03c
  0000000000000000 0000000000000000 0000000000000000 ffff88007c861e58
 Call Trace:
  [<ffffffff81c614dc>] dump_stack+0x19/0x1b
  [<ffffffff8108f50f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8108f56a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8234a071>] init_workqueues+0x35/0x505
  ...

v2: CPU number added to the generic debug info as requested by s390
    folks and dropped the s390 specific dump_stack().  This loses %ksp
    from the debug message which the maintainers think isn't important
    enough to keep the s390-specific dump_stack() implementation.

    dump_stack_print_info() is moved to kernel/printk.c from
    lib/dump_stack.c.  Because linkage is per objecct file,
    dump_stack_print_info() living in the same lib file as generic
    dump_stack() means that archs which implement custom dump_stack()
    - at this point, only blackfin - can't use dump_stack_print_info()
    as that will bring in the generic version of dump_stack() too.  v1
    The v1 patch broke build on blackfin due to this issue.  The build
    breakage was reported by Fengguang Wu.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Acked-by: Richard Kuo <rkuo@codeaurora.org>		[hexagon bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:02 -07:00
Linus Torvalds
5d434fcb25 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual stuff, mostly comment fixes, typo fixes, printk fixes and small
  code cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (45 commits)
  mm: Convert print_symbol to %pSR
  gfs2: Convert print_symbol to %pSR
  m32r: Convert print_symbol to %pSR
  iostats.txt: add easy-to-find description for field 6
  x86 cmpxchg.h: fix wrong comment
  treewide: Fix typo in printk and comments
  doc: devicetree: Fix various typos
  docbook: fix 8250 naming in device-drivers
  pata_pdc2027x: Fix compiler warning
  treewide: Fix typo in printks
  mei: Fix comments in drivers/misc/mei
  treewide: Fix typos in kernel messages
  pm44xx: Fix comment for "CONFIG_CPU_IDLE"
  doc: Fix typo "CONFIG_CGROUP_CGROUP_MEMCG_SWAP"
  mmzone: correct "pags" to "pages" in comment.
  kernel-parameters: remove outdated 'noresidual' parameter
  Remove spurious _H suffixes from ifdef comments
  sound: Remove stray pluses from Kconfig file
  radio-shark: Fix printk "CONFIG_LED_CLASS"
  doc: put proper reference to CONFIG_MODULE_SIG_ENFORCE
  ...
2013-04-30 09:36:50 -07:00
Linus Torvalds
8700c95adb Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP/hotplug changes from Ingo Molnar:
 "This is a pretty large, multi-arch series unifying and generalizing
  the various disjunct pieces of idle routines that architectures have
  historically copied from each other and have grown in random, wildly
  inconsistent and sometimes buggy directions:

   101 files changed, 455 insertions(+), 1328 deletions(-)

  this went through a number of review and test iterations before it was
  committed, it was tested on various architectures, was exposed to
  linux-next for quite some time - nevertheless it might cause problems
  on architectures that don't read the mailing lists and don't regularly
  test linux-next.

  This cat herding excercise was motivated by the -rt kernel, and was
  brought to you by Thomas "the Whip" Gleixner."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  idle: Remove GENERIC_IDLE_LOOP config switch
  um: Use generic idle loop
  ia64: Make sure interrupts enabled when we "safe_halt()"
  sparc: Use generic idle loop
  idle: Remove unused ARCH_HAS_DEFAULT_IDLE
  bfin: Fix typo in arch_cpu_idle()
  xtensa: Use generic idle loop
  x86: Use generic idle loop
  unicore: Use generic idle loop
  tile: Use generic idle loop
  tile: Enter idle with preemption disabled
  sh: Use generic idle loop
  score: Use generic idle loop
  s390: Use generic idle loop
  powerpc: Use generic idle loop
  parisc: Use generic idle loop
  openrisc: Use generic idle loop
  mn10300: Use generic idle loop
  mips: Use generic idle loop
  microblaze: Use generic idle loop
  ...
2013-04-30 07:50:17 -07:00
Linus Torvalds
e0972916e8 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Features:

   - Add "uretprobes" - an optimization to uprobes, like kretprobes are
     an optimization to kprobes.  "perf probe -x file sym%return" now
     works like kretprobes.  By Oleg Nesterov.

   - Introduce per core aggregation in 'perf stat', from Stephane
     Eranian.

   - Add memory profiling via PEBS, from Stephane Eranian.

   - Event group view for 'annotate' in --stdio, --tui and --gtk, from
     Namhyung Kim.

   - Add support for AMD NB and L2I "uncore" counters, by Jacob Shin.

   - Add Ivy Bridge-EP uncore support, by Zheng Yan

   - IBM zEnterprise EC12 oprofile support patchlet from Robert Richter.

   - Add perf test entries for checking breakpoint overflow signal
     handler issues, from Jiri Olsa.

   - Add perf test entry for for checking number of EXIT events, from
     Namhyung Kim.

   - Add perf test entries for checking --cpu in record and stat, from
     Jiri Olsa.

   - Introduce perf stat --repeat forever, from Frederik Deweerdt.

   - Add --no-demangle to report/top, from Namhyung Kim.

   - PowerPC fixes plus a couple of cleanups/optimizations in uprobes
     and trace_uprobes, by Oleg Nesterov.

  Various fixes and refactorings:

   - Fix dependency of the python binding wrt libtraceevent, from
     Naohiro Aota.

   - Simplify some perf_evlist methods and to allow 'stat' to share code
     with 'record' and 'trace', by Arnaldo Carvalho de Melo.

   - Remove dead code in related to libtraceevent integration, from
     Namhyung Kim.

   - Revert "perf sched: Handle PERF_RECORD_EXIT events" to get 'perf
     sched lat' back working, by Arnaldo Carvalho de Melo

   - We don't use Newt anymore, just plain libslang, by Arnaldo Carvalho
     de Melo.

   - Kill a bunch of die() calls, from Namhyung Kim.

   - Fix build on non-glibc systems due to libio.h absence, from Cody P
     Schafer.

   - Remove some perf_session and tracing dead code, from David Ahern.

   - Honor parallel jobs, fix from Borislav Petkov

   - Introduce tools/lib/lk library, initially just removing duplication
     among tools/perf and tools/vm.  from Borislav Petkov

  ... and many more I missed to list, see the shortlog and git log for
  more details."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (136 commits)
  perf/x86/intel/P4: Robistify P4 PMU types
  perf/x86/amd: Fix AMD NB and L2I "uncore" support
  perf/x86/amd: Remove old-style NB counter support from perf_event_amd.c
  perf/x86: Check all MSRs before passing hw check
  perf/x86/amd: Add support for AMD NB and L2I "uncore" counters
  perf/x86/intel: Add Ivy Bridge-EP uncore support
  perf/x86/intel: Fix SNB-EP CBO and PCU uncore PMU filter management
  perf/x86: Avoid kfree() in CPU_{STARTING,DYING}
  uprobes/perf: Avoid perf_trace_buf_prepare/submit if ->perf_events is empty
  uprobes/tracing: Don't pass addr=ip to perf_trace_buf_submit()
  uprobes/tracing: Change create_trace_uprobe() to support uretprobes
  uprobes/tracing: Make seq_printf() code uretprobe-friendly
  uprobes/tracing: Make register_uprobe_event() paths uretprobe-friendly
  uprobes/tracing: Make uprobe_{trace,perf}_print() uretprobe-friendly
  uprobes/tracing: Introduce is_ret_probe() and uretprobe_dispatcher()
  uprobes/tracing: Introduce uprobe_{trace,perf}_print() helpers
  uprobes/tracing: Generalize struct uprobe_trace_entry_head
  uprobes/tracing: Kill the pointless local_save_flags/preempt_count calls
  uprobes/tracing: Kill the pointless seq_print_ip_sym() call
  uprobes/tracing: Kill the pointless task_pt_regs() calls
  ...
2013-04-30 07:41:01 -07:00
Aneesh Kumar K.V
5c1f6ee9a3 powerpc: Reduce PTE table memory wastage
We allocate one page for the last level of linux page table. With THP and
large page size of 16MB, that would mean we are wasting large part
of that page. To map 16MB area, we only need a PTE space of 2K with 64K
page size. This patch reduce the space wastage by sharing the page
allocated for the last level of linux page table with multiple pmd
entries. We call these smaller chunks PTE page fragments and allocated
page, PTE page.

In order to support systems which doesn't have 64K HPTE support, we also
add another 2K to PTE page fragment. The second half of the PTE fragments
is used for storing slot and secondary bit information of an HPTE. With this
we now have a 4K PTE fragment.

We use a simple approach to share the PTE page. On allocation, we bump the
PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
request. This should help in the node locality of the PTE page fragment,
assuming that the immediate pte alloc request will mostly come from the
same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
we could be waisting some space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 16:00:07 +10:00
Aneesh Kumar K.V
ce54152f42 powerpc: Save DAR and DSISR in pt_regs on MCE
We were not saving DAR and DSISR on MCE. Save then and also print the values
along with exception details in xmon.

Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:42 +10:00
Kevin Hao
177c19237b powerpc/booke: Remove obsolete macro FINISH_EXCEPTION
This is stale and not used by anyone now.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:32 +10:00
Vasant Hegde
fb4696c395 powerpc/rtas_flash: Fix bad memory access
We use kmem_cache_alloc() to allocate memory to hold the new firmware
which will be flashed. kmem_cache_alloc() calls rtas_block_ctor() to
set memory to NULL. But these constructor is called only for newly
allocated slabs.

If we run below command multiple time without rebooting, allocator may
allocate memory from the area which was free'd by kmem_cache_free and
it will not call constructor. In this situation we may hit kernel oops.

dd if=<fw image> of=/proc/ppc64/rtas/firmware_flash bs=4096

oops message:
-------------
[ 1602.399755] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1602.399772] SMP NR_CPUS=1024 NUMA pSeries
[ 1602.399779] Modules linked in: rtas_flash nfsd lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod sg ipv6 ses enclosure ehea ehci_pci ohci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw scsi_dh_rdac scsi_dh ipr libata scsi_mod
[ 1602.399817] NIP: d00000000a170b9c LR: d00000000a170b64 CTR: c00000000079cd58
[ 1602.399823] REGS: c0000003b9937930 TRAP: 0300   Not tainted  (3.9.0-rc4-0.27-ppc64)
[ 1602.399828] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 22000428  XER: 20000000
[ 1602.399841] SOFTE: 1
[ 1602.399844] CFAR: c000000000005f24
[ 1602.399848] DAR: 8c2625a820631fef, DSISR: 40000000
[ 1602.399852] TASK = c0000003b4520760[3655] 'dd' THREAD: c0000003b9934000 CPU: 3
GPR00: 8c2625a820631fe7 c0000003b9937bb0 d00000000a179f28 d00000000a171f08
GPR04: 0000000010040000 0000000000001000 c0000003b9937df0 c0000003b5fb2080
GPR08: c0000003b58f7200 d00000000a179f28 c0000003b40058d4 c00000000079cd58
GPR12: d00000000a171450 c000000007f40900 0000000000000005 0000000010178d20
GPR16: 00000000100cb9d8 000000000000001d 0000000000000000 000000001003ffff
GPR20: 0000000000000001 0000000000000000 00003fffa0b50d30 000000001001f010
GPR24: 0000000010020888 0000000010040000 d00000000a171f08 d00000000a172808
GPR28: 0000000000001000 0000000010040000 c0000003b4005880 8c2625a820631fe7
[ 1602.399924] NIP [d00000000a170b9c] .rtas_flash_write+0x7c/0x1e8 [rtas_flash]
[ 1602.399930] LR [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash]
[ 1602.399934] Call Trace:
[ 1602.399939] [c0000003b9937bb0] [d00000000a170b64] .rtas_flash_write+0x44/0x1e8 [rtas_flash] (unreliable)
[ 1602.399948] [c0000003b9937c60] [c000000000282830] .proc_reg_write+0x90/0xe0
[ 1602.399955] [c0000003b9937ce0] [c0000000001ff374] .vfs_write+0x114/0x238
[ 1602.399961] [c0000003b9937d80] [c0000000001ff5d8] .SyS_write+0x70/0xe8
[ 1602.399968] [c0000003b9937e30] [c000000000009cdc] syscall_exit+0x0/0xa0
[ 1602.399973] Instruction dump:
[ 1602.399977] eb698010 801b0028 2f80dcd6 419e00a4 2fbc0000 419e009c ebfb0030 2fbf0000
[ 1602.399989] 409e0010 480000d8 60000000 7c1f0378 <e81f0008> 2fa00000 409efff4 e81f0000
[ 1602.400012] ---[ end trace b4136d115dc31dac ]---
[ 1602.402178]
[ 1602.402185] Sending IPI to other CPUs
[ 1602.403329] IPI complete

This patch uses kmem_cache_zalloc() instead of kmem_cache_alloc() to
allocate memory, which makes sure memory is set to 0 before using.
Also removes rtas_block_ctor(), which is no longer required.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-30 15:59:28 +10:00
Thomas Gleixner
d0380e6c3c early_printk: consolidate random copies of identical code
The early console implementations are the same all over the place.  Move
the print function to kernel/printk and get rid of the copies.

[akpm@linux-foundation.org: arch/mips/kernel/early_printk.c needs kernel.h for va_list]
[paul.gortmaker@windriver.com: sh4: make the bios early console support depend on EARLY_PRINTK]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Richard Weinberger <richard@nod.at>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:13 -07:00
Benjamin Herrenschmidt
bc23100a0d Merge remote-tracking branch 'kumar/next' into next
From Kumar Gala:
<<
Add support for T4 and B4 SoC families from Freescale, e6500 altivec
support, some various board fixes and other minor cleanups.
>>
2013-04-30 11:10:09 +10:00
Jiang Liu
5d585e5c48 mm/ppc: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:30 -07:00
Benjamin Herrenschmidt
54695c3088 KVM: PPC: Book3S HV: Speed up wakeups of CPUs on HV KVM
Currently, we wake up a CPU by sending a host IPI with
smp_send_reschedule() to thread 0 of that core, which will take all
threads out of the guest, and cause them to re-evaluate their
interrupt status on the way back in.

This adds a mechanism to differentiate real host IPIs from IPIs sent
by KVM for guest threads to poke each other, in order to target the
guest threads precisely when possible and avoid that global switch of
the core to host state.

We then use this new facility in the in-kernel XICS code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:31 +02:00
Paul Mackerras
c35635efdc KVM: PPC: Book3S HV: Report VPA and DTL modifications in dirty map
At present, the KVM_GET_DIRTY_LOG ioctl doesn't report modifications
done by the host to the virtual processor areas (VPAs) and dispatch
trace logs (DTLs) registered by the guest.  This is because those
modifications are done either in real mode or in the host kernel
context, and in neither case does the access go through the guest's
HPT, and thus no change (C) bit gets set in the guest's HPT.

However, the changes done by the host do need to be tracked so that
the modified pages get transferred when doing live migration.  In
order to track these modifications, this adds a dirty flag to the
struct representing the VPA/DTL areas, and arranges to set the flag
when the VPA/DTL gets modified by the host.  Then, when we are
collecting the dirty log, we also check the dirty flags for the
VPA and DTL for each vcpu and set the relevant bit in the dirty log
if necessary.  Doing this also means we now need to keep track of
the guest physical address of the VPA/DTL areas.

So as not to lose track of modifications to a VPA/DTL area when it gets
unregistered, or when a new area gets registered in its place, we need
to transfer the dirty state to the rmap chain.  This adds code to
kvmppc_unpin_guest_page() to do that if the area was dirty.  To simplify
that code, we now require that all VPA, DTL and SLB shadow buffer areas
fit within a single host page.  Guests already comply with this
requirement because pHyp requires that these areas not cross a 4k
boundary.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2013-04-26 20:27:13 +02:00
Michael Ellerman
240686c136 powerpc: Initialise PMU related regs on Power8
For both HV and guest kernels, intialise PMU regs to something sane.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:11:06 +10:00
Paul Mackerras
a485c70989 powerpc: Fix "attempt to move .org backwards" error
Building a 64-bit powerpc kernel with PR KVM enabled currently gives
this error:

  AS      arch/powerpc/kernel/head_64.o
arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards
make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1

This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into
33 instructions, but we only have space for 32 at the decrementer
interrupt vector (from 0x900 to 0x980).

In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we
currently have two instances of the HMT_MEDIUM macro, which has the
effect of setting the SMT thread priority to medium.  One is the
first instruction, and is overwritten by a no-op on processors where
we save the PPR (processor priority register), that is, POWER7 or
later.  The other is after we have saved the PPR.

In order to reduce the code at 0x900 by one instruction, we omit the
first HMT_MEDIUM.  On processors without SMT this will have no effect
since HMT_MEDIUM is a no-op there.  On POWER5 and RS64 machines this
will mean that the first few instructions take a little longer in the
case where a decrementer interrupt occurs when the hardware thread is
running at low SMT priority.  On POWER6 and later machines, the
hardware automatically boosts the thread priority when a decrementer
interrupt is taken if the thread priority was below medium, so this
change won't make any difference.

The alternative would be to branch out of line after saving the CFAR.
However, that would incur an extra overhead on all processors, whereas
the approach adopted here only adds overhead on older threaded processors.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:27 +10:00
Nathan Fontenot
e04fa61214 powerpc/pseries: Add /proc interface to control topology updates
There are instances in which we do not want topology updates to occur.
In order to allow this a /proc interface (/proc/powerpc/topology_updates)
is introduced so that topology updates can be enabled and disabled.

This patch also adds a prrn_is_enabled() call so that PRRN events are
handled in the kernel only if topology updating is enabled.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:26 +10:00
Nathan Fontenot
1b1218d32e powerpc/pseries: Enable PRRN handling
The Linux kernel and platform firmware negotiate their mutual support
of the PRRN option via the ibm,client-architecture-support interface.
This patch simply sets the appropriate fields in the client architecture
vector to indicate Linux support for PRRN and will allow the firmware to
report PRRN events via the RTAS event-scan mechanism.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:26 +10:00
Nathan Fontenot
f0ff7eb483 powerpc/pseries: Update firmware_has_feature() to check architecture vector 5 bits
The firmware_has_feature() function makes it easy to check for supported
features of the hypervisor. This patch extends the capability of
firmware_has_feature() to include checking for specified bits
in vector 5 of the architecture vector as reported in the device tree.

As part of this the #defines used for the architecture vector are re-defined
such that each option has the index into vector 5 and the feature bit encoded
into it. This makes checking for architecture bits when initiating data
for firmware_has_feature much easier.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:22 +10:00
Nathan Fontenot
530b5e1475 powerpc/pseries: Move architecture vector definitions to prom.h
As part of handling of PRRN events we need to check vector 5 of the
architecture vector bits reported in the device tree to ensure PRRN event
handling is enabled. To do this firmware_has_feature() is updated (in a
subsequent patch) to make this check vector 5 bits. To avoid having to
re-define bits in the architecture vector the bit definitions are moved
to prom.h.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:21 +10:00
Jesse Larrew
49c68a8518 powerpc/pseries: Add PRRN RTAS event handler
A PRRN event is signaled via the RTAS event-scan mechanism, which
returns a Hot Plug Event message "fixed part" indicating "Platform
Resource Reassignment". In response to the Hot Plug Event message,
we must call ibm,update-nodes to determine which resources were
reassigned and then ibm,update-properties to obtain the new affinity
information about those resources.

The PRRN event-scan RTAS message contains only the "fixed part" with
the "Type" field set to the value 160 and no Extended Event Log. The
four-byte Extended Event Log Length field is re-purposed (since no
Extended Event Log message is included) to pass the "scope" parameter
that causes the ibm,update-nodes to return the nodes affected by the
specific resource reassignment.

This patch adds a handler for RTAS events. The function
pseries_devicetree_update() (from mobility.c) is used to make the
ibm,update-nodes/ibm,update-properties RTAS calls. Updating the NUMA maps
(handled by a subsequent patch) will require significant processing,
so pseries_devicetree_update() is called from an asynchronous workqueue
to allow event processing to continue.

PRRN RTAS events on pseries systems are rare events that have to be
initiated from the HMC console for the system by an IBM tech. This allows
us to assume that these events are widely spaced. Additionally, all work
on the queue is flushed before handling any new work to ensure we only have
one event in flight being handled at a time.

Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:20 +10:00
Michael Neuling
3e96ca7f00 powerpc: Fix hardware IRQs with MMU on exceptions when HV=0
POWER8 allows us to take interrupts with the MMU on.  This gives us a
second set of vectors offset at 0x4000.

Unfortunately when coping these vectors we missed checking for MSR HV
for hardware interrupts (0x500).  This results in us trying to use
HSRR0/1 when HV=0, rather than SRR0/1 on HW IRQs

The below fixes this to check CPU_FTR_HVMODE when patching the code at
0x4500.

Also we remove the check for CPU_FTR_ARCH_206 since relocation on IRQs
are only available in arch 2.07 and beyond.

Thanks to benh for helping find this.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:18 +10:00
Michael Neuling
8c2a381734 powerpc/power8: Fix secondary CPUs hanging on boot for HV=0
In __restore_cpu_power8 we determine if we are HV and if not, we return
before setting HV only resources.

Unfortunately we forgot to restore the link register from r11 before
returning.

This will happen on boot and with secondary CPUs not coming online.

This adds the missing link register restore.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:17 +10:00
Michael Neuling
29ce3c5073 powerpc: Add isync to copy_and_flush
In __after_prom_start we copy the kernel down to zero in two calls to
copy_and_flush.  After the first call (copy from 0 to copy_to_here:)
we jump to the newly copied code soon after.

Unfortunately there's no isync between the copy of this code and the
jump to it.  Hence it's possible that stale instructions could still be
in the icache or pipeline before we branch to it.

We've seen this on real machines and it's results in no console output
after:
  calling quiesce...
  returning from prom_init

The below adds an isync to ensure that the copy and flushing has
completed before any branching to the new instructions occurs.

Signed-off-by: Michael Neuling <mikey@neuling.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-26 16:08:17 +10:00
Benjamin Herrenschmidt
234d15def9 Merge remote-tracking branch 'origin/master' into next
Merge upstream to get the audit fixes
2013-04-24 14:43:36 +10:00
Vasant Hegde
fd2d37c854 powerpc/rtas_flash: New return code to indicate FW entitlement expiry
Add new return code to rtas_flash to indicate firmware entitlement
expiry. Strictly we don't need this update. But to keep it in sync
with PAPR, this was added.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:31 +10:00
Vasant Hegde
f51005d7a4 powerpc/rtas_flash: Update return token comments
Add proper comment to ibm,validate-flash-image RTAS call
update result tokens.

Note: Only comment section is modified, no code change.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-24 14:22:31 +10:00
Benjamin Herrenschmidt
973e2abd15 Merge remote-tracking branch 'mpe/master' into next
Merge the tentative powerpc-next branch maintained by Michael
while I was on vacation
2013-04-24 14:11:20 +10:00
Chen Gang
5676005acf powerpc/pseries/lparcfg: Fix possible overflow are more than 1026
need set '\0' for 'local_buffer'.

SPLPAR_MAXLENGTH is 1026, RTAS_DATA_BUF_SIZE is 4096. so the contents of
rtas_data_buf may truncated in memcpy.

if contents are really truncated.
  the splpar_strlen is more than 1026. the next while loop checking will
  not find the end of buffer. that will cause memory access violation.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:39:35 +10:00
Oleg Nesterov
28d170abad ptrace/powerpc: Don't flush_ptrace_hw_breakpoint() on fork()
arch_dup_task_struct() does flush_ptrace_hw_breakpoint(src), this
destroys the parent's breakpoints for no reason. We should clear
child->thread.ptrace_bps[] copied by dup_task_struct() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:05:06 +10:00
Adhemerval Zanella
fcb41a2030 powerpc: Add VDSO version of time
On 04/18/2013 07:38 PM, Anton Blanchard wrote:
> Since you are only reading one long you shouldn't need to check the
> update count and loop, you will always see a consistent value. The
> system call version of time() just does an unprotected load for example.

Fixed.

> With the above change and with Michael's comments covered (decent
> changelog entry and Signed-off-by):
>
> Acked-by: Anton Blanchard <anton@samba.org>

Thanks for the review, below the updated patch:

From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>

This patch implement the time syscall as vDSO. The performance speedups
are:

Baseline PPC32: 380 nsec
Baseline PPC64: 350 nsec
vdso PPC32:      20 nsec
vsdo PPC64:      20 nsec

Tested on 64 bit build with both 32 bit and 64 bit userland.

Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-04-23 16:05:05 +10:00
Brian King
c2e1d84523 powerpc: Set default VGA device
Add a PCI quirk for VGA devices on Power to set the default VGA device.
Ensures a default VGA is always set if a graphics adapter is present,
even if firmware did not initialize it. If more than one graphics
adapter is present, ensure the one initialized by firmware is set
as the default VGA device. This ensures that X autoconfiguration
will work.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:58 +10:00
Yuanquan Chen
37f02195be powerpc/pci: fix PCI-e devices rescan issue on powerpc platform
Powerpc initializes the DMA and IRQ information in pci_scan_child_bus()->
pcibios_fixup_bus()->pcibios_setup_bus_devices(). But for the devices
which are hotpluged, bus->is added has been set for the first scan of the
PCI-e bus, so the initialization code won't be called. Then the hotpluged
devices' driver will fail to load.

For example :
The PCI-e device 0001:03:00.0 is the Intel PCI-e e1000e network card, remove
it from the system:

    # echo 1 > /sys/bus/pci/devices/0001\:03\:00.0/remove
    # e1000e 0001:03:00.0 eth0: removed PHC

Rescan it from it's bus:

    # echo 1 > /sys/bus/pci/devices/0001\:02\:00.0/rescan
    ...
    e1000e 0001:03:00.0: Disabling ASPM L0s L1
    e1000e 0001:03:00.0: No usable DMA configuration, aborting
    e1000e: probe of 0001:03:00.0 failed with error -5

So we move the DMA & IRQ initialization code from pcibios_setup_devices() and
construct a new function pcibios_enable_device. We call this function in
pcibios_enable_device, which will be called by PCI-e rescan code. At the
meanwhile, we avoid the the impact on cardbus. I also validate this patch with
silicon's PCIe-sata which encounters the IRQ issue.

Signed-off-by: Yuanquan Chen <Yuanquan.Chen@freescale.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:57 +10:00
Michael Neuling
517b731477 powerpc/ptrace: Add DAWR debug feature info for userspace
This adds new debug feature information so that the DAWR can be
identified by userspace tools like GDB.

Unfortunately the DAWR doesn't sit nicely into the current description
that ptrace provides to userspace via struct ppc_debug_info.  It doesn't
allow for specifying that only some ranges are possible or even the end
alignment constraints (DAWR only allows 512 byte wide ranges which can't
cross a 512 byte boundary).

After talking to Edjunior Machado (GDB ppc developer), it was decided
this was the best approach.  Just mark it as debug feature DAWR and
tools like GDB can internally decide the constraints.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:55 +10:00
Ian Munsie
a6a058e52a powerpc: Add accounting for Doorbell interrupts
This patch adds a new line to /proc/interrupts to account for the
doorbell interrupts that each hardware thread has received. The total
interrupt count in /proc/stat will now also include doorbells.

 # cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3
 16:        551       1267        281        175      XICS Level     IPI
LOC:       2037       1503       1688       1625   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
CNT:          0          0          0          0   Performance monitoring interrupts
MCE:          0          0          0          0   Machine check exceptions
DBL:         42        550         20         91   Doorbell interrupts

Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 15:59:55 +10:00
Michael Neuling
2a3563b023 powerpc: Setup in HFSCR for POWER8
Setup the HFSCR (Hypervisor Facility Status and Control Register) for POWER8
when running HV=1.  The HFSCR is the same as the FSCR except it's for
hypervisors.  It controls the available of various facilities in OS and
userspace levels.  It also indicates the cause of a hypervisor facility
unavailable interrupt (although we are not using this here).

This patch sets the facilities Linux knows about incase the firmware doesn't.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:59 +10:00
Alexey Kardashevskiy
ee4a391661 powerpc: fixing ptrace_get_reg to return an error
Currently ptrace_get_reg returns error as a value
what make impossible to tell whether it is a correct value or error code.

The patch adds a parameter which points to the real return data and
returns an error code.

As get_user_msr() never fails and it is used in multiple places so it has not
been changed by this patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:57 +10:00
Zhang Yanfei
c843be8a54 powerpc: remove cast for kmalloc/kzalloc return value
remove cast for kmalloc/kzalloc return value.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:56 +10:00
Alex Grad
c0b52c143e powerpc/kgdb: Removed kmalloc returned value cast
Signed-off-by: Alex Grad <alex.grad@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:56 +10:00
Paul Bolle
933ee7119f powerpc: remove PReP platform
PPC_PREP is marked as BROKEN since v2.6.15. Remove all PReP specific
code now.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:53 +10:00
Paul Bolle
9850baed30 powerpc: remove dead CONFIG_HVC_SCOM code
Commit c1fb6816fb ("powerpc: Add
relocation on exception vector handlers") added two lines of code that
depend on the macro CONFIG_HVC_SCOM. That macro doesn't exist. Perhaps
it was intended to use CONFIG_PPC_SCOM here. But since
"maintence_interrupt" is a typo and there's nothing in arch/powerpc that
looks like maintenance_interrupt it seems best to just delete these
lines.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:52 +10:00
Adrian-Leonard Radu
09652b00cd powerpc: Use PTR_RET instead of IS_ERR/PTR_ERR
Signed-off-by: Adrian-Leonard Radu <ady8radu@gmail.com>
Acked-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:48 +10:00
Gavin Shan
db38f290ca powerpc/kernel: Cleanup on rtas_pci.c
It's minor cleanup so that the function names comply with the
coding style.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 13:03:48 +10:00
Vasant Hegde
ad18a364f1 powerpc/rtas_flash: Free kmem upon module exit
Memory allocated to rtas_firmware_flash_list in rtas_flash_write
is not freed during module exit. We hit below call trace if we
unload rtas_flash module after loading new firmware image and
before rebooting the system.

Call trace:
----------
Feb  6 08:42:10 eagle3 kernel: kmem_cache_destroy rtas_flash_cache: Slab cache still has objects
Feb  6 08:42:10 eagle3 kernel: Call Trace:
Feb  6 08:42:10 eagle3 kernel: [c00000001c303b40] [c000000000014940] .show_stack+0x70/0x1c0 (unreliable)
Feb  6 08:42:10 eagle3 kernel: [c00000001c303bf0] [c000000000199bec] .kmem_cache_destroy+0x15c/0x170
Feb  6 08:42:10 eagle3 kernel: [c00000001c303c90] [d000000006fa1208] .rtas_flash_cleanup+0x3c/0x80 [rtas_flash]
Feb  6 08:42:10 eagle3 kernel: [c00000001c303d20] [c0000000000f8970] .SyS_delete_module+0x1d0/0x2e0
Feb  6 08:42:10 eagle3 kernel: [c00000001c303e30] [c000000000009954] syscall_exit+0x0/0x94

This patch frees rtas_firmware_flash_list during module exit.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
2013-04-18 11:52:58 +10:00
Ingo Molnar
b5210b2a34 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
Pull uprobes updates from Oleg Nesterov:

 - "uretprobes" - an optimization to uprobes, like kretprobes are an optimization
   to kprobes. "perf probe -x file sym%return" now works like kretprobes.

 - PowerPC fixes plus a couple of cleanups/optimizations in uprobes and trace_uprobes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-04-16 11:04:10 +02:00
Kevin Hao
d8b9229240 powerpc: add a missing label in resume_kernel
A label 0 was missed in the patch a9c4e541 (powerpc/kprobe: Complete
kprobe and migrate exception frame). This will cause the kernel
branch to an undetermined address if there really has a conflict when
updating the thread flags.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
Acked-By: Tiejun Chen <tiejun.chen@windriver.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-15 17:29:48 +10:00
Alistair Popple
05e38e5d5d powerpc: Fix audit crash due to save/restore PPR changes
The current mainline crashes when hitting userspace with the following:

kernel BUG at kernel/auditsc.c:1769!
cpu 0x1: Vector: 700 (Program Check) at [c000000023883a60]
    pc: c0000000001047a8: .__audit_syscall_entry+0x38/0x130
    lr: c00000000000ed64: .do_syscall_trace_enter+0xc4/0x270
    sp: c000000023883ce0
   msr: 8000000000029032
  current = 0xc000000023800000
  paca    = 0xc00000000f080380   softe: 0        irq_happened: 0x01
    pid   = 1629, comm = start_udev
kernel BUG at kernel/auditsc.c:1769!
enter ? for help
[c000000023883d80] c00000000000ed64 .do_syscall_trace_enter+0xc4/0x270
[c000000023883e30] c000000000009b08 syscall_dotrace+0xc/0x38
 --- Exception: c00 (System Call) at 0000008010ec50dc

Bisecting found the following patch caused it:

commit 44e9309f1f
Author: Haren Myneni <haren@linux.vnet.ibm.com>
powerpc: Implement PPR save/restore

It was found this patch corrupted r9 when calling
SET_DEFAULT_THREAD_PPR()

Using r10 as a scratch register instead of r9 solved the problem.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-15 17:29:45 +10:00
Anton Arapov
f15706b79d uretprobes/powerpc: Hijack return address
Hijack the return address and replace it with a trampoline address.
PowerPC implementation.

Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-04-13 15:31:56 +02:00
Anton Blanchard
2540334adc powerpc: Remove static branch prediction in 64bit traced syscall path
Some distros enable auditing by default which forces us through the
syscall trace path. Remove the static branch prediction in our 64bit
syscall handler and let the hardware do the prediction.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Paris <eparis@redhat.com>
2013-04-10 12:49:20 -04:00
Michael Neuling
f110c0c192 powerpc: fix compiling CONFIG_PPC_TRANSACTIONAL_MEM when CONFIG_ALTIVEC=n
We can't compile a kernel with CONFIG_ALTIVEC=n when
CONFIG_PPC_TRANSACTIONAL_MEM=y.  We currently get:

arch/powerpc/kernel/tm.S:320: Error: unsupported relocation against THREAD_VSCR
arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0
arch/powerpc/kernel/tm.S:323: Error: unsupported relocation against THREAD_VR0
etc.

The below fixes this with a sprinkling of #ifdefs.

This was found by mpe with kisskb:
  http://kisskb.ellerman.id.au/kisskb/buildresult/8539442/

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-04-10 08:14:39 +10:00
Al Viro
b177a29251 lparcfg: don't bother saving pointer to proc_dir_entry
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:34 -04:00
Al Viro
d9dda78bad procfs: new helper - PDE_DATA(inode)
The only part of proc_dir_entry the code outside of fs/proc
really cares about is PDE(inode)->data.  Provide a helper
for that; static inline for now, eventually will be moved
to fs/proc, along with the knowledge of struct proc_dir_entry
layout.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-04-09 14:13:32 -04:00
Thomas Gleixner
799fef0612 powerpc: Use generic idle loop
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Magnus Damm <magnus.damm@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20130321215235.026838003@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-04-08 17:39:27 +02:00
Ananth N Mavinakayanahalli
3c9eb54f71 uprobes/powerpc: Remove additional trap instruction check
prepare_uprobe() already checks if the underlying unstruction
(on file) is a trap variant. We don't need to check this again.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-04-04 13:57:04 +02:00
Ananth N Mavinakayanahalli
ab07e807be uprobes/powerpc: Teach uprobes to ignore gdb breakpoints
Powerpc has many trap variants that could be used by entities like gdb.
Currently, running gdb on a program being traced by uprobes causes an
endless loop since uprobes doesn't understand that the trap was inserted
by some other entity and a SIGTRAP needs to be delivered.

Teach uprobes to ignore breakpoints that do not belong to it.

Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-04-04 13:57:04 +02:00
Stuart Yoder
f9294e989f powerpc: define the conditions where the ePAPR idle hcall can be supported
For 32-bit, CONFIG_EPAPR_PARAVIRT pulls in both epapr_paravirt.c
and epapr_hcalls.c which contains the 32-bit paravirt idle loop.

For 64-bit, the paravirt idle loop is in idle_book3e.S and that
source file is included only if CONFIG_PPC_BOOK3E_64 defined.

This patch makes that dependency for 64-bit explicit.

Fixes these build errors:

arch/powerpc/kernel/built-in.o: In function `restore_pblist_ptr':
ftrace.c:(.toc+0xdc0): undefined reference to `epapr_ev_idle_start'
ftrace.c:(.toc+0xdd0): undefined reference to `epapr_ev_idle'

Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2013-03-26 08:47:27 +11:00