Commit Graph

497 Commits

Author SHA1 Message Date
David S. Miller
0dd5b7b09e sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.

Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.

Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.

The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.

The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.

The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.

These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image.  Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.

So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.

We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.

On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-05 16:53:39 -07:00
David S. Miller
8c82dc0e88 sparc64: Adjust KTSB assembler to support larger physical addresses.
As currently coded the KTSB accesses in the kernel only support up to
47 bits of physical addressing.

Adjust the instruction and patching sequence in order to support
arbitrary 64 bits addresses.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-05 16:53:39 -07:00
David S. Miller
4397bed080 sparc64: Define VA hole at run time, rather than at compile time.
Now that we use 4-level page tables, we can provide up to 53-bits of
virtual address space to the user.

Adjust the VA hole based upon the capabilities of the cpu type probed.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-05 16:53:39 -07:00
David S. Miller
ac55c76814 sparc64: Switch to 4-level page tables.
This has become necessary with chips that support more than 43-bits
of physical addressing.

Based almost entirely upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-10-05 16:53:38 -07:00
David S. Miller
473ad7f4fb sparc64: Fix reversed start/end in flush_tlb_kernel_range()
When we have to split up a flush request into multiple pieces
(in order to avoid the firmware range) we don't specify the
arguments in the right order for the second piece.

Fix the order, or else we get hangs as the code tries to
flush "a lot" of entries and we get lockups like this:

[ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
[ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
[ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
[ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
[ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
[ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
[ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
[ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
[ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
[ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
[ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
[ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
[ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
[ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
[ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
[ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
[ 4423.234193] Call Trace:
[ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
[ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
[ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
[ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
[ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
[ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
[ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
[ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
[ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
[ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
[ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
[ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
[ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c

Fixes: 4ca9a23765 ("sparc64: Guard against flushing openfirmware mappings.")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 21:05:14 -07:00
bob picco
7c21d533ab sparc64: mem boot option correction
The "mem" boot option can result in many unexpected consequences. This patch
attempts to prevent boot hangs which have been experienced on T4-4 and T5-8.
Basically the boot loader allocates vmlinuz and initrd higher in available
OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot
with mem=20G.

The patch utilizes memblock to avoid reserved regions and trim memory which
is only free. Other improvements are possible for a multi-node machine.

This is a snippet of the boot log with mem=20G on T5-8 with the patch applied:
MEMBLOCK configuration:	<- before memory reduction
 memory size = 0x1ffad6ce000 reserved size = 0xa1adf44
 memory.cnt  = 0xb
 memory[0x0]    [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes
 memory[0x1]    [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes
 memory[0x2]    [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes
 memory[0x3]    [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes
 memory[0x4]    [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes
 memory[0x5]    [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes
 memory[0x6]    [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes
 memory[0x7]    [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes
 memory[0x8]    [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes
 memory[0x9]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
 memory[0xa]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
 reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
...
MEMBLOCK configuration:	<- after reduction of memory
 memory size = 0x50a1adf44 reserved size = 0xa1adf44
 memory.cnt  = 0x4
 memory[0x0]    [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
 memory[0x1]    [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes
 memory[0x2]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
 memory[0x3]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
 reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
...
Early memory node ranges
  node   7: [mem 0x380000000000-0x38000117dfff]
  node   7: [mem 0x380004000000-0x380f0d01bfff]
  node   7: [mem 0x383fffc92000-0x383fffca1fff]
  node   7: [mem 0x383fffcb4000-0x383fffcb5fff]
Could not find start_pfn for node 0
Could not find start_pfn for node 1
Could not find start_pfn for node 2
Could not find start_pfn for node 3
Could not find start_pfn for node 4
Could not find start_pfn for node 5
Could not find start_pfn for node 6
.

The patch was tested on T4-1, T5-8 and Jalap?no.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 18:23:11 -07:00
bob picco
3dee9df548 sparc64: find_node adjustment
We have seen an issue with guest boot into LDOM that causes early boot failures
because of no matching rules for node identitity of the memory. I analyzed this
on my T4 and concluded there might not be a solution. I saw the issue in
mainline too when booting into the control/primary domain - with guests
configured.  Note, this could be a firmware bug on some older machines.

I'll provide a full explanation of the issues below. Should we not find a
matching BEST latency group for a real address (RA) then we will assume node 0.
On the T4-2 here with the information provided I can't see an alternative.

Technically the LDOM shown below should match the MBLOCK to the
favorable latency group. However other factors must be considered too. Were
the memory controllers configured "fine" grained interleave or "coarse"
grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
node?

There has to be at least one Machine Description (MD) "group" and hence one
NUMA node. The group can have one or more latency groups (lg) - more than one
memory controller. The current code chooses the smallest latency as the most
favorable per group. The latency and lg information is in MLGROUP below.
MBLOCK is the base and size of the RAs for the machine as fetched from OBP
/memory "available" property. My machine has one MBLOCK but more would be
possible - with holes?

For a T4-2 the following information has been gathered:
with LDOM guest
MEMBLOCK configuration:
 memory size = 0x27f870000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
 memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
 memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
MBLOCK[0]: base[20000000] size[280000000] offset[0]
(note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
(note: (RA + offset) & mask = val is the formula to detect a match for the
memory controller. should there be no match for find_node node, a return
value of -1 resulted for the node - BAD)

There is one group. It has these forward links
MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
(note: "val" is the best lg's (smallest latency) "match")

no LDOM guest - bare metal
MEMBLOCK configuration:
 memory size = 0xfdf2d0000
 memory.cnt  = 0x3
 memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
 memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
 memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
 reserved.cnt  = 0x2
 reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
 reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
MBLOCK[0]: base[20000000] size[fe0000000] offset[0]

there are two groups
group node[16d5]
MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
group node[171d]
MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
(note: for this two "group" bare metal machine, 1/2 memory is in group one's
lg and 1/2 memory is in group two's lg).

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 17:55:09 -07:00
bob picco
4ccb927289 sparc64: sun4v TLB error power off events
We've witnessed a few TLB events causing the machine to power off because
of prom_halt. In one case it was some nfs related area during rmmod. Another
was an mmapper of /dev/mem. A more recent one is an ITLB issue with
a bad pagesize which could be a hardware bug. Bugs happen but we should
attempt to not power off the machine and/or hang it when possible.

This is a DTLB error from an mmapper of /dev/mem:
[root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
SUN4V-DTLB: TPC<0xfffff80100903e6c>
SUN4V-DTLB: O7[fffff801081979d0]
SUN4V-DTLB: O7<0xfffff801081979d0>
SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
.

This is recent mainline for ITLB:
[ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
[ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
[ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
[ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
.

Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
prom_halt() and drop us to OF command prompt "ok". This isn't the case for
LDOMs and the machine powers off.

For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().

The logic of this patch was partially inspired by David Miller's feedback.

Power off of large sparc64 machines is painful. Plus die_if_kernel provides
more context. A reset sequence isn't a brief period on large sparc64 but
better than power-off/power-on sequence.

Cc: sparclinux@vger.kernel.org
Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-16 17:46:44 -07:00
Christoph Lameter
494fc42170 sparc: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: sparclinux@vger.kernel.org
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:55 -04:00
David S. Miller
5b6ff9df05 sparc64: Fix up merge thinko.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 19:09:19 -07:00
David S. Miller
e9011d0866 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Conflicts:
	arch/sparc/mm/init_64.c

Conflict was simple non-overlapping additions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-05 18:57:18 -07:00
David S. Miller
4ca9a23765 sparc64: Guard against flushing openfirmware mappings.
Based almost entirely upon a patch by Christopher Alexander Tobias
Schulze.

In commit db64fe0225 ("mm: rewrite vmap
layer") lazy VMAP tlb flushing was added to the vmalloc layer.  This
causes problems on sparc64.

Sparc64 has two VMAP mapped regions and they are not contiguous with
eachother.  First we have the malloc mapping area, then another
unrelated region, then the vmalloc region.

This "another unrelated region" is where the firmware is mapped.

If the lazy TLB flushing logic in the vmalloc code triggers after
we've had both a module unload and a vfree or similar, it will pass an
address range that goes from somewhere inside the malloc region to
somewhere inside the vmalloc region, and thus covering the
openfirmware area entirely.

The sparc64 kernel learns about openfirmware's dynamic mappings in
this region early in the boot, and then services TLB misses in this
area.  But openfirmware has some locked TLB entries which are not
mentioned in those dynamic mappings and we should thus not disturb
them.

These huge lazy TLB flush ranges causes those openfirmware locked TLB
entries to be removed, resulting in all kinds of problems including
hard hangs and crashes during reboot/reset.

Besides causing problems like this, such huge TLB flush ranges are
also incredibly inefficient.  A plea has been made with the author of
the VMAP lazy TLB flushing code, but for now we'll put a safety guard
into our flush_tlb_kernel_range() implementation.

Since the implementation has become non-trivial, stop defining it as a
macro and instead make it a function in a C source file.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04 20:16:00 -07:00
David S. Miller
18f3813252 sparc64: Do not insert non-valid PTEs into the TSB hash table.
The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
only be called when the PTE being installed will be accessible by the user.

This is not true for code paths originating from remove_migration_pte().

There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
miss frramework assumes thatwhen a TSB entry matches we can just load it into
the TLB and return from the TLB miss trap.

So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
and over, never satisfying the miss.

Just exit early from update_mmu_cache() and friends in this situation.

Based upon a report and patch from Christopher Alexander Tobias Schulze.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-04 16:34:01 -07:00
bob picco
f6d4fb5cc0 sparc64 - add mem to iomem resource
This patch adds sparc RAM to /proc/iomem. It also identifies the
code, data and bss regions of the kernel.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21 21:37:05 -07:00
Linus Torvalds
c4222e4635 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc fixes from David Miller:
 "Sparc sparse fixes from Sam Ravnborg"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (67 commits)
  sparc64: fix sparse warnings in int_64.c
  sparc64: fix sparse warning in ftrace.c
  sparc64: fix sparse warning in kprobes.c
  sparc64: fix sparse warning in kgdb_64.c
  sparc64: fix sparse warnings in compat_audit.c
  sparc64: fix sparse warnings in init_64.c
  sparc64: fix sparse warnings in aes_glue.c
  sparc: fix sparse warnings in smp_32.c + smp_64.c
  sparc64: fix sparse warnings in perf_event.c
  sparc64: fix sparse warnings in kprobes.c
  sparc64: fix sparse warning in tsb.c
  sparc64: clean up compat_sigset_t.seta handling
  sparc64: fix sparse "Should it be static?" warnings in signal32.c
  sparc64: fix sparse warnings in sys_sparc32.c
  sparc64: fix sparse warning in pci.c
  sparc64: fix sparse warnings in smp_64.c
  sparc64: fix sparse warning in prom_64.c
  sparc64: fix sparse warning in btext.c
  sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
  sparc64: fix sparse warning in process_64.c
  ...

Conflicts:
	arch/sparc/include/asm/pgtable_64.h
2014-06-19 07:50:07 -10:00
Naoya Horiguchi
c177c81e09 hugetlb: restrict hugepage_migration_support() to x86_64
Currently hugepage migration is available for all archs which support
pmd-level hugepage, but testing is done only for x86_64 and there're
bugs for other archs.  So to avoid breaking such archs, this patch
limits the availability strictly to x86_64 until developers of other
archs get interested in enabling this feature.

Simply disabling hugepage migration on non-x86_64 archs is not enough to
fix the reported problem where sys_move_pages() hits the BUG_ON() in
follow_page(FOLL_GET), so let's fix this by checking if hugepage
migration is supported in vma_migratable().

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-04 16:53:51 -07:00
Sam Ravnborg
48d372164d sparc64: fix sparse warnings in int_64.c
Fix following warnings:
init_64.c:798:5: warning: symbol 'numa_cpu_lookup_table' was not declared. Should it be static?
init_64.c:799:11: warning: symbol 'numa_cpumask_lookup_table' was not declared. Should it be static?

The warnings were present with an allnoconfig
Fix so the variables are only declared if CONFIG_NEED_MULTIPLE_NODES is defined.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:35 -07:00
Sam Ravnborg
59dec13b27 sparc64: fix sparse warnings in init_64.c
Fix following warnings:
init_64.c:191:10: warning: symbol 'dcpage_flushes' was not declared. Should it be static?
init_64.c:193:10: warning: symbol 'dcpage_flushes_xcall' was not declared. Should it be static?

Add extern declaration to asm/setup.h and drop local declaration in smp_64.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:34 -07:00
Sam Ravnborg
8c7260c0d9 sparc64: fix sparse warning in tsb.c
Fix following warning:
tsb.c:290:5: warning: symbol 'sysctl_tsb_ratio' was not declared. Should it be static?

Add extern declaration in asm/setup.h and remove local declaration
in kernel/sysctl.c

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:32 -07:00
Sam Ravnborg
8df52620e6 sparc64: fix sparse warnings in sys_sparc_64.c + unaligned_64.c
Fix following warnings:
kernel/sys_sparc_64.c:643:17: warning: symbol 'sys_kern_features' was not declared. Should it be static?
kernel/unaligned_64.c:297:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static?
kernel/unaligned_64.c:387:5: warning: symbol 'handle_popc' was not declared. Should it be static?
kernel/unaligned_64.c:428:5: warning: symbol 'handle_ldf_stq' was not declared. Should it be static?
kernel/unaligned_64.c:553:6: warning: symbol 'handle_ld_nf' was not declared. Should it be static?
kernel/unaligned_64.c:579:6: warning: symbol 'handle_lddfmna' was not declared. Should it be static?
kernel/unaligned_64.c:643:6: warning: symbol 'handle_stdfmna' was not declared. Should it be static?

Functions that are only used in kernel/ - add prototypes in kernel.h
Functions used outside kernel/ - add prototype in asm/setup.h
Removed local prototypes

One of the local prototypes had wrong signature (return void - not int).

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:30 -07:00
Sam Ravnborg
2e74a74f27 sparc: drop use of extern for prototypes in arch/sparc/*
Drop the remaining uses of extern for prototypes in .h files
in the sparc specific part of the kernel tree.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:29 -07:00
Sam Ravnborg
1918660b90 sparc32: fix sparse warning in io-unit.c
Fix following warning:
io-unit.c:56:13: warning: incorrect type in assignment (different address spaces)

The page table for the io unit resides in __iomem.
Fix up all users of the io unit page table.
Introduce sbus helers for all read/write operations.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:26 -07:00
Sam Ravnborg
f977ea49ae sparc32: fix sparse warning in iommu.c
Fix following warning:
iommu.c:69:21: warning: incorrect type in assignment (different address spaces)

iommu_struct.regs is __iomem - fix up all users.
Introduce sbus operations for all read/write operations.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 19:01:26 -07:00
David S. Miller
b18eb2d779 sparc64: Fix huge TSB mapping on pre-UltraSPARC-III cpus.
Access to the TSB hash tables during TLB misses requires that there be
an atomic 128-bit quad load available so that we fetch a matching TAG
and DATA field at the same time.

On cpus prior to UltraSPARC-III only virtual address based quad loads
are available.  UltraSPARC-III and later provide physical address
based variants which are easier to use.

When we only have virtual address based quad loads available this
means that we have to lock the TSB into the TLB at a fixed virtual
address on each cpu when it runs that process.  We can't just access
the PAGE_OFFSET based aliased mapping of these TSBs because we cannot
take a recursive TLB miss inside of the TLB miss handler without
risking running out of hardware trap levels (some trap combinations
can be deep, such as those generated by register window spill and fill
traps).

Without huge pages it's working perfectly fine, but when the huge TSB
got added another chunk of fixed virtual address space was not
allocated for this second TSB mapping.

So we were mapping both the 8K and 4MB TSBs to the same exact virtual
address, causing multiple TLB matches which gives undefined behavior.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-08 14:59:07 -07:00
David S. Miller
e5c460f46a sparc64: Don't bark so loudly about 32-bit tasks generating 64-bit fault addresses.
This was found using Dave Jone's trinity tool.

When a user process which is 32-bit performs a load or a store, the
cpu chops off the top 32-bits of the effective address before
translating it.

This is because we run 32-bit tasks with the PSTATE_AM (address
masking) bit set.

We can't run the kernel with that bit set, so when the kernel accesses
userspace no address masking occurs.

Since a 32-bit process will have no mappings in that region we will
properly fault, so we don't try to handle this using access_ok(),
which can safely just be a NOP on sparc64.

Real faults from 32-bit processes should never generate such addresses
so a bug check was added long ago, and it barks in the logs if this
happens.

But it also barks when a kernel user access causes this condition, and
that _can_ happen.  For example, if a pointer passed into a system call
is "0xfffffffc" and the kernel access 4 bytes offset from that pointer.

Just handle such faults normally via the exception entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-06 21:27:37 -07:00
David S. Miller
0eef331a3d sparc64: Use 'ILOG2_4MB' instead of constant '22'.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:52:50 -07:00
David S. Miller
70ffc6ebae sparc64: Fix top-level fault handling bugs.
Make get_user_insn() able to cope with huge PMDs.

Next, make do_fault_siginfo() more robust when get_user_insn() can't
actually fetch the instruction.  In particular, use the MMU announced
fault address when that happens, instead of calling
compute_effective_address() and computing garbage.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:41:19 -07:00
David S. Miller
04df419de3 sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
The large PMD path needs to check _PAGE_VALID not _PAGE_PRESENT, to
decide if it needs to bail and return 0.

pmd_large() should therefore just check _PAGE_PMD_HUGE.

Calls to gup_huge_pmd() are guarded with a check of pmd_large(), so we
just need to add a valid bit check.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:32:37 -07:00
David S. Miller
51e5ef1bb7 sparc64: Fix huge PMD invalidation.
On sparc64 "present" and "valid" are seperate PTE bits, this allows us to
naturally distinguish between the user explicitly asking for PROT_NONE
with mprotect() and other situations.

However we weren't handling this properly in the huge PMD paths.

First of all, the page table walker in the TSB miss path only checks
for _PAGE_PMD_HUGE.  So the generic pmdp_invalidate() would clear
_PAGE_PRESENT but the TLB miss paths would still load it into the TLB
as a valid huge PMD.

Fix this by clearing the valid bit in pmdp_invalidate(), and also
checking the valid bit in USER_PGTABLE_CHECK_PMD_HUGE using "brgez"
since _PAGE_VALID is bit 63 in both the sun4u and sun4v pte layouts.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:31:52 -07:00
David S. Miller
5b1e94fa43 sparc64: Fix executable bit testing in set_pmd_at() paths.
This code was mistakenly using the exec bit from the PMD in all
cases, even when the PMD isn't a huge PMD.

If it's not a huge PMD, test the exec bit in the individual ptes down
in tlb_batch_pmd_scan().

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-03 22:30:36 -07:00
Sam Ravnborg
9edfae3f69 sparc32: fix sparse warnings in unaligned_32.c
Fix following warnings:
unaligned_32.c:146:15: warning: symbol 'safe_compute_effective_address' was not declared. Should it be static?
unaligned_32.c:235:17: warning: symbol 'kernel_unaligned_trap' was not declared. Should it be static?
unaligned_32.c:319:17: warning: symbol 'user_unaligned_trap' was not declared. Should it be static?

Add proper declarations in kernel.h + setup.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:26 -04:00
Sam Ravnborg
8885ec7ca9 sparc32: fix sparse warning in devices.c
Fix following warning:
devices.c:114:13: warning: symbol 'device_scan' was not declared. Should it be static?

Add prototype to asm/setup.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:26 -04:00
Sam Ravnborg
d191723fee sparc32: fix sparse warnings in setup_32.c
Fix following warnings:
setup_32.c:106:15: warning: symbol 'cmdline_memory_size' was not declared. Should it be static?
setup_32.c:270:16: warning: symbol 'fake_swapper_regs' was not declared. Should it be static?
setup_32.c:368:55: warning: Using plain integer as NULL pointer

Add missing declaration of cmdline_memory_size and remove the local one in init_32.c
fake_swapper_regs was only used locally - so defined static.
When replacing 0 with NULL also add a few spaces around operators

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
a2b0aa9463 sparc32: fix sparse "Should it be static?" in mm/
Fix following warnings:
srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static?
iommu.c:430:13: warning: symbol 'ld_mmu_iommu' was not declared. Should it be static?
leon_mm.c:21:5: warning: symbol 'srmmu_swprobe_trace' was not declared. Should it be static?

Add proper prototypes or define static to fix them.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
e8c29c839b sparc32: fix sparse warnings in srmmu.c
Fix following warnings:
srmmu.c:78:5: warning: symbol 'flush_page_for_dma_global' was not declared. Should it be static?
srmmu.c:85:5: warning: symbol 'viking_mxcc_present' was not declared. Should it be static?
srmmu.c:103:6: warning: symbol 'srmmu_nocache_bitmap' was not declared. Should it be static?
srmmu.c:176:24: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:731:46: warning: Using plain integer as NULL pointer
srmmu.c:870:13: warning: symbol 'srmmu_paging_init' was not declared. Should it be static?

Add proper prototypes in mm_32.h and drop local prototype in init_32.c
Replace 0 with NULL

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:25 -04:00
Sam Ravnborg
4c9660f796 sparc32: fix sparse warning in init_32.c
Fix following warning:
init_32.c:112:22: warning: symbol 'bootmem_init' was not declared. Should it be static?

Fix by adding a proper prototype in pgtable_32.h and drop
the local prototype in srmmu.c

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Sam Ravnborg
e1b2f13488 sparc32: fix sparse warning in fault_32.c
Fix following warning:
fault_32.c:38:24: error: symbol 'unhandled_fault' redeclared with different type - different modifiers

When this warning was fixed several new warnings popped up - fix them too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Sam Ravnborg
ddb7417ea9 sparc32: rename mm/srmmu.h to mm/mm_32.h
This file will be used for more than just srmmu stuff, so the old name was misleading.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-29 01:12:24 -04:00
Doug Wilson
151b628f10 sparc64:tsb.c:use array size macro rather than number
This is a small patch which uses ARRAY_SIZE macro
    rather than a number to make code readability better.

Signed-off-by: Doug Wilson <doug.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17 15:55:49 -04:00
Paul Gortmaker
a56b072fa3 sparc32: make copy_to/from_user_page() usable from modular code
While copy_to/from_user_page() users are uncommon, there is one in
drivers/staging/lustre/lustre/libcfs/linux/linux-curproc.c which leads
to the following:

ERROR: "sparc32_cachetlb_ops" [drivers/staging/lustre/lustre/libcfs/libcfs.ko] undefined!

during routine allmodconfig build coverage.  The reason this happens
is as follows:

In arch/sparc/include/asm/cacheflush_32.h we have:

 #define flush_cache_page(vma,addr,pfn) \
        sparc32_cachetlb_ops->cache_page(vma, addr)

 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
        do {                                                    \
                flush_cache_page(vma, vaddr, page_to_pfn(page));\
                memcpy(dst, src, len);                          \
        } while (0)
 #define copy_from_user_page(vma, page, vaddr, dst, src, len) \
        do {                                                    \
                flush_cache_page(vma, vaddr, page_to_pfn(page));\
                memcpy(dst, src, len);                          \
        } while (0)

However, sparc32_cachetlb_ops isn't exported and hence the error.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 19:49:48 -05:00
Paul Gortmaker
8b2abcbc5e sparc: delete non-required instances of include <linux/init.h>
None of these files are actually using any __init type directives
and hence don't need to include <linux/init.h>.  Most are just a
left over from __devinit and __cpuinit removal, or simply due to
code getting copied from one driver to the next.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-28 23:38:23 -08:00
Tang Chen
e7e8de5918 memblock: make memblock_set_node() support different memblock_type
[sfr@canb.auug.org.au: fix powerpc build]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Chen Tang <imtangchen@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Liu Jiang <jiang.liu@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-21 16:19:44 -08:00
Stephen Rothwell
6a328f3fe0 sparc64: merge fix
After merging the final tree, today's linux-next build (sparc64 defconfig)
failed like this:

arch/sparc/mm/init_64.c: In function 'pte_alloc_one':
arch/sparc/mm/init_64.c:2568:9: error: unused variable 'pte' [-Werror=unused-variable]

Caused by the merge between commit 37b3a8ff3e ("sparc64: Move from 4MB
to 8MB huge pages") and commit 1ae9ae5f7d ("sparc: handle
pgtable_page_ctor() fail") (I had the following merge fix in linux-next,
but it didn't seem to propagate upstream - may have forgotten to point it
out :-().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-18 15:08:52 -08:00
Linus Torvalds
1b2722752f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc update from David Miller:

 1) Implement support for up to 47-bit physical addresses on sparc64.

 2) Support HAVE_CONTEXT_TRACKING on sparc64, from Kirill Tkhai.

 3) Fix Simba bridge window calculations, from Kjetil Oftedal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
  sparc64: Implement HAVE_CONTEXT_TRACKING
  sparc64: Add self-IPI support for smp_send_reschedule()
  sparc: PCI: Fix incorrect address calculation of PCI Bridge windows on Simba-bridges
  sparc64: Encode huge PMDs using PTE encoding.
  sparc64: Move to 64-bit PGDs and PMDs.
  sparc64: Move from 4MB to 8MB huge pages.
  sparc64: Make PAGE_OFFSET variable.
  sparc64: Fix inconsistent max-physical-address defines.
  sparc64: Document the shift counts used to validate linear kernel addresses.
  sparc64: Define PAGE_OFFSET in terms of physical address bits.
  sparc64: Use PAGE_OFFSET instead of a magic constant.
  sparc64: Clean up 64-bit mmap exclusion defines.
2013-11-15 14:16:30 +09:00
Kirill A. Shutemov
1ae9ae5f7d sparc: handle pgtable_page_ctor() fail
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:19 +09:00
Kirill A. Shutemov
c389a250ab mm, thp: do not access mm->pmd_huge_pte directly
Currently mm->pmd_huge_pte protected by page table lock.  It will not
work with split lock.  We have to have per-pmd pmd_huge_pte for proper
access serialization.

For now, let's just introduce wrapper to access mm->pmd_huge_pte.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Alex Thorlton <athorlton@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:14 +09:00
Kirill Tkhai
812cb83a56 sparc64: Implement HAVE_CONTEXT_TRACKING
Mark the places when the system are in user or are in kernel.
This is used to make full dynticks system (tickless) --
CONFIG_NO_HZ_FULL dependence.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 14:57:21 -08:00
David S. Miller
a7b9403f0e sparc64: Encode huge PMDs using PTE encoding.
Now that we have 64-bits for PMDs we can stop using special encodings
for the huge PMD values, and just put real PTEs in there.

We allocate a _PAGE_PMD_HUGE bit to distinguish between plain PMDs and
huge ones.  It is the same for both 4U and 4V PTE layouts.

We also use _PAGE_SPECIAL to indicate the splitting state, since a
huge PMD cannot also be special.

All of the PMD --> PTE translation code disappears, and most of the
huge PMD bit modifications and tests just degenerate into the PTE
operations.  In particular USER_PGTABLE_CHECK_PMD_HUGE becomes
trivial.

As a side effect, normal PMDs don't shift the physical address around.
This also speeds up the page table walks in the TLB miss paths since
they don't have to do the shifts any more.

Another non-trivial aspect is that pte_modify() has to be changed
to preserve the _PAGE_PMD_HUGE bits as well as the page size field
of the pte.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-13 12:33:08 -08:00
David S. Miller
2b77933c28 sparc64: Move to 64-bit PGDs and PMDs.
To make the page tables compact, we were using 32-bit PGDs and PMDs.
We only had to support <= 43 bits of physical addresses so this was
quite feasible.

In order to support larger physical addresses we have to move to
64-bit PGDs and PMDs.

Most of the changes are straight-forward:

1) {pgd,pmd}_t --> unsigned long

2) Anything that tries to use plain "unsigned int" types with pgd/pmd
   values needs to be adjusted.  In particular things like "0U" become
   "0UL".

3) {PGDIR,PMD}_BITS decrease by one.

4) In the assembler page table walkers, use "ldxa" instead of "lduwa"
   and adjust the low bit masks to clear out the low 3 bits instead of
   just the low 2 bits during pgd/pmd address formation.

Also, use PTRS_PER_PGD and PTRS_PER_PMD in the sizing of the
swapper_{pg_dir,low_pmd_dir} arrays.

This patch does not try to take advantage of having 64-bits in the
PMDs to simplify the hugepage code, that will come in a subsequent
change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12 15:22:35 -08:00
David S. Miller
37b3a8ff3e sparc64: Move from 4MB to 8MB huge pages.
The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout.  It'd be nice to support at least
43-bits.

The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.

So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.

Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.

Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.

This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t.  Now we have less
spinlocks taken in the page table allocation path.

The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.

For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do.  It all just works
out.

So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.

With bug fixes and help from Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-12 15:22:34 -08:00
David S. Miller
b2d4383480 sparc64: Make PAGE_OFFSET variable.
Choose PAGE_OFFSET dynamically based upon cpu type.

Original UltraSPARC-I (spitfire) chips only supported a 44-bit
virtual address space.

Newer chips (T4 and later) support 52-bit virtual addresses
and up to 47-bits of physical memory space.

Therefore we have to adjust PAGE_SIZE dynamically based upon
the capabilities of the chip.

Note that this change alone does not allow us to support > 43-bit
physical memory, to do that we need to re-arrange our page table
support.  The current encodings of the pmd_t and pgd_t pointers
restricts us to "32 + 11" == 43 bits.

This change can waste quite a bit of memory for the various tables.
In particular, a future change should work to size and allocate
kern_linear_bitmap[] and sparc64_valid_addr_bitmap[] dynamically.
This isn't easy as we really cannot take a TLB miss when accessing
kern_linear_bitmap[].  We'd have to lock it into the TLB or similar.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:34 -08:00
David S. Miller
bb7b435388 sparc64: Document the shift counts used to validate linear kernel addresses.
This way we can see exactly what they are derived from, and in particular
how they would change if we were to use a different PAGE_OFFSET value.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:34 -08:00
David S. Miller
922631b988 sparc64: Use PAGE_OFFSET instead of a magic constant.
This pertains to all of the computations of the kernel fast
TLB miss xor values.

Based upon a patch by Bob Picco.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:33 -08:00
David S. Miller
c920745e69 sparc64: Clean up 64-bit mmap exclusion defines.
Older UltraSPARC chips had an address space hole due to the MMU only
supporting 44-bit virtual addresses.

The top end of this hole also has the same value as the current
definition of PAGE_OFFSET, so this can be confusing.

Consolidate the defines for the userspace mmap exclusion range into
page_64.h and use them in sys_sparc_64.c and hugetlbpage.c

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2013-11-12 15:22:33 -08:00
Johannes Weiner
759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Naoya Horiguchi
83467efbdb mm: migrate: check movability of hugepage in unmap_and_move_huge_page()
Currently hugepage migration works well only for pmd-based hugepages
(mainly due to lack of testing,) so we had better not enable migration of
other levels of hugepages until we are ready for it.

Some users of hugepage migration (mbind, move_pages, and migrate_pages) do
page table walk and check pud/pmd_huge() there, so they are safe.  But the
other users (softoffline and memory hotremove) don't do this, so without
this patch they can try to migrate unexpected types of hugepages.

To prevent this, we introduce hugepage_migration_support() as an
architecture dependent check of whether hugepage are implemented on a pmd
basis or not.  And on some architecture multiple sizes of hugepages are
available, so hugepage_migration_support() also checks hugepage size.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:57:49 -07:00
Paul Gortmaker
2066aadd53 sparc: delete __cpuinit/__CPUINIT usage from all users
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/sparc uses of the __cpuinit macros from
C files and removes __CPUINIT from assembly files.  Note that even
though arch/sparc/kernel/trampoline_64.S has instances of ".previous"
in it, they are all paired off against explicit ".section" directives,
and not implicitly paired with __CPUINIT (unlike mips and arm were).

[1] https://lkml.org/lkml/2013/5/20/589

Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:52 -04:00
Olivier DANET
961246b4ed [PATCH] sparc32: vm_area_struct access for old Sun SPARCs.
Commit e4c6bfd2d7 ("mm: rearrange
vm_area_struct for fewer cache misses") changed the layout of the
vm_area_struct structure, it broke several SPARC32 assembly routines
which used numerical constants for accessing the vm_mm field.

This patch defines the VMA_VM_MM constant to replace the immediate values.

Signed-off-by: Olivier DANET <odanet@caramail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-10 13:56:10 -07:00
Linus Torvalds
65b97fb730 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Ben Herrenschmidt:
 "This is the powerpc changes for the 3.11 merge window.  In addition to
  the usual bug fixes and small updates, the main highlights are:

   - Support for transparent huge pages by Aneesh Kumar for 64-bit
     server processors.  This allows the use of 16M pages as transparent
     huge pages on kernels compiled with a 64K base page size.

   - Base VFIO support for KVM on power by Alexey Kardashevskiy

   - Wiring up of our nvram to the pstore infrastructure, including
     putting compressed oopses in there by Aruna Balakrishnaiah

   - Move, rework and improve our "EEH" (basically PCI error handling
     and recovery) infrastructure.  It is no longer specific to pseries
     but is now usable by the new "powernv" platform as well (no
     hypervisor) by Gavin Shan.

   - I fixed some bugs in our math-emu instruction decoding and made it
     usable to emulate some optional FP instructions on processors with
     hard FP that lack them (such as fsqrt on Freescale embedded
     processors).

   - Support for Power8 "Event Based Branch" facility by Michael
     Ellerman.  This facility allows what is basically "userspace
     interrupts" for performance monitor events.

   - A bunch of Transactional Memory vs.  Signals bug fixes and HW
     breakpoint/watchpoint fixes by Michael Neuling.

  And more ...  I appologize in advance if I've failed to highlight
  something that somebody deemed worth it."

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
  pstore: Add hsize argument in write_buf call of pstore_ftrace_call
  powerpc/fsl: add MPIC timer wakeup support
  powerpc/mpic: create mpic subsystem object
  powerpc/mpic: add global timer support
  powerpc/mpic: add irq_set_wake support
  powerpc/85xx: enable coreint for all the 64bit boards
  powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
  powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
  powerpc/mpic: Add get_version API both for internal and external use
  powerpc: Handle both new style and old style reserve maps
  powerpc/hw_brk: Fix off by one error when validating DAWR region end
  powerpc/pseries: Support compression of oops text via pstore
  powerpc/pseries: Re-organise the oops compression code
  pstore: Pass header size in the pstore write callback
  powerpc/powernv: Fix iommu initialization again
  powerpc/pseries: Inform the hypervisor we are using EBB regs
  powerpc/perf: Add power8 EBB support
  powerpc/perf: Core EBB support for 64-bit book3s
  powerpc/perf: Drop MMCRA from thread_struct
  powerpc/perf: Don't enable if we have zero events
  ...
2013-07-04 10:29:23 -07:00
Jiang Liu
dceccbe920 mm/SPARC: prepare for removing num_physpages and simplify mem_init()
Prepare for removing num_physpages and simplify mem_init().

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:37 -07:00
Jiang Liu
0c98853473 mm: concentrate modification of totalram_pages into the mm core
Concentrate code to modify totalram_pages into the mm core, so the arch
memory initialized code doesn't need to take care of it.  With these
changes applied, only following functions from mm core modify global
variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
free_all_bootmem_node(), adjust_managed_page_count().

With this patch applied, it will be much more easier for us to keep
totalram_pages and zone->managed_pages in consistence.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:33 -07:00
Jiang Liu
11199692d8 mm: change signature of free_reserved_area() to fix building warnings
Change signature of free_reserved_area() according to Russell King's
suggestion to fix following build warnings:

  arch/arm/mm/init.c: In function 'mem_init':
  arch/arm/mm/init.c:603:2: warning: passing argument 1 of 'free_reserved_area' makes integer from pointer without a cast [enabled by default]
    free_reserved_area(__va(PHYS_PFN_OFFSET), swapper_pg_dir, 0, NULL);
    ^
  In file included from include/linux/mman.h:4:0,
                   from arch/arm/mm/init.c:15:
  include/linux/mm.h:1301:22: note: expected 'long unsigned int' but argument is of type 'void *'
   extern unsigned long free_reserved_area(unsigned long start, unsigned long end,

   mm/page_alloc.c: In function 'free_reserved_area':
>> mm/page_alloc.c:5134:3: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default]
   In file included from arch/mips/include/asm/page.h:49:0,
                    from include/linux/mmzone.h:20,
                    from include/linux/gfp.h:4,
                    from include/linux/mm.h:8,
                    from mm/page_alloc.c:18:
   arch/mips/include/asm/io.h:119:29: note: expected 'const volatile void *' but argument is of type 'long unsigned int'
   mm/page_alloc.c: In function 'free_area_init_nodes':
   mm/page_alloc.c:5030:34: warning: array subscript is below array bounds [-Warray-bounds]

Also address some minor code review comments.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <sworddragon2@aol.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Michel Lespinasse <walken@google.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03 16:07:32 -07:00
Benjamin Herrenschmidt
24a72acac1 Linux 3.10
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJR0K2gAAoJEHm+PkMAQRiGWsEH+gMZSN1qRm34hZ82q1Tx7HvL
 Eb/Gsl3Qw/7G2TlTqgjBUs36IdqV9O2cui/aa3/TfXvdvrx+0GlhRkEwQPc+ygcO
 Mvoyoke4tT4+4jVFdCg1J8avREsa28/6oaHs0ZZxuVmJBBLTJH7aXaNsGn6eU1q9
 9+p798MQis6naIiPC63somlZcCIiBhsuWCPWpEfLMn8G1HWAFTM3xXIbNBqe/brS
 bmIOfhomlIZ5dcdaXGvjtP3+KJhkNDwhkPC4tVYu8JqqgSlrE+a+EGyEuuGqKk10
 U+swiqyuD31uBI9ga54u/2FzSqDiAu6YOcMXevjo/m3g9XLdYbYLvN+nvN8alCQ=
 =Ob6Z
 -----END PGP SIGNATURE-----

Merge tag 'v3.10' into next

Merge 3.10 in order to get some of the last minute powerpc
changes, resolve conflicts and add additional fixes on top
of them.
2013-07-01 17:57:25 +10:00
Aneesh Kumar K.V
6b0b50b061 mm/THP: add pmd args to pgtable deposit and withdraw APIs
This will be later used by powerpc THP support.  In powerpc we want to use
pgtable for storing the hash index values.  So instead of adding them to
mm_context list, we would like to store them in the second half of pmd

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 16:55:07 +10:00
Dave Kleikamp
23a01138ef sparc: tsb must be flushed before tlb
This fixes a race where a cpu may re-load a tlb from a stale tsb right
after it has been flushed by a remote function call.

I still see some instability when stressing the system with parallel
kernel builds while creating memory pressure by writing to
/proc/sys/vm/nr_hugepages, but this patch improves the stability
significantly.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 02:10:30 -07:00
bob picco
771a37ff4d sparc64 address-congruence property
The Machine Description (MD) property "address-congruence-offset" is
optional. According to the MD specification the value is assumed 0UL when
not present. This caused early boot failure on T5.

Signed-off-by: Bob Picco <bob.picco@oracle.com>
CC: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-06-19 02:10:30 -07:00
Jiang Liu
70affe4520 mm/SPARC: use common help functions to free reserved pages
Use common help functions to free reserved pages.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 18:38:26 -07:00
David S. Miller
048c9acca9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Merge sparc bug fixes that didn't make it into v3.9 into
sparc-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-05-04 18:34:13 -07:00
Johannes Weiner
0aad818b2d sparse-vmemmap: specify vmemmap population range in bytes
The sparse code, when asking the architecture to populate the vmemmap,
specifies the section range as a starting page and a number of pages.

This is an awkward interface, because none of the arch-specific code
actually thinks of the range in terms of 'struct page' units and always
translates it to bytes first.

In addition, later patches mix huge page and regular page backing for
the vmemmap.  For this, they need to call vmemmap_populate_basepages()
on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
are not necessarily multiples of the 'struct page' size and so this unit
is too coarse.

Just translate the section range into bytes once in the generic sparse
code, then pass byte ranges down the stack.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: David S. Miller <davem@davemloft.net>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:35 -07:00
Jiang Liu
32e1a10927 mm/SPARC: use free_highmem_page() to free highmem pages into buddy system
Use helper function free_highmem_page() to free highmem pages into
the buddy system.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 15:54:32 -07:00
David S. Miller
f0af97070a sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching.
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-24 16:52:18 -07:00
David S. Miller
f36391d279 sparc64: Fix race in TLB batch processing.
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls.  If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
   implementations.

2) Do TLB batch cross calls instead via:

	smp_call_function_many()
		tlb_pending_func()
			__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

	a) Add 'active' member to struct tlb_batch
	b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
	c) Set 'active' in arch_enter_lazy_mmu_mode()
	d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
	e) Check 'active' in tlb_batch_add_one() and do a synchronous
           flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

	a) Implement __flush_tlb_page and per-cpu variants, patch
	   as needed.
	b) Likewise for xcall_flush_tlb_page.
	c) Implement smp_flush_tlb_page() to invoke the cross-call.
	d) Wire up global_flush_tlb_page() to the right routine based
           upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
   3 batch flushes have only a single entry in them.

   The batch flush waiting is very expensive, both because of the poll
   on sibling cpu completeion, as well as because passing the tlb batch
   pointer to the sibling cpus invokes a shared memory dereference.

   Therefore, in flush_tlb_pending(), if there is only one entry in
   the batch perform a completely asynchronous global_flush_tlb_page()
   instead.

Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2013-04-19 17:26:26 -04:00
Kirill Tkhai
07df841877 sparc64: Do not save/restore interrupts in get_new_mmu_context()
get_new_mmu_context() is always called with interrupts disabled.
So it's possible to do this micro optimization.

(Also fix the comment to switch_mm, which is called in both cases)

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-08 22:50:47 -04:00
Akinobu Mita
9a0ac1b6af sparc/iommu: fix typo s/265KB/256KB/
IOMMU_NPTES is 64K PTEs, so the size is 256KB (= 64K * sizeof(iopte_t))

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31 19:29:12 -04:00
Akinobu Mita
54df2db36c sparc/srmmu: clear trailing edge of bitmap properly
srmmu_nocache_bitmap is cleared by bit_map_init().  But bit_map_init()
attempts to clear by memset(), so it can't clear the trailing edge of
bitmap properly on big-endian architecture if the number of bits is not
a multiple of BITS_PER_LONG.

Actually, the number of bits in srmmu_nocache_bitmap is not always
a multiple of BITS_PER_LONG.  It is calculated as below:

        bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT;

srmmu_nocache_size is decided proportionally by the amount of system RAM
and it is rounded to a multiple of PAGE_SIZE.  SRMMU_NOCACHE_BITMAP_SHIFT
is defined as (PAGE_SHIFT - 4).  So it can only be said that bitmap_bits
is a multiple of 16.

This fixes the problem by using bitmap_clear() instead of memset()
in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct
size at bitmap allocation time.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-31 19:29:12 -04:00
Tkhai Kirill
ce835e513a sparc64: Do not change num_physpages during initmem freeing
Common hibernation code looks at num_physpages during suspend and restore.
Restore is able to be called from initcall, which is before initmem freeing.
This case leads to restore fail.

Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
CC: David Miller <davem@davemloft.net>
CC: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 11:06:54 -07:00
Shaohua Li
ec8acf20af swap: add per-partition lock for swapfile
swap_lock is heavily contended when I test swap to 3 fast SSD (even
slightly slower than swap to 2 such SSD).  The main contention comes
from swap_info_get().  This patch tries to fix the gap with adding a new
per-partition lock.

Global data like nr_swapfiles, total_swap_pages, least_priority and
swap_list are still protected by swap_lock.

nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
theory, it's possible get_swap_page() finds no swap pages but actually
there are free swap pages.  But sounds not a big problem.

Accessing partition specific data (like scan_swap_map and so on) is only
protected by swap_info_struct.lock.

Changing swap_info_struct.flags need hold swap_lock and
swap_info_struct.lock, because scan_scan_map() will check it.  read the
flags is ok with either the locks hold.

If both swap_lock and swap_info_struct.lock must be hold, we always hold
the former first to avoid deadlock.

swap_entry_free() can change swap_list.  To delete that code, we add a
new highest_priority_index.  Whenever get_swap_page() is called, we
check it.  If it's valid, we use it.

It's a pity get_swap_page() still holds swap_lock().  But in practice,
swap_lock() isn't heavily contended in my test with this patch (or I can
say there are other much more heavier bottlenecks like TLB flush).  And
BTW, looks get_swap_page() doesn't really need the lock.  We never free
swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
lock is we could swapout to some low priority swap, but we can quickly
recover after several rounds of swap, so sounds not a big deal to me.
But I'd prefer to fix this if it's a real problem.

"swap: make each swap partition have one address_space" improved the
swapout speed from 1.7G/s to 2G/s.  This patch further improves the
speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
so TLB flush isn't the biggest bottleneck before the patches.

[arnd@arndb.de: fix it for nommu]
[hughd@google.com: add missing unlock]
[minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:17 -08:00
Tang Chen
0197518cd3 memory-hotplug: remove memmap of sparse-vmemmap
Introduce a new API vmemmap_free() to free and remove vmemmap
pagetables.  Since pagetable implements are different, each architecture
has to provide its own version of vmemmap_free(), just like
vmemmap_populate().

Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc.

[mhocko@suse.cz: fix implicit declaration of remove_pagetable]
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Wu Jianguo <wujianguo@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu
46723bfa54 memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap
For removing memmap region of sparse-vmemmap which is allocated bootmem,
memmap region of sparse-vmemmap needs to be registered by
get_page_bootmem().  So the patch searches pages of virtual mapping and
registers the pages by get_page_bootmem().

NOTE: register_page_bootmem_memmap() is not implemented for ia64,
      ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
      and revert register_page_bootmem_info_node() when platform doesn't
      support it.

      It's implemented by adding a new Kconfig option named
      CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
      by memory-hotplug feature fully supported archs(currently only on
      x86_64).

      Since we have 2 config options called MEMORY_HOTPLUG and
      MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
      and codes in function register_page_bootmem_info_node() are only
      used for collecting infomation for hot-remove, so reside it under
      MEMORY_HOTREMOVE.

      Besides page_isolation.c selected by MEMORY_ISOLATION under
      MEMORY_HOTPLUG is also such case, move it too.

[mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
[linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
[mhocko@suse.cz: remove the arch specific functions without any implementation]
[linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
[rientjes@google.com: fix defined but not used warning]
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: Wu Jianguo <wujianguo@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:12 -08:00
Linus Torvalds
2ef14f465b Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Peter Anvin:
 "This is a huge set of several partly interrelated (and concurrently
  developed) changes, which is why the branch history is messier than
  one would like.

  The *really* big items are two humonguous patchsets mostly developed
  by Yinghai Lu at my request, which completely revamps the way we
  create initial page tables.  In particular, rather than estimating how
  much memory we will need for page tables and then build them into that
  memory -- a calculation that has shown to be incredibly fragile -- we
  now build them (on 64 bits) with the aid of a "pseudo-linear mode" --
  a #PF handler which creates temporary page tables on demand.

  This has several advantages:

  1. It makes it much easier to support things that need access to data
     very early (a followon patchset uses this to load microcode way
     early in the kernel startup).

  2. It allows the kernel and all the kernel data objects to be invoked
     from above the 4 GB limit.  This allows kdump to work on very large
     systems.

  3. It greatly reduces the difference between Xen and native (Xen's
     equivalent of the #PF handler are the temporary page tables created
     by the domain builder), eliminating a bunch of fragile hooks.

  The patch series also gets us a bit closer to W^X.

  Additional work in this pull is the 64-bit get_user() work which you
  were also involved with, and a bunch of cleanups/speedups to
  __phys_addr()/__pa()."

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits)
  x86, mm: Move reserving low memory later in initialization
  x86, doc: Clarify the use of asm("%edx") in uaccess.h
  x86, mm: Redesign get_user with a __builtin_choose_expr hack
  x86: Be consistent with data size in getuser.S
  x86, mm: Use a bitfield to mask nuisance get_user() warnings
  x86/kvm: Fix compile warning in kvm_register_steal_time()
  x86-32: Add support for 64bit get_user()
  x86-32, mm: Remove reference to alloc_remap()
  x86-32, mm: Remove reference to resume_map_numa_kva()
  x86-32, mm: Rip out x86_32 NUMA remapping code
  x86/numa: Use __pa_nodebug() instead
  x86: Don't panic if can not alloc buffer for swiotlb
  mm: Add alloc_bootmem_low_pages_nopanic()
  x86, 64bit, mm: hibernate use generic mapping_init
  x86, 64bit, mm: Mark data/bss/brk to nx
  x86: Merge early kernel reserve for 32bit and 64bit
  x86: Add Crash kernel low reservation
  x86, kdump: Remove crashkernel range find limit for 64bit
  memblock: Add memblock_mem_size()
  x86, boot: Not need to check setup_header version for setup_data
  ...
2013-02-21 18:06:55 -08:00
David S. Miller
0fbebed682 sparc64: Fix tsb_grow() in atomic context.
If our first THP installation for an MM is via the set_pmd_at() done
during khugepaged's collapsing we'll end up in tsb_grow() trying to do
a GFP_KERNEL allocation with several locks held.

Simply using GFP_ATOMIC in this situation is not the best option
because we really can't have this fail, so we'd really like to keep
this an order 0 GFP_KERNEL allocation if possible.

Also, doing the TSB allocation from khugepaged is a really bad idea
because we'll allocate it potentially from the wrong NUMA node in that
context.

So what we do is defer the hugepage TSB allocation until the first TLB
miss we take on a hugepage.  This is slightly tricky because we have
to handle two unusual cases:

1) Taking the first hugepage TLB miss in the window trap handler.
   We'll call the winfix_trampoline when that is detected.

2) An initial TSB allocation via TLB miss races with a hugetlb
   fault on another cpu running the same MM.  We handle this by
   unconditionally loading the TSB we see into the current cpu
   even if it's non-NULL at hugetlb_setup time.

Reported-by: Meelis Roos <mroos@ut.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 09:46:08 -08:00
David S. Miller
bcd896bae0 sparc64: Handle hugepage TSB being NULL.
Accomodate the possibility that the TSB might be NULL at
the point that update_mmu_cache() is invoked.  This is
necessary because we will sometimes need to defer the TSB
allocation to the first fault that happens in the 'mm'.

Seperate out the hugepage PTE test into a seperate function
so that the logic is clearer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 09:46:08 -08:00
David S. Miller
a55ee1ff75 sparc64: Fix gfp_flags setting in tsb_grow().
We should "|= more_flags" rather than "= more_flags".

Reported-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-20 09:45:53 -08:00
David S. Miller
89a77915e0 sparc64: Fix get_user_pages_fast() wrt. THP.
Mostly mirrors the s390 logic, as unlike x86 we don't need the
SetPageReferenced() bits.

On sparc64 we also lack a user/privileged bit in the huge PMDs.

In order to make this work for THP and non-THP builds, some header
file adjustments were necessary.  Namely, provide the PMD_HUGE_* bit
defines and the pmd_large() inline unconditionally rather than
protected by TRANSPARENT_HUGEPAGE.

Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-13 12:22:14 -08:00
H. Peter Anvin
de65d816aa Merge remote-tracking branch 'origin/x86/boot' into x86/mm2
Coming patches to x86/mm2 require the changes and advanced baseline in
x86/boot.

Resolved Conflicts:
	arch/x86/kernel/setup.c
	mm/nobootmem.c

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29 15:10:15 -08:00
Greg Kroah-Hartman
7c9503b838 SPARC: drivers: remove __dev* attributes.
CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-03 15:57:04 -08:00
Linus Torvalds
9977d9b379 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild
2012-12-12 12:22:13 -08:00
Michel Lespinasse
2aea28b975 mm: use vm_unmapped_area() in hugetlbfs on sparc64 architecture
Update the sparc64 hugetlb_get_unmapped_area function to make use of
vm_unmapped_area() instead of implementing a brute force search.

Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-11 17:22:26 -08:00
Yinghai Lu
961f8fa04b sparc, mm: Remove calling of free_all_bootmem_node()
Now NO_BOOTMEM version free_all_bootmem_node() does not really
do free_bootmem at all, and it only call
register_page_bootmem_info_node instead.

That is confusing, try to kill that free_all_bootmem_node().

Before that, this patch will remove calling of free_all_bootmem_node()

We add register_page_bootmem_info() to call register_page_bootmem_info_node
directly.

Also could use free_all_bootmem() for numa case, and it is just
the same as free_low_memory_core_early().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-45-git-send-email-yinghai@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: sparclinux@vger.kernel.org
Acked-by: "David S. Miller" <davem@davemloft.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2012-11-17 11:59:48 -08:00
Al Viro
d05f06e60d Merge branch 'arch-frv' into no-rebases 2012-11-16 22:27:58 -05:00
David S. Miller
916ca14aaf sparc64: Add global PMU register dumping via sysrq.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-16 09:34:01 -07:00
Al Viro
dff933da76 sparc64: clear syscall_noerror on the entry to syscall, not on the exit
Move that sucker to just before TI_FPDEPTH and replace stb with sth in
etrap_save().  Take current_ds to its old place, so that we don't push
wsaved into TI_... flags.  That allows to lose clearing syscall_noerror
on return from syscall.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-14 19:26:52 -04:00
David S. Miller
f88620b9c5 sparc64: Fix deficiencies in sun4v error reporting.
Missing error types, attributes, and report fields.  Pad out
to 64-bytes.

Make string reporting cleaner and easier to extend in the future using
"const char *" arrays that index by either bit position, or absolute
field value.

Report the raw 64-byte error report as a sequence of u64s before the
annotated version.

Only report fields which are valid, given the context and the
attribute bits which are set.

For shutdown requests, use the local copy of the error report not the
one we just freed up back to the queue.  Also, use orderly_poweroff()
just like the Domain Services shutdown request code does.

If the real-address reported is "-1" (unknown) try to disassemble the
instruction to report the effective address of the access.  Only do
this in privileged mode.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-10 17:19:32 -07:00
David Miller
9e695d2ecc sparc64: Support transparent huge pages.
This is relatively easy since PMD's now cover exactly 4MB of memory.

Our PMD entries are 32-bits each, so we use a special encoding.  The
lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
because sparc64's page tables are purely software entities so we can use
whatever encoding scheme we want.  We just have to make the TLB miss
assembler page table walkers aware of the layout.

set_pmd_at() works much like set_pte_at() but it has to operate in two
page from a table of non-huge PTEs, so we have to queue up TLB flushes
based upon what mappings are valid in the PTE table.  In the second regime
we are going from huge-page to non-huge-page, and in that case we need
only queue up a single TLB flush to push out the huge page mapping.

We still have 5 bits remaining in the huge PMD encoding so we can very
likely support any new pieces of THP state tracking that might get added
in the future.

With lots of help from Johannes Weiner.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:06 +09:00
David Miller
c460bec78d sparc64: Eliminate PTE table memory wastage.
We've split up the PTE tables so that they take up half a page instead of
a full page.  This is in order to facilitate transparent huge page
support, which works much better if our PMDs cover 4MB instead of 8MB.

What we do is have a one-behind cache for PTE table allocations in the
mm struct.

This logic triggers only on allocations.  For example, we don't try to
keep track of free'd up page table blocks in the style that the s390 port
does.

There were only two slightly annoying aspects to this change:

1) Changing pgtable_t to be a "pte_t *".  There's all of this special
   logic in the TLB free paths that needed adjustments, as did the
   PMD populate interfaces.

2) init_new_context() needs to zap the pointer, since the mm struct
   just gets copied from the parent on fork.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:05 +09:00
David Miller
15b9350a17 sparc64: Only support 4MB huge pages and 8KB base pages.
Narrowing the scope of the page size configurations will make the
transparent hugepage changes much simpler.

In the end what we really want to do is have the kernel support multiple
huge page sizes and use whatever is appropriate as the context dictactes.

Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:04 +09:00
Shaohua Li
45cac65b0f readahead: fault retry breaks mmap file read random detection
.fault now can retry.  The retry can break state machine of .fault.  In
filemap_fault, if page is miss, ra->mmap_miss is increased.  In the second
try, since the page is in page cache now, ra->mmap_miss is decreased.  And
these are done in one fault, so we can't detect random mmap file access.

Add a new flag to indicate .fault is tried once.  In the second try, skip
ra->mmap_miss decreasing.  The filemap_fault state machine is ok with it.

I only tested x86, didn't test other archs, but looks the change for other
archs is obvious, but who knows :)

Signed-off-by: Shaohua Li <shaohua.li@fusionio.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:47 +09:00
Akinobu Mita
5da444aae5 sparc: fix format string argument for prom_printf()
prom_printf() takes printf style arguments.  Specifing GCC's format
attribute reveals that there are several wrong usages of prom_printf().

This fixes those wrong format strings and arguments, and also leaves
format attributes in order to detect similar mistakes at compile time.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: sparclinux@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-02 23:20:34 -04:00
David S. Miller
c69ad0a3f7 sparc64: Use cpu_pgsz_mask for linear kernel mapping config.
This required a little bit of reordering of how we set up the memory
management early on.

We now only know the final values of kern_linear_pte_xor[] after we
take over the trap table and start processing TLB misses ourselves.

So once we fill those values in we re-clear the kernel's 4M TSB and
flush the TLBs.  That way if we find we support larger than 4M pages
we won't have any stale smaller page size entries in the TSB.

SUN4U Panther support for larger page sizes should now be extremely
trivial but I have no hardware on which to test it and I believe
that some of the sun4u TLB miss assembler needs to be audited first
to make sure it really can handle larger than 4M PTEs properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-06 20:35:36 -07:00
David S. Miller
ce33fdc52a sparc64: Probe cpu page size support more portably.
On sun4v, interrogate the machine description.  This code is extremely
defensive in nature, and a lot of the checks can probably be removed.

On sun4u things are a lot simpler.  There are the page sizes all chips
support, and then Panther adds 32MB and 256MB pages.

Report the probed value in /proc/cpuinfo

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-06 19:01:25 -07:00
David S. Miller
4f93d21d25 sparc64: Support 2GB and 16GB page sizes for kernel linear mappings.
SPARC-T4 supports 2GB pages.

So convert kpte_linear_bitmap into an array of 2-bit values which
index into kern_linear_pte_xor.

Now kern_linear_pte_xor is used for 4 page size aligned regions,
4MB, 256MB, 2GB, and 16GB respectively.

Enabling 2GB pages is currently hardcoded using a check against
sun4v_chip_type.  In the future this will be done more cleanly
by interrogating the machine description which is the correct
way to determine this kind of thing.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-06 18:13:58 -07:00
David S. Miller
2856cc2e4d sparc64: Be less verbose during vmemmap population.
On a 2-node machine with 256GB of ram we get 512 lines of
console output, which is just too much.

This mimicks Yinghai Lu's x86 commit c2b91e2eec
(x86_64/mm: check and print vmemmap allocation continuous) except that
we aren't ever going to get contiguous block pointers in between calls
so just print when the virtual address or node changes.

This decreases the output by an order of 16.

Also demote this to KERN_DEBUG.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 00:37:29 -07:00
Sam Ravnborg
a0ce3ba03f sparc32: delete dead code in show_mem()
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:17 -07:00
Sam Ravnborg
9a4d5b93cb sparc32: move kmap_init() to highmem.c
Try to keep highmem support in a more central place.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:17 -07:00
Sam Ravnborg
d8a1b2b94c sparc32: move probe_memory() to srmmu.c
Only one user so move it to the file using it.
It had nothing to do in fault_32.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:17 -07:00
Sam Ravnborg
b585e8551b sparc32: centralize all mmu context handling in srmmu.c
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
59b00c792f sparc32: drop quicklist
The quicklist stuff is not used anymore - drop it.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
cc52aea9dc sparc32: drop sparc model check in paging_init
We already check the model in head_32.S so no need to
repeat the check here

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
c966a337fa sparc32: drop sparc_unmapped_base
The base is always the same so no need to use a variable for this.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
d884297aca sparc32,leon: drop leon_init()
This function was only used to set of_pdt_build_more to leon_node_init().
But the leon_node_init() was a nop as prom_amba_init was never assigned.

Cc: Daniel Hellstrom <daniel@gaisler.com>
Cc: Konrad Eisele <konrad@gaisler.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
1b6d06d820 sparc32: drop fixmap.h
sparc32 does not support fixmaps - so do not pretend so by
having the fixmap.h file.
Move relevant parts to vaddrs.h.

I looked at simplifying this even more but failed to understand
the reasoning behind the extra guard page involved and due to
missing testing possibilities only the trivial conversion was done.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:16 -07:00
Sam Ravnborg
5bbeed12bd sparc32: drop unused kmap_atomic_to_page
No users left of this function - drop it.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
7cdfbc74c8 sparc32: beautify srmmu_inherit_prom_mappings()
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
f71a2aacc6 sparc32: use void * in nocache get/free
This allowed to us to kill a lot of casts,
with no loss of readability in any places

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
605ae96240 sparc32: fix coding-style in srmmu.c
Fix the most annoying issues that distracts me:
- whitespace
- missing space after "if" and "while"
- spaces around operators
and similar simple things.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
4a049b0341 sparc32: sort includes in srmmu.c
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
32442467ed sparc32: define a few srmmu functions __init
They are only used during early init so lets get rid of them
after init to save some RAM.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 16:46:15 -07:00
Sam Ravnborg
805918f80f sparc32: srmmu_probe now knows about leon too
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Hellstrom <daniel@gaisler.com>
Cc: Konrad Eisele <konrad@gaisler.com>
2012-05-27 23:52:51 -07:00
Sam Ravnborg
6729cf7967 sparc32: introduce run-time patching of srmmu access functions
LEON uses a different ASI than SUN for MMUREGS

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Hellstrom <daniel@gaisler.com>
Cc: Konrad Eisele <konrad@gaisler.com>
2012-05-27 23:52:49 -07:00
Sam Ravnborg
3107948848 sparc32,leon: always include leon_smp + leon_mm in build
Fix-up leon specific assembler to use ASI_LEON_MMUREGS

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Daniel Hellstrom <daniel@gaisler.com>
Cc: Konrad Eisele <konrad@gaisler.com>
2012-05-27 23:52:47 -07:00
Sam Ravnborg
29af0ebaa2 sparc32: use the common implementation of alloc_thread_info_node()
With sun4c removed we can fall-back to the common implementation.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-22 12:02:56 -07:00
Linus Torvalds
9daeaa3705 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller:

1) Kill off support for sun4c and Cypress sun4m chips.

   And as a result we were able to also kill off that ugly btfixup thing
   that required multi-stage links of the final vmlinux image in the
   Kbuild system.  This should make the kbuild maintainers really happy.

   Thanks a lot to Sam Ravnborg for his tireless efforts to get this
   going.

2) Convert sparc64 to nobootmem.  I suspect now with sparc32 being a lot
   cleaner, it should be able to fall in line and modernize in this area
   too.

3) Make sparc32 use generic clockevents, from Tkhai Kirill.

[ I fixed up the BPF rules, and tried to clean up the build rules too.
  But I don't have - or want - a sparc cross-build environment, so the
  BPF rule bug and the related build cleanup was all done with just a
  bare "make -n" pseudo-test.      - Linus ]

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (110 commits)
  sparc32: use flushi when run-time patching in per_cpu_patch
  sparc32: fix cpuid_patch run-time patching
  sparc32: drop unused inline functions in srmmu.c
  sparc32: drop unused functions in pgtsrmmu.h
  sparc32,leon: move leon mmu functions to leon_mm.c
  sparc32,leon: remove duplicate definitions in leon.h
  sparc32,leon: remove duplicate UART register definitions
  sparc32,leon: move leon ASI definitions to asi.h
  sparc32: move trap table to a separate file
  sparc64: renamed ttable.S to ttable_64.S
  sparc32: Remove asm/sysen.h header.
  sparc32: Delete asm/smpprim.h
  sparc32: Remove unused empty_bad_page{,_table} declarations.
  sparc32: Kill boot_cpu_id4
  sparc32: Move GET_PROCESSOR*_ID() out of asm/asmmacro.h
  sparc32: Remove completely unused code from asm/cache.h
  sparc32: Add ucmpdi2.o to obj-y instead of lib-y.
  sparc32: add ucmpdi2
  sparc: introduce arch/sparc/Kbuild
  sparc: remove obsolete documentation
  ...
2012-05-21 10:32:01 -07:00
Sam Ravnborg
8578149904 sparc32: drop unused inline functions in srmmu.c
When decelared inline the compiler does not warn
about unused functions.
But they are not used so drop them.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19 23:27:40 -07:00
Sam Ravnborg
3d5f7d37c8 sparc32: drop unused functions in pgtsrmmu.h
One function was only used by leon - move it to a leon specific file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19 23:27:39 -07:00
Sam Ravnborg
accf032cfa sparc32,leon: move leon mmu functions to leon_mm.c
We already have a leaon specific file - so
keep all the laon stuff in one place.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Konrad Eisele <konrad@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19 23:27:38 -07:00
Sam Ravnborg
70168dfa1c sparc32: cleanup mm/fault_32.c
- remove unused variables
- fix coding style issues that hurts my eyes

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 10:43:48 -07:00
David S. Miller
c7020eb466 sparc32: Remove cypress cpu support.
It's the one aberration in v8, the only cpu that
didn't actually have hardware multiply and divide
instructions.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
2012-05-15 10:22:00 -07:00
Sam Ravnborg
50544bce4c sparc32: remove runtime btfix support
- remove all uses of btfixup header
- remove the btfixup header
- remove the btfixup code

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-14 14:05:09 -07:00
David S. Miller
5d83d66635 sparc32: Move cache and TLB flushes over to method ops.
This eliminated most of the remaining users of btfixup.

There are some complications because of the special cases we
have for sun4d, leon, and some flavors of viking.

It was found that there are no cases where a flush_page_for_dma
method was not hooked up to something, so the "noflush" iommu
methods were removed.

Add some documentation to the viking_sun4d_smp_ops to describe exactly
the hardware bug which causes us to need special TLB flushing on
sun4d.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 20:49:31 -07:00
David S. Miller
b25e74b1be sparc32: Remove unused declarations in srmmu.c
Uses of these went away with the sun4c removal.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 15:27:09 -07:00
David S. Miller
d894d964ff sparc32: Convert mmu_* interfaces from btfixup to method ops.
This set of changes displays one major danger of btfixup, interface
signatures are not always type checked fully.  As seen here the iounit
variant of the map_dma_area routine had an incorrect type for one of
it's arguments.

It turns out to be harmless in this case, but just imagine trying to
debug something involving this kind of problem.  No thanks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 13:57:05 -07:00
David S. Miller
679bea5e43 sparc: Kill mmu_{un,}lockarea().
These were used on sun4c during floppy data transfers since on that
chip we had to lock the cpu mappings into the TLB because we cannot
take a TLB miss during the assembler floppy interrupt handler that
does the data transfer.

That is no longer necessary since we've removed sun4c support, thus
this stuff can disappear completely.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 13:23:16 -07:00
David S. Miller
f613914efc sparc32: Un-btfixup update_mmu_cache().
The magic Swift SRMMU code in question has not been enabled for
something on the order of a decade, and it as well as it's comment
is there in the history in case we ever need it again.

Therefore all implementations are NOPs and we can kill this stuff
off.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 13:16:39 -07:00
David S. Miller
73c1377da9 sparc32: Kill btfixup for xchg()'s 'swap' instruction.
We always have this instruction available, so no need to use
btfixup for it any more.

This also eradicates the whole of atomic_32.S and thus the
__atomic_begin and __atomic_end symbols completely.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 13:07:16 -07:00
Sam Ravnborg
fb6f66f405 sparc32: drop btfixup in page_32.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 12:51:55 -07:00
Sam Ravnborg
b796c6da51 sparc32: drop btfixup in mmu_context_32.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 12:51:53 -07:00
Sam Ravnborg
9701b264d3 sparc32: drop btfixup in pgtable_32.h
Only one function left using btfixup.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 12:51:53 -07:00
Sam Ravnborg
642ea3ed9c sparc32: drop btfixup in pgalloc_32.h
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-13 12:51:52 -07:00
David S. Miller
301d5bbb52 sparc32: Un-btfixup more PTE constants and PTE ops.
pte_{filei,wrprotecti,mkcleani,mkoldi}
pte_{mkwrite,mkdirty,mkyoung}

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:54:58 -07:00
David S. Miller
f755f77a3a sparc32: Un-btfixup pte_{write,dirty,young}i
And we can certainly get rid of the const function attributes, there
is no way that's needed any longer and no other arch uses this kind
of annotation here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:48:10 -07:00
David S. Miller
62875cff73 sparc32: Un-btfixup set_pte, pte_present, mk_pte{_phys,_io}().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:39:23 -07:00
Sam Ravnborg
a3c5c6637b sparc32: drop loadmmu
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:30:49 -07:00
David S. Miller
f167edaee0 sparc32: Un-btfixup pmd_{bad,present}().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:30:28 -07:00
David S. Miller
7d9fa4aa3d sparc32: Un-btfixup pgd_{none,bad,present}.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 13:13:16 -07:00
David S. Miller
6439d1c693 sparc32: Un-btfixup PAGE_{NONE,COPY,READONLY,SHARED,KERNEL}.
That lets us also get rid of the run-time initialization of
protection_map[] and all the ugly module workarounds for
PAGE_KERNEL and PAGE_SHARED to deal with the fact that we
can't do btfixups for modular code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 12:52:47 -07:00
David S. Miller
3d8273675d sparc32: Un-btfixup pmd_page and pte_pfn.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 12:33:08 -07:00
David S. Miller
a46d6056f6 sparc32: Un-btfixup {pte,pmd,pgd}_clear().
Also we can remove BTFIXUPCALL_SWAPO0G0 as that is no longer
used.

This was rather amusing, we were setting the btfixup vectors
based upon cpu type but all to the same exact generic srmmu
routines.

Furthermore, we were inconsistently marking the fixup as
either BTFIXUPCALL_SWAPO0G0 or BTFIXUPCALL_NORM.

What a mess, glad we could untangle this stuff.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 12:26:47 -07:00
David S. Miller
3d386c0ef6 sparc32: Un-btfixup PGDIR_{SHIFT,SIZE,MASK} {USER_,}PTRS_PER_{PGD,PMD}
Only one set of values exist, the SRMMU ones.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 12:02:02 -07:00
Sam Ravnborg
3774348770 sparc32: drop btfixup for check_pgt_cache
It is a noop for srmmu - so use a define as sparc64 does.
And drop all sparc callers - no need to confuse our-self
be calling a noop function.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 11:32:04 -07:00
Sam Ravnborg
34d4accfe0 sparc32: drop btfixup for switch_mm
This revealed that the implementation of switch_mm
had a bogus extra argument.
No harm as said argument was never used - but confusing.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 11:32:03 -07:00
David S. Miller
ee906c9e0b sparc32: Trivial removal of sun4c references in comments.
I left some around, like the ones in the openprom headers, since
we need to think about which pieces of those datastructures and
code we can completely toss now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-12 00:35:45 -07:00
David S. Miller
c1e3cb54f2 sparc32: Remove sparc_lvl15_nmi().
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 21:27:04 -07:00
David S. Miller
716a5d73a7 sparc32: Kill asm/vac-ops.h
All sun4/sun4c stuff and unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 21:07:50 -07:00
Sam Ravnborg
afaedde7c9 sparc32: use inline versions of pgprot_noncached, pte_to_pgoff and pgoff_to_pte
We no longer have different versions of these so use a few simple
static inline functions.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:29:10 -07:00
Sam Ravnborg
e7b7e0c356 sparc32: drop btfixup for alloc_thread_info_node/free_thread_info
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:29:09 -07:00
Sam Ravnborg
ef136bc91e sparc32: drop sun4c user stack checking routine
With this we no longer do any run-time patchings of traps.
So drop the function + macro to support this.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:52 -07:00
Sam Ravnborg
e098ff92f6 sparc32: drop sun4c stack checking routine
And drop run-time patching too.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:51 -07:00
Sam Ravnborg
054768a132 sparc32: drop sun4c window overflow stack checking routine
Also drop run-time patching for srmmu

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:50 -07:00
Sam Ravnborg
28de2f7339 sparc32: drop sun4c specific stack validation
This allows us to kill run-time patching for this function too

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:50 -07:00
Sam Ravnborg
582a0baee5 sparc32: remove all uses of ARCH_SUN4C
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:48 -07:00
Sam Ravnborg
306f123162 sparc32: remove sun4c traps
We used to runtime patch the trap table for srmmu.
With the removal of sun4c support this is no longer required.

With the sun4c trap removed we can remove all the referenced
trap handling which is sun4c specific.
This also allows us to get rid of the nosun4c.c file that
contained only dummy functions/data.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:46 -07:00
Sam Ravnborg
e7eaf5b8ab sparc32: remove calls to sun4c dummy mm inits functions
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:45 -07:00
Sam Ravnborg
2c1cfb2db6 sparc32: drop sun4c support
Machines with sun4c support are very rare these days, and noone
is using them for any practical purposes.
The sun4c support has been know broken for quite some time too.

So rather than trying to keep it up-to-date, lets get rid of it.
This allows us to do some very welcome cleanup of sparc32 support.

Updated the former sun4c specifc nmi (which was also used
for sun4m UP) to be a generic UP NMI.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-11 19:27:44 -07:00
David S. Miller
a5a737e090 sparc64: Do not clobber %g2 in xcall_fetch_glob_regs().
%g2 is meant to hold the CPUID number throughout this routine, since
at the very beginning, and at the very end, we use %g2 to calculate
indexes into per-cpu arrays.

However we erroneously clobber it in order to hold the %cwp register
value mid-stream.

Fix this code to use %g3 for the %cwp read and related calulcations
instead.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-10 11:00:46 -07:00
Paul Gortmaker
aa6f079075 sparc: fix build fail in mm/init_64.c when NEED_MULTIPLE_NODES is off
Commit 625d693e97 (linux-next)

    "sparc64: Convert over to NO_BOOTMEM."

causes the following compile failure for sparc64 allnoconfig:

  arch/sparc/mm/init_64.c:822:16: error: unused variable 'paddr'
  arch/sparc/mm/init_64.c:1759:7: error: unused variable 'node'
  arch/sparc/mm/init_64.c:809:12: error: 'memblock_nid_range' defined but not used

The paddr decl can easily be shuffled within the ifdef.  The
memblock_nid_range is just a stub function for when NEED_MULTIPLE_NODES
is off, but the only caller is within a NEED_MULTIPLE_NODES enabled
section, so we can simply delete it.

The unused "node" is slightly more interesting.  In the case of
"# CONFIG_NEED_MULTIPLE_NODES is not set" we no longer get the
definition of:

 #define NODE_DATA(nid)          (node_data[nid])

from arch/sparc/include/asm/mmzone.h - but instead we get:

 #define NODE_DATA(nid)          (&contig_page_data)

from include/linux/mmzone.h -- and since the arg is ignored,
the thing really is unused.  Rather than put in a confusing
looking __maybe_unused, simply splitting the declaration
from the assignment seemed to me to be the least offensive.

Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-09 19:58:07 -07:00
David S. Miller
799d40cc10 sparc64: Do not set max_mapnr.
There is no need, since nothing relevant to sparc64 makes
use of this value.

Noticed by Sam Ravnborg.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 10:55:08 -07:00
David S. Miller
5ed56f1ad9 sparc64: Use node local allocations for IRQ stacks.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 20:50:34 -07:00
David S. Miller
625d693e97 sparc64: Convert over to NO_BOOTMEM.
With help from Sam Ravnborg.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-26 20:02:23 -07:00
David S. Miller
3423166fdb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc 2012-04-13 13:32:07 -07:00
Kautuk Consul
c29554f53e sparc/mm/fault_32.c: Port OOM changes to do_sparc_fault
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to 32-bit sparc.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 15:42:24 -07:00
Kautuk Consul
7358e51082 sparc/mm/fault_64.c: Port OOM changes to do_sparc64_fault
Commit d065bd810b
(mm: retry page fault when blocking on disk transfer) and
commit 37b23e0525
(x86,mm: make pagefault killable)

The above commits introduced changes into the x86 pagefault handler
for making the page fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial
during OOM killer invocation.

Port these changes to 64-bit sparc.

Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 15:42:24 -07:00
David Howells
49a7f04a4b Move all declarations of free_initmem() to linux/mm.h
Move all declarations of free_initmem() to linux/mm.h so that there's only one
and it's used by everything.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-c6x-dev@linux-c6x.org
cc: microblaze-uclinux@itee.uq.edu.au
cc: linux-sh@vger.kernel.org
cc: sparclinux@vger.kernel.org
cc: x86@kernel.org
cc: linux-mm@kvack.org
2012-03-28 18:30:03 +01:00
David Howells
d550bbd40c Disintegrate asm/system.h for Sparc
Disintegrate asm/system.h for Sparc.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: sparclinux@vger.kernel.org
2012-03-28 18:30:03 +01:00
Cong Wang
a24401bcf4 highmem: kill all __kmap_atomic()
[swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:30 +08:00
Joe Perches
e9b57cca3d sparc: Use vsprintf extention %pf with builtin_return_address
Emit the function name not the address when possible.

builtin_return_address() gives an address.  When building
a kernel with CONFIG_KALLSYMS, emit the actual function
name not the address.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-28 16:08:02 -05:00
Ingo Molnar
45aa0663cc Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock 2011-12-20 12:14:26 +01:00
David S. Miller
b1f44e13a5 sparc32: Be less strict in matching %lo part of relocation.
The "(insn & 0x01800000) != 0x01800000" test matches 'restore'
but that is a legitimate place to see the %lo() part of a 32-bit
symbol relocation, particularly in tail calls.

Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Sergei Trofimovich <slyfox@gentoo.org>
2011-12-14 10:57:28 -08:00
Tejun Heo
2a4814df54 sparc: Use HAVE_MEMBLOCK_NODE_MAP
sparc doesn't access early_node_map[] directly and enabling
HAVE_MEMBLOCK_NODE_MAP is trivial - replacing add_active_range() calls
with memblock_set_node() and selecting HAVE_MEMBLOCK_NODE_MAP is
enough.

-v2: Use select in Kconfig instead as suggested by Sam Ravnborg.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: sparclinux@vger.kernel.org
2011-12-08 10:22:08 -08:00
Tejun Heo
1aadc0560f memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
The only function of memblock_analyze() is now allowing resize of
memblock region arrays.  Rename it to memblock_allow_resize() and
update its users.

* The following users remain the same other than renaming.

  arm/mm/init.c::arm_memblock_init()
  microblaze/kernel/prom.c::early_init_devtree()
  powerpc/kernel/prom.c::early_init_devtree()
  openrisc/kernel/prom.c::early_init_devtree()
  sh/mm/init.c::paging_init()
  sparc/mm/init_64.c::paging_init()
  unicore32/mm/init.c::uc32_memblock_init()

* In the following users, analyze was used to update total size which
  is no longer necessary.

  powerpc/kernel/machine_kexec.c::reserve_crashkernel()
  powerpc/kernel/prom.c::early_init_devtree()
  powerpc/mm/init_32.c::MMU_init()
  powerpc/mm/tlb_nohash.c::__early_init_mmu()  
  powerpc/platforms/ps3/mm.c::ps3_mm_add_memory()
  powerpc/platforms/embedded6xx/wii.c::wii_memory_fixups()
  sh/kernel/machine_kexec.c::reserve_crashkernel()

* x86/kernel/e820.c::memblock_x86_fill() was directly setting
  memblock_can_resize before populating memblock and calling analyze
  afterwards.  Call memblock_allow_resize() before start populating.

memblock_can_resize is now static inside memblock.c.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:08 -08:00
Tejun Heo
fe091c208a memblock: Kill memblock_init()
memblock_init() initializes arrays for regions and memblock itself;
however, all these can be done with struct initializers and
memblock_init() can be removed.  This patch kills memblock_init() and
initializes memblock with struct initializer.

The only difference is that the first dummy entries don't have .nid
set to MAX_NUMNODES initially.  This doesn't cause any behavior
difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2011-12-08 10:22:07 -08:00
Tejun Heo
d4bbf7e775 Merge branch 'master' into x86/memblock
Conflicts & resolutions:

* arch/x86/xen/setup.c

	dc91c728fd "xen: allow extra memory to be in multiple regions"
	24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

	conflicted on xen_add_extra_mem() updates.  The resolution is
	trivial as the latter just want to replace
	memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

	166e9278a3 "x86/ia64: intel-iommu: move to drivers/iommu/"
	5dfe8660a3 "bootmem: Replace work_with_active_regions() with..."

	conflicted as the former moved the file under drivers/iommu/.
	Resolved by applying the chnages from the latter on the moved
	file.

* mm/Kconfig

	6661672053 "memblock: add NO_BOOTMEM config symbol"
	c378ddd53f "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

	conflicted trivially.  Both added config options.  Just
	letting both add their own options resolves the conflict.

* mm/memblock.c

	d1f0ece6cd "mm/memblock.c: small function definition fixes"
	ed7b56a799 "memblock: Remove memblock_memory_can_coalesce()"

	confliected.  The former updates function removed by the
	latter.  Resolution is trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
2011-11-28 09:46:22 -08:00
David S. Miller
3e37fd3153 sparc: Kill custom io_remap_pfn_range().
To handle the large physical addresses, just make a simple wrapper
around remap_pfn_range() like MIPS does.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-17 18:17:59 -08:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Andrea Arcangeli
b35a35b556 thp: share get_huge_page_tail()
This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Andrea Arcangeli
e0d85a366c sparc: gup_pte_range() support THP based tail recounting
Up to this point the code assumed old refcounting for hugepages (pre-thp).
 This updates the code directly to the thp mapcount tail page refcounting.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-02 16:06:58 -07:00
Paul Gortmaker
f060ac5460 sparc: Add module.h to files previously implicitly using it.
The file mm/extable.c needs module.h for within_module_init(), and
also for search_exception_tables [which arguably could be living somewhere
more appropriate than module.h] - eventually causing this:

arch/sparc/mm/extable.c: In function 'trim_init_extable':
arch/sparc/mm/extable.c:74: error: dereferencing pointer to incomplete type
arch/sparc/mm/extable.c:75: error: dereferencing pointer to incomplete type
arch/sparc/mm/extable.c:77: error: implicit declaration of function 'within_module_init'
arch/sparc/mm/extable.c:77: error: dereferencing pointer to incomplete type
arch/sparc/mm/extable.c:78: error: dereferencing pointer to incomplete type
arch/sparc/mm/extable.c:80: error: dereferencing pointer to incomplete type
arch/sparc/mm/extable.c: In function 'search_extables_range':
arch/sparc/mm/extable.c:93: error: implicit declaration of function 'search_exception_tables'

The other instances are more straight forward uses of things
like MODULE_* and module_*

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:54 -04:00
Paul Gortmaker
cdd0b0ac12 sparc: remove several unnecessary module.h include instances
Building an allyesconfig doesn't reveal a hidden need
for any of these.  Since module.h brings in the whole kitchen
sink, it just needlessly adds 30k+ lines to the cpp burden.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:54 -04:00
Paul Gortmaker
7b64db608a sparc: add export.h to arch/sparc files as required
These files are only exporting symbols, so they don't need
the full module.h header file.  Previously they were getting
access to EXPORT_SYMBOL implicitly via overuse of module.h
from within other .h files, but that is being cleaned up.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:52 -04:00
Daniel Hellstrom
f22ed71cd6 sparc32,leon: SRMMU MMU Table probe fix
The LEON MMU Model (SRMMU) does not implement MMu Table probing
in hardware, instead it is implemented in software. However the
software implementation does not return the PTE as it should which
always results in INVALID entires and the PROM mappings are not
inherited as they should during startup. The following patch
removes the masking of the PTE.

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-07 12:08:01 -07:00
David S. Miller
f4142cba4e sparc64: Force the execute bit in OpenFirmware's translation entries.
In the OF 'translations' property, the template TTEs in the mappings
never specify the executable bit.  This is the case even though some
of these mappings are for OF's code segment.

Therefore, we need to force the execute bit on in every mapping.

This problem can only really trigger on Niagara/sun4v machines and the
history behind this is a little complicated.

Previous to sun4v, the sun4u TTE entries lacked a hardware execute
permission bit.  So OF didn't have to ever worry about setting
anything to handle executable pages.  Any valid TTE loaded into the
I-TLB would be respected by the chip.

But sun4v Niagara chips have a real hardware enforced executable bit
in their TTEs.  So it has to be set or else the I-TLB throws an
instruction access exception with type code 6 (protection violation).

We've been extremely fortunate to not get bitten by this in the past.

The best I can tell is that the OF's mappings for it's executable code
were mapped using permanent locked mappings on sun4v in the past.
Therefore, the fact that we didn't have the exec bit set in the OF
translations we would use did not matter in practice.

Thanks to Greg Onufer for helping me track this down.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-29 12:18:59 -07:00
David S. Miller
0785a8e87b sparc: Fix build with DEBUG_PAGEALLOC enabled.
arch/sparc/mm/init_64.c:1622:22: error: unused variable '__swapper_4m_tsb_phys_patch_end' [-Werror=unused-variable]
arch/sparc/mm/init_64.c:1621:22: error: unused variable '__swapper_4m_tsb_phys_patch' [-Werror=unused-variable]

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-06 05:26:35 -07:00
David S. Miller
9076d0e7e0 sparc: Access kernel TSB using physical addressing when possible.
On sun4v this is basically required since we point the hypervisor and
the TSB walking hardware at these tables using physical addressing
too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-05 00:53:57 -07:00
David S. Miller
df077ac468 sparc64: implement get_user_pages_fast()
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:10 -07:00
David S. Miller
4dedbf8dc1 sparc64: kill page table quicklists
With the recent mmu_gather changes that included generic RCU freeing of
page-tables, it is now quite straightforward to implement gup_fast() on
sparc64.

This patch:

Remove the page table quicklists.  They are pointless and make it harder
to use RCU page table freeing and share code with other architectures.

BTW, this is the second time this has happened, see commit 3c93646524
("[SPARC64]: Kill pgtable quicklists and use SLAB.")

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-25 20:57:09 -07:00
Linus Torvalds
4d4abdcb1d Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits)
  perf: Remove the nmi parameter from the oprofile_perf backend
  x86, perf: Make copy_from_user_nmi() a library function
  perf: Remove perf_event_attr::type check
  x86, perf: P4 PMU - Fix typos in comments and style cleanup
  perf tools: Make test use the preset debugfs path
  perf tools: Add automated tests for events parsing
  perf tools: De-opt the parse_events function
  perf script: Fix display of IP address for non-callchain path
  perf tools: Fix endian conversion reading event attr from file header
  perf tools: Add missing 'node' alias to the hw_cache[] array
  perf probe: Support adding probes on offline kernel modules
  perf probe: Add probed module in front of function
  perf probe: Introduce debuginfo to encapsulate dwarf information
  perf-probe: Move dwarf library routines to dwarf-aux.{c, h}
  perf probe: Remove redundant dwarf functions
  perf probe: Move strtailcmp to string.c
  perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END
  tracing/kprobe: Update symbol reference when loading module
  tracing/kprobes: Support module init function probing
  kprobes: Return -ENOENT if probe point doesn't exist
  ...
2011-07-22 16:44:39 -07:00
Tejun Heo
f9b18db3b1 memblock: Don't allow archs to override memblock_nid_range()
memblock_nid_range() is used to implement memblock_[try_]alloc_nid().
The generic version determines the range by walking early_node_map
with for_each_mem_pfn_range().  The generic version is defined __weak
to allow arch override.

Currently, only sparc overrides it; however, with the previous update
to the generic implementation, there isn't much to be gained with arch
override.  Sparc would behave exactly the same with the generic
implementation.

This patch disallows arch override for memblock_nid_range() and make
both generic and sparc versions static.

sparc is only compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1310460395-30913-6-git-send-email-tj@kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2011-07-14 11:45:33 -07:00
Matthias Rosenfelder
6d999da4d2 sparc32,leon: Added __init declaration to leon_flush_needed()
The function leon_flush_needed() is called only during bootup from another
__init function. Therefore, we can also add __init to leon_flush_needed().

Signed-off-by: Matthias Rosenfelder <rosenfelder.lkml@googlemail.com>
Acked-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-06 08:01:52 -07:00
Peter Zijlstra
a8b0ca17b8 perf: Remove the nmi parameter from the swevent and overflow interface
The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

  - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
    the PMI-tail (ARM etc.)
  - tracepoint: nmi=0; since tracepoint could be from NMI context.
  - software: nmi=[0,1]; some, like the schedule thing cannot
    perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01 11:06:35 +02:00
Joe Perches
6cb79b3f3b sparc: Remove unnecessary semicolons
Semicolons are not necessary after switch/while/for/if braces
so remove them.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-07 16:06:34 -07:00
Peter Zijlstra
1c39517696 mm: now that all old mmu_gather code is gone, remove the storage
Fold all the mmu_gather rework patches into one for submission

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:16 -07:00