Commit Graph

1442 Commits

Author SHA1 Message Date
Aneesh Kumar K.V
e568006b9d powerpc/mm/hash: Don't add memory coherence if cache inhibited is set
H_ENTER hcall handling in qemu had assumptions that a cache inhibited
hpte entry won't have memory conference set. Also older kernel
mentioned that some version of pHyp required this (the code removed
by the below commit says:

    /* Make pHyp happy */
    if ((rflags & _PAGE_NO_CACHE) && !(rflags & _PAGE_WRITETHRU))
            hpte_r &= ~HPTE_R_M;

But with older kernel we had some inconsistent memory conherence
mapping. We always enabled memory conherence in the page fault path and
removed memory conherence is _PAGE_NO_CACHE was set when we mapped the
page via htab_bolt_mapping. The commit mentioned below tried to
consolidate that by always enabling memory conherence. But as mentioned
above that breaks Qemu H_ENTER handling.

This patch update this such that we enable memory conherence only if
cache inhibited is not set and bring fault handling, lpar and bolt
mapping in sync.

Fixes: commit 30bda41aba4e("powerpc/mm: Drop WIMG in favour of new constant")
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-17 19:47:51 +10:00
Oliver O'Halloran
3079abe555 powerpc/mm: Ensure "special" zones are empty
The mm zone mechanism was traditionally used by arch specific code to
partition memory into allocation zones. However there are several zones
that are managed by the mm subsystem rather than the architecture. Most
architectures set the max PFN of these special zones to zero, however on
powerpc we set them to ~0ul. This, in conjunction with a bug in
free_area_init_nodes() results in all of system memory being placed in
ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel
memory allocations so this will cause a kernel panic at boot. Given the
planned addition of more mm managed zones (ZONE_CMA) we should aim to be
consistent with every other architecture and set the max PFN for these
zones to zero.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-16 16:03:21 +10:00
Bharata B Rao
45b64ee649 powerpc/numa: Fix multiple bugs in memory_hotplug_max()
memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
addressable memory by referring to ibm,dyanamic-memory property. There
are three problems with the current approach:

1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
  all the LMBs of the guest, but that is not true for PowerKVM which
  populates only DR LMBs (LMBs that can be hotplugged/removed) in that
  property.
2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
  at the max possible address. Since ibm,dynamic-memory doesn't include
  RMA LMBs, the address thus obtained will be less than the actual max
  address. For example, if max possible memory size is 32G, with lmb-size
  of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
  which won't be present here).  hot_add_drconf_memory_max() would then
  return the max addressable memory as 127 * 256MB = 31.75GB, the max
  address should have been 32G which is what ibm,lrdr-capacity shows.
3 In PowerKVM, there can be a gap between the end of boot time RAM and
  beginning of hotplug RAM area. So just multiplying lmb-count with
  lmb-size will not provide the correct max possible address for PowerKVM.

This patch fixes 1 by using ibm,lrdr-capacity property to return the max
addressable memory whenever the property is present. Then it fixes 2 & 3
by fetching the address of the last LMB in ibm,dynamic-memory property.

Fixes: cd34206e94 ("powerpc: Add memory_hotplug_max()")
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14 16:06:13 +10:00
Bharata B Rao
e70bd3ae91 powerpc/numa: Fix whitespace in hot_add_drconf_memory_max()
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14 16:06:05 +10:00
Michael Ellerman
027dfac694 powerpc: Various typo fixes
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14 13:58:26 +10:00
Aneesh Kumar K.V
8550e2fa34 powerpc/mm/hash: Use the correct PPP mask when updating HPTE
With commit e58e87adc8 "powerpc/mm: Update _PAGE_KERNEL_RO" we now
use all the three PPP bits. The top bit is now used to have a PPP value
of 0b110 which will be mapped to kernel read only. When updating the
hpte entry use right mask such that we update the 63rd bit (top 'P' bit)
too.

Prior to e58e87adc8 we didn't support KERNEL_RO at all (it was ==
KERNEL_RW), so this isn't a regression as such.

Fixes: e58e87adc8 ("powerpc/mm: Update _PAGE_KERNEL_RO")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-14 13:54:51 +10:00
Aneesh Kumar K.V
a145abf12c powerpc/mm/radix: Flush page walk cache when freeing page table
Even though a tlb_flush() does a flush with invalidate all cache,
we can end up doing an RCU page table free before calling tlb_flush().
That means we can have page walk cache entries even after we free the
page table pages. This can result in us doing wrong page table walk.

Avoid this by doing pwc flush on every page table free. We can't batch
the pwc flush, because the rcu call back function where we free the
page table pages doesn't have information of the mmu gather. Thus we
have to do a pwc on every page table page freed.

Note: I also removed the dummy tlb_flush_pgtable call functions for
hash 32.

Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-10 16:14:52 +10:00
Aneesh Kumar K.V
36194812a4 powerpc/mm/radix: Update to tlb functions ric argument
Radix invalidate control (RIC) is used to control which cache to flush
using tlb instructions. When doing a PID flush, we currently flush
everything including page walk cache. For address range flush, we flush
only the TLB. In the next patch, we add support for flushing only the
page walk cache.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-10 16:14:43 +10:00
Aneesh Kumar K.V
3b6d1eb7ea powerpc/mm/hash: Compute the segment size correctly for ISA 3.0
PowerISA 3.0 encodes the segment size in the second half of hash page
table entry. Update hpte_decode() accordingly.

Fixes: 50de596de8 ("powerpc/mm/hash: Add support for Power9 Hash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-08 14:36:22 +10:00
Aneesh Kumar K.V
9690c15742 powerpc/mm/radix: Fix always false comparison against MMU_NO_CONTEXT
In some of the radix TLB flush routines, we use a local to store the
mm->context.id, AKA the PID.

Currently we use an int, but the PID is unsigned long, so large values
of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which
means all our comparisons against that value can never be true.

This means we'll issue TLB flushes when we shouldn't on radix enabled
machines.

Fix it by using an unsigned long for the local. Discovered by Coverity.

Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-08 13:56:53 +10:00
Aneesh Kumar K.V
157d4d0620 powerpc/mm/radix: Add missing tlb flush
This should not have any impact on hash, because hash does tlb
invalidate with every pte update and we don't implement
flush_tlb_* functions for hash. With radix we should make an explicit
call to flush tlb outside pte update.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Aneesh Kumar K.V
dc47c0c1f8 powerpc/mm/hash: Fix the reference bit update when handling hash fault
When we converted the asm routines to C functions, we missed updating
HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
bits from pte via.

andi.	r3,r30,0x1fe		/* Get basic set of flags */

We also update the code such that we won't update the Change bit ('C'
bit) always. This was added by commit c5cf0e30bf ("powerpc: Fix
buglet with MMU hash management").

With hash64, we need to make sure that hardware doesn't do a pte update
directly. This is because we do end up with entries in TLB with no hash
page table entry. This happens because when we find a hash bucket full,
we "evict" a more/less random entry from it. When we do that we don't
invalidate the TLB (hpte_remove) because we assume the old translation
is still technically "valid". For more info look at commit
0608d692463("powerpc/mm: Always invalidate tlb on hpte invalidate and
update").

Thus it's critical that valid hash PTEs always have reference bit set
and writeable ones have change bit set. We do this by hashing a
non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
thus R) when hashing anything else in. Any attempt by Linux at clearing
those bits also removes the corresponding hash entry.

Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
We don't really need to do that because we never map a RW pte entry
without setting 'C' bit. On READ fault on a RW pte entry, we still map
it READ only, hence a store update in the page will still cause a hash
pte fault.

This patch reverts the part of commit c5cf0e30bf ("[PATCH] powerpc:
Fix buglet with MMU hash management") and retain the updatepp part.

- If we hit the updatepp path on native, the old code without that
  commit, would fail to set C bcause native_hpte_updatepp()
  was implemented to filter the same bits as H_PROTECT and not let C
  through thus we would "upgrade" a RO HPTE to RW without setting C
  thus causing the bug. So the real fix in that commit was the change
  to native_hpte_updatepp

Fixes: 89ff725051 ("powerpc/mm: Convert __hash_page_64K to C")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Aneesh Kumar K.V
d6c886006c powerpc/mm/radix: Update LPCR only if it is powernv
LPCR cannot be updated when running in guest mode.

Fixes: 2bfd65e45e ("powerpc/mm/radix: Add radix callbacks for early init routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-01 13:47:34 +10:00
Linus Torvalds
c04a588029 powerpc updates for 4.7
Highlights:
  - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V
  - Live patching support for ppc64le (also merged via livepatching.git)
 
 Various cleanups & minor fixes from:
  - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
    Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart
    Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael
    Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta,
    Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin
    Rothberg, Vipin K Parashar.
 
 General:
  - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot
  - Fix branching to OOL handlers in relocatable kernel from Hari Bathini
  - Add support for userspace Power9 copy/paste from Chris Smart
  - Always use STRICT_MM_TYPECHECKS from Michael Ellerman
  - Add mask of possible MMU features from Michael Ellerman
 
 PCI:
  - Enable pass through of NVLink to guests from Alexey Kardashevskiy
  - Cleanups in preparation for powernv PCI hotplug from Gavin Shan
  - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan
  - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan
  - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G. Piccoli
  - Remove the dependency on EEH struct in DDW mechanism from Guilherme G. Piccoli
 
 selftests:
  - Test cp_abort during context switch from Chris Smart
  - Add several tests for transactional memory support from Rashmica Gupta
 
 perf:
  - Add support for sampling interrupt register state from Anju T
  - Add support for unwinding perf-stackdump from Chandan Kumar
 
 cxl:
  - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud
  - Allow initialization on timebase sync failures from Frederic Barrat
  - Increase timeout for detection of AFU mmio hang from Frederic Barrat
  - Handle num_of_processes larger than can fit in the SPA from Ian Munsie
  - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie
  - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie
  - Check periodically the coherent platform function's state from Christophe Lombard
 
 Freescale:
  - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum
    workaround, and a kconfig dependency fix."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXPsGzAAoJEFHr6jzI4aWAVoAP/iKdrDe0eYHlVAE9SqnbsiZs
 lgDxdsC8P3fsmP1G9o/HkKhC82zHl/La8Ztz8dtqa+LkSzbfliWP1ztJsI7GsBFo
 tyCKzWnX9Rwvd3meHu/o/SQ29TNLm/PbPyyRqpj5QPbJ8XCXkAXR7ZZZqjvcMsJW
 /AgIr7Cgf53tl9oZzzl/c7CnNHhMq+NBdA71vhWtUx+T97wfJEGyKW6HhZyHDbEU
 iAki7fu77ZpEqC/Fh9swf0dCGBJ+a132NoMVo0AdV7EQLznUYlQpQEqa+1PyHZOP
 /ArOzf2mDg6m3PfCo1eiB07v8PnVZ3llEUbVAJNg3GUxbE4SHrqq/kwm0iElm3p/
 DvFxerCwdX9vmskJX4wDs+pSZRabXYj9XVMptsgFzA4joWrqqb7mBHqaort88YcY
 YSljEt1bHyXmiJ+dBya40qARsWUkCVN7ZgEzdxckq0KI3w7g2tqpqIbO2lClWT6t
 B3GpqQ4jp34+d1M14FB91fIGK7tMvOhSInE0Mv9+tPvRsepXqiiU/SwdAtRlr3m2
 zs/K+4FYcVjJ3Rmpgc+tI38PbZxHe212I35YN6L1LP+4ZfAtzz0NyKdooTIBtkbO
 19pX4WbBjKq8zK+YutrySncBIrbnI6VjW51vtRhgVKZliPFO/6zKagyU6FbxM+E5
 udQES+t3F/9gvtxgxtDe
 =YvyQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights:
   - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V
   - Live patching support for ppc64le (also merged via livepatching.git)

  Various cleanups & minor fixes from:
   - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
     Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie,
     Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring,
     Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras,
     Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung
     Bauermann, Valentin Rothberg, Vipin K Parashar.

  General:
   - Update LMB associativity index during DLPAR add/remove from Nathan
     Fontenot
   - Fix branching to OOL handlers in relocatable kernel from Hari Bathini
   - Add support for userspace Power9 copy/paste from Chris Smart
   - Always use STRICT_MM_TYPECHECKS from Michael Ellerman
   - Add mask of possible MMU features from Michael Ellerman

  PCI:
   - Enable pass through of NVLink to guests from Alexey Kardashevskiy
   - Cleanups in preparation for powernv PCI hotplug from Gavin Shan
   - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan
   - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan
   - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
     from Guilherme G Piccoli
   - Remove the dependency on EEH struct in DDW mechanism from Guilherme
     G Piccoli

  selftests:
   - Test cp_abort during context switch from Chris Smart
   - Add several tests for transactional memory support from Rashmica
     Gupta

  perf:
   - Add support for sampling interrupt register state from Anju T
   - Add support for unwinding perf-stackdump from Chandan Kumar

  cxl:
   - Configure the PSL for two CAPI ports on POWER8NVL from Philippe
     Bergheaud
   - Allow initialization on timebase sync failures from Frederic Barrat
   - Increase timeout for detection of AFU mmio hang from Frederic
     Barrat
   - Handle num_of_processes larger than can fit in the SPA from Ian
     Munsie
   - Ensure PSL interrupt is configured for contexts with no AFU IRQs
     from Ian Munsie
   - Add kernel API to allow a context to operate with relocate disabled
     from Ian Munsie
   - Check periodically the coherent platform function's state from
     Christophe Lombard

  Freescale:
   - Updates from Scott: "Contains 86xx fixes, minor device tree fixes,
     an erratum workaround, and a kconfig dependency fix."

* tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits)
  powerpc/86xx: Fix PCI interrupt map definition
  powerpc/86xx: Move pci1 definition to the include file
  powerpc/fsl: Fix build of the dtb embedded kernel images
  powerpc/fsl: Fix rcpm compatible string
  powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC
  powerpc/fsl-pci: Add a workaround for PCI 5 errata
  powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb
  powerpc/powernv/npu: Add PE to PHB's list
  powerpc/powernv: Fix insufficient memory allocation
  powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism
  Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell"
  powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner()
  powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover()
  powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover()
  powerpc/eeh: Don't report error in eeh_pe_reset_and_recover()
  Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()"
  powerpc/powernv/npu: Enable NVLink pass through
  powerpc/powernv/npu: Rework TCE Kill handling
  powerpc/powernv/npu: Add set/unset window helpers
  powerpc/powernv/ioda2: Export debug helper pe_level_printk()
  ...
2016-05-20 10:12:41 -07:00
Vaishali Thakkar
71bf79cc3f powerpc: mm: use hugetlb_bad_size()
Update setup_hugepagesz() to call hugetlb_bad_size() when unsupported
hugepage size is found.

Signed-off-by: Vaishali Thakkar <vaishali.thakkar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Gavin Shan
171cb719da powerpc/mm: Improve readability of update_mmu_cache()
The function is used to update the MMU with software PTE. It can
be called by data access exception handler (0x300) or instruction
access exception handler (0x400). If the function is called by
0x400 handler, the local variable @access is set to _PAGE_EXEC
to indicate the software PTE should have that flag set. When the
function is called by 0x300 handler, @access is set to zero.

This improves the readability of the function by replacing if
statements with switch. No logical changes introduced.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:09 +10:00
Oliver O'Halloran
dd0b52c47a powerpc/mm: define TOP_ZONE as a constant
The zone that contains the top of memory will be either ZONE_NORMAL
or ZONE_HIGHMEM depending on the kernel config. There are two functions
that require this information and both of them use an #ifdef to set
a local variable (top_zone). This is a little silly so lets just make it
a constant.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-mm@kvack.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:08 +10:00
Michael Ellerman
aac55d7573 powerpc/mm/hash64: Fix subpage protection with 4K HPTE config
With Linux page size of 64K and hardware only supporting 4K HPTE, if we
use subpage protection, we always fail for the subpage 0 as shown
below (using the selftest subpage_prot test):

  520175565:  (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
  4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
  4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
  4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !

This is because hash preload wrongly inserts the HPTE entry for subpage
0 without looking at the subpage protection information.

Fix it by teaching should_hash_preload() not to preload if we have
subpage protection configured for that range.

It appears this has been broken since it was introduced in 2008.

Fixes: fa28237cfc ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:05 +10:00
Michael Ellerman
8bbc9b7b00 powerpc/mm/hash64: Factor out hash preload psize check
Currently we have a check in hash_preload() against the psize, which is
only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
this check in a subsequent patch, so factor it out to allow that. As a
bonus it removes the #ifdef in the C code.

Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
block because it would require a forward declaration.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:04 +10:00
Aneesh Kumar K.V
62ccf5bf1f powerpc/mm/slice: Remove slice_mm_new_context()
The usage in mm mmu_context_nohash.c is bogus, because we set the
context.id value to MMU_NO_CONTEXT 4 lines previously in the same
function, meaning slice_mm_new_context() will always be true.

The book3s 64 usage was removed in the previous commit. So remove it as
unused.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:54:00 +10:00
Aneesh Kumar K.V
2d566537dd powerpc/mm/subpage: Initialise user psize correctly
As part of the radix support we switched Book3s64 to use a value of ~0
for MMU_NO_CONTEXT. That is because id 0 is special on radix.

However that broke the logic in init_new_context(). The code there needs
to differentiate between a newly allocated context and one inherited via
fork. Previously it worked because a newly allocated context has an id
of zero (because it was just memset() to zero), which used to match
MMU_NO_CONTEXT, and therefore slice_mm_new_context() did the right
thing.

Instead check against a context.id value of zero instead of using
slice_mm_new_context().

Without this patch we never call slice_set_user_psize(), and end up with
a slice psize value of zero and we always end up using 4K HPTE.

Fixes: 1a472c9dba ("powerpc/mm/radix: Add tlbflush routines")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:59 +10:00
Aneesh Kumar K.V
bde3eb6222 powerpc/mm/radix: Add radix THP callbacks
The deposited pgtable_t is a pte fragment hence we cannot use page->lru
for linking then together. We use the first two 64 bits for pte fragment
as list_head type to link all deposited fragments together. On withdraw
we properly zero then out.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:57 +10:00
Aneesh Kumar K.V
3df33f12be powerpc/mm/thp: Abstraction for THP functions
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:57 +10:00
Aneesh Kumar K.V
6a1ea36260 powerpc/mm: THP is only available on hash64 as of now
Only code movement in this patch. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:56 +10:00
Aneesh Kumar K.V
484837601d powerpc/mm: Add radix support for hugetlb
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:55 +10:00
Aneesh Kumar K.V
2f5f0dfd1e powerpc/mm: Fix vma_mmu_pagesize() for radix
Radix doesn't use the slice framework to find the page size. Hence use
vma to find the page size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:54 +10:00
Aneesh Kumar K.V
5ed7ecd08a powerpc/mm: pte_frag abstraction
In this patch we make the number of pte fragments per level 4 page table
page a variable. Radix level 4 table size is 256 bytes and hence we can
have 256 fragments per level 4 page. We don't update the fragment count
in this patch. We need to do performance measurements to find the right
value for fragment count.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:54 +10:00
Aneesh Kumar K.V
a3dece6d69 powerpc/radix: Update MMU cache
With radix there is no MMU cache. Hence we don't need to do anything in
update_mmu_cache().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:53 +10:00
Aneesh Kumar K.V
d6a9996e84 powerpc/mm: vmalloc abstraction in preparation for radix
The vmalloc range differs between hash and radix config. Hence make
VMALLOC_START and related constants a variable which will be runtime
initialized depending on whether hash or radix mode is active.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix missing init of ioremap_bot in pgtable_64.c for ppc64e]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:53 +10:00
Aneesh Kumar K.V
4dfb88ca9b powerpc/mm: Update pte filter for radix
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:52 +10:00
Aneesh Kumar K.V
a2f41eb992 powerpc/mm: Add radix pgalloc details
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:51 +10:00
Aneesh Kumar K.V
934828edfa powerpc/mm: Make 4K and 64K use pte_t for pgtable_t
This patch switches 4K Linux page size config to use pte_t * type
instead of struct page * for pgtable_t. This simplifies the code a lot
and helps in consolidating both 64K and 4K page allocator routines. The
changes should not have any impact, because we already store physical
address in the upper level page table tree and that implies we already
do struct page * to physical address conversion.

One change to note here is we move the pgtable_page_dtor() call for
nohash to pte_fragment_free_mm(). The nohash related change is due to
the related changes in pgtable_64.c.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:51 +10:00
Aneesh Kumar K.V
74701d5947 powerpc/mm: Rename function to indicate we are allocating fragments
Only code cleanup. No functionality change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:50 +10:00
Aneesh Kumar K.V
b5dcc60969 powerpc/mm/radix: Update PTCR on secondary CPUs
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:48 +10:00
Aneesh Kumar K.V
7a0eedeedd powerpc/mm/radix: Pick the address layout for radix config
Hash needs special get_unmapped_area() handling because of limitations
around base page size, so we have to set HAVE_ARCH_UNMAPPED_AREA.

With radix we don't have such restrictions, so we could use the generic
code. But because we've set HAVE_ARCH_UNMAPPED_AREA (for hash), we have
to re-implement the same logic as the generic code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:47 +10:00
Aneesh Kumar K.V
177ba7c647 powerpc/mm/radix: Limit paca allocation in radix
On return from RTAS we access the paca variables and we have 64 bit
disabled. This requires us to limit paca in 32 bit range.

Fix this by setting ppc64_rma_size to first_memblock_size/1G range.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:47 +10:00
Aneesh Kumar K.V
764041e0f4 powerpc/mm/radix: Add checks in slice code to catch radix usage
Radix doesn't need slice support. Catch incorrect usage of slice code
when radix is enabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-11 21:53:46 +10:00
Aneesh Kumar K.V
1a472c9dba powerpc/mm/radix: Add tlbflush routines
Core kernel doesn't track the page size of the VA range that we are
invalidating. Hence we end up flushing TLB for the entire mm here. Later
patches will improve this.

We also don't flush page walk cache separetly instead use RIC=2 when
flushing TLB, because we do a MMU gather flush after freeing page table.

MMU_NO_CONTEXT is updated for hash.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:09 +10:00
Aneesh Kumar K.V
676012a66f powerpc/mm: Hash abstraction for tlbflush routines
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:08 +10:00
Aneesh Kumar K.V
f5df4b4be9 powerpc/mm: Rename mmu_context_hash64.c to mmu_context_book3s64.c
This file now contains both hash and radix specific code. Rename it to
indicate this better.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:07 +10:00
Aneesh Kumar K.V
7e381c0ff6 powerpc/mm/radix: Add mmu context handling callback for radix
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:05 +10:00
Aneesh Kumar K.V
d2adba3fd1 powerpc/mm: Abstraction for switch_mmu_context()
How we switch MMU context differs between hash and radix. For hash we
need to switch the SLB details and for radix we need to switch the PID
SPR.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:04 +10:00
Aneesh Kumar K.V
d9225ad923 powerpc/mm/radix: Add radix callbacks for vmemmap and map_kernel page()
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:03 +10:00
Aneesh Kumar K.V
31a14fae92 powerpc/mm: Abstraction for vmemmap and map_kernel_page()
For hash we create vmemmap mapping using bolted hash page table entries.
For radix we fill the radix page table. The next patch will add the
radix details for creating vmemmap mappings.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:02 +10:00
Aneesh Kumar K.V
2bfd65e45e powerpc/mm/radix: Add radix callbacks for early init routines
This adds routines for early setup for radix. We use device tree
property "ibm,processor-radix-AP-encodings" to find supported page
sizes. If we don't find the above we consider 64K and 4K as supported
page sizes.

We do map vmemap using 2M page size if we can. The linear mapping is
done such that we use required page size for that range. For example
memory of 3.5G is mapped such that we use 1G mapping till 3G range and
use 2M mapping for the rest.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:33:00 +10:00
Aneesh Kumar K.V
756d08d1ba powerpc/mm: Abstract early MMU init in preparation for radix
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:58 +10:00
Aneesh Kumar K.V
dd1842a2a4 powerpc/mm: Make page table size a variable
Radix and hash MMU models support different page table sizes. Make
the #defines a variable so that existing code can work with variable
sizes.

Slice related code is only used by hash, so use hash constants there. We
will replicate some of the boundary conditions with resepct to TASK_SIZE
using radix values too. Right now we do boundary condition check using
hash constants.

Swapper pgdir size is initialized in asm code. We select the max pgd
size to keep it simple. For now we select hash pgdir. When adding radix
we will switch that to radix pgdir which is 64K.

BUILD_BUG_ON check which is removed is already done in hugepage_init()
using MAYBE_BUILD_BUG_ON().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:48 +10:00
Aneesh Kumar K.V
945537df7a powerpc/mm/book3s: Rename hash specific PTE bits to carry H_ prefix
This helps to make following hash only pte bits easier.

We have kept _PAGE_CHG_MASK, _HPAGE_CHG_MASK and _PAGE_PROT_BITS as it
is in this patch eventhough they use hash specific bits. Using them in
radix as it is should be ok, because with radix we expect those bit
positions to be zero.

Only renames in this patch, no change in functionality.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:43 +10:00
Aneesh Kumar K.V
eee24b5aaf powerpc/mm: Move hash and no hash code to separate files
This patch reduces the number of #ifdefs in C code and will also help in
adding radix changes later. Only code movement in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Propagate copyrights and update GPL text]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:42 +10:00
Aneesh Kumar K.V
50de596de8 powerpc/mm/hash: Add support for Power9 Hash
PowerISA 3.0 adds a parition table indexed by LPID. Parition table
allows us to specify the MMU model that will be used for guest and host
translation.

This patch adds support with SLB based hash model (UPRT = 0). What is
required with this model is to support the new hash page table entry
format and also setup partition table such that we use hash table for
address translation.

We don't have segment table support yet.

In order to make sure we don't load KVM module on Power9 (since we don't
have kvm support yet) this patch also disables KVM on Power9.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:40 +10:00
Aneesh Kumar K.V
ff844b741e powerpc/mm: Use generic version of pmdp_clear_flush_young()
The radix variant is going to require a flush_pmd_tlb_range(). With
flush_pmd_tlb_range() added, pmdp_clear_flush_young() is the same as the
generic version. So drop the powerpc specific variant.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:35 +10:00
Aneesh Kumar K.V
30bda41aba powerpc/mm: Drop WIMG in favour of new constants
PowerISA 3.0 introduces two pte bits with the below meaning for radix:
  00 -> Normal Memory
  01 -> Strong Access Order (SAO)
  10 -> Non idempotent I/O (Cache inhibited and guarded)
  11 -> Tolerant I/O (Cache inhibited)

We drop the existing WIMG bits in the Linux page table in favour of the
above constants. We loose _PAGE_WRITETHRU with this conversion. We only
use writethru via pgprot_cached_wthru() which is used by
fbdev/controlfb.c which is Apple control display and also PPC32.

With respect to _PAGE_COHERENCE, we have been marking hpte always
coherent for some time now. htab_convert_pte_flags() always added
HPTE_R_M.

NOTE: KVM changes need closer review.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:33 +10:00
Aneesh Kumar K.V
72176dd0ad powerpc/mm: Use a helper for finding pte bits mapping I/O area
Use a helper instead of open coding with constants. A later patch will
drop the WIMG bits and use PowerISA 3.0 defines.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:32 +10:00
Aneesh Kumar K.V
e58e87adc8 powerpc/mm: Update _PAGE_KERNEL_RO
PS3 had used a PPP bit hack to implement a read only mapping in the
kernel area. Since we are bolting the ioremap area, it used the pte
flags _PAGE_PRESENT | _PAGE_USER to get a PPP value of 0x3 there by
resulting in a read only mapping. This means the area can be accessed by
user space, but kernel will never return such an address to user space.

But we can do better by implementing a read only kernel mapping using
PPP bits 0b110.

This also allows us to do read only kernel mapping for radix in later
patches.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:30 +10:00
Aneesh Kumar K.V
96270b1fc2 powerpc/mm: Remove RPN_SHIFT and RPN_SIZE
PTE_RPN_SHIFT is actually page size dependent. Even though PowerISA 3.0
expects only the lower 12 bits to be zero, we will always find the pages
to be PAGE_SHIFT aligned. In case of hash config, this also allows us to
use the additional 3 bits to track pte specific information. We need
to make sure we use these bits only for hash specific pte flags.

For both 4K and 64K config, pte now can hold 57 bits address.

Inorder to keep things simpler, drop PTE_RPN_SHIFT and PTE_RPN_SIZE and
specify the 57 bit detail explicitly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:29 +10:00
Aneesh Kumar K.V
ac29c64089 powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED
_PAGE_PRIVILEGED means the page can be accessed only by the kernel. This
is done to keep pte bits similar to PowerISA 3.0 Radix PTE format. User
pages are now marked by clearing _PAGE_PRIVILEGED bit.

Previously we allowed the kernel to have a privileged page in the lower
address range (USER_REGION). With this patch such access is denied.

We also prevent a kernel access to a non-privileged page in higher
address range (ie, REGION_ID != 0).

Both the above access scenarios should never happen.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:26 +10:00
Michael Ellerman
7e1e63c5e9 powerpc/mm: Convert pte_user() to static inline
In a subsequent patch we want to add a second definition of pte_user().
Before we do that, make the signature clear, ie. it takes a pte_t and
returns bool.

We move it up inside the existing #ifndef __ASSEMBLY__ block, but
otherwise it's a straight conversion.

Convert the call in settlbcam(), which passes an unsigned long, to pass
a pte_t.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:24 +10:00
Aneesh Kumar K.V
73a1441a9b powerpc/mm/subpage: Clear RWX bit to indicate no access
Subpage protection used to depend on the _PAGE_USER bit to implement no
access mode. This patch switches that to use _PAGE_RWX. We clear Read,
Write and Execute access from the pte instead of clearing _PAGE_USER
now. This was done so that we can switch to _PAGE_PRIVILEGED in a later
patch.

subpage_protection() returns pte bits that need to be cleared. Instead
of updating the interface to handle no-access in a separate way, it
appears simpler to clear RWX acecss to indicate no access.

We still don't insert hash ptes for no access implied by !_PAGE_RWX.
Hence we should not get PROT_FAULT with change.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:23 +10:00
Aneesh Kumar K.V
c7d54842de powerpc/mm: Use _PAGE_READ to indicate Read access
This splits the _PAGE_RW bit into _PAGE_READ and _PAGE_WRITE. It also
removes the dependency on _PAGE_USER for implying read only. Few things
to note here is that, we have read implied with write and execute
permission. Hence we should always find _PAGE_READ set on hash pte
fault.

We still can't switch PROT_NONE to !(_PAGE_RWX). Auto numa depends on
marking a prot none pte _PAGE_WRITE. (For more details look at
b191f9b106 "mm: numa: preserve PTE write permissions across a NUMA
hinting fault")

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:21 +10:00
Aneesh Kumar K.V
5dc1ef858c powerpc/mm: Use big endian Linux page tables for book3s 64
Traditionally Power server machines have used the Hashed Page Table MMU
mode. In this mode Linux manages its own tree of nested page tables,
aka. "the Linux page tables", which are not used by the hardware
directly, and software loads translations into the hash page table for
use by the hardware.

Power ISA 3.0 defines a new MMU mode, known as Radix Tree Translation,
where the hardware can directly operate on the Linux page tables.
However the hardware requires that the page tables be in big endian
format.

To accommodate this, switch the pgtable types to __be64 and add
appropriate endian conversions.

Because we will be supporting a single kernel binary that boots using
either radix or hash mode, we always store the Linux page tables big
endian, even in hash mode where they are not actually used by the
hardware.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix sparse errors, flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:18 +10:00
Michael Ellerman
3910a7f485 powerpc/mm: Add pte_xchg() helper
We have five locations in 64-bit hash MMU code that do a cmpxchg() of a
PTE. Currently doing it inline OK, but in a future patch we will be
converting the PTEs to __be64 in some configs. In that case we will need
casts at every cmpxchg() site in order to keep sparse happy.

So move the logic into a helper, this is a reasonably nice cleanup on
its own.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:16 +10:00
Aneesh Kumar K.V
4bece39b50 powerpc/mm: Drop PTE_ATOMIC_UPDATES from pmd_hugepage_update()
pmd_hugepage_update() is inside #ifdef CONFIG_TRANSPARENT_HUGEPAGE. THP
can only be enabled if PPC_BOOK3S_64=y && PPC_64K_PAGES=y, aka. hash64.

On hash64 we always define PTE_ATOMIC_UPDATES to 1, meaning the #ifdef
in pmd_hugepage_update() is unnecessary, so drop it.

That is also the only use of PTE_ATOMIC_UPDATES in any of the hash code,
meaning we no longer need to #define it at all in the hash headers.

Note it's still #defined and used in the nohash code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-05-01 18:32:15 +10:00
Daniel Axtens
7f92bc5694 powerpc: sparse: Include headers for __weak symbols
Sometimes when sparse warns about undefined symbols, it isn't
because they should have 'static' added, it's because they're
overriding __weak symbols defined elsewhere, and the header has
been missed.

Fix a few of them by adding appropriate headers.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-12 21:05:19 +10:00
Michael Ellerman
1f4c66e805 powerpc/mm: Remove long disabled SLB code
We have a bunch of SLB related code in the tree which is there to handle
dynamic VSIDs - but currently it's all disabled at compile time. The
comments say "Keep that around for when we re-implement dynamic VSIDs".

But that was over 10 years ago (commit 3c726f8dee ("[PATCH] ppc64:
support 64k pages")). The chance that it would still work unchanged is
minimal, and in the meantime it's confusing to folks browsing/grepping
the code. If we ever want to re-instate it, it's in the git history.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
2016-04-11 20:30:40 +10:00
Sebastian Siewior
08a5bb2921 powerpc/mm: Fixup preempt underflow with huge pages
hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
accessing the variable did not migrate from one CPU to another and soon
this was noticed by Tiejun Chen in 94b09d7554 ("powerpc/hugetlb:
Replace __get_cpu_var with get_cpu_var"). So we had it fixed.

Christoph Lameter was doing his __get_cpu_var() replaces and forgot
PowerPC. Then he noticed this and sent his fixed up batch again which
got applied as 69111bac42 ("powerpc: Replace __get_cpu_var uses").

The careful reader will noticed one little detail: get_cpu_var() got
replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
a preempt_enable() and nothing that does preempt_disable() so we
underflow the preempt counter.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-29 12:08:07 +11:00
Linus Torvalds
d5e2d00898 powerpc updates for 4.6
Highlights:
  - Restructure Linux PTE on Book3S/64 to Radix format from Paul Mackerras
  - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh Kumar K.V
  - Add POWER9 cputable entry from Michael Neuling
  - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
  - Add support for new ftrace ABI on ppc64le from Torsten Duwe
 
 Various cleanups & minor fixes from:
  - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy, Cyril
    Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell Currey,
    Sukadev Bhattiprolu, Suraj Jitindar Singh.
 
 General:
  - atomics: Allow architectures to define their own __atomic_op_* helpers from
    Boqun Feng
  - Implement atomic{, 64}_*_return_* variants and acquire/release/relaxed
    variants for (cmp)xchg from Boqun Feng
  - Add powernv_defconfig from Jeremy Kerr
  - Fix BUG_ON() reporting in real mode from Balbir Singh
  - Add xmon command to dump OPAL msglog from Andrew Donnellan
  - Add xmon command to dump process/task similar to ps(1) from Douglas Miller
  - Clean up memory hotplug failure paths from David Gibson
 
 pci/eeh:
  - Redesign SR-IOV on PowerNV to give absolute isolation between VFs from Wei
    Yang.
  - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
  - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
  - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
  - MAINTAINERS: Update EEH details and maintainership from Russell Currey
 
 cxl:
  - Support added to the CXL driver for running on both bare-metal and
    hypervisor systems, from Christophe Lombard and Frederic Barrat.
  - Ignore probes for virtual afu pci devices from Vaibhav Jain
 
 perf:
  - Export Power8 generic and cache events to sysfs from Sukadev Bhattiprolu
  - hv-24x7: Fix usage with chip events, display change in counter values,
    display domain indices in sysfs, eliminate domain suffix in event names,
    from Sukadev Bhattiprolu
 
 Freescale:
  - Updates from Scott: "Highlights include 8xx optimizations, 32-bit checksum
    optimizations, 86xx consolidation, e5500/e6500 cpu hotplug, more fman and
    other dt bits, and minor fixes/cleanup."
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW69OrAAoJEFHr6jzI4aWAe5EQAJw/hE6WBQc6a7Tj70AnXOqR
 qk/m5pZjuTwQxfBteIvHR1pE5eXdlvtAjcD254LVkFkAbIn19W/h2k0VX/nlee7P
 n/VRHRifjtGmukqHrPYJJ7ua9mNlY7pxh3leGSixBFASnSWqMxNNNziNQtSTcuCs
 TjHiw6NkZ/kzeunA4bAfE4yHVUZjmL74oiS9JbLyaVHqoW4fqWLlh26AKo2yYMZI
 qPicBBG4HBi3FGvoexnKxlJNdcV4HO7LzDjJmCSfUKYCJi+Pw19T5qmhso0q0qVz
 vHg/A8HNeG4Hn83pNVmLeQSAIQRZ3DvTtcLgbjPo+TVwm/hzrRRBWipTeOVbkLW8
 2bcOXT4t7LWUq15EAJ1LYgYZGzcLrfRfUeOcuQ1TWd3+PcfY9pE7FmizsxAAfaVe
 E9j9mpz4XnIqBtWkFHneTIHkQ5OWptyKuZJEaYH0nut4VsP0k8NarkseafGqBPu7
 5eG83gbiQbCVixfOgblV9eocJ29JcwpjPAY4CZSGJimShg909FV7WRgZgJkKWrbK
 dBRco8Jcp4VglGfo2qymv7Uj4KwQoypBREOhiKUvrAsVlDxPfx+bcskhjGu9xGDC
 xs/+nme0/lKa/wg5K4C3mQ1GAlkMWHI0ojhJjsyODbetup5UbkEu03wjAaTdO9dT
 Y6ptGm0rYAJluPNlziFj
 =qkAt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "This was delayed a day or two by some build-breakage on old toolchains
  which we've now fixed.

  There's two PCI commits both acked by Bjorn.

  There's one commit to mm/hugepage.c which is (co)authored by Kirill.

  Highlights:
   - Restructure Linux PTE on Book3S/64 to Radix format from Paul
     Mackerras
   - Book3s 64 MMU cleanup in preparation for Radix MMU from Aneesh
     Kumar K.V
   - Add POWER9 cputable entry from Michael Neuling
   - FPU/Altivec/VSX save/restore optimisations from Cyril Bur
   - Add support for new ftrace ABI on ppc64le from Torsten Duwe

  Various cleanups & minor fixes from:
   - Adam Buchbinder, Andrew Donnellan, Balbir Singh, Christophe Leroy,
     Cyril Bur, Luis Henriques, Madhavan Srinivasan, Pan Xinhui, Russell
     Currey, Sukadev Bhattiprolu, Suraj Jitindar Singh.

  General:
   - atomics: Allow architectures to define their own __atomic_op_*
     helpers from Boqun Feng
   - Implement atomic{, 64}_*_return_* variants and acquire/release/
     relaxed variants for (cmp)xchg from Boqun Feng
   - Add powernv_defconfig from Jeremy Kerr
   - Fix BUG_ON() reporting in real mode from Balbir Singh
   - Add xmon command to dump OPAL msglog from Andrew Donnellan
   - Add xmon command to dump process/task similar to ps(1) from Douglas
     Miller
   - Clean up memory hotplug failure paths from David Gibson

  pci/eeh:
   - Redesign SR-IOV on PowerNV to give absolute isolation between VFs
     from Wei Yang.
   - EEH Support for SRIOV VFs from Wei Yang and Gavin Shan.
   - PCI/IOV: Rename and export virtfn_{add, remove} from Wei Yang
   - PCI: Add pcibios_bus_add_device() weak function from Wei Yang
   - MAINTAINERS: Update EEH details and maintainership from Russell
     Currey

  cxl:
   - Support added to the CXL driver for running on both bare-metal and
     hypervisor systems, from Christophe Lombard and Frederic Barrat.
   - Ignore probes for virtual afu pci devices from Vaibhav Jain

  perf:
   - Export Power8 generic and cache events to sysfs from Sukadev
     Bhattiprolu
   - hv-24x7: Fix usage with chip events, display change in counter
     values, display domain indices in sysfs, eliminate domain suffix in
     event names, from Sukadev Bhattiprolu

  Freescale:
   - Updates from Scott: "Highlights include 8xx optimizations, 32-bit
     checksum optimizations, 86xx consolidation, e5500/e6500 cpu
     hotplug, more fman and other dt bits, and minor fixes/cleanup"

* tag 'powerpc-4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
  powerpc: Fix unrecoverable SLB miss during restore_math()
  powerpc/8xx: Fix do_mtspr_cpu6() build on older compilers
  powerpc/rcpm: Fix build break when SMP=n
  powerpc/book3e-64: Use hardcoded mttmr opcode
  powerpc/fsl/dts: Add "jedec,spi-nor" flash compatible
  powerpc/T104xRDB: add tdm riser card node to device tree
  powerpc32: PAGE_EXEC required for inittext
  powerpc/mpc85xx: Add pcsphy nodes to FManV3 device tree
  powerpc/mpc85xx: Add MDIO bus muxing support to the board device tree(s)
  powerpc/86xx: Introduce and use common dtsi
  powerpc/86xx: Update device tree
  powerpc/86xx: Move dts files to fsl directory
  powerpc/86xx: Switch to kconfig fragments approach
  powerpc/86xx: Update defconfigs
  powerpc/86xx: Consolidate common platform code
  powerpc32: Remove one insn in mulhdu
  powerpc32: small optimisation in flush_icache_range()
  powerpc: Simplify test in __dma_sync()
  powerpc32: move xxxxx_dcache_range() functions inline
  powerpc32: Remove clear_pages() and define clear_page() inline
  ...
2016-03-19 15:38:41 -07:00
Joonsoo Kim
fe896d1878 mm: introduce page reference manipulation functions
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count.  Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it.  Then, it is hard to find actual
reason of CMA allocation failure.  CMA allocation should be guaranteed
to succeed so finding offending place is really important.

In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function.  This is preparation step to
add tracepoint to each page reference manipulation function.  With this
facility, we can easily find reason of CMA allocation failure.  There is
no functional change in this patch.

In addition, this patch also converts reference read sites.  It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Joonsoo Kim
e7df0d88c4 powerpc: query dynamic DEBUG_PAGEALLOC setting
We can disable debug_pagealloc processing even if the code is compiled
with CONFIG_DEBUG_PAGEALLOC.  This patch changes the code to query
whether it is enabled or not in runtime.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-17 15:09:34 -07:00
Linus Torvalds
10dc374766 One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes.
 
 * ARM:
 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - various optimizations to the vgic save/restore code.
 
 * PPC:
 - enabled KVM-VFIO integration ("VFIO device")
 - optimizations to speed up IPIs between vcpus
 - in-kernel handling of IOMMU hypercalls
 - support for dynamic DMA windows (DDW).
 
 * s390:
 - provide the floating point registers via sync regs;
 - separated instruction vs. data accesses
 - dirty log improvements for huge guests
 - bugfixes and documentation improvements.
 
 * x86:
 - Hyper-V VMBus hypercall userspace exit
 - alternative implementation of lowest-priority interrupts using vector
 hashing (for better VT-d posted interrupt support)
 - fixed guest debugging with nested virtualizations
 - improved interrupt tracking in the in-kernel IOAPIC
 - generic infrastructure for tracking writes to guest memory---currently
 its only use is to speedup the legacy shadow paging (pre-EPT) case, but
 in the future it will be used for virtual GPUs as well
 - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
 oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
 tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
 Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
 WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
 =OUZZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One of the largest releases for KVM...  Hardly any generic
  changes, but lots of architecture-specific updates.

  ARM:
   - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
   - PMU support for guests
   - 32bit world switch rewritten in C
   - various optimizations to the vgic save/restore code.

  PPC:
   - enabled KVM-VFIO integration ("VFIO device")
   - optimizations to speed up IPIs between vcpus
   - in-kernel handling of IOMMU hypercalls
   - support for dynamic DMA windows (DDW).

  s390:
   - provide the floating point registers via sync regs;
   - separated instruction vs.  data accesses
   - dirty log improvements for huge guests
   - bugfixes and documentation improvements.

  x86:
   - Hyper-V VMBus hypercall userspace exit
   - alternative implementation of lowest-priority interrupts using
     vector hashing (for better VT-d posted interrupt support)
   - fixed guest debugging with nested virtualizations
   - improved interrupt tracking in the in-kernel IOAPIC
   - generic infrastructure for tracking writes to guest
     memory - currently its only use is to speedup the legacy shadow
     paging (pre-EPT) case, but in the future it will be used for
     virtual GPUs as well
   - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
  KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
  KVM: x86: disable MPX if host did not enable MPX XSAVE features
  arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
  arm64: KVM: vgic-v3: Reset LRs at boot time
  arm64: KVM: vgic-v3: Do not save an LR known to be empty
  arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
  arm64: KVM: vgic-v3: Avoid accessing ICH registers
  KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
  KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
  KVM: arm/arm64: vgic-v2: Reset LRs at boot time
  KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
  KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
  KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
  KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
  KVM: s390: allocate only one DMA page per VM
  KVM: s390: enable STFLE interpretation only if enabled for the guest
  KVM: s390: wake up when the VCPU cpu timer expires
  KVM: s390: step the VCPU timer while in enabled wait
  KVM: s390: protect VCPU cpu timer with a seqcount
  KVM: s390: step VCPU cpu timer during kvm_run ioctl
  ...
2016-03-16 09:55:35 -07:00
Linus Torvalds
d37a14bb5f Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ram resource handling changes from Ingo Molnar:
 "Core kernel resource handling changes to support NVDIMM error
  injection.

  This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
  for System RAM while keeping the current IORESOURCE_MEM type bit set
  for all memory-mapped ranges (including System RAM) for backward
  compatibility.

  With this resource flag it no longer takes a strcmp() loop through the
  resource tree to find "System RAM" resources.

  The new resource type is then used to extend ACPI/APEI error injection
  facility to also support NVDIMM"

* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI/EINJ: Allow memory error injection to NVDIMM
  resource: Kill walk_iomem_res()
  x86/kexec: Remove walk_iomem_res() call with GART type
  x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
  resource: Add walk_iomem_res_desc()
  memremap: Change region_intersects() to take @flags and @desc
  arm/samsung: Change s3c_pm_run_res() to use System RAM type
  resource: Change walk_system_ram() to use System RAM type
  drivers: Initialize resource entry to zero
  xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
  kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
  arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
  ia64: Set System RAM type and descriptor
  x86/e820: Set System RAM type and descriptor
  resource: Add I/O resource descriptor
  resource: Handle resource flags properly
  resource: Add System RAM resource type
2016-03-14 15:15:51 -07:00
Christophe Leroy
060ef9d89d powerpc32: PAGE_EXEC required for inittext
PAGE_EXEC is required for inittext, otherwise CONFIG_DEBUG_PAGEALLOC
ends up with an Oops

[    0.000000] Inode-cache hash table entries: 8192 (order: 1, 32768 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] bootmem::free_all_bootmem_core nid=0 start=0 end=2000
[    0.000000] Unable to handle kernel paging request for instruction fetch
[    0.000000] Faulting instruction address: 0xc045b970
[    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.000000] PREEMPT DEBUG_PAGEALLOC CMPC885
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.25-local-dirty #1673
[    0.000000] task: c04d83d0 ti: c04f8000 task.ti: c04f8000
[    0.000000] NIP: c045b970 LR: c045b970 CTR: 0000000a
[    0.000000] REGS: c04f9ea0 TRAP: 0400   Not tainted  (3.18.25-local-dirty)
[    0.000000] MSR: 08001032 <ME,IR,DR,RI>  CR: 39955d35  XER: a000ff40
[    0.000000]
GPR00: c045b970 c04f9f50 c04d83d0 00000000 ffffffff c04dcdf4 00000048 c04f6b10
GPR08: c04f6ab0 00000001 c0563488 c04f6ab0 c04f8000 00000000 00000000 b6db6db7
GPR16: 00003474 00000180 00002000 c7fec000 00000000 000003ff 00000176 c0415014
GPR24: c0471018 c0414ee8 c05304e8 c03aeaac c0510000 c0471018 c0471010 00000000
[    0.000000] NIP [c045b970] free_all_bootmem+0x164/0x228
[    0.000000] LR [c045b970] free_all_bootmem+0x164/0x228
[    0.000000] Call Trace:
[    0.000000] [c04f9f50] [c045b970] free_all_bootmem+0x164/0x228 (unreliable)
[    0.000000] [c04f9fa0] [c0454044] mem_init+0x3c/0xd0
[    0.000000] [c04f9fb0] [c045080c] start_kernel+0x1f4/0x390
[    0.000000] [c04f9ff0] [c0002214] start_here+0x38/0x98
[    0.000000] Instruction dump:
[    0.000000] 2f150000 7f968840 72a90001 3ad60001 56b5f87e 419a0028 419e0024 41a20018
[    0.000000] 807cc20c 38800000 7c638214 4bffd2f5 <3a940001> 3a100024 4bffffc8 7e368b78
[    0.000000] ---[ end trace dc8fa200cb88537f ]---

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 20:04:32 -06:00
Christophe Leroy
8478d7f091 powerpc: Simplify test in __dma_sync()
This simplification helps the compiler. We now have only one test
instead of two, so it reduces the number of branches.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:20:12 -06:00
Christophe Leroy
766d45cbee powerpc/8xx: rewrite flush_instruction_cache() in C
On PPC8xx, flushing instruction cache is performed by writing
in register SPRN_IC_CST. This registers suffers CPU6 ERRATA.
The patch rewrites the fonction in C so that CPU6 ERRATA will
be handled transparently

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:20:11 -06:00
Christophe Leroy
a7761fe489 powerpc/8xx: rewrite set_context() in C
There is no real need to have set_context() in assembly.
Now that we have mtspr() handling CPU6 ERRATA directly, we
can rewrite set_context() in C language for easier maintenance.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:20:11 -06:00
Christophe Leroy
e974cd4be0 powerpc32: remove ioremap_base
ioremap_base is not initialised and is nowhere used so remove it

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:02 -06:00
Christophe Leroy
c562eb06d5 powerpc32: Remove useless/wrong MMU:setio progress message
Commit 7711684947 ("[POWERPC] Remove unused machine call outs")
removed the call to setup_io_mappings(), so remove the associated
progress line message

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:02 -06:00
Christophe Leroy
3084cdb7cd powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together
x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
purpose, and are never defined at the same time.
So rename them x_block_mapped() and define them in the relevant
places

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:02 -06:00
Christophe Leroy
516d91893b powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c
Now we have a 8xx specific .c file for that so put it in there
as other powerpc variants do

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:01 -06:00
Christophe Leroy
a372acfac5 powerpc/8xx: Map linear kernel RAM with 8M pages
On a live running system (VoIP gateway for Air Trafic Control), over
a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
and approximatly 35 secondes are spent in DTLB handler.
This represents 5.8% of the overall time and even 10.8% of the
non-idle time.
Among those 87 millions DTLB misses, 15% are on user addresses and
85% are on kernel addresses. And within the kernel addresses, 93%
are on addresses from the linear address space and only 7% are on
addresses from the virtual address space.

MPC8xx has no BATs but it has 8Mb page size. This patch implements
mapping of kernel RAM using 8Mb pages, on the same model as what is
done on the 40x.

In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
entries to the same 8Mb physical page. In each second entry, we add
4Mb to the page physical address to ease life of the FixupDAR
routine. This is just ignored by HW.

In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
will point to the first page of the area. The DTLB handler adds
the 3 bits from EPN to map the correct page.

With this patch applied, we now get only 13 millions TLB misses
during the 10 minutes period. The idle time has increased to 313s
and the overall time spent in DTLB miss handler is 6.3s, which
represents 1% of the overall time and 2.2% of non-idle time.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-11 17:18:01 -06:00
Paolo Bonzini
ab92f30875 KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - Various optimizations to the vgic save/restore code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW36xjAAoJECPQ0LrRPXpDGQkQAMDppzcTOixT3e8VPdHAX09a
 Z5PO0gyTMVV7Jyz5Ul3pedPJA2GSK9mxOCwqvIFbdxLAR6ZB00juO5FrTHkSdI91
 1XLPj4bKoMWcVvhL/g5A4Glp/pVMW1k/9Yq8zZAtYlsLRlqG5rLOutSadcqHcYaJ
 cTD/pFf7b2oPtkTPyoFml75KgHBT/8uvAvFDOWA66Id2z6T11+PsBT/6XnGDiwKg
 tpGTNzx3kPIKIzOAOHqVW6UBxFOeabebXLT8wUz3VwNn/UbG6gkumMNApMAyF2q1
 zU0nAh8+7Ek6Dr4OFWE6BfW6sgg/l7i1lA8XoAmqG7ZTrSptCc59fvaZJxPruG+Q
 dMsU6QgR77JJjbZTinf9a1jReZ/liZrx2gZXedVKdILrjmDSq0UnGcxjUOEDZOGy
 2/dbrlJhv+LhpcJtuPpxPCfoqbW5L0ynzmuYuXRdRz3lTHiOWIRx5gugrhO+wH4D
 4gvZhbw3XCiYfpYHYhl8A1EH5kanKgdXDocz9yIm7mZm89gngufF/HkeXS3ZU25T
 yThyBGulGjqN4FCdgf1HolkTfFjnfSx4qJovJ58eHga+HNLXRkTecZZcbFy2OOHv
 8Bx0PIlwj4RgSaRLWQUudAhdhKS2g22DKDDljxFwhkMPNghvqkYMJCRDKLu6GBXQ
 4YsLKM+TaShHFjSpx+ao
 =rpvb
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.6

- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code

Conflicts:
	include/uapi/linux/kvm.h
2016-03-09 11:50:42 +01:00
Linus Torvalds
b8155fe1b2 powerpc fixes for 4.5 #4
- cxl: Fix PSL timebase synchronization detection from Frederic Barrat
  - Fix oops when destroying hw_breakpoint event from Ravi Bangoria
  - Avoid lbarx on e5500 from Scott Wood
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIbBAABAgAGBQJW21wIAAoJEFHr6jzI4aWAiTQP+PWe24xwWmC/9FWBhT5rmHbs
 CO82Wy2X3gjMsQoXGqe48ohfToJAWEt5yXS+H/PKpoTYJ/yFcih2ufRw55Y6xzHs
 M98c4Rt4mAbGwkmwWTBjSc1zP+ReWeMNwRE69j6N6Xl+fDqGg6tMKrcysh6wdDLB
 ZhLeHZbVEGhIlXuxMjGyPrMxSrvcEZMMVrxp8eTycC6+wBHB4w4WzOpFx87WM0fn
 3g1Q5WWzRjW2iCeWa6tJLwIIzEACrFs8nfJU08mJU1n33YYVbpu+2klG+5ppfGAL
 JiuKt9/ZoWSAMwNU7yEwFdgjtnOzywTnt6kAUWFdMKLV7DktJ+sf0ucZF9oBg87e
 ermsXt+Jy591NXgoif5gFMELWouC+Ti9pnqxdOTawYC8TKFm0OGExJFZQMGmzBw1
 9ZpO/2pl0bsuNZyClspuFyVRrNb1sxddisSyKlxW+0Vnz1P432fRZZSz2gHLvClG
 RjQ+nsKZYj6vH0wfA+eG9+/a5iYjbfBKRNSjrO1g8pbc3ELeG5C2kXDePKnkt8+J
 jpvTLHxaRp44wNPcizPrEthrdlKxE8uvTUosRuSMSJu74SK7t8+9kCvgDnHYne51
 f/2Ko7CzAV2SfdKB+nWI2oJxfYpPlxqORZkYGkQZIOD5uQ8pvJ4yCNzy43XqFHFE
 DyJOYeTxVQmfMT6jkrg=
 =ZcDj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 - cxl: Fix PSL timebase synchronization detection from Frederic Barrat
 - Fix oops when destroying hw_breakpoint event from Ravi Bangoria
 - Avoid lbarx on e5500 from Scott Wood

* tag 'powerpc-4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/fsl-book3e: Avoid lbarx on e5500
  powerpc/hw_breakpoint: Fix oops when destroying hw_breakpoint event
  cxl: Fix PSL timebase synchronization detection
2016-03-06 11:08:06 -08:00
chenhui zhao
ebb9d30a6a powerpc/mm: any thread in one core can be the first to setup TLB1
On e6500, in the case of cpu hotplug, either thread in one core
may be the first thread initilzing the TLB1. The subsequent threads
must not setup it again.

The code is derived from the comment of Scott Wood.

Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-04 23:44:02 -06:00
Ingo Molnar
bc94b99636 Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into core/resources, to resolve conflict

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-04 12:12:08 +01:00
Scott Wood
37c5e942bb powerpc/fsl-book3e: Avoid lbarx on e5500
lbarx/stbcx. are implemented on e6500, but not on e5500.
Likewise, SMT is on e6500, but not on e5500.

So, avoid executing an unimplemented instruction by only locking
when needed (i.e. in the presence of SMT).

Signed-off-by: Scott Wood <oss@buserror.net>
2016-03-03 23:43:05 -06:00
Aneesh Kumar K.V
c367a44133 powerpc/mm: add _PAGE_HASHPTE similar to 4K hash
We don't need to update linux page table entry with _PAGE_HASHPTE early
in hash pte fault. A parallel pte update will loop via _PAGE_BUSY
and look at _PAGE_HASHPTE for a required hpte flush only if
_PAGE_BUSY is cleared. That ensures a pte update will wait for a
parallel hpte insert to finish before looking at _PAGE_HASHPTE bit.

To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K.

commit 41743a4e34 ("powerpc: Free a PTE bit on ppc64 with 64K pages")
did similar change for 64K config

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03 21:18:29 +11:00
Aneesh Kumar K.V
e9a681478c powerp/mm: Update code comments
We are updating pte in those functions.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03 21:18:29 +11:00
Kirill A. Shutemov
ff20c2e0ac mm: Some arch may want to use HPAGE_PMD related values as variables
With next generation power processor, we are having a new mmu model
[1] that require us to maintain a different linux page table format.

Inorder to support both current and future ppc64 systems with a single
kernel we need to make sure kernel can select between different page
table format at runtime. With the new MMU (radix MMU) added, we will
have two different pmd hugepage size 16MB for hash model and 2MB for
Radix model. Hence make HPAGE_PMD related values as a variable.

Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a
followup patch.

[1] http://ibm.biz/power-isa3 (Needs registration).

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03 21:18:29 +11:00
Aneesh Kumar K.V
368ced78e6 powerpc/mm: Switch book3s 64 with 64K page size to 4 level page table
This is needed so that we can support both hash and radix page table
using single kernel. Radix kernel uses a 4 level table.

We now use physical address in upper page table tree levels. Even though
they are aligned to their size, for the masked bits we use the
bit positions as per PowerISA 3.0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-03 21:18:28 +11:00
David Gibson
5c3c7ede2b powerpc/mm: Split hash page table sizing heuristic into a helper
htab_get_table_size() either retrieve the size of the hash page table (HPT)
from the device tree - if the HPT size is determined by firmware - or
uses a heuristic to determine a good size based on RAM size if the kernel
is responsible for allocating the HPT.

To support a PAPR extension allowing resizing of the HPT, we're going to
want the memory size -> HPT size logic elsewhere, so split it out into a
helper function.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-02 09:06:16 +11:00
David Gibson
1dace6c665 powerpc/mm: Clean up memory hotplug failure paths
This makes a number of cleanups to handling of mapping failures during
memory hotplug on Power:

For errors creating the linear mapping for the hot-added region:
  * This is now reported with EFAULT which is more appropriate than the
    previous EINVAL (the failure is unlikely to be related to the
    function's parameters)
  * An error in this path now prints a warning message, rather than just
    silently failing to add the extra memory.
  * Previously a failure here could result in the region being partially
    mapped.  We now clean up any partial mapping before failing.

For errors creating the vmemmap for the hot-added region:
   * This is now reported with EFAULT instead of causing a BUG() - this
     could happen for external reason (e.g. full hash table) so it's better
     to handle this non-fatally
   * An error message is also printed, so the failure won't be silent
   * As above a failure could cause a partially mapped region, we now
     clean this up. [mpe: move htab_remove_mapping() out of #ifdef
     CONFIG_MEMORY_HOTPLUG to enable this]

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 22:04:18 +11:00
David Gibson
27828f98a0 powerpc/mm: Handle removing maybe-present bolted HPTEs
At the moment the hpte_removebolted callback in ppc_md returns void and
will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
place.  This is awkward for the case of cleaning up a mapping which was
partially made before failing.

So, we add a return value to hpte_removebolted, and have it return ENOENT
in the case that the HPTE to remove didn't exist in the first place.

In the (sole) caller, we propagate errors in hpte_removebolted to its
caller to handle.  However, we handle ENOENT specially, continuing to
complete the unmapping over the specified range before returning the error
to the caller.

This means that htab_remove_mapping() will work sanely on a partially
present mapping, removing any HPTEs which are present, while also returning
ENOENT to its caller in case it's important there.

There are two callers of htab_remove_mapping():
   - In remove_section_mapping() we already WARN_ON() any error return,
     which is reasonable - in this case the mapping should be fully
     present
   - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
     just a WARN_ON() in the case of ENOENT, since failing to remove a
     mapping that wasn't there in the first place probably shouldn't be
     fatal.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 22:04:18 +11:00
David Gibson
abd0a0e791 powerpc/mm: Clean up error handling for htab_remove_mapping
Currently, the only error that htab_remove_mapping() can report is -EINVAL,
if removal of bolted HPTEs isn't implemeted for this platform.  We make
a few clean ups to the handling of this:

 * EINVAL isn't really the right code - there's nothing wrong with the
   function's arguments - use ENODEV instead
 * We were also printing a warning message, but that's a decision better
   left up to the callers, so remove it
 * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
   error, making the warning message redundant, so no change is needed
   there.
 * The other caller is remove_section_mapping().  This is called in the
   memory hot remove path at a point after vmemmap_remove_mapping() so
   if hpte_removebolted isn't implemented, we'd expect to have already
   BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
   really shouldn't be happening.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 22:04:17 +11:00
Adam Buchbinder
446957ba51 powerpc: Fix misspellings in comments.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-03-01 19:27:20 +11:00
Paul Mackerras
a9d4996df1 powerpc/mm/book3s-64: Move HPTE-related bits in PTE to upper end
This moves the _PAGE_HASHPTE, _PAGE_F_GIX and _PAGE_F_SECOND fields in
the Linux PTE on 64-bit Book 3S systems to the most significant byte.
Of the 5 bits, one is a software-use bit and the other four are
reserved bit positions in the PowerISA v3.0 radix PTE format.
Using these bits is OK because these bits are all to do with tracking
the HPTE(s) associated with the Linux PTE, and therefore won't be
needed in radix mode.  This frees up bit positions in the lower two
bytes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29 20:34:39 +11:00
Paul Mackerras
849f86a630 powerpc/mm/book3s-64: Move _PAGE_PRESENT to the most significant bit
This changes _PAGE_PRESENT for 64-bit Book 3S processors from 0x2 to
0x8000_0000_0000_0000, because that is where PowerISA v3.0 CPUs in
radix mode will expect to find it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29 20:34:39 +11:00
Paul Mackerras
c61a884312 powerpc/mm/book3s-64: Use physical addresses in upper page table tree levels
This changes the Linux page tables to store physical addresses
rather than kernel virtual addresses in the upper levels of the
tree (pgd, pud and pmd) for 64-bit Book 3S machines.

This also changes the hugepd pointers used to implement hugepages
when the base page size is 4k to store physical addresses rather than
virtual addresses (again just for 64-bit Book3S machines).

This frees up some high order bits, and will be needed with
PowerISA v3.0 machines which read the page table tree in hardware
in radix mode.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-29 20:34:34 +11:00
Daniel Cashman
5ef11c35ce mm: ASLR: use get_random_long()
Replace calls to get_random_int() followed by a cast to (unsigned long)
with calls to get_random_long().  Also address shifting bug which, in
case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.

Signed-off-by: Daniel Cashman <dcashman@android.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Mark Salyzyn <salyzyn@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-27 10:28:52 -08:00
Paul Mackerras
f1a9ae034a powerpc/mm/book3s-64: Free up 7 high-order bits in the Linux PTE
This frees up bits 57-63 in the Linux PTE on 64-bit Book 3S machines.
In the 4k page case, this is done just by reducing the size of the
RPN field to 39 bits, giving 51-bit real addresses.  In the 64k page
case, we had 10 unused bits in the middle of the PTE, so this moves
the RPN field down 10 bits to make use of those unused bits.  This
means the RPN field is now 3 bits larger at 37 bits, giving 53-bit
real addresses in the normal case, or 49-bit real addresses for the
special 4k PFN case.

We are doing this in order to be able to move some other PTE bits
into the positions where PowerISA V3.0 processors will expect to
find them in radix-tree mode.  Ultimately we will be able to move
the RPN field to lower bit positions and make it larger.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-27 21:06:57 +11:00
Paul Mackerras
1ec3f93710 powerpc/mm/book3s-64: Clean up some obsolete or misleading comments
No code changes.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-27 21:06:57 +11:00
Aneesh Kumar K.V
9ab3ac233a powerpc/mm/hash: Clear the invalid slot information correctly
We can get a hash pte fault with 4k base page size and find the pte
already inserted with 64K base page size. In that case we need to clear
the existing slot information from the old pte. Fix this correctly

With THP, we also clear the slot information with respect to all
the 64K hash pte mapping that 16MB page. They are all invalid
now. This make sure we don't find the slot valid when we fault with
4k base page size. Finding the slot valid should not result in any wrong
behavior because we do check again in hash page table for the validity.
But we can avoid that check completely.

Fixes: a43c0eb836 ("powerpc/mm: Convert 4k hash insert to C")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-22 19:27:39 +11:00
Alexey Kardashevskiy
e9ab1a1caf powerpc: Make vmalloc_to_phys() public
This makes vmalloc_to_phys() public as there will be another user
(KVM in-kernel VFIO acceleration) for it soon. As this new user
can be compiled as a module, this exports the symbol.

As a little optimization, this changes the helper to call
vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the
struct page may not be power-of-two aligned which will make gcc use
multiply instructions instead of shifts.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2016-02-16 13:44:26 +11:00
Aneesh Kumar K.V
c777e2a8b6 powerpc/mm: Fix Multi hit ERAT cause by recent THP update
With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

		CPU0				CPU1
shrink_page_list()
  add_to_swap()
    split_huge_page_to_list()
      __split_huge_pmd_locked()
        pmdp_huge_clear_flush_notify()
	// pmd_none() == true
					exit_mmap()
					  unmap_vmas()
					    zap_pmd_range()
					      // no action on pmd since pmd_none() == true
	pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Fixes: eef1b3ba05 ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-15 21:10:04 +11:00
Toshi Kani
35d98e93fe arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
"System RAM", "Kernel code", "Kernel data", and "Kernel bss".

Note that:

 - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
   IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
   as (IORESOURCE_MEM|IORESOURCE_SYSRAM).

 - Some archs do not set 'flags' for children nodes, such as
   "Kernel code".  This patch does not change 'flags' in this
   case.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 09:49:57 +01:00
Vasant Hegde
e256caa7d0 powerpc/mm: Allow user space to map rtas_rmo_buf
With commit 90a545e9 (restrict /dev/mem to idle io memory ranges) mapping
rtas_rmo_buf from user space is failing. Hence we are not able to make
RTAS syscall.

This patch calls page_is_rtas_user_buf before calling iomem_is_exclusive
in  devmem_is_allowed(). This will allow user space to map rtas_rmo_buf
and we are able to make RTAS syscall.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
CC: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-25 16:31:13 +11:00
Kirill A. Shutemov
7aa9a23c69 powerpc, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov
ddc58f27f9 mm: drop tail page refcounting
Tail page refcounting is utterly complicated and painful to support.

It uses ->_mapcount on tail pages to store how many times this page is
pinned.  get_page() bumps ->_mapcount on tail page in addition to
->_count on head.  This information is required by split_huge_page() to
be able to distribute pins from head of compound page to tails during
the split.

We will need ->_mapcount to account PTE mappings of subpages of the
compound page.  We eliminate need in current meaning of ->_mapcount in
tail pages by forbidding split entirely if the page is pinned.

The only user of tail page refcounting is THP which is marked BROKEN for
now.

Let's drop all this mess.  It makes get_page() and put_page() much
simpler.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov
78ddc53473 thp: rename split_huge_page_pmd() to split_huge_pmd()
We are going to decouple splitting THP PMD from splitting underlying
compound page.

This patch renames split_huge_page_pmd*() functions to split_huge_pmd*()
to reflect the fact that it doesn't imply page splitting, only PMD.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Michael Ellerman
be6bfc29bc Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include moving QE code out of arch/powerpc (to be shared with
arm), device tree updates, and minor fixes."
2016-01-14 09:55:01 +11:00
Michael Ellerman
c33e54fafa powerpc: Fix build break due to paca mm_context_t changes
Commit 2fc251a8dd ("powerpc: Copy only required pieces of the
mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y
and CONFIG_PPC_MM_SLICES=n.

That only happens for a kernel built with 4K pages and HUGETLB disabled,
which is why we missed it.

Fix it by adding a mm_ctx_user_psize member to the paca and populating
it in the appropriate places.

Fixes: 2fc251a8dd ("powerpc: Copy only required pieces of the mm_context_t to the paca")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-01-09 08:28:44 +11:00
Michael Neuling
2fc251a8dd powerpc: Copy only required pieces of the mm_context_t to the paca
Currently we copy the whole mm_context_t to the paca but only access a
few bits of it.  This is wasteful of space paca and also takes quite
some time in the hot path of context switching.

This patch pulls in only the required bits from the mm_context_t to
the paca and on context switch, copies only those.

Benchmarking this (On top of Anton's recent MSR context switching
changes [1]) using processes and yield shows an improvement of almost
3% on POWER8:

  http://ozlabs.org/~anton/junkcode/context_switch2.c
  ./context_switch2 --test=yield --process 0 0

1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html

Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Rename paca fields to be mm_ctx_foo rather than context_foo]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:14 +11:00
Scott Wood
2749539b4b powerpc/e6500: add locking to hugetlb
e6500 has threads but does not have TLB write conditional.  Thus,
the hugetlb code needs to take the same lock that the normal TLB miss
handlers take, to ensure that the tlbsx and tlbwe are atomic.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-12-22 18:23:22 -06:00
Michael Neuling
c395465da6 powerpc: Add function to copy mm_context_t to the paca
This adds a function to copy the mm->context to the paca.  This is
only a basic conversion for now but will be used more extensively in
the next patch.

This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's
not used elsewhere.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-19 22:13:12 +11:00
Aneesh Kumar K.V
4dcbd88eb6 powerpc/mm: Don't open code pgtable_t size
The slot information of base page size hash pte is stored in the
pgtable_t w.r.t transparent hugepage. We need to make sure we don't
index beyond pgtable_t size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:17 +11:00
Aneesh Kumar K.V
62607bc64c powerpc/mm: Don't hardcode page table size
pte and pmd table size are dependent on config items. Don't
hard code the same. This make sure we use the right value
when masking pmd entries and also while checking pmd_bad

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:15 +11:00
Aneesh Kumar K.V
6a119eae94 powerpc/mm: Add a _PAGE_PTE bit
For a pte entry we will have _PAGE_PTE set. Our pte page
address have a minimum alignment requirement of HUGEPD_SHIFT_MASK + 1.
We use the lower 7 bits to indicate hugepd. ie.

For pmd and pgd we can find:
1) _PAGE_PTE set pte -> indicate PTE
2) bits [2..6] non zero -> indicate hugepd.
   They also encode the size. We skip bit 1 (_PAGE_PRESENT).
3) othewise pointer to next table.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:14 +11:00
Aneesh Kumar K.V
e34aa03ca4 powerpc/mm: Move THP headers around
We support THP only with book3s_64 and 64K page size. Move
THP details to hash64-64k.h to clarify the same.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:14 +11:00
Aneesh Kumar K.V
26a344aea4 powerpc/mm: Move hugetlb related headers
W.r.t hugetlb, we support two format for pmd. With book3s_64 and
64K linux page size, we can have pte at the pmd level. Hence we
don't need to support hugepd there. For everything else hugepd
is supported and pmd_huge is (0).

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:13 +11:00
Aneesh Kumar K.V
40e8550afc powerpc/mm: Move WIMG update to helper.
Only difference here is, we apply the WIMG mapping early, so rflags
passed to updatepp will also be changed.

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:13 +11:00
Aneesh Kumar K.V
c6a3c495f0 powerpc/mm: Add helper for converting pte bit to hpte bits
Instead of open coding it in multiple code paths, export the helper
and add more documentation. Also make sure we don't make assumption
regarding pte bit position

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:12 +11:00
Aneesh Kumar K.V
a43c0eb836 powerpc/mm: Convert 4k insert from asm to C
This is similar to 64K insert. May be we want to consolidate

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:12 +11:00
Aneesh Kumar K.V
89ff725051 powerpc/mm: Convert __hash_page_64K to C
Convert from asm to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:11 +11:00
Aneesh Kumar K.V
506b863c68 powerpc/mm: Remove pte_val usage for the second half of pgtable_t
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
bf680d5160 powerpc/mm: Don't track subpage valid bit in pte_t
This free up 11 bits in pte_t. In the later patch we also change
the pte_t format so that we can start supporting migration pte
at pmd level. We now track 4k subpage valid bit as below

If we have _PAGE_COMBO set, we override the _PAGE_F_GIX_SHIFT
and _PAGE_F_SECOND. Together we have 4 bits, each of them
used to indicate whether any of the 4 4k subpage in that group
is valid. ie,

[ group 1 bit ]   [ group 2 bit ]  ..... [ group 4 ]
[ subpage 1 - 4]  [ subpage 5- 8]  ..... [ subpage 13 - 16]

We still track each 4k subpage slot number and secondary hash
information in the second half of pgtable_t. Removing the subpage
tracking have some significant overhead on aim9 and ebizzy benchmark and
to support THP with 4K subpage, we do need a pgtable_t of 4096 bytes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
106713a145 powerpc/mm: Remove the dependency on pte bit position in asm code
We should not expect pte bit position in asm code. Simply
by moving part of that to C

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:10 +11:00
Aneesh Kumar K.V
91f1da9979 powerpc/mm: Convert 4k hash insert to C
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:09 +11:00
Aneesh Kumar K.V
f281b5d50c powerpc/mm: Don't use pmd_val, pud_val and pgd_val as lvalue
We convert them static inline function here as we did with pte_val in
the previous patch

Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:07 +11:00
Aneesh Kumar K.V
0863d7f213 powerpc/mm: Fix infinite loop in hash fault with 4K page size
This is the same bug we fixed as part of 09567e7fd4
("powerpc/mm: Check paca psize is up to date for huge mappings"). Please
check that for details. The difference here is that faults were
happening on a 4K page at an address previously mapped by hugetlb.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-14 15:19:04 +11:00
Linus Torvalds
2f4bf528ec powerpc updates for 4.4
- Kconfig: remove BE-only platforms from LE kernel build from Boqun Feng
  - Refresh ps3_defconfig from Geoff Levand
  - Emit GNU & SysV hashes for the vdso from Michael Ellerman
  - Define an enum for the bolted SLB indexes from Anshuman Khandual
  - Use a local to avoid multiple calls to get_slb_shadow() from Michael Ellerman
  - Add gettimeofday() benchmark from Michael Neuling
  - Avoid link stack corruption in __get_datapage() from Michael Neuling
  - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar K.V
  - Add ppc64le_defconfig from Michael Ellerman
  - pseries: extract of_helpers module from Andy Shevchenko
  - Correct string length in pseries_of_derive_parent() from Nathan Fontenot
  - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
  - Shorten irq_chip name for the SIU from Christophe Leroy
  - Wait 1s for secondaries to enter OPAL during kexec from Samuel Mendoza-Jonas
  - Fix _ALIGN_* errors due to type difference. from Aneesh Kumar K.V
  - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from Colin Ian King
  - Disable hugepd for 64K page size. from Aneesh Kumar K.V
  - Differentiate between hugetlb and THP during page walk from Aneesh Kumar K.V
  - Make PCI non-optional for pseries from Michael Ellerman
  - Individual System V IPC system calls from Sam bobroff
  - Add selftest of unmuxed IPC calls from Michael Ellerman
  - discard .exit.data at runtime from Stephen Rothwell
  - Delete old orphaned PrPMC 280/2800 DTS and boot file. from Paul Gortmaker
  - Use of_get_next_parent to simplify code from Christophe Jaillet
  - Paginate some xmon output from Sam bobroff
  - Add some more elements to the xmon PACA dump from Michael Ellerman
  - Allow the tm-syscall selftest to build with old headers from Michael Ellerman
  - Run EBB selftests only on POWER8 from Denis Kirjanov
  - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael Ellerman
  - Avoid reference to potentially freed memory in prom.c from Christophe Jaillet
  - Quieten boot wrapper output with run_cmd from Geoff Levand
  - EEH fixes and cleanups from Gavin Shan
  - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
  - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael Ellerman
  - Fix section mismatch warning in msi_bitmap_alloc() from Denis Kirjanov
  - Fix ps3-lpm white space from Rudhresh Kumar J
  - Fix ps3-vuart null dereference from Colin King
  - nvram: Add missing kfree in error path from Christophe Jaillet
  - nvram: Fix function name in some errors messages. from Christophe Jaillet
  - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro Koskinen
  - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
  - cxl: Free virtual PHB when removing from Andrew Donnellan
  - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from Michael Ellerman
  - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building with O= from Michael Ellerman
 
  - Freescale updates from Scott: Highlights include 64-bit book3e kexec/kdump
    support, a rework of the qoriq clock driver, device tree changes including
    qoriq fman nodes, support for a new 85xx board, and some fixes.
 
  - MPC5xxx updates from Anatolij: Highlights include a driver for MPC512x
    LocalPlus Bus FIFO with its device tree binding documentation, mpc512x
    device tree updates and some minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPEZgAAoJEFHr6jzI4aWANjYQAKX2Q/95hqKfCuF5FBcUmtMC
 Pu/Nff027MVzxZ2ApDcvvLGps5Nz2bn3nIhc9zjkXc5E8DuL6X3Yl8ce7qyNcc3g
 cJJ8RvtUo6J1OMWetXFehtPYniAAwKMhZYKnj0+WnLr2SyH/Vhl3ehDkFbGyPtuH
 r+2E7krFjfVgU+bzciIFnOaDekFuFN/pXWMb6e6zQyBJe9N8ZIp96uouGCebKVd0
 VDLItzdaKErT8JFfbymMPvZm3V0rMVx4WWu3kAbQX8LrD5a18NF1zrjAOHRXc61n
 kkk8/DPuNOon1PbXXyiS5BcFyZRe+KE3VBnoW5sOMqMIRg5WdO1oU3e2pEfXMO8+
 leXYwFLXiKzUZuOgQG2QiUhrzD2yC1o6/TJWATv0dSl9AwrecgPX+Vj6X357slAf
 A9E3eMy5tgnpndBWZmvZS3W7YDKH+NkeZ+Q40+NErAlqr++ErrTcKVndk5vWlYTT
 7mMZeTXagX66al/k5ATKqwB7iUSpnYHSAa9fcUYPSM2FnXsDxPyeJGkBbcoOmkGj
 QrpgNYOvJaUJd076goZCV39v0c1xpfV9/9kyVch8HUadf6JcjpVZwYnbGw2qlJjh
 ZanuBG2VOeSwaKQqXiRBSBetnpAg8CVpFjDmX9wOBfSek2wxEJqDX/vQExdbIDQQ
 pUs7vnUxLzhmW/x+ygOI
 =YwcM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Kconfig: remove BE-only platforms from LE kernel build from Boqun
   Feng
 - Refresh ps3_defconfig from Geoff Levand
 - Emit GNU & SysV hashes for the vdso from Michael Ellerman
 - Define an enum for the bolted SLB indexes from Anshuman Khandual
 - Use a local to avoid multiple calls to get_slb_shadow() from Michael
   Ellerman
 - Add gettimeofday() benchmark from Michael Neuling
 - Avoid link stack corruption in __get_datapage() from Michael Neuling
 - Add virt_to_pfn and use this instead of opencoding from Aneesh Kumar
   K.V
 - Add ppc64le_defconfig from Michael Ellerman
 - pseries: extract of_helpers module from Andy Shevchenko
 - Correct string length in pseries_of_derive_parent() from Nathan
   Fontenot
 - Free the MSI bitmap if it was slab allocated from Denis Kirjanov
 - Shorten irq_chip name for the SIU from Christophe Leroy
 - Wait 1s for secondaries to enter OPAL during kexec from Samuel
   Mendoza-Jonas
 - Fix _ALIGN_* errors due to type difference, from Aneesh Kumar K.V
 - powerpc/pseries/hvcserver: don't memset pi_buff if it is null from
   Colin Ian King
 - Disable hugepd for 64K page size, from Aneesh Kumar K.V
 - Differentiate between hugetlb and THP during page walk from Aneesh
   Kumar K.V
 - Make PCI non-optional for pseries from Michael Ellerman
 - Individual System V IPC system calls from Sam bobroff
 - Add selftest of unmuxed IPC calls from Michael Ellerman
 - discard .exit.data at runtime from Stephen Rothwell
 - Delete old orphaned PrPMC 280/2800 DTS and boot file, from Paul
   Gortmaker
 - Use of_get_next_parent to simplify code from Christophe Jaillet
 - Paginate some xmon output from Sam bobroff
 - Add some more elements to the xmon PACA dump from Michael Ellerman
 - Allow the tm-syscall selftest to build with old headers from Michael
   Ellerman
 - Run EBB selftests only on POWER8 from Denis Kirjanov
 - Drop CONFIG_TUNE_CELL in favour of CONFIG_CELL_CPU from Michael
   Ellerman
 - Avoid reference to potentially freed memory in prom.c from Christophe
   Jaillet
 - Quieten boot wrapper output with run_cmd from Geoff Levand
 - EEH fixes and cleanups from Gavin Shan
 - Fix recursive fenced PHB on Broadcom shiner adapter from Gavin Shan
 - Use of_get_next_parent() in of_get_ibm_chip_id() from Michael
   Ellerman
 - Fix section mismatch warning in msi_bitmap_alloc() from Denis
   Kirjanov
 - Fix ps3-lpm white space from Rudhresh Kumar J
 - Fix ps3-vuart null dereference from Colin King
 - nvram: Add missing kfree in error path from Christophe Jaillet
 - nvram: Fix function name in some errors messages, from Christophe
   Jaillet
 - drivers/macintosh: adb: fix misleading Kconfig help text from Aaro
   Koskinen
 - agp/uninorth: fix a memleak in create_gatt_table from Denis Kirjanov
 - cxl: Free virtual PHB when removing from Andrew Donnellan
 - scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target from
   Michael Ellerman
 - scripts/kconfig/Makefile: Fix KBUILD_DEFCONFIG check when building
   with O= from Michael Ellerman
 - Freescale updates from Scott: Highlights include 64-bit book3e
   kexec/kdump support, a rework of the qoriq clock driver, device tree
   changes including qoriq fman nodes, support for a new 85xx board, and
   some fixes.
 - MPC5xxx updates from Anatolij: Highlights include a driver for
   MPC512x LocalPlus Bus FIFO with its device tree binding
   documentation, mpc512x device tree updates and some minor fixes.

* tag 'powerpc-4.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (106 commits)
  powerpc/msi: Fix section mismatch warning in msi_bitmap_alloc()
  powerpc/prom: Use of_get_next_parent() in of_get_ibm_chip_id()
  powerpc/pseries: Correct string length in pseries_of_derive_parent()
  powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
  powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)
  powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan
  powerpc/fsl: Add #clock-cells and clockgen label to clockgen nodes
  powerpc: handle error case in cpm_muram_alloc()
  powerpc: mpic: use IRQCHIP_SKIP_SET_WAKE instead of redundant mpic_irq_set_wake
  powerpc/book3e-64: Enable kexec
  powerpc/book3e-64/kexec: Set "r4 = 0" when entering spinloop
  powerpc/booke: Only use VIRT_PHYS_OFFSET on booke32
  powerpc/book3e-64/kexec: Enable SMP release
  powerpc/book3e-64/kexec: create an identity TLB mapping
  powerpc/book3e-64: Don't limit paca to 256 MiB
  powerpc/book3e/kdump: Enable crash_kexec_wait_realmode
  powerpc/book3e: support CONFIG_RELOCATABLE
  powerpc/booke64: Fix args to copy_and_flush
  powerpc/book3e-64: rename interrupt_end_book3e with __end_interrupts
  powerpc/e6500: kexec: Handle hardware threads
  ...
2015-11-05 23:38:43 -08:00
Raghavendra K T
c118baf802 arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodes
With the setup_nr_nodes(), we have already initialized
node_possible_map.  So it is safe to use for_each_node here.

There are many places in the kernel that use hardcoded 'for' loop with
nr_node_ids, because all other architectures have numa nodes populated
serially.  That should be reason we had maintained the same for
powerpc.

But, since sparse numa node ids possible on powerpc, we unnecessarily
allocate memory for non existent numa nodes.

For e.g., on a system with 0,1,16,17 as numa nodes nr_node_ids=18 and
we allocate memory for nodes 2-14.  This patch we allocate memory for
only existing numa nodes.

The patch is boot tested on a 4 node tuleta, confirming with printks
that it works as expected.

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-05 19:34:48 -08:00
Kevin Hao
e1f580e8ce powerpc/e6500: hw tablewalk: make sure we invalidate and write to the same tlb entry
In order to workaround Erratum A-008139, we have to invalidate the
tlb entry with tlbilx before overwriting. Due to the performance
consideration, we don't add any memory barrier when acquire/release
the tcd lock. This means the two load instructions for esel_next do
have the possibility to return different value. This is definitely
not acceptable due to the Erratum A-008139. We have two options to
fix this issue:
  a) Add memory barrier when acquire/release tcd lock to order the
     load/store to esel_next.
  b) Just make sure to invalidate and write to the same tlb entry and
     tolerate the race that we may get the wrong value and overwrite
     the tlb entry just updated by the other thread.

We observe better performance using option b. So reserve an additional
register to save the value of the esel_next.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:14:40 -05:00
Scott Wood
eba5de8dc1 powerpc/fsl-booke-64: Don't limit ppc64_rma_size to one TLB entry
This is required for kdump to work when loaded at at an address that
does not fall within the first TLB entry -- which can easily happen
because while the lower limit is enforced via reserved memory, which
doesn't affect how much is mapped, the upper limit is enforced via a
different mechanism that does.  Thus, more TLB entries are needed than
would normally be used, as the total memory to be mapped might not be a
power of two.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-27 18:13:22 -05:00
Scott Wood
d9e1831a42 powerpc/85xx: Load all early TLB entries at once
Use an AS=1 trampoline TLB entry to allow all normal TLB1 entries to
be loaded at once.  This avoids the need to keep the translation that
code is executing from in the same TLB entry in the final TLB
configuration as during early boot, which in turn is helpful for
relocatable kernels (e.g. kdump) where the kernel is not running from
what would be the first TLB entry.

On e6500, we limit map_mem_in_cams() to the primary hwthread of a
core (the boot cpu is always considered primary, as a kdump kernel
can be entered on any cpu).  Each TLB only needs to be set up once,
and when we do, we don't want another thread to be running when we
create a temporary trampoline TLB1 entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-10-22 22:50:46 -05:00
Christophe Jaillet
1def37586f powerpc/numa: Use of_get_next_parent to simplify code
of_get_next_parent can be used to simplify the while() loop and
avoid the need of a temp variable.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-15 20:32:00 +11:00
Aneesh Kumar K.V
891121e6c0 powerpc/mm: Differentiate between hugetlb and THP during page walk
We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd73
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB

Reported-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:30:09 +11:00
Aneesh Kumar K.V
ec2640b114 powerpc/mm: Disable hugepd for 64K page size.
After commit e2b3d202d1
("powerpc: Switch 16GB and 16MB explicit hugepages to a
different page table format"), we don't need to support
is_hugepd() for 64K page size.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-12 15:29:59 +11:00
Cyril Bur
fdf880a608 powerpc: Fix checkstop in native_hpte_clear() with lockdep
native_hpte_clear() is called in real mode from two places:
- Early in boot during htab initialisation if firmware assisted dump is
  active.
- Late in the kexec path.

In both contexts there is no need to disable interrupts are they are
already disabled. Furthermore, locking around the tlbie() is only required
for pre POWER5 hardware.

On POWER5 or newer hardware concurrent tlbie()s work as expected and on pre
POWER5 hardware concurrent tlbie()s could result in deadlock. This code
would only be executed at crashdump time, during which all bets are off,
concurrent tlbie()s are unlikely and taking locks is unsafe therefore the
best course of action is to simply do nothing. Concurrent tlbie()s are not
possible in the first case as secondary CPUs have not come up yet.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:01:38 +11:00
Michael Ellerman
26cd835ef8 powerpc/slb: Use a local to avoid multiple calls to get_slb_shadow()
For no reason other than it looks ugly.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:01 +10:00
Anshuman Khandual
1d15010c34 powerpc/slb: Define an enum for the bolted indexes
This patch defines macros for the three bolted SLB indexes we use.
Switch the functions that take the indexes as an argument to use the
enum.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-01 16:52:01 +10:00
Linus Torvalds
f240bdd2a5 powerpc fixes for 4.3
- Fix 32-bit TCE table init in kdump kernel from Nish
  - Fix kdump with non-power-of-2 crashkernel= from Nish
  - Abort cxl_pci_enable_device_hook() if PCI channel is offline from Andrew
  - Fix to release DRC when configure_connector() fails from Bharata
  - Wire up sys_userfaultfd()
  - Fix race condition in tearing down MSI interrupts from Paul
  - Fix unbalanced pci_dev_get() in cxl_probe() from Daniel
  - Fix cxl build failure due to -Wunused-variable gcc behaviour change from Ian
  - Tell the toolchain to use ABI v2 when building an LE boot wrapper from Benh
  - Fix THP to recompute hash value after a failed update from Aneesh
  - 32-bit memcpy/memset: only use dcbz once cache is enabled from Christophe
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV+h5EAAoJEFHr6jzI4aWADboP/3jc6vUtOmFZ1GBGwXrftOc0
 PGtkTtYnGbaNsp+ZBl39Y+CFsSfhFVaDgylm/8G5NMKZaSBVdcFJwfU1w6Ymn49R
 nZkAT5PC9KgT5RTuRTZ3DO/Y2RC9vg2T9pXjEn8NGYcV8GgUkc3dZAn48S3AFgnV
 4jQI5sbxvwU12XkCUn+DkETh13g3gLYtRxwehBu/S/ovED5iNHKJwnXRzxyAA969
 dARNriSeyLBVMLamJ+rJB1S5hVTZTMbughFVVFbgriyIGuC/C1g9b9GN86dCGS6w
 T6VrKveK/iVCLUB16KV8+inbfvUrXItOxhGJWPHw9uAJGLZTz5G+yLHRRPX8onyC
 pgDesJDDpP/P7sAnKto3tF1Vzi7lwVtVPC1dT1Fc9VAWJGPYC/d16EKGIpNqqlnc
 mAIJ7wcI5c/HxvqXR2rdRV6fMer+aY7utwMsh4o/gDs7ArQUcuCrOKSW0jvHGmyr
 MumARXnUGDwPnGD8IfYI2vDOvwisv9g6XACwsM+pi499SfowaiLuK3utsMccagGZ
 INFtaqS7gpcj8TTu3kymw1TbKW/tqG9T81RRZ0rFH1q3aSfvwQ9QvsXctSeqOl/n
 lkxR3Mk0CT0oupXNKV6pjqsRwLroA0AF5tGxuw4ap1Rt4i8G7WHaPgRp7SPZxtkR
 qVBryfK5bKqYtWp4ztVK
 =Dgn5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix 32-bit TCE table init in kdump kernel from Nish

 - Fix kdump with non-power-of-2 crashkernel= from Nish

 - Abort cxl_pci_enable_device_hook() if PCI channel is offline from
   Andrew

 - Fix to release DRC when configure_connector() fails from Bharata

 - Wire up sys_userfaultfd()

 - Fix race condition in tearing down MSI interrupts from Paul

 - Fix unbalanced pci_dev_get() in cxl_probe() from Daniel

 - Fix cxl build failure due to -Wunused-variable gcc behaviour change
   from Ian

 - Tell the toolchain to use ABI v2 when building an LE boot wrapper
   from Benh

 - Fix THP to recompute hash value after a failed update from Aneesh

 - 32-bit memcpy/memset: only use dcbz once cache is enabled from
   Christophe

* tag 'powerpc-4.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc32: memset: only use dcbz once cache is enabled
  powerpc32: memcpy: only use dcbz once cache is enabled
  powerpc/mm: Recompute hash value after a failed update
  powerpc/boot: Specify ABI v2 when building an LE boot wrapper
  cxl: Fix build failure due to -Wunused-variable behaviour change
  cxl: Fix unbalanced pci_dev_get in cxl_probe
  powerpc/MSI: Fix race condition in tearing down MSI interrupts
  powerpc: Wire up sys_userfaultfd()
  powerpc/pseries: Release DRC when configure_connector fails
  cxl: abort cxl_pci_enable_device_hook() if PCI channel is offline
  powerpc/powernv/pci-ioda: fix kdump with non-power-of-2 crashkernel=
  powerpc/powernv/pci-ioda: fix 32-bit TCE table init in kdump kernel
2015-09-18 08:01:06 -07:00
Aneesh Kumar K.V
36b35d5d80 powerpc/mm: Recompute hash value after a failed update
If we had secondary hash flag set, we ended up modifying hash value in
the updatepp code path. Hence with a failed updatepp we will be using
a wrong hash value for the following hash insert. Fix this by
recomputing hash before insert.

Without this patch we can end up with using wrong slot number in linux
pte. That can result in us missing an hash pte update or invalidate
which can cause memory corruption or even machine check.

Fixes: 6d492ecc64 ("powerpc/THP: Add code to handle HPTE faults for hugepages")
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-09-16 22:06:03 +10:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Dan Williams
033fbae988 mm: ZONE_DEVICE for "device memory"
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.

Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory.  Device memory has
different lifetime and performance characteristics than RAM.  However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jerome Glisse <j.glisse@gmail.com>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:40:58 -04:00
Michael Ellerman
9698351565 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 32-bit memcpy/memset optimizations, checksum
optimizations, 85xx config fragments and updates, device tree updates,
e6500 fixes for non-SMP, and misc cleanup and minor fixes."
2015-08-27 20:13:12 +10:00
Nikunj A Dadhania
1d805440a3 powerpc/numa: initialize distance lookup table from drconf path
In some situations, a NUMA guest that supports
ibm,dynamic-memory-reconfiguration node will end up having flat NUMA
distances between nodes. This is because of two problems in the
current code.

1) Different representations of associativity lists.

   There is an assumption about the associativity list in
   initialize_distance_lookup_table(). Associativity list has two forms:

   a) [cpu,memory]@x/ibm,associativity has following
      format:
           <N> <N integers>

   b) ibm,dynamic-reconfiguration-memory/ibm,associativity-lookup-arrays

           <M> <N> <M associativity lists each having N integers>
           M = the number of associativity lists
           N = the number of entries per associativity list

   Fix initialize_distance_lookup_table() so that it does not assume
   "case a". And update the caller to skip the length field before
   sending the associativity list.

2) Distance table not getting updated from drconf path.

   Node distance table will not get initialized in certain cases as
   ibm,dynamic-reconfiguration-memory path does not initialize the
   lookup table.

   Call initialize_distance_lookup_table() from drconf path with
   appropriate associativity list.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-18 19:34:41 +10:00
Michael Ellerman
73b341efda powerpc/mm: Drop CONFIG_PPC_HAS_HASH_64K
The relation between CONFIG_PPC_HAS_HASH_64K and CONFIG_PPC_64K_PAGES is
painfully complicated.

But if we rearrange it enough we can see that PPC_HAS_HASH_64K
essentially depends on PPC_STD_MMU_64 && PPC_64K_PAGES.

We can then notice that PPC_HAS_HASH_64K is used in files that are only
built for PPC_STD_MMU_64, meaning it's equivalent to PPC_64K_PAGES.

So replace all uses and drop it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2015-08-18 19:32:10 +10:00
Michael Ellerman
f444f1f898 powerpc/cell: Drop support for 64K local store on 4K kernels
Back in the olden days we added support for using 64K pages to map the
SPU (Synergistic Processing Unit) local store on Cell, when the main
kernel was using 4K pages.

This was useful at the time because distros were using 4K pages, but
using 64K pages on the SPUs could reduce TLB pressure there.

However these days the number of Cell users is approaching zero, and
supporting this option adds unpleasant complexity to the memory
management code.

So drop the option, CONFIG_SPU_FS_64K_LS, and all related code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
2015-08-18 19:29:49 +10:00
Kevin Hao
69399ee9cb powerpc/e6500: hw tablewalk: optimize a bit for tcd lock acquiring codes
It makes no sense to put the instructions for calculating the lock
value (cpu number + 1) and the clearing of eq bit of cr1 in lbarx/stbcx
loop. And when the lock is acquired by the other thread, the current
lock value has no chance to equal with the lock value used by current
cpu. So we can skip the comparing for these two lock values in the
lbz/bne loop.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-17 18:53:47 -05:00
Anshuman Khandual
79d0be7407 powerpc/slb: Add documentation on runtime patching of SLB encoding
This patch adds some documentation to patch_slb_encoding() explaining
how it works.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Update change log and mention the signedness of the immediate]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 15:04:52 +10:00
Anshuman Khandual
2be682af48 powerpc/slb: Rename all the 'slot' occurrences to 'entry'
The SLB code uses 'slot' and 'entry' interchangeably, change it to always
use 'entry'.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:50:12 +10:00
Anshuman Khandual
752b8adec4 powerpc/slb: Remove a duplicate extern variable
This patch just removes one redundant entry for one extern variable
'slb_compare_rr_to_size' from the scope. This patch does not change
any functionality.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-12 14:50:12 +10:00
Scott Wood
c60232029a powerpc/fsl: Force coherent memory on e500mc derivatives
In CoreNet systems it is not allowed to mix M and non-M mappings to the
same memory, and coherent DMA accesses are considered to be M mappings
for this purpose.  Ignoring this has been observed to cause hard
lockups in non-SMP kernels on e6500.

Furthermore, e6500 implements the LRAT (logical to real address table)
which allows KVM guests to control the WIMGE bits.  This means that
KVM cannot force the M bit on the way it usually does, so the guest had
better set it itself.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 23:00:01 -05:00
Scott Wood
0d61f0b3e2 powerpc/booke64: Move mb() to __set_pte_at() with kernel-addr test
map_kernel() doesn't catch all places that create kernel PTEs.  In
particular, vmalloc() calls set_pte_at() directly.  This causes a
crash when booting a non-SMP kernel on e6500.

Move the sync to __set_pte(), to be executed only for kernel addresses.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 23:00:01 -05:00
Scott Wood
2f7d2b74a9 powerpc/mm: Don't call __flush_dcache_icache_phys() with PA>VA
__flush_dcache_icache_phys() requires the ability to access the
memory with the MMU disabled, which means that on a 32-bit system
any memory above 4 GiB is inaccessible.  In particular, mpc86xx is
32-bit and can have more than 4 GiB of RAM.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-08-07 22:59:20 -05:00
Anton Blanchard
eab861a7a5 powerpc: Add plain English description for alignment exception oopses
If we take an alignment exception which we cannot fix, the oops
currently prints:

Unable to handle kernel paging request for unknown fault

Lets print something more useful:

Unable to handle kernel paging request for unaligned access at address 0xc0000000f77bba8f

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-06 20:24:35 +10:00
Linus Torvalds
9d90f03531 Replace module_init with appropriate alternate initcall in non modules.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVkO6nAAoJEOvOhAQsB9HWpHMP/Aknc+lmX2dZeIn96gdkP+UK
 1qL24C5oq2sm/9yTZLdoXbyApLaaTbAJHS9O4kolaOU6uOs3JrgtXqL1697PVp1R
 qV4f4DOzXmmEHaE2oO21afAri3tXIVQNqA2NQl2TmKfwz0Atu01Vj5RJPu/ZOBPl
 dONXcFnE6nO2p7AEFRP/GfDZwkng4xALyZPhwL7tJDAeGaBpqG/n2hCuq+Szn9g8
 wjTFACBdad/mRrYsL6YsWZ1e+LKI8vsArQbdPTam+jPaEUlK7yjFReFKCJVzL2JP
 xfQoTcCgFztzTUV0JTGR9sqeYA3WH9AkJOFDxNE/eIili4xiTh789WbEpHLVECSX
 1LsW025I3DkRWBPT4L+9ZP805ha71kNXDFc5N3XJkzrCYaFvD2BgsUzxi6FXj7aC
 9lEVKt6xO04FFG5SwTKnO0f8PEhPemZH3BDnVvjBDWQYLjUcPSNz7bfyHUhif0G5
 ulOGVB0ncJJF9iP8PyZs1RA/F8kKxXWnhYMIHzvl0f0vLUA7rAKsACnhBgq8s9ZQ
 uM5YjzU91Z/4pe5C2E5MmQIZ84b79ZPsee1lF0GJdjK5W3PDvnCjIdXfQ5M/f3S8
 76cssXWNhS78/P+19YqirLeb0u7Zw0jf73m9t9ywRgcByWfY5ZUDm0DFpQnWKkoR
 QY/aFO/yHKTO3VHj8Ril
 =KDJO
 -----END PGP SIGNATURE-----

Merge tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull module_init replacement part two from Paul Gortmaker:
 "Replace module_init with appropriate alternate initcall in non
  modules.

  This series converts non-modular code that is using the module_init()
  call to hook itself into the system to instead use one of our
  alternate priority initcalls.

  Unlike the previous series that used device_initcall and hence was a
  runtime no-op, these commits change to one of the alternate initcalls,
  because (a) we have them and (b) it seems like the right thing to do.

  For example, it would seem logical to use arch_initcall for arch
  specific setup code and fs_initcall for filesystem setup code.

  This does mean however, that changes in the init ordering will be
  taking place, and so there is a small risk that some kind of implicit
  init ordering issue may lie uncovered.  But I think it is still better
  to give these ones sensible priorities than to just assign them all to
  device_initcall in order to exactly preserve the old ordering.

  Thad said, we have already made similar changes in core kernel code in
  commit c96d6660dc ("kernel: audit/fix non-modular users of
  module_init in core code") without any regressions reported, so this
  type of change isn't without precedent.  It has also got the same
  local testing and linux-next coverage as all the other pull requests
  that I'm sending for this merge window have got.

  Once again, there is an unused module_exit function removal that shows
  up as an outlier upon casual inspection of the diffstat"

* tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
  x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
  mm/page_owner.c: use late_initcall to hook in enabling
  lib/list_sort: use late_initcall to hook in self tests
  arm: use subsys_initcall in non-modular pl320 IPC code
  powerpc: don't use module_init for non-modular core hugetlb code
  powerpc: use subsys_initcall for Freescale Local Bus
  x86: don't use module_init for non-modular core bootflag code
  netfilter: don't use module_init/exit in core IPV4 code
  fs/notify: don't use module_init for non-modular inotify_user code
  mm: replace module_init usages with subsys_initcall in nommu.c
2015-07-02 10:36:29 -07:00
Linus Torvalds
8d7804a2f0 Driver core patches for 4.2-rc1
Here is the driver core / firmware changes for 4.2-rc1.
 
 A number of small changes all over the place in the driver core, and in
 the firmware subsystem.  Nothing really major, full details in the
 shortlog.  Some of it is a bit of churn, given that the platform driver
 probing changes was found to not work well, so they were reverted.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlWNoCQACgkQMUfUDdst+ym4JACdFrrXoMt2pb8nl5gMidGyM9/D
 jg8AnRgdW8ArDA/xOarULd/X43eA3J3C
 =Al2B
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the driver core / firmware changes for 4.2-rc1.

  A number of small changes all over the place in the driver core, and
  in the firmware subsystem.  Nothing really major, full details in the
  shortlog.  Some of it is a bit of churn, given that the platform
  driver probing changes was found to not work well, so they were
  reverted.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
  Revert "base/platform: Only insert MEM and IO resources"
  Revert "base/platform: Continue on insert_resource() error"
  Revert "of/platform: Use platform_device interface"
  Revert "base/platform: Remove code duplication"
  firmware: add missing kfree for work on async call
  fs: sysfs: don't pass count == 0 to bin file readers
  base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
  base/platform: Remove code duplication
  of/platform: Use platform_device interface
  base/platform: Continue on insert_resource() error
  base/platform: Only insert MEM and IO resources
  firmware: use const for remaining firmware names
  firmware: fix possible use after free on name on asynchronous request
  firmware: check for file truncation on direct firmware loading
  firmware: fix __getname() missing failure check
  drivers: of/base: move of_init to driver_init
  drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
  sysfs: disambiguate between "error code" and "failure" in comments
  driver-core: fix build for !CONFIG_MODULES
  driver-core: make __device_attach() static
  ...
2015-06-26 15:07:37 -07:00
Aneesh Kumar K.V
8809aa2d28 mm: clarify that the function operates on hugepage pte
We have confusing functions to clear pmd, pmd_clear_* and pmd_clear.  Add
_huge_ to pmdp_clear functions so that we are clear that they operate on
hugepage pte.

We don't bother about other functions like pmdp_set_wrprotect,
pmdp_clear_flush_young, because they operate on PTE bits and hence
indicate they are operating on hugepage ptes

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Aneesh Kumar K.V
f28b6ff8c3 powerpc/mm: use generic version of pmdp_clear_flush()
Also move the pmd_trans_huge check to generic code.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Aneesh Kumar K.V
15a25b2ead mm/thp: split out pmd collapse flush into separate functions
Architectures like ppc64 [1] need to do special things while clearing pmd
before a collapse.  For them this operation is largely different from a
normal hugepage pte clear.  Hence add a separate function to clear pmd
before collapse.  After this patch pmdp_* functions operate only on
hugepage pte, and not on regular pmd_t values pointing to page table.

[1] ppc64 needs to invalidate all the normal page pte mappings we already
have inserted in the hardware hash page table.  But before doing that we
need to make sure there are no parallel hash page table insert going on.
So we need to do a kick_all_cpus_sync() before flushing the older hash
table entries.  By moving this to a separate function we capture these
details and mention how it is different from a hugepage pte clear.

This patch is a cleanup and only does code movement for clarity.  There
should not be any change in functionality.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:44 -07:00
Zhang Zhen
e81f2d2237 mm/hugetlb: reduce arch dependent code about huge_pmd_unshare
Currently we have many duplicates in definitions of huge_pmd_unshare.  In
all architectures this function just returns 0 when
CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N.

This patch puts the default implementation in mm/hugetlb.c and lets these
architectures use the common code.

Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Rientjes <rientjes@google.com>
Cc: James Yang <James.Yang@freescale.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-24 17:49:41 -07:00
Linus Torvalds
08d183e3c1 powerpc updates for 4.2
- Disable the 32-bit vdso when building LE, so we can build with a 64-bit only
    toolchain.
  - EEH fixes from Gavin & Richard.
  - Enable the sys_kcmp syscall from Laurent.
  - Sysfs control for fastsleep workaround from Shreyas.
  - Expose OPAL events as an irq chip by Alistair.
  - MSI ops moved to pci_controller_ops by Daniel.
  - Fix for kernel to userspace backtraces for perf from Anton.
  - Merge pseries and pseries_le defconfigs from Cyril.
  - CXL in-kernel API from Mikey.
  - OPAL prd driver from Jeremy.
  - Fix for DSCR handling & tests from Anshuman.
  - Powernv flash mtd driver from Cyril.
  - Dynamic DMA Window support on powernv from Alexey.
  - LLVM clang fixes & workarounds from Anton.
  - Reworked version of the patch to abort syscalls when transactional.
  - Fix the swap encoding to support 4TB, from Aneesh.
  - Various fixes as usual.
  - Freescale updates from Scott: Highlights include more 8xx optimizations, an
    e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and
    various fixes and cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJViSZqAAoJEFHr6jzI4aWAA7kQAKq3+pejfo2rY7alpKJyeVao
 vlaIEaDNOTh+ctcmu3MFF9Jy6fai8gNZziRXU5JRmE5RW4GVBN4KZiqXRbkVjdBK
 uG9sCX7Y58VRsS2vnGBYLsamfTMgjaXeDvgunQHVLiechJnrDr0RHEK90F3LSi73
 Axp6l8XIG63a3zFZmkhzANMCme2lm5+MWmGlSjUUNi5F+viQUgJc5iiO8xrVUgM5
 RpNlV2NJSqFiU+gMQWJ226V85UIniouq4j+qtyUcu8/m9BberyolXVU0GPlPFdsx
 r/Qh9uCJyZaUdSB5hzomQZj50IsSz6J6nEuJTeGRoVZOmeI8Dnc2xU9fxQF5fC8H
 lUJw10WPoNOggQZTeSUKn7wTXw3i4p3KsWNUczaW68VJdhqZUVaSp0+I6mnDSqzs
 9iGC+VffLYNa1OHq7mGRFrgDdLBCHes31aZ3CxlQsmyNpAPCwMzsD4TUfVnvOG6E
 oJOeaQ4mZM9PvqxEYJfoIL+vgRxmQ8sdIBtNY4in+C7J6eFnZNFO9xmPnJZuVU31
 PGtx60kjFCOVMXvqn34WkRNbgqGWI91IK0KcRwFO2LXVio1uY77TWL52kNK2IMsp
 Az+VDDvqnT3+BoV1yz0P6SrXAkwTpvFk2y+IdmEiUUN7zZFL5ZSA2epej9AzHTAK
 WID2bc5yVtIL6p6x5ICH
 =d9Wh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - disable the 32-bit vdso when building LE, so we can build with a
   64-bit only toolchain.

 - EEH fixes from Gavin & Richard.

 - enable the sys_kcmp syscall from Laurent.

 - sysfs control for fastsleep workaround from Shreyas.

 - expose OPAL events as an irq chip by Alistair.

 - MSI ops moved to pci_controller_ops by Daniel.

 - fix for kernel to userspace backtraces for perf from Anton.

 - merge pseries and pseries_le defconfigs from Cyril.

 - CXL in-kernel API from Mikey.

 - OPAL prd driver from Jeremy.

 - fix for DSCR handling & tests from Anshuman.

 - Powernv flash mtd driver from Cyril.

 - dynamic DMA Window support on powernv from Alexey.

 - LLVM clang fixes & workarounds from Anton.

 - reworked version of the patch to abort syscalls when transactional.

 - fix the swap encoding to support 4TB, from Aneesh.

 - various fixes as usual.

 - Freescale updates from Scott: Highlights include more 8xx
   optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
   t1024/t1023 support, and various fixes and cleanup.

* tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
  cxl: Fix typo in debug print
  cxl: Add CXL_KERNEL_API config option
  powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
  powerpc/mm: Change the swap encoding in pte.
  powerpc/mm: PTE_RPN_MAX is not used, remove the same
  powerpc/tm: Abort syscalls in active transactions
  powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
  powerpc/include: Add opal-prd to installed uapi headers
  powerpc/powernv: fix construction of opal PRD messages
  powerpc/powernv: Increase opal-irqchip initcall priority
  powerpc: Make doorbell check preemption safe
  powerpc/powernv: pnv_init_idle_states() should only run on powernv
  macintosh/nvram: Remove as unused
  powerpc: Don't use gcc specific options on clang
  powerpc: Don't use -mno-strict-align on clang
  powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
  powerpc: Only use -mabi=altivec if toolchain supports it
  powerpc: Fix duplicate const clang warning in user access code
  vfio: powerpc/spapr: Support Dynamic DMA windows
  vfio: powerpc/spapr: Register memory and define IOMMU v2
  ...
2015-06-24 08:46:32 -07:00
Michael Ellerman
6096f88451 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include more 8xx optimizations, an e6500 hugetlb optimization,
QMan device tree nodes, t1024/t1023 support, and various fixes and
cleanup."
2015-06-19 17:23:48 +10:00
Paul Gortmaker
6f114281c4 powerpc: don't use module_init for non-modular core hugetlb code
The hugetlbpage.o is obj-y (always built in).  It will never
be modular, so using module_init as an alias for __initcall is
somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of arch_initcall (which
makes sense for arch code) will thus change this registration
from level 6-device to level 3-arch (i.e. slightly earlier).
However no observable impact of that small difference has
been observed during testing, or is expected.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2015-06-16 14:12:34 -04:00
Alexey Kardashevskiy
15b244a88e powerpc/mmu: Add userspace-to-physical addresses translation cache
We are adding support for DMA memory pre-registration to be used in
conjunction with VFIO. The idea is that the userspace which is going to
run a guest may want to pre-register a user space memory region so
it all gets pinned once and never goes away. Having this done,
a hypervisor will not have to pin/unpin pages on every DMA map/unmap
request. This is going to help with multiple pinning of the same memory.

Another use of it is in-kernel real mode (mmu off) acceleration of
DMA requests where real time translation of guest physical to host
physical addresses is non-trivial and may fail as linux ptes may be
temporarily invalid. Also, having cached host physical addresses
(compared to just pinning at the start and then walking the page table
again on every H_PUT_TCE), we can be sure that the addresses which we put
into TCE table are the ones we already pinned.

This adds a list of memory regions to mm_context_t. Each region consists
of a header and a list of physical addresses. This adds API to:
1. register/unregister memory regions;
2. do final cleanup (which puts all pre-registered pages);
3. do userspace to physical address translation;
4. manage usage counters; multiple registration of the same memory
is allowed (once per container).

This implements 2 counters per registered memory region:
- @mapped: incremented on every DMA mapping; decremented on unmapping;
initialized to 1 when a region is just registered; once it becomes zero,
no more mappings allowe;
- @used: incremented on every "register" ioctl; decremented on
"unregister"; unregistration is allowed for DMA mapped regions unless
it is the very last reference. For the very last reference this checks
that the region is still mapped and returns -EBUSY so the userspace
gets to know that memory is still pinned and unregistration needs to
be retried; @used remains 1.

Host physical addresses are stored in vmalloc'ed array. In order to
access these in the real mode (mmu off), there is a real_vmalloc_addr()
helper. In-kernel acceleration patchset will move it from KVM to MMU code.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-11 15:16:54 +10:00
Aneesh Kumar K.V
cfcb3d80a2 powerpc/mm: Add trace point for tracking hash pte fault
This enables us to understand how many hash fault we are taking
when running benchmarks.

For ex:
-bash-4.2# ./perf stat -e  powerpc:hash_fault -e page-faults /tmp/ebizzy.ppc64 -S 30  -P -n 1000
...

 Performance counter stats for '/tmp/ebizzy.ppc64 -S 30 -P -n 1000':

       1,10,04,075      powerpc:hash_fault
       1,10,03,429      page-faults

      30.865978991 seconds time elapsed

NOTE:
The impact of the tracepoint was not noticeable when running test. It was
within the run-time variance of the test. For ex:

without-patch:
--------------

 Performance counter stats for './a.out 3000 300':

	       643      page-faults               #    0.089 M/sec
	  7.236562      task-clock (msec)         #    0.928 CPUs utilized
	 2,179,213      stalled-cycles-frontend   #    0.00% frontend cycles idle
	17,174,367      stalled-cycles-backend    #    0.00% backend  cycles idle
		 0      context-switches          #    0.000 K/sec

       0.007794658 seconds time elapsed

And with-patch:
---------------

 Performance counter stats for './a.out 3000 300':

	       643      page-faults               #    0.089 M/sec
	  7.233746      task-clock (msec)         #    0.921 CPUs utilized
		 0      context-switches          #    0.000 K/sec

       0.007854876 seconds time elapsed

 Performance counter stats for './a.out 3000 300':

	       643      page-faults               #    0.087 M/sec
	       649      powerpc:hash_fault        #    0.087 M/sec
	  7.430376      task-clock (msec)         #    0.938 CPUs utilized
	 2,347,174      stalled-cycles-frontend   #    0.00% frontend cycles idle
	17,524,282      stalled-cycles-backend    #    0.00% backend  cycles idle
		 0      context-switches          #    0.000 K/sec

       0.007920284 seconds time elapsed

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-10 14:06:29 +10:00
Greg Kroah-Hartman
987aec39a7 Merge 4.1-rc7 into driver-core-next
We want the fixes in this branch as well for testing and merge
resolution.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-06-08 10:19:40 -07:00
Michael Neuling
ec249dd860 cxl: Move include file cxl.h -> cxl-base.h
This moves the current include file from cxl.h -> cxl-base.h.  This current
include file is used only to pass information between the base driver that
needs to be built into the kernel and the cxl module.

This is to make way for a new include/misc/cxl.h which will
contain just the kernel API for other driver to use

Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-03 13:27:19 +10:00
Michael Neuling
85a97da958 powerpc/copro: Fix faulting kernel segments
This fixes calculating the key bits (KP and KS) in the SLB VSID for kernel
mappings.

I'm not CCing this to stable as there are no uses of this currently.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-03 13:27:15 +10:00
Scott Wood
6c0cc62715 powerpc/mm: Use PFN_PHYS() in devmem_is_allowed()
This function can run on systems where physical addresses don't
fit in unsigned long, so make sure to use the macro that contains the
proper cast.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:23 -05:00
Scott Wood
c89ca8ab74 powerpc/e6500: Optimize hugepage TLB misses
Some workloads take a lot of TLB misses despite using traditional
hugepages.  Handle these TLB misses in the asm fastpath rather than
going through a bunch of C code.  With this patch I measured around a
5x speedup in handling hugepage TLB misses.

Signed-off-by: Scott Wood <scottwood@freescale.com>
2015-06-02 21:37:19 -05:00
Ingo Molnar
f407a82586 Merge branch 'linus' into sched/core, to resolve conflict
Conflicts:
	arch/sparc/include/asm/topology_64.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 08:05:42 +02:00
Michael Ellerman
09f3f326cd powerpc/mm: Fix build break with STRICT_MM_TYPECHECKS && DEBUG_PAGEALLOC
If both STRICT_MM_TYPECHECKS and DEBUG_PAGEALLOC are enabled, the code
in kernel_map_linear_page() is built, and so we fail with:

  arch/powerpc/mm/hash_utils_64.c:1478:2:
  error: incompatible type for argument 1 of 'htab_convert_pte_flags'

Fix it by using pgprot_val().

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-02 13:24:48 +10:00
Bartosz Golaszewski
06931e6224 sched/topology: Rename topology_thread_cpumask() to topology_sibling_cpumask()
Rename topology_thread_cpumask() to topology_sibling_cpumask()
for more consistency with scheduler code.

Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Benoit Cousson <bcousson@baylibre.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/1432645896-12588-2-git-send-email-bgolaszewski@baylibre.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27 15:22:15 +02:00
Luis R. Rodriguez
ecc8617053 module: add extra argument for parse_params() callback
This adds an extra argument onto parse_params() to be used
as a way to make the unused callback a bit more useful and
generic by allowing the caller to pass on a data structure
of its choice. An example use case is to allow us to easily
make module parameters for every module which we will do
next.

@ parse @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 extern char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					));

@ parse_mod @
identifier name, args, params, num, level_min, level_max;
identifier unknown, param, val, doing;
type s16;
@@
 char *parse_args(const char *name,
 			 char *args,
 			 const struct kernel_param *params,
 			 unsigned num,
 			 s16 level_min,
 			 s16 level_max,
+			 void *arg,
 			 int (*unknown)(char *param, char *val,
					const char *doing
+					, void *arg
					))
{
	...
}

@ parse_args_found @
expression R, E1, E2, E3, E4, E5, E6;
identifier func;
@@

(
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	R =
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   &func);
|
	parse_args(E1, E2, E3, E4, E5, E6,
+		   NULL,
		   NULL);
)

@ parse_args_unused depends on parse_args_found @
identifier parse_args_found.func;
@@

int func(char *param, char *val, const char *unused
+		 , void *arg
		 )
{
	...
}

@ mod_unused depends on parse_args_found @
identifier parse_args_found.func;
expression A1, A2, A3;
@@

-	func(A1, A2, A3);
+	func(A1, A2, A3, NULL);

Generated-by: Coccinelle SmPL
Cc: cocci@systeme.lip6.fr
Cc: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Felipe Contreras <felipe.contreras@gmail.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Tejun Heo <tj@kernel.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-20 00:25:24 -07:00
David Hildenbrand
70ffdb9393 mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:15 +02:00
David Hildenbrand
2cb7c9cb42 sched/preempt, mm/kmap: Explicitly disable/enable preemption in kmap_atomic_*
The existing code relies on pagefault_disable() implicitly disabling
preemption, so that no schedule will happen between kmap_atomic() and
kunmap_atomic().

Let's make this explicit, to prepare for pagefault_disable() not
touching preemption anymore.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-5-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:14 +02:00
Aneesh Kumar K.V
7b868e81be powerpc/mm: Return NULL for not present hugetlb page
We need to check whether pte is present in follow_huge_addr() and
properly return NULL if mapping is not present. Also use READ_ONCE
when dereferencing pte_t address.

Without this patch, we may wrongly return a zero pfn page in
follow_huge_addr().

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 11:04:29 +10:00
Aneesh Kumar K.V
13bd817bb8 powerpc/thp: Serialize pmd clear against a linux page table walk.
Serialize against find_linux_pte_or_hugepte() which does lock-less
lookup in page tables with local interrupts disabled. For huge pages it
casts pmd_t to pte_t. Since the format of pte_t is different from pmd_t
we want to prevent transit from pmd pointing to page table to pmd
pointing to huge page (and back) while interrupts are disabled.  We
clear pmd to possibly replace it with page table pointer in different
code paths. So make sure we wait for the parallel
find_linux_pte_or_hugepage() to finish.

Without this patch, a find_linux_pte_or_hugepte() running in parallel to
__split_huge_zero_page_pmd() or do_huge_pmd_wp_page_fallback() or
zap_huge_pmd() can run into the above issue. With
__split_huge_zero_page_pmd() and do_huge_pmd_wp_page_fallback() we clear
the hugepage pte before inserting the pmd entry with a regular pgtable
address. Such a clear need to wait for the parallel
find_linux_pte_or_hugepte() to finish.

With zap_huge_pmd(), we can run into issues, with a hugepage pte getting
zapped due to a MADV_DONTNEED while other cpu fault it in as small
pages.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-12 11:03:29 +10:00
Aneesh Kumar K.V
2e826695d8 powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled
This fix the below build error

arch/powerpc/mm/hash_utils_64.c: In function ‘flush_hash_hugepage’:
arch/powerpc/mm/hash_utils_64.c:1381:1: error: label at end of compound statement
 tm_abort:
 ^
make[1]: *** [arch/powerpc/mm/hash_utils_64.o] Error 1

Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-23 17:42:14 +10:00
Aneesh Kumar K.V
7d6e7f7ffa powerpc/mm/thp: Return pte address if we find trans_splitting.
For THP that is marked trans splitting, we return the pte.
This require the callers to handle the pmd_trans_splitting scenario,
if they care. All the current callers are either looking at pfn or
write_ok, hence we don't need to update them.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:40 +10:00
Aneesh Kumar K.V
691e95fd73 powerpc/mm/thp: Make page table walk safe against thp split/collapse
We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-17 11:23:39 +10:00
Michael Ellerman
1cbee462a5 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes 2015-04-17 11:22:51 +10:00
Linus Torvalds
d19d5efd8c powerpc updates for 4.1
- Numerous minor fixes, cleanups etc.
 - More EEH work from Gavin to remove its dependency on device_nodes.
 - Memory hotplug implemented entirely in the kernel from Nathan Fontenot.
 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.
 - Rewrite of VPHN parsing logic & tests from Greg Kurz.
 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.
 - Support for pstore on powernv from Hari Bathini.
 - Removal of old powerpc specific byte swap routines by David Gibson.
 - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing
   your firmware when it wasn't.
 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.
 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek.
 - Some fixes for migration from Tyrel Datwyler.
 - A new syscall to switch the cpu endian by Michael Ellerman.
 - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn.
 - A fix for the OPAL sensor driver from Cédric Le Goater.
 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.
 - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per
   machine.
 - Small patch from Sam Bobroff to explicitly abort non-suspended transactions
   on syscalls, plus a test to exercise it.
 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.
 - Small patch to enable the hard lockup detector from Anton Blanchard.
 - Fix from Dave Olson for missing L2 cache information on some CPUs.
 - Some fixes from Michael Ellerman to get Cell machines booting again.
 - Freescale updates from Scott: Highlights include BMan device tree nodes, an
   MSI erratum workaround, a couple minor performance improvements, config
   updates, and misc fixes/cleanup.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVL2cxAAoJEFHr6jzI4aWAR8cP/19VTo/CzCE4ffPSx7qR464n
 F+WFZcbNjIMXu6+B0YLuJZEsuWtKKrCit/MCg3+mSgE4iqvxmtI+HDD0445Buszj
 UD4E4HMdPrXQ+KUSUDORvRjv/FFUXIa94LSv/0g2UeMsPz/HeZlhMxEu7AkXw9Nf
 rTxsmRTsOWME85Y/c9ss7XHuWKXT3DJV7fOoK9roSaN3dJAuWTtG3WaKS0nUu0ok
 0M81D6ZczoD6ybwh2DUMPD9K6SGxLdQ4OzQwtW6vWzcQIBDfy5Pdeo0iAFhGPvXf
 T4LLPkv4cF4AwHsAC4rKDPHQNa+oZBoLlScrHClaebAlDiv+XYKNdMogawUObvSh
 h7avKmQr0Ygp1OvvZAaXLhuDJI9FJJ8lf6AOIeULgHsDR9SyKMjZWxRzPe11uarO
 Fyi0qj3oJaQu6LjazZraApu8mo+JBtQuD3z3o5GhLxeFtBBF60JXj6zAXJikufnl
 kk1/BUF10nKUhtKcDX767AMUCtMH3fp5hx8K/z9T5v+pobJB26Wup1bbdT68pNBT
 NjdKUppV6QTjZvCsA6U2/ECu6E9KeIaFtFSL2IRRoiI0dWBN5/5eYn3RGkO2ZFoL
 1NdwKA2XJcchwTPkpSRrUG70sYH0uM2AldNYyaLfjzrQqza7Y6lF699ilxWmCN/H
 OplzJAE5cQ8Am078veTW
 =03Yh
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:

 - Numerous minor fixes, cleanups etc.

 - More EEH work from Gavin to remove its dependency on device_nodes.

 - Memory hotplug implemented entirely in the kernel from Nathan
   Fontenot.

 - Removal of redundant CONFIG_PPC_OF by Kevin Hao.

 - Rewrite of VPHN parsing logic & tests from Greg Kurz.

 - A fix from Nish Aravamudan to reduce memory usage by clamping
   nodes_possible_map.

 - Support for pstore on powernv from Hari Bathini.

 - Removal of old powerpc specific byte swap routines by David Gibson.

 - Fix from Vasant Hegde to prevent the flash driver telling you it was
   flashing your firmware when it wasn't.

 - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver.

 - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan
   Stancek.

 - Some fixes for migration from Tyrel Datwyler.

 - A new syscall to switch the cpu endian by Michael Ellerman.

 - Large series from Wei Yang to implement SRIOV, reviewed and acked by
   Bjorn.

 - A fix for the OPAL sensor driver from Cédric Le Goater.

 - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman.

 - Large series from Daniel Axtens to make our PCI hooks per PHB rather
   than per machine.

 - Small patch from Sam Bobroff to explicitly abort non-suspended
   transactions on syscalls, plus a test to exercise it.

 - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu.

 - Small patch to enable the hard lockup detector from Anton Blanchard.

 - Fix from Dave Olson for missing L2 cache information on some CPUs.

 - Some fixes from Michael Ellerman to get Cell machines booting again.

 - Freescale updates from Scott: Highlights include BMan device tree
   nodes, an MSI erratum workaround, a couple minor performance
   improvements, config updates, and misc fixes/cleanup.

* tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits)
  powerpc/powermac: Fix build error seen with powermac smp builds
  powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE
  powerpc: Remove PPC32 code from pseries specific find_and_init_phbs()
  powerpc/cell: Fix iommu breakage caused by controller_ops change
  powerpc/eeh: Fix crash in eeh_add_device_early() on Cell
  powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
  powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails
  powerpc/pseries: Correct memory hotplug locking
  powerpc: Fix missing L2 cache size in /sys/devices/system/cpu
  powerpc: Add ppc64 hard lockup detector support
  oprofile: Disable oprofile NMI timer on ppc64
  powerpc/perf/hv-24x7: Add missing put_cpu_var()
  powerpc/perf/hv-24x7: Break up single_24x7_request
  powerpc/perf/hv-24x7: Define update_event_count()
  powerpc/perf/hv-24x7: Whitespace cleanup
  powerpc/perf/hv-24x7: Define add_event_to_24x7_request()
  powerpc/perf/hv-24x7: Rename hv_24x7_event_update
  powerpc/perf/hv-24x7: Move debug prints to separate function
  powerpc/perf/hv-24x7: Drop event_24x7_request()
  powerpc/perf/hv-24x7: Use pr_devel() to log message
  ...

Conflicts:
	tools/testing/selftests/powerpc/Makefile
	tools/testing/selftests/powerpc/tm/Makefile
2015-04-16 13:53:32 -05:00
Scott Wood
50c6a665b3 powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
Commit dc6c9a35b6 ("mm: account pmd page tables to the process")
added a counter that is incremented whenever a PMD is allocated and
decremented whenever a PMD is freed.  For hugepages on PPC, common code
is used to allocated PMDs, but arch-specific code is used to free PMDs.

This results in kernel output such as "BUG: non-zero nr_pmds on freeing
mm: 1" when using hugepages.

Update the PPC hugepage PMD freeing code to decrement the count, just
as the above commit did for free_pmd_range().

Fixes: dc6c9a35b6 ("mm: account pmd page tables to the process")
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.0.x
2015-04-15 15:24:22 -05:00
Kees Cook
2b68f6caea mm: expose arch_mmap_rnd when available
When an architecture fully supports randomizing the ELF load location,
a per-arch mmap_rnd() function is used to find a randomized mmap base.
In preparation for randomizing the location of ET_DYN binaries
separately from mmap, this renames and exports these functions as
arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
for describing this feature on architectures that support it
(which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
already supports a separated ET_DYN ASLR from mmap ASLR without the
ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Russell King <linux@arm.linux.org.uk>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Arun Chandran <achandran@mvista.com>
Cc: Yann Droneaud <ydroneaud@opteya.com>
Cc: Min-Hua Chen <orca.chen@gmail.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Vineeth Vijayan <vvijayan@mvista.com>
Cc: Jeff Bailey <jeffbailey@google.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Behan Webster <behanw@converseincode.com>
Cc: Ismael Ripoll <iripoll@upv.es>
Cc: Jan-Simon Mller <dl9pf@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Kees Cook
ed6322746a powerpc: standardize mmap_rnd() usage
In preparation for splitting out ET_DYN ASLR, this refactors the use of
mmap_rnd() to be used similarly to arm and x86.

(Can mmap ASLR be safely enabled in the legacy mmap case here?  Other
archs use "mm->mmap_base = TASK_UNMAPPED_BASE + random_factor".)

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:05 -07:00
Michael Ellerman
f691fa1080 powerpc: Replace mem_init_done with slab_is_available()
We have a powerpc specific global called mem_init_done which is "set on
boot once kmalloc can be called".

But that's not *quite* true. We set it at the bottom of mem_init(), and
rely on the fact that mm_init() calls kmem_cache_init() immediately
after that, and nothing is running in parallel.

So replace it with the generic and 100% correct slab_is_available().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:48 +10:00
Michael Ellerman
4f9c53c8cc powerpc: Fix compile errors with STRICT_MM_TYPECHECKS enabled
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fix the 32-bit code also]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-10 20:02:47 +10:00
Michael Ellerman
5dd4e4f6fe powerpc/mm: Change setbat() to take a pgprot_t rather than flags
The callers of setbat() are actually passing a pgprot_t for the flags
parameter. This doesn't matter unless STRICT_MM_TYPECHECKS is enabled.
So we can turn that on without breaking the build, change setbat() to
take a pgprot_t and have it convert it to an unsigned long internally.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-07 17:15:13 +10:00
Michael Ellerman
911083350e powerpc/mm: Remove duplicate declaration of setbat()
This is already declared in mmu_decl.h, so we don't need a second
version in the C file.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-04-07 17:15:13 +10:00
Michael Ellerman
df60f57684 Merge branch 'next-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc into test
Merge miscellaneous bits from benh. Fix a minor conflict with
OpalMessageType changing names to opal_msg_type.
2015-03-26 20:04:28 +11:00
Yanjiang Jin
e77553cb21 powerpc/mm: Free string after creating kmem cache
kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new
space and copys name's content, so it is safe to free name memory after
calling kmem_cache_create(). Else kmemleak will report the below
warning:

unreferenced object 0xc0000000f9002160 (size 16):
  comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s)
  hex dump (first 16 bytes):
    70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef  pgtable-2^9.....
  backtrace:
    [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0
    [<c0000000004e045c>] .kasprintf+0x2c/0x50
    [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100
    [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80
    [<c000000000c6c67c>] .start_kernel+0x228/0x4c8
    [<c000000000000594>] .start_here_common+0x24/0x90

Signed-off-by: Yanjiang Jin <yanjiang.jin@windriver.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-26 15:23:17 +11:00
Scott Wood
cc83458d3a powerpc/32: %pF is only for function pointers
Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.  Even on
other architectures, refrain from setting a bad example that people
copy.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 15:14:48 +11:00
Nishanth Aravamudan
3af229f207 powerpc/numa: Reset node_possible_map to only node_online_map
Raghu noticed an issue with excessive memory allocation on power with a
simple cgroup test, specifically, in mem_cgroup_css_alloc ->
for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing
up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup
directories).

The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes
possible), which defines node_possible_map, which in turn defines the
value of nr_node_ids in setup_nr_node_ids and the iteration of
for_each_node.

In practice, we never see a system with 256 NUMA nodes, and in fact, we
do not support node hotplug on power in the first place, so the nodes
that are online when we come up are the nodes that will be present for
the lifetime of this kernel. So let's, at least, drop the NUMA possible
map down to the online map at runtime. This is similar to what x86 does
in its initialization routines.

mem_cgroup_css_alloc should also be fixed to only iterate over
memory-populated nodes and handle hotplug, but that is a separate
change.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 13:25:53 +11:00
Greg Kurz
3338a65bad powerpc/vphn: parsing code rewrite
The current VPHN parsing logic has some flaws that this patch aims to fix:

1) when the value 0xffff is read, the value 0xffffffff gets added to the
   the output list and its element count isn't incremented. This is wrong.
   According to PAPR+ the domain identifiers are packed into a sequence
   terminated by the "reserved value of all ones". This means that 0xffff
   is a stream terminator.

2) the combination of byteswaps and casts make the code hardly readable.
   Let's parse the stream one 16-bit field at a time instead.

3) it is assumed that the hypercall returns 12 32-bit values packed into
   6 64-bit registers. According to PAPR+, the domain identifiers may be
   streamed as 16-bit values. Let's increase the number of expected numbers
   to 24.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz
4b6cfb2a8c powerpc/vphn: move VPHN parsing logic to a separate file
The goal behind this patch is to be able to write userland tests for the
VPHN parsing code.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz
b1fc9484aa powerpc/vphn: move endianness fixing to vphn_unpack_associativity()
The first argument to vphn_unpack_associativity() is a const long *, but the
parsing code expects __be64 values actually. Let's move the endian fixing
down for consistency.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Greg Kurz
9d0e6d9db5 powerpc/vphn: clarify the H_HOME_NODE_ASSOCIATIVITY API
The number of values returned by the H_HOME_NODE_ASSOCIATIVITY h_call deserves
to be explicitly defined, for a better understanding of the code.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-18 10:48:59 +11:00
Kyle Moffett
b05ae4ee60 powerpc: Remove duplicate cacheable_memcpy/memzero functions
These functions are only used from one place each.  If the cacheable_*
versions really are more efficient, then those changes should be
migrated into the common code instead.

NOTE: The old routines are just flat buggy on kernels that support
      hardware with different cacheline sizes.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 11:25:50 +11:00
Kirill A. Shutemov
780fc5642f powerpc: drop _PAGE_FILE and pte_file()-related helpers
We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-16 17:56:05 -08:00