Thanks to commit 4b3ef9daa4 ("mm/swap: split swap cache into 64MB
trunks"), after swapoff the address_space associated with the swap
device will be freed. So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.
The dcache flushing functions (flush_dcache_page(), etc) in architecture
specific code may access the address_space of swap device for anonymous
pages in swap cache via page_mapping() function. But in some cases
there are no mechanisms to prevent the swap device from being swapoff,
for example,
CPU1 CPU2
__get_user_pages() swapoff()
flush_dcache_page()
mapping = page_mapping()
... exit_swap_address_space()
... kvfree(spaces)
mapping_mapped(mapping)
The address space may be accessed after being freed.
But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should be
used. The implementation of flush_dcache_page() in all architectures
follows this too. They will check whether page_mapping() is NULL and
whether mapping_mapped() is true to determine whether to flush the
dcache immediately. And they will use interval tree (mapping->i_mmap)
to find all user space mappings. While mapping_mapped() and
mapping->i_mmap isn't used by anonymous pages in swap cache at all.
So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise. All page_mapping() invoking in flush dcache functions are
replaced with page_mapping_file().
[akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xtensa memory initialization code frees high memory pages without
checking whether they are in the reserved memory regions or not. That
results in invalid value of totalram_pages and duplicate page usage by
CMA and highmem. It produces a bunch of BUGs at startup looking like
this:
BUG: Bad page state in process swapper pfn:70800
page:be60c000 count:0 mapcount:-127 mapping: (null) index:0x1
flags: 0x80000000()
raw: 80000000 00000000 00000001 ffffff80 00000000 be60c014 be60c014 0000000a
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Tainted: G B 4.16.0-rc1-00015-g7928b2cbe55b-dirty #23
Stack:
bd839d33 00000000 00000018 ba97b64c a106578c bd839d70 be60c000 00000000
a1378054 bd86a000 00000003 ba97b64c a1066166 bd839da0 be60c000 ffe00000
a1066b58 bd839dc0 be504000 00000000 000002f4 bd838000 00000000 0000001e
Call Trace:
[<a1065734>] bad_page+0xac/0xd0
[<a106578c>] free_pages_check_bad+0x34/0x4c
[<a1066166>] __free_pages_ok+0xae/0x14c
[<a1066b58>] __free_pages+0x30/0x64
[<a1365de5>] init_cma_reserved_pageblock+0x35/0x44
[<a13682dc>] cma_init_reserved_areas+0xf4/0x148
[<a10034b8>] do_one_initcall+0x80/0xf8
[<a1361c16>] kernel_init_freeable+0xda/0x13c
[<a125b59d>] kernel_init+0x9/0xd0
[<a1004304>] ret_from_kernel_thread+0xc/0x18
Only free high memory pages that are not reserved.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Output virtual addresses and sizes occupied by the main kernel sections:
.text, .rodata, .data, .init and .bss.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Cover kernel addresses above 0x90000000 by the shadow map. Enable
HAVE_ARCH_KASAN when MMU is enabled. Provide kasan_early_init that fills
shadow map with writable copies of kasan_zero_page. Call
kasan_early_init right after mmu initialization in the setup_arch.
Provide kasan_init that allocates proper shadow map pages from the
memblock and puts these pages into the shadow map for addresses from
VMALLOC area to the end of KSEG. Call kasan_init right after memblock
initialization. Don't use KASAN for the boot code, MMU and KASAN
initialization and page fault handler. Make kernel stack size 4 times
larger when KASAN is enabled to avoid stack overflows.
GCC 7.3, 8 or newer is required to build the xtensa kernel with KASAN.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The virtual address space between the page table and the VMALLOC region
is big enough to host KASAN shadow map and there's enough space between
the VMALLOC area and KSEG for the fixmap and kmap.
Move fixmap and kmap to the gap between VMALLOC area and KSEG, just
above the KSEG. Reorder entries in the kernel memory layout printing
code. Drop duplicate PGTABLE_START definition, use
XCHAL_PAGE_TABLE_VADDR instead.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
swapper_pg_dir is located in the .bss, so it's zero-initialized anyway.
With KASAN enabled paging_init will be called after KASAN
initialization, it must not erase page directory entries set up for
KASAN shadow map.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
KIO region placement may be specified in the device tree, that's why
it's initialized with the rest of MMU after the early_init_devtree. In
order to support KASAN the MMU must be initialized earlier.
Separate KIO initialization from the rest of MMU initialization.
Reinitialize KIO if its location is specified in the device tree.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Replace #ifdef'fed/commented out debug printk statements with pr_debug.
Replace printk statements with pr_* equivalents.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Functions clear_user_highpage, copy_user_highpage, flush_dcache_page,
local_flush_cache_range and local_flush_cache_page may be used from
modules. Export them.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Currently building kernel for xtensa core with aliasing WT cache fails
with the following messages:
mm/memory.c:2152: undefined reference to `flush_dcache_page'
mm/memory.c:2332: undefined reference to `local_flush_cache_page'
mm/memory.c:1919: undefined reference to `local_flush_cache_range'
mm/memory.c:4179: undefined reference to `copy_to_user_page'
mm/memory.c:4183: undefined reference to `copy_from_user_page'
This happens because implementation of these functions is only compiled
when data cache is WB, which looks wrong: even when data cache doesn't
need flushing it still needs invalidation. The functions like
__flush_[invalidate_]dcache_* are correctly defined for both WB and WT
caches (and even if they weren't that'd still be ok, just slower).
Fix this by providing the same implementation of the above functions for
both WB and WT cache.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This file was only including module.h for exception table related
functions. We've now separated that content out into its own file
"extable.h" so now move over to that and avoid all the extra header
content in module.h that we don't really need to compile this file.
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: linux-xtensa@linux-xtensa.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enable HAVE_DMA_CONTIGUOUS, reserve contiguous memory at bootmem_init,
use dma_alloc_from_contiguous and dma_release_from_contiguous in
xtensa_dma_alloc/free.
This allows for big contiguous DMA buffer allocation from designated
area configured in the device tree.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
- add new kernel memory layouts for MMUv3 cores: with 256MB and 512MB
KSEG size, starting at physical address other than 0;
- make kernel load address configurable;
- clean up kernel memory layout macros;
- drop sysmem early allocator and switch to memblock;
- enable kmemleak and memory reservation from the device tree;
- wire up new syscalls: userfaultfd, membarrier, mlock2, copy_file_range,
preadv2 and pwritev2;
- add new platform: Cadence Configurable System Platform (CSP) and new
core variant for it: xt_lnx;
- rearrange CCOUNT calibration code, make most of it generic;
- improve machine reset code (XTFPGA now reboots reliably with MMUv3
cores);
- provide default memmap command line option for configurations without
device tree support;
- ISS fixes: simdisk is now capable of using highmem pages, panic
correctly terminates simulator.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJX9RvtAAoJEFH5zJH4P6BEwmoQAJTUTrkRVd0nlTkh2vt8GfNR
s0rGUnAZa2dm3EY+J7F7RFxDfcXHP5Z73iM0fm8mUt8V/f6NR4QEF1FB9BI0lqXy
fTKHCgt+85BtPzIsNukwDi+QRyEtn3wFVCluKU4mtZ6KcEffTJwT0zMxrpDXoMdq
gcoFGViSdQ0aNo1RosHUBCF/f34+cfUnvvmF8FhcnkAmTWniM+kWk0nDmGz+qInF
ZWhvbcrPEEqR0j/wLLgL7kMhz1AYLI08+DaGR2UP80NQ9yuWraDfsRFnKbAHDqE0
JHAdcUQtPrQmBPSlc+CaE84sPXutsKVoZ/DKby70OR1TljrdytxnVC7zBvdgfVGd
bWa7+qNdhSjGKtxwOPIvjOK5VJZYsFAI3SDEVW9pg0ZD3uBec+P1yWbh1Wvo+Geb
X46EdlUfjsVp4U4G8CTG3aTQB8Dgn6QnkhtbI067l6evCebT21bx4Re1nPCfLD8C
nlt1bgstVUuWDJt+2J0cGbMBill+RBtCEHEwsU778dqq7dJmiawg1aLI2kyHL6P5
VpBaprVrUHHZ5We0obl1BPyK1Sfc7L/NiaKv0wZbuAIoEjeEloYEB+q56HFz9NWn
CJfcfugIh9q58842C0L0XY6uhce+7ZIpqTCMYFC6e8QjpJibY9qbORyineQy4Q7V
QtGm6s1HFRCyvzpx2Uen
=8HqU
-----END PGP SIGNATURE-----
Merge tag 'xtensa-20161005' of git://github.com/jcmvbkbc/linux-xtensa
Pull Xtensa updates from Max Filippov:
"Updates for the xtensa architecture. It is a combined set of patches
for 4.8 that never got to the mainline and new patches for 4.9.
- add new kernel memory layouts for MMUv3 cores: with 256MB and 512MB
KSEG size, starting at physical address other than 0
- make kernel load address configurable
- clean up kernel memory layout macros
- drop sysmem early allocator and switch to memblock
- enable kmemleak and memory reservation from the device tree
- wire up new syscalls: userfaultfd, membarrier, mlock2,
copy_file_range, preadv2 and pwritev2
- add new platform: Cadence Configurable System Platform (CSP) and
new core variant for it: xt_lnx
- rearrange CCOUNT calibration code, make most of it generic
- improve machine reset code (XTFPGA now reboots reliably with MMUv3
cores)
- provide default memmap command line option for configurations
without device tree support
- ISS fixes: simdisk is now capable of using highmem pages, panic
correctly terminates simulator"
* tag 'xtensa-20161005' of git://github.com/jcmvbkbc/linux-xtensa: (24 commits)
xtensa: disable MMU initialization option on MMUv2 cores
xtensa: add default memmap and mmio32native options to defconfigs
xtensa: add default memmap option to common_defconfig
xtensa: add default memmap option to iss_defconfig
xtensa: ISS: allow simdisk to use high memory buffers
xtensa: ISS: define simc_exit and use it instead of inline asm
xtensa: xtfpga: group platform_* functions together
xtensa: rearrange CCOUNT calibration
xtensa: xtfpga: use clock provider, don't update DT
xtensa: Tweak xuartps UART driver Rx watermark for Cadence CSP config.
xtensa: initialize MMU before jumping to reset vector
xtensa: fix icountlevel setting in cpu_reset
xtensa: extract common CPU reset code into separate function
xtensa: Added Cadence CSP kernel configuration for Xtensa
xtensa: fix default kernel load address
xtensa: wire up new syscalls
xtensa: support reserved-memory DT node
xtensa: drop sysmem and switch to memblock
xtensa: minimize use of PLATFORM_DEFAULT_MEM_{ADDR,SIZE}
xtensa: cleanup MMU setup and kernel layout macros
...
This allows reserving regions of physical memory from the device tree.
See Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
for more details.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Memblock is the standard kernel boot-time memory tracker/allocator. Use
it instead of the custom sysmem allocator. This allows using kmemleak,
CMA and device tree memory reservation.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
MMUv3 is able to support low memory bigger than 128MB.
Implement 256MB and 512MB KSEG layouts:
- add Kconfig selector for KSEG layout;
- add KSEG base address, size and alignment definitions to
arch/xtensa/include/asm/kmem_layout.h;
- use new definitions in TLB initialization;
- add build time memory map consistency checks.
See Documentation/xtensa/mmu.txt for the details of new memory layouts.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Create a header dedicated to memory layout definitions. Include it from
places where these definitions are needed.
Express vmalloc area address, VIRTUAL_MEMORY_ADDRESS and KERNELOFFSET
through KSEG address.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
- control whether perf IRQ is treated as NMI from Kconfig;
- implement ioremap for regions outside KIO segment;
- fix ISS serial port behaviour when EOF is reached;
- fix preemption in {clear,copy}_user_highpage;
- fix endianness issues for XTFPGA devices, big-endian cores are now
fully functional;
- clean up debug infrastructure and add support for hardware breakpoints
and watchpoints.
- add processor configurations for Three Core HiFi-2 MX and HiFi3 cpus
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW7lSZAAoJEI9vqH3mFV2sSV4QAI6P5huzOPT6OugN+nnI7REh
Hu6+64F3pNmrtVa2AdApAPjJm9eCeDpFRDI7QC7VArFJFTvYQiu3Ejp3/197cr8s
gUvh9BFSnsqbwtzFXXpwDfSzLVcd6hPKWfTw7r2THotVbkba0JZvErMNwTryvCtY
3W1tVJmBi/6W2LBVkEbJHneguC1tQ6e8+poNDrcYIvxIdRnWGSWNW0xjUtCPX5pB
HmEb5K8a/UWMxrZ8ZReGCsKKfXdIlFpQj23Xt/9IfxoR5UmWna1/BuarlhA0063y
QT8kXv54koIhC08Tn55yaUvK8tcUGqr/3x3VXL8n//0QRdI9weT8ouoqodJ80MmW
AgFm1A0MzJRrm1gdtB6pusgCqalXoKfOuxI7EhazTdBBAcHEdp6+j3t0k73FxPjq
ZXXFpZjGleYaKMqBSU80a/uW/DRELyvPorowJPUN9hGrvXtYx2cPYzbUI3uCJKHb
6lfCe72igM/0LSpbKCysTUNE2gjYESELrmEePSsaNpYbjhzKIoB86+SVpjgekucC
Hpo8PCyoggTaxCgPapd4zJVLStF7UHAX0fnrFQkDCn4bA1iULQCXS4gI1ie1DKBY
imKW26bGPjaZKIT5GTJYiUeff5MBj9SHkq1OdhySJaEz1tJXQFeO3HNd87mKeGUB
8PJXDh0ryXtcXA5ygGOc
=jcKN
-----END PGP SIGNATURE-----
Merge tag 'xtensa-next-20160320' of git://github.com/czankel/xtensa-linux
Pull Xtensa updates from Chris Zankel:
"Xtensa improvements for 4.6:
- control whether perf IRQ is treated as NMI from Kconfig
- implement ioremap for regions outside KIO segment
- fix ISS serial port behaviour when EOF is reached
- fix preemption in {clear,copy}_user_highpage
- fix endianness issues for XTFPGA devices, big-endian cores are now
fully functional
- clean up debug infrastructure and add support for hardware
breakpoints and watchpoints
- add processor configurations for Three Core HiFi-2 MX and HiFi3
cpus"
* tag 'xtensa-next-20160320' of git://github.com/czankel/xtensa-linux:
xtensa: add test_kc705_hifi variant
xtensa: add Three Core HiFi-2 MX Variant.
xtensa: support hardware breakpoints/watchpoints
xtensa: use context structure for debug exceptions
xtensa: remove remaining non-functional KGDB bits
xtensa: clear all DBREAKC registers on start
xtensa: xtfpga: fix earlycon endianness
xtensa: xtfpga: fix i2c controller register width and endianness
xtensa: xtfpga: fix ethernet controller endianness
xtensa: xtfpga: fix serial port register width and endianness
xtensa: define CONFIG_CPU_{BIG,LITTLE}_ENDIAN
xtensa: fix preemption in {clear,copy}_user_highpage
xtensa: ISS: don't hang if stdin EOF is reached
xtensa: support ioremap for memory outside KIO region
xtensa: use XTENSA_INT_LEVEL macro in asm/timex.h
xtensa: make fake NMI configurable
The define has a comment from Nick Piggin from 2007:
/* For backwards compat. Remove me quickly. */
I guess 9 years should not be too hurried sense of 'quickly' even for
kernel measures.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Disabling pagefault makes little sense there, preemption disabling is
what was meant.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Let's define page_mapped() to be true for compound pages if any
sub-pages of the compound page is mapped (with PMD or PTE).
On other hand page_mapcount() return mapcount for this particular small
page.
This will make cases like page_get_anon_vma() behave correctly once we
allow huge pages to be mapped with PTE.
Most users outside core-mm should use page_mapcount() instead of
page_mapped().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).
In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().
Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.
faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.
This patch is based on a patch from Thomas Gleixner.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
noMMU configuration doesn't use special area for vmalloc allocations,
don't print it in the memory map.
PAGE_OFFSET is fixed to 0 in noMMU, use min_low_pfn and max_low_pfn for
lowmem range display.
Make all XCHAL_KSEG_* constants unsigned long for consistency with other
addresses.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Memory accounting code can't handle pages below
PLATFORM_DEFAULT_MEM_START. Reserve those pages if they exist.
When PLATFORM_DEFAULT_MEM_START is zero reserve one page at address 0 to
make sure that successfull memory allocations don't return NULL.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Use __flush_invalidate_dcache_page_alias with alias set to color of the
page physical address instead of __flush_invalidate_dcache_page: this
works for high memory pages and mapping/unmapping to the TLBTEMP area is
virtually free.
Allow building configurations with aliasing cache and highmem enabled.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Define ARCH_PKMAP_COLORING and provide corresponding macro definitions
on cores with aliasing data cache.
Instead of single last_pkmap_nr maintain an array last_pkmap_nr_arr of
pkmap counters for each page color. Make sure that kmap maps physical
page at virtual address with color matching its physical address.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Map high memory pages at virtual addresses with color that match color
of their physical address. Existing cache alias management mechanisms
may be used with such pages.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Existing clear_user_page and copy_user_page cannot be used with highmem
because they calculate physical page address from its virtual address
and do it incorrectly in case of high memory page mapped with
kmap_atomic. Also kmap is not needed, as most likely userspace mapping
color would be different from the kmapped color.
Provide clear_user_highpage and copy_user_highpage functions that
determine if temporary mapping is needed for the pages. Move most of the
logic of the former clear_user_page and copy_user_page to
xtensa/mm/cache.c only leaving temporary mapping setup, invalidation and
clearing/copying in the xtensa/mm/misc.S. Rename these functions to
clear_page_alias and copy_page_alias.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
To support aliasing cache both kmap region sizes are multiplied by the
number of data cache colors. After that expansion page tables that cover
kmap regions may become larger than one page. Correctly allocate and
initialize page tables in this case.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
It's much easier to reason about alignment and coloring of regions
located in the fixmap when fixmap index is just a PFN within the fixmap
region. Change fixmap addressing so that index 0 corresponds to
FIXADDR_START instead of the FIXADDR_TOP.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
When sysmem reservation occurs exactly at the end of an existing block
that block is deleted, because it is incorrectly included in the range
of memblocks to remove. Fix that by skipping such block.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Introduce fixmap area just below the vmalloc region. Use it for atomic
mapping of high memory pages.
High memory on cores with cache aliasing is not supported and is still
to be implemented. Fail build for such configurations for now.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Debug dump of physical memory configuration. Useful for inspection of
resulting memory map, esp. in the presence of memmap= kernel option.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This option is useful for reserving memory regions for secondary cores
in AMP configurations.
Implement the following memmap variants:
- memmap=nn[KMG]@ss[KMG]: force usage of a specific region of memory;
- memmap=nn[KMG]$ss[KMG]: mark specified memory as reserved;
- memmap=nn[KMG]: set end of memory.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Bootparam meminfo is a bootloader ABI, kernel meminfo is for the kernel
bookkeeping, keep them separate. Kernel doesn't care of memory region
types, so drop the type field and don't pass it to add_sysmem_bank.
Move kernel sysmem structures and prototypes to asm/sysmem.h and sysmem
variable and add_sysmem_bank to mm/init.c
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
- allow booting xtfpga on boards with new uBoot and >128MBytes memory;
- drop nonexistent GPIO32 support from fsf variant;
- don't select USE_GENERIC_SMP_HELPERS;
- enable common clock framework support, set up ethoc clock on xtfpga;
- wire up sched_setattr and sched_getattr syscalls.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTB51dAAoJEFH5zJH4P6BEqtIP/RDphJSzcGyzbndQA5NZTZ8h
MoRDEQtR5KzT8EApOjfN2FEa7vbulAla7n9L076fFmmQDlnk8DQ1XxWgBaUcoe2+
iTmSjdRJFy+/v1QACFhWnm18S12dNPivLRFKPERyxQaDOlpz1Y9ZeXeG1WPXN7KS
+cGnnpxy7XizZP1w0u7qORxXfjbgTBda4si75RZf0eU9dnsrJXyr1z4SYUO84kfq
E5WQ3uiWPjvpZboS5uVYbu2ebLsT7ZOAqv56CfUZ5bJHak32Snd0ci/pEIjljtqf
KjtFCAvMK4rxJqVAegcipV+gjLSMAdqJaztkfX90w138InN+gqk0pLiX5+6El9xn
9OupIFBQeJvztJd3PTCytChwaigmJKOQqKEulxm3cTzJArVNTGQRclePECbpDR6o
kTm4wTriR9VD5l9EzT/adL7RLWaWBUi01y0W6ug5/bbEFDzqfVdyvO4VMbLjOiz6
txSZlHUfiDrBIAkJFCWG/xz1p1hxTfdCZACmsAfXYwOOdAqsXeTy4/4XTV2dlLPA
blJVpe7W+PGLdRZfnciufOILC6g7LOqb735aQer1ubBT18Yd1IfK4n1DMEaH/AnQ
2buv1lCDrNW4RWTNMjzqg/T1dne3QMFxXipL2tqqyU5sHeThitKCC77HCqA8Oq4/
n2TtLb0X+GoZso7eq8fy
=XS4m
-----END PGP SIGNATURE-----
Merge tag 'xtensa-for-next-20140221-1' into for_next
Xtensa fixes for 3.14:
- allow booting xtfpga on boards with new uBoot and >128MBytes memory;
- drop nonexistent GPIO32 support from fsf variant;
- don't select USE_GENERIC_SMP_HELPERS;
- enable common clock framework support, set up ethoc clock on xtfpga;
- wire up sched_setattr and sched_getattr syscalls.
Signed-off-by: Chris Zankel <chris@zankel.net>
Use the simple-bus node to discover the io area, and remap the cached and
bypass io ranges. The parent-bus-address value of the first triplet in the
"ranges" property is used. This value is rounded down to the nearest 256MB
boundary. The length of the io area is fixed at 256MB; the "ranges" property
length value is ignored.
Other limitations: (1) only the first simple-bus node is considered, and (2)
only the first triplet of the "ranges" property is considered.
See ePAPR 1.1 §6.5 for the simple-bus node description, and §2.3.8 for the
"ranges" property description.
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
This is largely based on SMP code from the xtensa-2.6.29-smp tree by
Piet Delaney, Marc Gauthier, Joe Taylor, Christian Zankel (and possibly
other Tensilica folks).
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Chris Zankel <chris@zankel.net>