linux_dsm_epyc7002/mm
Yang Shi 89fdcd262f mm: shmem: make stat.st_blksize return huge page size if THP is on
Since tmpfs THP was supported in 4.8, hugetlbfs is not the only
filesystem with huge page support anymore.  tmpfs can use huge page via
THP when mounting by "huge=" mount option.

When applications use huge page on hugetlbfs, it just need check the
filesystem magic number, but it is not enough for tmpfs.  Make
stat.st_blksize return huge page size if it is mounted by appropriate
"huge=" option to give applications a hint to optimize the behavior with
THP.

Some applications may not do wisely with THP.  For example, QEMU may
mmap file on non huge page aligned hint address with MAP_FIXED, which
results in no pages are PMD mapped even though THP is used.  Some
applications may mmap file with non huge page aligned offset.  Both
behaviors make THP pointless.

statfs.f_bsize still returns 4KB for tmpfs since THP could be split, and
it also may fallback to 4KB page silently if there is not enough huge
page.  Furthermore, different f_bsize makes max_blocks and free_blocks
calculation harder but without too much benefit.  Returning huge page
size via stat.st_blksize sounds good enough.

Since PUD size huge page for THP has not been supported, now it just
returns HPAGE_PMD_SIZE.

Hugh said:

: Sorry, I have no enthusiasm for this patch; but do I feel strongly
: enough to override you and everyone else to NAK it?  No, I don't feel
: that strongly, maybe st_blksize isn't worth arguing over.
:
: We did look at struct stat when designing huge tmpfs, to see if there
: were any fields that should be adjusted for it; but concluded none.
: Yes, it would sometimes be nice to have a quickly accessible indicator
: for when tmpfs has been mounted huge (scanning /proc/mounts for options
: can be tiresome, agreed); but since tmpfs tries to supply huge (or not)
: pages transparently, no difference seemed right.
:
: So, because st_blksize is a not very useful field of struct stat, with
: "size" in the name, we're going to put HPAGE_PMD_SIZE in there instead
: of PAGE_SIZE, if the tmpfs was mounted with one of the huge "huge"
: options (force or always, okay; within_size or advise, not so much).
: Though HPAGE_PMD_SIZE is no more its "preferred I/O size" or "blocksize
: for file system I/O" than PAGE_SIZE was.
:
: Which we can expect to speed up some applications and disadvantage
: others, depending on how they interpret st_blksize: just like if we
: changed it in the same way on non-huge tmpfs.  (Did I actually try
: changing st_blksize early on, and find it broke something?  If so, I've
: now forgotten what, and a search through commit messages didn't find
: it; but I guess we'll find out soon enough.)
:
: If there were an mstat() syscall, returning a field "preferred
: alignment", then we could certainly agree to put HPAGE_PMD_SIZE in
: there; but in stat()'s st_blksize?  And what happens when (in future)
: mm maps this or that hard-disk filesystem's blocks with a pmd mapping -
: should that filesystem then advertise a bigger st_blksize, despite the
: same disk layout as before?  What happens with DAX?
:
: And this change is not going to help the QEMU suboptimality that
: brought you here (or does QEMU align mmaps according to st_blksize?).
: QEMU ought to work well with kernels without this change, and kernels
: with this change; and I hope it can easily deal with both by avoiding
: that use of MAP_FIXED which prevented the kernel's intended alignment.

[akpm@linux-foundation.org: remove unneeded `else']
Link: http://lkml.kernel.org/r/1524665633-83806-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
..
kasan kasan: fix memory hotplug during boot 2018-05-25 18:12:11 -07:00
backing-dev.c bdi: Move cgroup bdi_writeback to a dedicated low concurrency workqueue 2018-05-23 15:28:50 -06:00
balloon_compaction.c virtio_balloon: fix deadlock on OOM 2017-11-14 23:57:38 +02:00
bootmem.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
cleancache.c docs/vm: rename documentation files to .rst 2018-04-16 14:18:15 -06:00
cma_debug.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cma.c Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" 2018-05-24 10:07:50 -07:00
cma.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compaction.c Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" 2018-05-24 10:07:50 -07:00
debug_page_ref.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
debug.c mm/debug.c: provide useful debugging information for VM_BUG 2018-01-04 16:45:09 -08:00
dmapool.c
early_ioremap.c mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep 2017-12-11 14:54:44 +01:00
fadvise.c mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() 2018-04-02 20:16:10 +02:00
failslab.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
filemap.c mm/filemap.c: fix NULL pointer in page_cache_tree_insert() 2018-04-20 17:18:36 -07:00
frame_vector.c mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()' 2017-12-14 16:00:48 -08:00
frontswap.c docs/vm: rename documentation files to .rst 2018-04-16 14:18:15 -06:00
gup_benchmark.c mm/gup_benchmark: handle gup failures 2018-04-13 17:10:27 -07:00
gup.c mm, gup: prevent pmd checking race in follow_pmd_mask() 2018-06-07 17:34:35 -07:00
highmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hmm.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
huge_memory.c There's been a fair amount of work in the docs tree this time around, 2018-06-04 12:34:27 -07:00
hugetlb_cgroup.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
hugetlb.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
hwpoison-inject.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
init-mm.c mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
internal.h Changes for 4.18: 2018-06-05 13:24:20 -07:00
interval_tree.c mm/interval_tree.c: use vma_pages() helper 2018-01-31 17:18:37 -08:00
Kconfig mm: introduce ARCH_HAS_PTE_SPECIAL 2018-06-07 17:34:35 -07:00
Kconfig.debug kmemcheck: rip it out 2017-11-15 18:21:05 -08:00
khugepaged.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
kmemleak-test.c
kmemleak.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
ksm.c mm/ksm: docs: extend overview comment and make it "DOC:" 2018-04-27 17:19:24 -06:00
list_lru.c mm: make counting of list_lru_one::nr_items lockless 2018-04-05 21:36:27 -07:00
maccess.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
madvise.c mm/memory_failure: Remove unused trapno from memory_failure 2018-01-23 12:17:42 -06:00
Makefile mm: restructure memfd code 2018-06-07 17:34:35 -07:00
memblock.c mm/memblock: introduce PHYS_ADDR_MAX 2018-06-07 17:34:35 -07:00
memcontrol.c mm: treat memory.low value inclusive 2018-06-07 17:34:35 -07:00
memfd.c mm: restructure memfd code 2018-06-07 17:34:35 -07:00
memory_hotplug.c mm/memory_hotplug: fix leftover use of struct page during hotplug 2018-05-25 18:12:11 -07:00
memory-failure.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
memory.c mm: remove odd HAVE_PTE_SPECIAL 2018-06-07 17:34:35 -07:00
mempolicy.c mm: unclutter THP migration 2018-04-11 10:28:32 -07:00
mempool.c mempool: Add mempool_init()/mempool_exit() 2018-05-14 13:14:23 -06:00
memtest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
migrate.c mm: migrate: fix double call of radix_tree_replace_slot() 2018-05-11 17:28:45 -07:00
mincore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mlock.c mm, mlock, vmscan: no more skipping pagevecs 2018-02-21 15:35:42 -08:00
mm_init.c
mmap.c There's been a fair amount of work in the docs tree this time around, 2018-06-04 12:34:27 -07:00
mmu_context.c
mmu_notifier.c mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks 2018-01-31 17:18:38 -08:00
mmzone.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mprotect.c sched/numa: avoid trapping faults and attempting migration of file-backed dirty pages 2018-04-11 10:28:31 -07:00
mremap.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
msync.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nobootmem.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nommu.c mm/nommu: remove description of alloc_vm_area 2018-04-05 21:36:26 -07:00
oom_kill.c mm: rename page_counter's count/limit into usage/max 2018-06-07 17:34:35 -07:00
page_alloc.c mm/page_alloc: remove realsize in free_area_init_core() 2018-06-07 17:34:35 -07:00
page_counter.c mm: memory.low hierarchical behavior 2018-06-07 17:34:35 -07:00
page_ext.c mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it 2018-01-31 17:18:39 -08:00
page_idle.c mm: thp: fix potential clearing to referenced flag in page_idle_clear_pte_refs_one() 2018-04-05 21:36:25 -07:00
page_io.c block: convert to bio_first_bvec_all & bio_first_page_all 2018-01-06 09:18:00 -07:00
page_isolation.c mm, migrate: remove reason argument from new_page_t 2018-04-11 10:28:32 -07:00
page_owner.c mm/page_owner.c: make early_page_owner_param() __init 2018-04-05 21:36:26 -07:00
page_poison.c mm/page_poison.c: make early_page_poison_param() __init 2018-04-05 21:36:26 -07:00
page_vma_mapped.c mm, page_vma_mapped: Introduce pfn_in_hpage() 2018-01-22 12:15:57 -08:00
page-writeback.c writeback: safer lock nesting 2018-04-20 17:18:35 -07:00
pagewalk.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
percpu-internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
percpu-km.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu-stats.c mm: reuse DEFINE_SHOW_ATTRIBUTE() macro 2018-04-05 21:36:25 -07:00
percpu-vm.c percpu: allow select gfp to be passed to underlying allocators 2018-02-18 05:33:01 -08:00
percpu.c arch: remove obsolete architecture ports 2018-04-02 20:20:12 -07:00
pgtable-generic.c mm: do not lose dirty and accessed bits in pmdp_invalidate() 2018-01-31 17:18:38 -08:00
process_vm_access.c mm: docs: add blank lines to silence sphinx "Unexpected indentation" errors 2018-02-06 18:32:48 -08:00
quicklist.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
readahead.c mm: split ->readpages calls to avoid non-contiguous pages lists 2018-06-01 18:37:32 -07:00
rmap.c Linux 4.17-rc2 2018-04-27 17:13:20 -06:00
rodata_test.c mm: fix RODATA_TEST failure "rodata_test: test data was not read only" 2017-10-03 17:54:24 -07:00
shmem.c mm: shmem: make stat.st_blksize return huge page size if THP is on 2018-06-07 17:34:35 -07:00
slab_common.c mm: make should_failslab always available for fault injection 2018-04-05 21:36:26 -07:00
slab.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slab.h slab, slub: skip unnecessary kasan_cache_shutdown() 2018-04-05 21:36:24 -07:00
slob.c slab: __GFP_ZERO is incompatible with a constructor 2018-06-07 17:34:34 -07:00
slub.c mm/slub: remove obsolete comment 2018-06-07 17:34:34 -07:00
sparse-vmemmap.c mm: merge vmem_altmap_alloc into altmap_alloc_block_buf 2018-01-08 11:46:23 -08:00
sparse.c mm/sparse.c: pass the __highest_present_section_nr + 1 to alloc_func() 2018-06-07 17:34:35 -07:00
swap_cgroup.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
swap_slots.c mm, memcontrol: move swap charge handling into get_swap_page() 2018-06-07 17:34:34 -07:00
swap_state.c mm, memcontrol: move swap charge handling into get_swap_page() 2018-06-07 17:34:34 -07:00
swap.c mm/swap.c: remove @cold parameter description for release_pages() 2018-04-05 21:36:26 -07:00
swapfile.c mm: fix nr_rotate_swap leak in swapon() error case 2018-05-25 18:12:10 -07:00
truncate.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
usercopy.c usercopy: WARN() on slab cache usercopy region violations 2018-01-15 12:07:48 -08:00
userfaultfd.c mm/userfaultfd.c: remove duplicate include 2018-02-06 18:32:47 -08:00
util.c Merge branch 'mm-rst' into docs-next 2018-04-16 14:25:08 -06:00
vmacache.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
vmalloc.c mm: vmalloc: pass proper vm_start into debugobjects 2018-06-07 17:34:35 -07:00
vmpressure.c
vmscan.c mm: fix the NULL mapping case in __isolate_lru_page() 2018-06-02 09:33:47 -07:00
vmstat.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
workingset.c page cache: use xa_lock 2018-04-11 10:28:39 -07:00
z3fold.c z3fold: fix reclaim lock-ups 2018-05-11 17:28:45 -07:00
zbud.c mm: docs: fix parameter names mismatch 2018-02-06 18:32:48 -08:00
zpool.c mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc 2018-02-21 15:35:43 -08:00
zsmalloc.c mm: kernel-doc: add missing parameter descriptions 2018-04-05 21:36:27 -07:00
zswap.c mm, swap, frontswap: fix THP swap if frontswap enabled 2018-02-21 15:35:43 -08:00