linux_dsm_epyc7002/mm
Nick Piggin 05fe478dd0 mm: write_cache_pages integrity fix
In write_cache_pages, nr_to_write is heeded even for data-integrity syncs,
so the function will return success after writing out nr_to_write pages,
even if that was not sufficient to guarantee data integrity.

The callers tend to set it to values that could break data interity
semantics easily in practice.  For example, nr_to_write can be set to
mapping->nr_pages * 2, however if a file has a single, dirty page, then
fsync is called, subsequent pages might be concurrently added and dirtied,
then write_cache_pages might writeout two of these newly dirty pages,
while not writing out the old page that should have been written out.

Fix this by ignoring nr_to_write if it is a data integrity sync.

This is a data integrity bug.

The reason this has been done in the past is to avoid stalling sync
operations behind page dirtiers.

 "If a file has one dirty page at offset 1000000000000000 then someone
  does an fsync() and someone else gets in first and starts madly writing
  pages at offset 0, we want to write that page at 1000000000000000.
  Somehow."

What we do today is return success after an arbitrary amount of pages are
written, whether or not we have provided the data-integrity semantics that
the caller has asked for.  Even this doesn't actually fix all stall cases
completely: in the above situation, if the file has a huge number of pages
in pagecache (but not dirty), then mapping->nrpages is going to be huge,
even if pages are being dirtied.

This change does indeed make the possibility of long stalls lager, and
that's not a good thing, but lying about data integrity is even worse.  We
have to either perform the sync, or return -ELINUXISLAME so at least the
caller knows what has happened.

There are subsequent competing approaches in the works to solve the stall
problems properly, without compromising data integrity.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:59 -08:00
..
allocpercpu.c mm/allocpercpu.c: make 4 functions static 2008-07-26 12:00:12 -07:00
backing-dev.c mm/backing-dev.c: remove recently-added WARN_ON() 2008-12-10 08:01:52 -08:00
bootmem.c misc: replace __FUNCTION__ with __func__ 2008-10-16 11:21:30 -07:00
bounce.c bounce: don't rely on a zeroed bio_vec list 2008-12-29 08:29:52 +01:00
dmapool.c dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too 2008-04-28 08:58:20 -07:00
fadvise.c Remove Andrew Morton's old email accounts 2008-10-16 11:21:32 -07:00
failslab.c SLUB: failslab support 2008-12-29 11:27:46 +02:00
filemap_xip.c mm: xip/ext2 fix block allocation race 2008-08-20 15:40:32 -07:00
filemap.c mm: write_cache_pages integrity fix 2009-01-06 15:58:59 -08:00
fremap.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
highmem.c x86, pat: avoid highmem cache attribute aliasing 2008-08-15 17:22:57 +02:00
hugetlb.c mm: report the MMU pagesize in /proc/pid/smaps 2009-01-06 15:58:58 -08:00
internal.h hugetlb: pull gigantic page initialisation out of the default path 2008-11-06 15:41:18 -08:00
Kconfig Unevictable LRU Infrastructure 2008-10-20 08:50:26 -07:00
maccess.c kgdb: fix optional arch functions and probe_kernel_* 2008-04-17 20:05:39 +02:00
madvise.c madvise: update function comment of madvise_dontneed 2008-07-30 09:41:45 -07:00
Makefile SLUB: failslab support 2008-12-29 11:27:46 +02:00
memcontrol.c memcg: fix page_cgroup allocation 2008-10-23 08:55:02 -07:00
memory_hotplug.c meminit section warnings 2008-11-30 10:03:35 -08:00
memory.c mm: don't mark_page_accessed in fault path 2009-01-06 15:58:58 -08:00
mempolicy.c Merge branch 'master' into next 2008-11-14 11:29:12 +11:00
mempool.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
migrate.c mm: move_pages: no need to set pp->page to ZERO_PAGE(0) by default 2009-01-06 15:58:58 -08:00
mincore.c mm: remove nopage 2008-04-28 08:58:18 -07:00
mlock.c x86, bts: memory accounting 2008-12-20 09:15:47 +01:00
mm_init.c mm: mminit_loglevel cannot be __meminitdata anymore 2008-08-20 15:40:30 -07:00
mmap.c mm: update my address 2009-01-05 17:44:42 -08:00
mmu_notifier.c mmu-notifiers: core 2008-07-28 16:30:21 -07:00
mmzone.c mm: mark the correct zone as full when scanning zonelists 2008-09-13 14:41:52 -07:00
mprotect.c mm: update my address 2009-01-05 17:44:42 -08:00
mremap.c mm: update my address 2009-01-05 17:44:42 -08:00
msync.c add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
nommu.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
oom_kill.c oom: print triggering task's cpuset and mems allowed 2009-01-06 15:58:59 -08:00
page_alloc.c cpusets: update mems allowed in page allocator 2008-11-12 17:17:16 -08:00
page_cgroup.c page_cgroup should ignore empty nodes 2008-12-10 08:01:53 -08:00
page_io.c mm: fix PageUptodate data race 2008-02-05 09:44:19 -08:00
page_isolation.c memory hotplug: fix page_zone() calculation in test_pages_isolated() 2008-11-06 15:41:19 -08:00
page-writeback.c mm: write_cache_pages integrity fix 2009-01-06 15:58:59 -08:00
pagewalk.c pagemap: pass mm into pagewalkers 2008-06-12 18:05:41 -07:00
pdflush.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
prio_tree.c spelling fixes: mm/ 2007-10-20 01:27:18 +02:00
quicklist.c mm: size of quicklists shouldn't be proportional to the number of CPUs 2008-09-02 19:21:38 -07:00
readahead.c vmscan: split LRU lists into anon & file sets 2008-10-20 08:50:25 -07:00
rmap.c make mm/rmap.c:anon_vma_cachep static 2008-10-20 08:52:40 -07:00
shmem_acl.c [PATCH] sanitize ->permission() prototype 2008-07-26 20:53:14 -04:00
shmem.c mm: don't mark_page_accessed in shmem_fault 2009-01-06 15:58:58 -08:00
slab.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
slob.c slob: do not pass the SLAB flags as GFP in kmem_cache_create() 2008-12-15 16:27:06 -08:00
slub.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
sparse-vmemmap.c vmemmap: warn about page_structs with remote distance 2008-11-06 15:41:19 -08:00
sparse.c meminit section warnings 2008-11-30 10:03:35 -08:00
swap_state.c mm: pagecache insertion fewer atomics 2008-10-20 08:52:31 -07:00
swap.c mm: remove UP version of lru_add_drain_all() 2008-12-10 08:01:53 -08:00
swapfile.c x86: consolidate __swp_XXX() macros 2008-12-16 18:34:51 +01:00
thrash.c Bug in mm/thrash.c function grab_swap_token() 2007-05-11 08:29:32 -07:00
tiny-shmem.c Export tiny shmem_file_setup for DRM-GEM 2008-10-20 16:17:42 -07:00
truncate.c mmap: handle mlocked pages during map, remap, unmap 2008-10-20 08:52:31 -07:00
util.c mm: Make generic weak get_user_pages_fast and EXPORT_GPL it 2008-08-12 17:52:53 +10:00
vmalloc.c vmalloc.c: fix flushing in vmap_page_range() 2009-01-04 13:33:20 -08:00
vmscan.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30
vmstat.c cpumask: convert mm/ 2009-01-01 10:12:29 +10:30