linux_dsm_epyc7002/Documentation
Kirill A. Shutemov 1d798ca3f1 mm: make compound_head() robust
Hugh has pointed that compound_head() call can be unsafe in some
context. There's one example:

	CPU0					CPU1

isolate_migratepages_block()
  page_count()
    compound_head()
      !!PageTail() == true
					put_page()
					  tail->first_page = NULL
      head = tail->first_page
					alloc_pages(__GFP_COMP)
					   prep_compound_page()
					     tail->first_page = head
					     __SetPageTail(p);
      !!PageTail() == true
    <head == NULL dereferencing>

The race is pure theoretical. I don't it's possible to trigger it in
practice. But who knows.

We can fix the race by changing how encode PageTail() and compound_head()
within struct page to be able to update them in one shot.

The patch introduces page->compound_head into third double word block in
front of compound_dtor and compound_order. Bit 0 encodes PageTail() and
the rest bits are pointer to head page if bit zero is set.

The patch moves page->pmd_huge_pte out of word, just in case if an
architecture defines pgtable_t into something what can have the bit 0
set.

hugetlb_cgroup uses page->lru.next in the second tail page to store
pointer struct hugetlb_cgroup. The patch switch it to use page->private
in the second tail page instead. The space is free since ->first_page is
removed from the union.

The patch also opens possibility to remove HUGETLB_CGROUP_MIN_ORDER
limitation, since there's now space in first tail page to store struct
hugetlb_cgroup pointer. But that's out of scope of the patch.

That means page->compound_head shares storage space with:

 - page->lru.next;
 - page->next;
 - page->rcu_head.next;

That's too long list to be absolutely sure, but looks like nobody uses
bit 0 of the word.

page->rcu_head.next guaranteed[1] to have bit 0 clean as long as we use
call_rcu(), call_rcu_bh(), call_rcu_sched(), or call_srcu(). But future
call_rcu_lazy() is not allowed as it makes use of the bit and we can
get false positive PageTail().

[1] http://lkml.kernel.org/g/20150827163634.GD4029@linux.vnet.ibm.com

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
..
ABI There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
accounting
acpi mfd: core: redo ACPI matching of the children devices 2015-10-26 15:25:53 +01:00
aoe
arm arm64 updates for 4.4: 2015-11-04 14:47:13 -08:00
arm64 arm64 updates for 4.4: 2015-11-04 14:47:13 -08:00
auxdisplay
backlight
blackfin
block block: add an API for Persistent Reservations 2015-10-21 14:46:56 -06:00
blockdev zram: update documentation 2015-09-24 15:39:42 -06:00
bus-devices
cdrom
cgroups There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
cma
connector
console
cpu-freq cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event 2015-09-01 15:50:38 +02:00
cpuidle
cris
crypto KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
development-process
device-mapper - Revert a dm-multipath change that caused a regression for unprivledged 2015-11-04 21:19:53 -08:00
devicetree - New Device Support 2015-11-06 10:53:48 -08:00
dmaengine Documentation: dmaengine: Add DMA_CTRL_REUSE documentation 2015-08-17 13:46:22 +05:30
DocBook There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
driver-model driver-core: platform: Provide helpers for multi-driver modules 2015-10-05 05:02:40 +01:00
dvb
early-userspace
EDID
extcon
fault-injection futex: Fault/error injection capabilities 2015-07-20 11:45:45 +02:00
fb Documentation/fb: add documentation for sm712fb 2015-08-07 15:05:01 -07:00
features arm64 updates for 4.4: 2015-11-04 14:47:13 -08:00
filesystems Merge branch 'akpm' (patches from Andrew) 2015-11-05 23:10:54 -08:00
firmware_class
fmc
fpga usage documentation for FPGA manager core 2015-10-07 18:07:20 +01:00
frv
gpio There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
hid
hwmon hwmon: (lm75) Add support for TMP75C 2015-10-14 07:57:14 -07:00
i2c i2c: support 10 bit and slave addresses in sysfs 'new_device' 2015-08-24 14:05:15 +02:00
ia64
ide
infiniband IB/hfi1: add driver files 2015-08-28 22:59:36 -04:00
input Input: fix typo in MT documentation 2015-09-19 11:39:02 -07:00
ioctl char/misc drivers for 4.4-rc1 2015-11-04 22:15:15 -08:00
isdn
ja_JP Doc: ja_JP: Fix typo in HOWTO 2015-06-08 16:43:09 -06:00
kbuild modsign: Allow password to be specified for signing key 2015-08-07 16:26:14 +01:00
kdump
ko_KR
laptops Move freefall program from Documentation/ to tools/ 2015-06-08 16:42:07 -06:00
leds Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds 2015-07-01 19:09:11 -07:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-11-03 16:10:43 -08:00
m68k
memory-devices
metag
mic misc: mic: Update MIC host daemon with COSM changes 2015-10-04 12:54:54 +01:00
mips
misc-devices Doc:misc-devices: Fix typo in Documentation/misc-devices 2015-09-18 10:04:24 -06:00
mmc mmc: core: Remove MMC_CLKGATE 2015-10-26 16:00:09 +01:00
mn10300
mtd
namespaces
netlabel
networking There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
nfc NFC: Fix typo in nfc-hci.txt 2015-06-08 23:15:45 +02:00
nios2
nvdimm libnvdimm: Non-Volatile Devices 2015-06-26 11:23:38 -04:00
nvmem Documentation: nvmem: add nvmem api level and how-to doc 2015-08-05 13:43:45 -07:00
parisc
PCI
pcmcia pcmcia: Fix typo in locking documentation 2015-08-07 14:34:58 +02:00
phy
platform
power PCI / PM: Update runtime PM documentation for PCI devices 2015-09-25 02:48:44 +02:00
powerpc SCSI misc on 20150901 2015-09-02 12:22:54 -07:00
pps Doc: pps: Fix file name in pps.txt 2015-07-14 12:35:42 -06:00
prctl Documentation/prctl: don't build tsc tests when cross compiling 2015-06-22 16:05:04 -06:00
pti
ptp testptp: Silence compiler warnings on ppc64 2015-09-29 21:16:56 -07:00
rapidio
RCU Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD 2015-10-07 16:06:25 -07:00
s390 KVM: s390: remove outdated documentation 2015-07-29 11:02:35 +02:00
scheduler
scsi Merge branch 'for-4.2/sg' of git://git.kernel.dk/linux-block 2015-06-25 15:22:36 -07:00
security KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
serial Documentation: improve line discipline method descriptions 2015-10-05 04:53:26 +01:00
sh
sound Doc: sound:oss: Fix typo in sound/oss 2015-06-09 17:23:00 +02:00
spi
sysctl kernel/watchdog.c: perform all-CPU backtrace in case of hard lockup 2015-11-05 19:34:48 -08:00
target Documentation/target: Fix tcm_mod_builder.py build breakage 2015-07-24 17:48:55 -07:00
thermal thermal: power_allocator: relax the requirement of two passive trip points 2015-09-14 07:41:45 -07:00
timers
tpm
trace intel_th: Add driver infrastructure for Intel(R) Trace Hub devices 2015-10-04 20:28:58 +01:00
usb usb: interface authorization: Documentation part 2015-09-22 12:08:40 -07:00
vDSO Documentation/vDSO: don't build tests when cross compiling 2015-06-22 16:04:57 -06:00
video4linux [media] media: videobuf2: Change queue_setup argument 2015-10-20 14:48:39 -02:00
virtual s390: A bunch of fixes and optimizations for interrupt and time 2015-11-05 16:26:26 -08:00
vm mm: make compound_head() robust 2015-11-06 17:50:42 -08:00
w1 w1: masters: omap_hdq: add support for 1-wire mode 2015-10-05 04:47:09 +01:00
watchdog Documentation/watchdog: add timeout and ping rate control to watchdog-test.c 2015-09-09 21:33:36 +02:00
wimax
x86 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 10:07:40 -07:00
xtensa
zh_CN sysfs.txt: fix pre-kernfs sysfs_dirent reference 2015-09-13 14:38:50 -06:00
00-INDEX
adding-syscalls.txt Documentation: describe how to add a system call 2015-08-13 17:54:06 -06:00
applying-patches.txt
assoc_array.txt
atomic_ops.txt locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h 2015-09-13 10:35:46 +02:00
bad_memory.txt
basic_profiling.txt
bcache.txt
binfmt_misc.txt
braille-console.txt
bt8xxgpio.txt
btmrvl.txt
BUG-HUNTING
bus-virt-phys-mapping.txt
cachetlb.txt
Changes There is a nice new document from Neil on how pathname lookups work and 2015-11-05 15:59:24 -08:00
circular-buffers.txt
clk.txt clk: change clk_ops' ->determine_rate() prototype 2015-07-27 18:12:01 -07:00
coccinelle.txt
CodeOfConflict
CodingStyle Documentation: CodingStyle: remove broken links in the References section 2015-07-10 13:54:34 -06:00
cpu-hotplug.txt
cpu-load.txt
cputopology.txt
crc32.txt
dcdbas.txt
debugging-modules.txt
debugging-via-ohci1394.txt Doc: Change wikipedia's URL from http to https 2015-06-22 10:14:05 -06:00
dell_rbu.txt
devices.txt
digsig.txt
DMA-API-HOWTO.txt Documentation: DMA API: Be more explicit that nents is always the same 2015-09-24 15:50:06 -06:00
DMA-API.txt Documentation: DMA API: Be more explicit that nents is always the same 2015-09-24 15:50:06 -06:00
DMA-attributes.txt
dma-buf-sharing.txt
DMA-ISA-LPC.txt
dontdiff
dynamic-debug-howto.txt
edac.txt Documentation/EDAC: Add reference documents section for amd64_edac 2015-09-29 13:42:41 +02:00
efi-stub.txt
eisa.txt
email-clients.txt Documentation/email-clients.txt: remove trailing whitespace 2015-10-11 15:31:57 -06:00
flexible-arrays.txt
futex-requeue-pi.txt
gcov.txt
gdb-kernel-debugging.txt
highuid.txt
HOWTO docs: update HOWTO for 3.x -> 4.x versioning 2015-08-24 11:28:17 -06:00
hsi.txt
hw_random.txt hwrng: doc - Fix device node name reference /dev/hw_random => /dev/hwrng 2015-09-21 22:00:41 +08:00
hwspinlock.txt
init.txt
initrd.txt
intel_txt.txt
Intel-IOMMU.txt x86/vt-d: Fix documentation of DRHD 2015-08-25 10:44:49 +02:00
io_ordering.txt
io-mapping.txt
iostats.txt
IPMI.txt
IRQ-affinity.txt
IRQ-domain.txt irqdomain: Documentation updates 2015-10-13 19:01:25 +02:00
IRQ.txt
irqflags-tracing.txt
isapnp.txt
java.txt
kasan.txt mm, slub, kasan: enable user tracking by default with KASAN=y 2015-11-05 19:34:48 -08:00
kernel-doc-nano-HOWTO.txt Documenation: Update location of docproc.c 2015-07-14 12:36:39 -06:00
kernel-docs.txt kernel-docs.txt: update kernelnewbies reference 2015-10-11 15:36:43 -06:00
kernel-parameters.txt Merge branch 'akpm' (patches from Andrew) 2015-11-05 23:10:54 -08:00
kernel-per-CPU-kthreads.txt
kmemcheck.txt
kmemleak.txt Doc: Change wikipedia's URL from http to https 2015-06-22 10:14:05 -06:00
kobject.txt
kprobes.txt
kref.txt
kselftest.txt Documentation: Update kselftest.txt 2015-09-24 15:51:53 -06:00
ldm.txt
local_ops.txt
lockup-watchdogs.txt kernel/watchdog.c: add sysctl knob hardlockup_panic 2015-11-05 19:34:48 -08:00
logo.gif
logo.txt
lzo.txt
magic-number.txt
mailbox.txt Documentation: minor typo fix in mailbox.txt 2015-08-13 18:03:18 -06:00
Makefile
ManagementStyle
md-cluster.txt md-cluster: fix deadlock issue on message lock 2015-08-31 19:41:41 +02:00
md.txt doc:md: fix typo in md.txt. 2015-06-23 06:49:44 -06:00
media-framework.txt
memory-barriers.txt atomic: remove all traces of READ_ONCE_CTRL() and atomic*_read_ctrl() 2015-11-03 17:22:17 -08:00
memory-hotplug.txt
men-chameleon-bus.txt Documentation: Minor changes to men-chameleon-bus.txt 2015-07-24 15:15:17 +02:00
module-signing.txt Move certificate handling to its own directory 2015-08-14 16:06:13 +01:00
mono.txt
nommu-mmap.txt
ntb.txt NTB: Rename Intel code names to platform names 2015-07-04 14:09:25 -04:00
numastat.txt
oops-tracing.txt
padata.txt
parport-lowlevel.txt
parport.txt
percpu-rw-semaphore.txt
phy.txt
pi-futex.txt
pinctrl.txt
pnp.txt
preempt-locking.txt
printk-formats.txt
pwm.txt
ramoops.txt
rbtree.txt documentation: fix small typo in rbtree.txt 2015-09-13 14:38:50 -06:00
remoteproc.txt remoteproc: introduce rproc_get_by_phandle API 2015-06-16 21:12:52 +03:00
rfkill.txt
robust-futex-ABI.txt
robust-futexes.txt
rpmsg.txt
rtc.txt
SAK.txt
SecurityBugs
serial-console.txt
sgi-ioc4.txt
SM501.txt
smsc_ece1099.txt
sparse.txt
stable_api_nonsense.txt
stable_kernel_rules.txt
static-keys.txt locking/static_keys: Fix up the static keys documentation 2015-09-15 07:12:06 +02:00
SubmitChecklist
SubmittingDrivers
SubmittingPatches SubmittingPatches: make Subject examples match the de facto standard 2015-09-24 15:57:42 -06:00
svga.txt
sysfs-rules.txt
sysrq.txt mm, oom: do not panic for oom kills triggered from sysrq 2015-09-08 15:35:28 -07:00
this_cpu_ops.txt
unaligned-memory-access.txt
unicode.txt
unshare.txt
vfio.txt vfio: powerpc/spapr: Support Dynamic DMA windows 2015-06-11 15:16:55 +10:00
VGA-softcursor.txt
vgaarbiter.txt
video-output.txt
vme_api.txt Documentation: mention vme_master_mmap() in VME API 2015-06-12 17:26:56 -07:00
volatile-considered-harmful.txt
workqueue.txt
xillybus.txt
xz.txt
zorro.txt