linux_dsm_epyc7002/arch
Li Zhong cce606feb4 powerpc: Set cpu sibling mask before online cpu
It seems following race is possible:

	cpu0					cpux
smp_init->cpu_up->_cpu_up
	__cpu_up
		kick_cpu(1)
-------------------------------------------------------------------------
		waiting online			...
		...				notify CPU_STARTING
							set cpux active
						set cpux online
-------------------------------------------------------------------------
		finish waiting online
		...
sched_init_smp
	init_sched_domains(cpu_active_mask)
		build_sched_domains
						set cpux sibling info
-------------------------------------------------------------------------

Execution of cpu0 and cpux could be concurrent between two separator
lines.

So if the cpux sibling information was set too late (normally
impossible, but could be triggered by adding some delay in
start_secondary, after setting cpu online), build_sched_domains()
running on cpu0 might see cpux active, with an empty sibling mask, then
cause some bad address accessing like following:

[    0.099855] Unable to handle kernel paging request for data at address 0xc00000038518078f
[    0.099868] Faulting instruction address: 0xc0000000000b7a64
[    0.099883] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[    0.099922] Modules linked in:
[    0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc1-00120-gb973425-dirty #16
[    0.099956] task: c0000001fed80000 ti: c0000001fed7c000 task.ti: c0000001fed7c000
[    0.099971] NIP: c0000000000b7a64 LR: c0000000000b7a40 CTR: c0000000000b4934
[    0.099985] REGS: c0000001fed7f760 TRAP: 0300   Not tainted  (3.10.0-rc1-00120-gb973425-dirty)
[    0.099997] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24272828  XER: 20000003
[    0.100045] SOFTE: 1
[    0.100053] CFAR: c000000000445ee8
[    0.100064] DAR: c00000038518078f, DSISR: 40000000
[    0.100073]
GPR00: 0000000000000080 c0000001fed7f9e0 c000000000c84d48 0000000000000010
GPR04: 0000000000000010 0000000000000000 c0000001fc55e090 0000000000000000
GPR08: ffffffffffffffff c000000000b80b30 c000000000c962d8 00000003845ffc5f
GPR12: 0000000000000000 c00000000f33d000 c00000000000b9e4 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
GPR20: c000000000ccf750 0000000000000000 c000000000c94d48 c0000001fc504000
GPR24: c0000001fc504000 c0000001fecef848 c000000000c94d48 c000000000ccf000
GPR28: c0000001fc522090 0000000000000010 c0000001fecef848 c0000001fed7fae0
[    0.100293] NIP [c0000000000b7a64] .get_group+0x84/0xc4
[    0.100307] LR [c0000000000b7a40] .get_group+0x60/0xc4
[    0.100318] Call Trace:
[    0.100332] [c0000001fed7f9e0] [c0000000000dbce4] .lock_is_held+0xa8/0xd0 (unreliable)
[    0.100354] [c0000001fed7fa70] [c0000000000bf62c] .build_sched_domains+0x728/0xd14
[    0.100375] [c0000001fed7fbe0] [c000000000af67bc] .sched_init_smp+0x4fc/0x654
[    0.100394] [c0000001fed7fce0] [c000000000adce24] .kernel_init_freeable+0x17c/0x30c
[    0.100413] [c0000001fed7fdb0] [c00000000000ba08] .kernel_init+0x24/0x12c
[    0.100431] [c0000001fed7fe30] [c000000000009f74] .ret_from_kernel_thread+0x5c/0x68
[    0.100445] Instruction dump:
[    0.100456] 38800010 38a00000 4838e3f5 60000000 7c6307b4 2fbf0000 419e0040 3d220001
[    0.100496] 78601f24 39491590 e93e0008 7d6a002a <7d69582a> f97f0000 7d4a002a e93e0010
[    0.100559] ---[ end trace 31fd0ba7d8756001 ]---

This patch tries to move the sibling maps updating before
notify_cpu_starting() and cpu online, and a write barrier there to make
sure sibling maps are updated before active and online mask.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:46:55 +10:00
..
alpha Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
arc ARC: lazy dcache flush broke gdb in non-aliasing configs 2013-05-25 14:15:55 +05:30
arm Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm 2013-06-09 17:15:56 -07:00
arm64 arm64: don't kill the kernel on a bad esr from el0 2013-05-31 16:04:51 +01:00
avr32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 2013-05-22 18:06:57 -07:00
blackfin blackfin updates for Linux 3.10 2013-05-10 07:21:16 -07:00
c6x dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
cris - Lots of cleanups from Artem, including deletion of some obsolete drivers 2013-05-09 10:15:46 -07:00
frv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
h8300 We get rid of the general module prefix confusion with a binary config option, 2013-05-05 10:58:06 -07:00
hexagon Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
ia64 arch, mm: Remove tlb_fast_mode() 2013-06-06 10:07:26 +09:00
m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2013-05-01 14:08:52 -07:00
m68k Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu 2013-06-08 11:50:17 -07:00
metag Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
microblaze microblaze: Use static inline functions in cacheflush.h 2013-06-03 11:33:23 +02:00
mips Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2013-06-12 11:48:14 -07:00
mn10300 MN10300: Need pci_iomap() and __pci_ioport_map() defining 2013-05-30 13:38:48 +09:00
openrisc Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
parisc parisc: kernel: using strlcpy() instead of strcpy() 2013-06-01 14:29:01 +02:00
powerpc powerpc: Set cpu sibling mask before online cpu 2013-07-01 11:46:55 +10:00
s390 mm/THP: add pmd args to pgtable deposit and withdraw APIs 2013-06-20 16:55:07 +10:00
score score: remove redundant kcore_list entries 2013-05-25 10:27:27 -07:00
sh Removal of GENERIC_GPIO for v3.10 2013-05-09 09:59:16 -07:00
sparc mm/THP: add pmd args to pgtable deposit and withdraw APIs 2013-06-20 16:55:07 +10:00
tile Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2013-05-09 14:34:58 -07:00
um Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-05-07 15:14:53 -07:00
unicore32 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-10 09:21:05 -07:00
x86 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-06-13 13:08:51 -07:00
xtensa Xtensa patchset for v3.10-rc1 2013-05-09 14:38:16 -07:00
.gitignore
Kconfig Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-05-15 14:04:00 -07:00