linux_dsm_epyc7002/arch/powerpc
Sourabh Jain ba608c4fa1 powerpc/fadump: fix race between pstore write and fadump crash trigger
When we enter into fadump crash path via system reset we fail to update
the pstore.

On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.

Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.

                 1    2     3    ...      n   CPU Threads
                 |    |     |             |
                 |    |     |             |
 Reached to   -->|--->|---->| ----------->|
 system reset    |    |     |             |
 path            |    |     |             |
                 |    |     |             |
 Try to       -->|--->|---->|------------>|
 acquire the     |    |     |             |
 pstore lock     |    |     |             |
                 |    |     |             |
                 |    |     |             |
 Got the      -->| +->|     |             |<-+
 pstore lock     | |  |     |             |  |-->  Didn't get the
                 | --------------------------+     lock and moving
                 |    |     |             |        ahead on fadump
                 |    |     |             |        crash path
                 |    |     |             |
  Begins the  -->|    |     |             |
  process to     |    |     |             |<-- Got the chance to
  update the     |    |     |             |    trigger the crash
  pstore         | -> |     |    ... <-   |
                 | |  |     |         |   |
                 | |  |     |         |   |<-- Triggers the
                 | |  |     |         |   |    crash
                 | |  |     |         |   |      ^
                 | |  |     |         |   |      |
  Writing to  -->| |  |     |         |   |      |
  pstore         | |  |     |         |   |      |
                   |                  |          |
       ^           |__________________|          |
       |               CPU Relax                 |
       |                                         |
       +-----------------------------------------+
                          |
                          v
            Race: crash triggered before pstore
                  update completes

To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.

A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com
2020-07-16 13:12:44 +10:00
..
boot powerpc/boot/dts: Fix dtc "pciex" warnings 2020-06-30 14:38:00 +10:00
configs powerpc: Drop CONFIG_MTD_M25P80 in 85xx-hw.config 2020-07-06 23:11:04 +10:00
crypto crypto: lib/sha1 - remove unnecessary includes of linux/cryptohash.h 2020-05-08 15:32:17 +10:00
include powerpc: Add cputime_to_nsecs() 2020-07-16 13:12:43 +10:00
kernel powerpc/fadump: fix race between pstore write and fadump crash trigger 2020-07-16 13:12:44 +10:00
kexec powerpc/crash: Use NMI context for printk when starting to crash 2020-06-02 20:59:07 +10:00
kvm maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault 2020-06-17 10:57:41 -07:00
lib powerpc/ppc-opcode: Move ppc instruction encoding from test_emulate_step 2020-07-16 13:12:42 +10:00
math-emu treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mm powerpc/numa: remove arch_update_cpu_topology 2020-07-16 13:12:39 +10:00
net powerpc/ppc-opcode: Consolidate powerpc instructions from bpf_jit.h 2020-07-16 13:12:42 +10:00
oprofile maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault 2020-06-17 10:57:41 -07:00
perf powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask 2020-07-16 13:12:41 +10:00
platforms powerpc/pseries: remove obsolete memory hotplug DT notifier code 2020-07-16 13:12:41 +10:00
purgatory .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
sysdev powerpc/xive: Ignore kmemleak false positives 2020-06-22 10:37:57 +10:00
tools powerpc/head_check: Avoid broken pipe 2020-05-19 00:10:35 +10:00
xmon powerpc/xmon: Reset RCU and soft lockup watchdogs 2020-07-15 11:07:12 +10:00
Kbuild powerpc/kexec: Move kexec files into a dedicated subdir. 2019-11-21 15:41:34 +11:00
Kconfig powerpc: Remove inaccessible CMDLINE default 2020-06-22 10:37:56 +10:00
Kconfig.debug powerpc: Remove Xilinx PPC405/PPC440 support 2020-05-28 23:24:34 +10:00
Makefile powerpc/4xx: ppc4xx compile flag optimizations 2020-06-22 14:19:12 +10:00
Makefile.postlink powerpc: Do not consider weak unresolved symbol relocations as bad 2020-01-31 20:17:22 +11:00