mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 07:50:53 +07:00
eb9036b4cf
commit 8a8109f303e25a27f92c1d8edd67d7cbbc60a4eb upstream.
printk_safe_flush_on_panic() caused the following deadlock on our
server:
CPU0: CPU1:
panic rcu_dump_cpu_stacks
kdump_nmi_shootdown_cpus nmi_trigger_cpumask_backtrace
register_nmi_handler(crash_nmi_callback) printk_safe_flush
__printk_safe_flush
raw_spin_lock_irqsave(&read_lock)
// send NMI to other processors
apic_send_IPI_allbutself(NMI_VECTOR)
// NMI interrupt, dead loop
crash_nmi_callback
printk_safe_flush_on_panic
printk_safe_flush
__printk_safe_flush
// deadlock
raw_spin_lock_irqsave(&read_lock)
DEADLOCK: read_lock is taken on CPU1 and will never get released.
It happens when panic() stops a CPU by NMI while it has been in
the middle of printk_safe_flush().
Handle the lock the same way as logbuf_lock. The printk_safe buffers
are flushed only when both locks can be safely taken. It can avoid
the deadlock _in this particular case_ at expense of losing contents
of printk_safe buffers.
Note: It would actually be safe to re-init the locks when all CPUs were
stopped by NMI. But it would require passing this information
from arch-specific code. It is not worth the complexity.
Especially because logbuf_lock and printk_safe buffers have been
obsoleted by the lockless ring buffer.
Fixes:
|
||
---|---|---|
.. | ||
braille.c | ||
braille.h | ||
console_cmdline.h | ||
internal.h | ||
Makefile | ||
printk_ringbuffer.c | ||
printk_ringbuffer.h | ||
printk_safe.c | ||
printk.c |