mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-25 16:45:25 +07:00
54f64d5c98
Since the 5.0 merge window opened, I've been seeing frequent
crashes on suspend and reboot with the trace:
[ 36.911170] Unable to handle kernel paging request at virtual address ffffff801153d660
[ 36.912769] Unable to handle kernel paging request at virtual address ffffff800004b564
...
[ 36.950666] Call trace:
[ 36.950670] queued_spin_lock_slowpath+0x1cc/0x2c8
[ 36.950681] _raw_spin_lock_irqsave+0x64/0x78
[ 36.950692] complete+0x28/0x70
[ 36.950703] ffs_epfile_io_complete+0x3c/0x50
[ 36.950713] usb_gadget_giveback_request+0x34/0x108
[ 36.950721] dwc3_gadget_giveback+0x50/0x68
[ 36.950723] dwc3_thread_interrupt+0x358/0x1488
[ 36.950731] irq_thread_fn+0x30/0x88
[ 36.950734] irq_thread+0x114/0x1b0
[ 36.950739] kthread+0x104/0x130
[ 36.950747] ret_from_fork+0x10/0x1c
I isolated this down to in ffs_epfile_io():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
Where the completion done is setup on the stack:
DECLARE_COMPLETION_ONSTACK(done);
Then later we setup a request and queue it, and wait for it:
if (unlikely(wait_for_completion_interruptible(&done))) {
/*
* To avoid race condition with ffs_epfile_io_complete,
* dequeue the request first then check
* status. usb_ep_dequeue API should guarantee no race
* condition with req->complete callback.
*/
usb_ep_dequeue(ep->ep, req);
interrupted = ep->status < 0;
}
The problem is, that we end up being interrupted, dequeue the
request, and exit.
But then the irq triggers and we try calling complete() on the
context pointer which points to now random stack space, which
results in the panic.
Alan Stern pointed out there is a bug here, in that the snippet
above "assumes that usb_ep_dequeue() waits until the request has
been completed." And that:
wait_for_completion(&done);
Is needed right after the usb_ep_dequeue().
Thus this patch implements that change. With it I no longer see
the crashes on suspend or reboot.
This issue seems to have been uncovered by behavioral changes in
the dwc3 driver in commit
|
||
---|---|---|
arch | ||
block | ||
certs | ||
crypto | ||
Documentation | ||
drivers | ||
firmware | ||
fs | ||
include | ||
init | ||
ipc | ||
kernel | ||
lib | ||
LICENSES | ||
mm | ||
net | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.clang-format | ||
.cocciconfig | ||
.get_maintainer.ignore | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
COPYING | ||
CREDITS | ||
Kbuild | ||
Kconfig | ||
MAINTAINERS | ||
Makefile | ||
README |
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.