2018-01-27 03:22:04 +07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0+
|
2005-04-17 05:20:36 +07:00
|
|
|
/*
|
|
|
|
* PCI Express PCI Hot Plug Driver
|
|
|
|
*
|
|
|
|
* Copyright (C) 1995,2001 Compaq Computer Corporation
|
|
|
|
* Copyright (C) 2001 Greg Kroah-Hartman (greg@kroah.com)
|
|
|
|
* Copyright (C) 2001 IBM Corp.
|
|
|
|
* Copyright (C) 2003-2004 Intel Corporation
|
|
|
|
*
|
|
|
|
* All rights reserved.
|
|
|
|
*
|
2005-08-17 05:16:10 +07:00
|
|
|
* Send feedback to <greg@kroah.com>,<kristen.c.accardi@intel.com>
|
2005-04-17 05:20:36 +07:00
|
|
|
*/
|
|
|
|
|
2019-05-08 06:24:51 +07:00
|
|
|
#define dev_fmt(fmt) "pciehp: " fmt
|
|
|
|
|
2019-10-26 02:00:47 +07:00
|
|
|
#include <linux/dmi.h>
|
2005-04-17 05:20:36 +07:00
|
|
|
#include <linux/kernel.h>
|
|
|
|
#include <linux/types.h>
|
2006-01-08 16:02:05 +07:00
|
|
|
#include <linux/jiffies.h>
|
2018-07-20 05:27:39 +07:00
|
|
|
#include <linux/kthread.h>
|
2005-04-17 05:20:36 +07:00
|
|
|
#include <linux/pci.h>
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
#include <linux/pm_runtime.h>
|
2005-11-14 07:06:39 +07:00
|
|
|
#include <linux/interrupt.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
|
|
|
#include <linux/slab.h>
|
2005-11-14 07:06:39 +07:00
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
#include "../pci.h"
|
|
|
|
#include "pciehp.h"
|
|
|
|
|
2019-10-26 02:00:47 +07:00
|
|
|
static const struct dmi_system_id inband_presence_disabled_dmi_table[] = {
|
|
|
|
/*
|
|
|
|
* Match all Dell systems, as some Dell systems have inband
|
|
|
|
* presence disabled on NVMe slots (but don't support the bit to
|
|
|
|
* report it). Setting inband presence disabled should have no
|
|
|
|
* negative effect, except on broken hotplug slots that never
|
|
|
|
* assert presence detect--and those will still work, they will
|
|
|
|
* just have a bit of extra delay before being probed.
|
|
|
|
*/
|
|
|
|
{
|
|
|
|
.ident = "Dell System",
|
|
|
|
.matches = {
|
|
|
|
DMI_MATCH(DMI_OEM_STRING, "Dell System"),
|
|
|
|
},
|
|
|
|
},
|
|
|
|
{}
|
|
|
|
};
|
|
|
|
|
2013-05-10 00:26:16 +07:00
|
|
|
static inline struct pci_dev *ctrl_dev(struct controller *ctrl)
|
2006-12-22 08:01:06 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
return ctrl->pcie->port;
|
2006-12-22 08:01:06 +07:00
|
|
|
}
|
2005-04-17 05:20:36 +07:00
|
|
|
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
static irqreturn_t pciehp_isr(int irq, void *dev_id);
|
|
|
|
static irqreturn_t pciehp_ist(int irq, void *dev_id);
|
2018-07-20 05:27:39 +07:00
|
|
|
static int pciehp_poll(void *data);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2008-04-26 04:39:08 +07:00
|
|
|
static inline int pciehp_request_irq(struct controller *ctrl)
|
|
|
|
{
|
2008-08-22 15:16:48 +07:00
|
|
|
int retval, irq = ctrl->pcie->irq;
|
2008-04-26 04:39:08 +07:00
|
|
|
|
|
|
|
if (pciehp_poll_mode) {
|
2018-07-20 05:27:39 +07:00
|
|
|
ctrl->poll_thread = kthread_run(&pciehp_poll, ctrl,
|
|
|
|
"pciehp_poll-%s",
|
2018-09-19 02:46:17 +07:00
|
|
|
slot_name(ctrl));
|
2018-07-20 05:27:39 +07:00
|
|
|
return PTR_ERR_OR_ZERO(ctrl->poll_thread);
|
2008-04-26 04:39:08 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/* Installs the interrupt handler */
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
retval = request_threaded_irq(irq, pciehp_isr, pciehp_ist,
|
2019-05-09 04:34:01 +07:00
|
|
|
IRQF_SHARED, "pciehp", ctrl);
|
2008-04-26 04:39:08 +07:00
|
|
|
if (retval)
|
2008-09-05 10:11:26 +07:00
|
|
|
ctrl_err(ctrl, "Cannot get irq %d for the hotplug controller\n",
|
|
|
|
irq);
|
2008-04-26 04:39:08 +07:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void pciehp_free_irq(struct controller *ctrl)
|
|
|
|
{
|
|
|
|
if (pciehp_poll_mode)
|
2018-07-20 05:27:39 +07:00
|
|
|
kthread_stop(ctrl->poll_thread);
|
2008-04-26 04:39:08 +07:00
|
|
|
else
|
2008-08-22 15:16:48 +07:00
|
|
|
free_irq(ctrl->pcie->irq, ctrl);
|
2008-04-26 04:39:08 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
static int pcie_poll_cmd(struct controller *ctrl, int timeout)
|
2008-05-27 17:05:26 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2008-05-27 17:05:26 +07:00
|
|
|
u16 slot_status;
|
|
|
|
|
2019-11-08 18:18:55 +07:00
|
|
|
do {
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
PCI: pciehp: Handle invalid data when reading from non-existent devices
It's platform-dependent, but an MMIO read to a non-existent PCI device
generally returns data with all bits set. This happens when the host
bridge or Root Complex times out waiting for a response from the device and
fabricates return data to complete the CPU's read.
One example, reported in the bugzilla below, involved this hierarchy:
pci 0000:00:1c.0: PCI bridge to [bus 02-3a] Root Port
pci 0000:02:00.0: PCI bridge to [bus 03-0a] Upstream Port
pci 0000:03:03.0: PCI bridge to [bus 05-07] Downstream Port
pci 0000:05:00.0: PCI bridge to [bus 06-07] Thunderbolt Upstream Port
pci 0000:06:00.0: PCI bridge to [bus 07] Thunderbolt Downstream Port
pci 0000:07:00.0: BCM57762 NIC
Unplugging the Thunderbolt switch and the NIC below it resulted in this:
pciehp 0000:03:03.0: Surprise Removal
tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
pciehp 0000:06:00.0: unloading service driver pciehp
pciehp 0000:06:00.0: pcie_isr: intr_loc 11f
pciehp 0000:06:00.0: Switch interrupt received
pciehp 0000:06:00.0: Latch open on Slot
pciehp 0000:06:00.0: Attention button interrupt received
pciehp 0000:06:00.0: Button pressed on Slot
pciehp 0000:06:00.0: Presence/Notify input change
pciehp 0000:06:00.0: Card present on Slot
pciehp 0000:06:00.0: Power fault interrupt received
pciehp 0000:06:00.0: Data Link Layer State change
pciehp 0000:06:00.0: Link Up event
The pciehp driver correctly noticed that the Thunderbolt switch (05:00.0
and 06:00.0) and NIC (07:00.0) had been removed, and it called their driver
remove methods.
Since the NIC was already gone, tg3 received 0xffffffff when it tried to
read from the device. The resulting timeout is a tg3 issue and not of
interest here.
Similarly, since the 06:00.0 Thunderbolt switch was already gone,
pcie_isr() received 0xffff when it tried to read PCI_EXP_SLTSTA, and pciehp
thought that was valid status showing that many events had happened: the
latch had been opened, the attention button had been pressed, a card was
now present, and the link was now up. These are all wrong, of course, but
pciehp went on to try to power up and enumerate devices below the
non-existent bridge:
pciehp 0000:06:00.0: PCI slot - powering on due to button press
pciehp 0000:06:00.0: Surprise Insertion
pci 0000:07:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[bhelgaas: changelog, also check in pcie_poll_cmd() & pcie_do_write_cmd()]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99841
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-07-21 23:25:30 +07:00
|
|
|
if (slot_status == (u16) ~0) {
|
|
|
|
ctrl_info(ctrl, "%s: no response from device\n",
|
|
|
|
__func__);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
if (slot_status & PCI_EXP_SLTSTA_CC) {
|
2013-05-10 00:26:16 +07:00
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
|
|
|
|
PCI_EXP_SLTSTA_CC);
|
2008-12-19 13:19:02 +07:00
|
|
|
return 1;
|
2008-06-20 10:04:33 +07:00
|
|
|
}
|
2015-06-19 14:57:45 +07:00
|
|
|
msleep(10);
|
|
|
|
timeout -= 10;
|
2019-11-08 18:18:55 +07:00
|
|
|
} while (timeout >= 0);
|
2008-05-27 17:05:26 +07:00
|
|
|
return 0; /* timeout */
|
|
|
|
}
|
|
|
|
|
2014-06-14 02:58:35 +07:00
|
|
|
static void pcie_wait_cmd(struct controller *ctrl)
|
2006-12-22 08:01:09 +07:00
|
|
|
{
|
2006-12-22 08:01:10 +07:00
|
|
|
unsigned int msecs = pciehp_poll_mode ? 2500 : 1000;
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
unsigned long duration = msecs_to_jiffies(msecs);
|
|
|
|
unsigned long cmd_timeout = ctrl->cmd_started + duration;
|
|
|
|
unsigned long now, timeout;
|
2006-12-22 08:01:10 +07:00
|
|
|
int rc;
|
|
|
|
|
2014-06-14 02:58:35 +07:00
|
|
|
/*
|
|
|
|
* If the controller does not generate notifications for command
|
|
|
|
* completions, we never need to wait between writes.
|
|
|
|
*/
|
2014-06-27 01:58:55 +07:00
|
|
|
if (NO_CMD_CMPL(ctrl))
|
2014-06-14 02:58:35 +07:00
|
|
|
return;
|
|
|
|
|
|
|
|
if (!ctrl->cmd_busy)
|
|
|
|
return;
|
|
|
|
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
/*
|
|
|
|
* Even if the command has already timed out, we want to call
|
|
|
|
* pcie_poll_cmd() so it can clear PCI_EXP_SLTSTA_CC.
|
|
|
|
*/
|
|
|
|
now = jiffies;
|
|
|
|
if (time_before_eq(cmd_timeout, now))
|
|
|
|
timeout = 1;
|
|
|
|
else
|
|
|
|
timeout = cmd_timeout - now;
|
|
|
|
|
2014-06-14 02:58:35 +07:00
|
|
|
if (ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE &&
|
|
|
|
ctrl->slot_ctrl & PCI_EXP_SLTCTL_CCIE)
|
2008-05-28 12:59:44 +07:00
|
|
|
rc = wait_event_timeout(ctrl->queue, !ctrl->cmd_busy, timeout);
|
2014-06-14 02:58:35 +07:00
|
|
|
else
|
2014-09-23 09:05:45 +07:00
|
|
|
rc = pcie_poll_cmd(ctrl, jiffies_to_msecs(timeout));
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
|
2006-12-22 08:01:10 +07:00
|
|
|
if (!rc)
|
2014-08-16 06:18:44 +07:00
|
|
|
ctrl_info(ctrl, "Timeout on hotplug command %#06x (issued %u msec ago)\n",
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
ctrl->slot_ctrl,
|
2014-09-23 09:07:35 +07:00
|
|
|
jiffies_to_msecs(jiffies - ctrl->cmd_started));
|
2006-12-22 08:01:09 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
#define CC_ERRATUM_MASK (PCI_EXP_SLTCTL_PCC | \
|
|
|
|
PCI_EXP_SLTCTL_PIC | \
|
|
|
|
PCI_EXP_SLTCTL_AIC | \
|
|
|
|
PCI_EXP_SLTCTL_EIC)
|
|
|
|
|
2015-06-09 06:10:50 +07:00
|
|
|
static void pcie_do_write_cmd(struct controller *ctrl, u16 cmd,
|
|
|
|
u16 mask, bool wait)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
u16 slot_ctrl_orig, slot_ctrl;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2006-12-22 08:01:09 +07:00
|
|
|
mutex_lock(&ctrl->ctrl_lock);
|
|
|
|
|
2015-06-09 06:10:50 +07:00
|
|
|
/*
|
|
|
|
* Always wait for any previous command that might still be in progress
|
|
|
|
*/
|
PCI: pciehp: Wait for hotplug command completion lazily
Previously we issued a hotplug command and waited for it to complete. But
there's no need to wait until we're ready to issue the *next* command. The
next command will probably be much later, so the first one may have already
completed and we may not have to actually wait at all.
Because of hardware errata, some controllers generate command completion
events for some commands but not others. In the case of Intel CF118 (see
spec update reference), the controller indicates command completion only
for Slot Control writes that change the value of the following bits:
Power Controller Control
Power Indicator Control
Attention Indicator Control
Electromechanical Interlock Control
Changes to other bits, e.g., the interrupt enable bits, do not cause the
Command Completed bit to be set. Controllers from AMD and Nvidia are
reported to have similar errata.
These errata cause timeouts when pcie_enable_notification() enables
interrupts. Previously that timeout occurred at boot-time. With this
change, the timeout occurs later, when we change the state of the slot
power, indicators, or interlock. This speeds up boot but causes a timeout
at the first hotplug event on the slot. Subsequent events don't timeout
because only the first (boot-time) hotplug command updates Slot Control
without touching the power/indicator/interlock controls.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 04:06:40 +07:00
|
|
|
pcie_wait_cmd(ctrl);
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
|
PCI: pciehp: Handle invalid data when reading from non-existent devices
It's platform-dependent, but an MMIO read to a non-existent PCI device
generally returns data with all bits set. This happens when the host
bridge or Root Complex times out waiting for a response from the device and
fabricates return data to complete the CPU's read.
One example, reported in the bugzilla below, involved this hierarchy:
pci 0000:00:1c.0: PCI bridge to [bus 02-3a] Root Port
pci 0000:02:00.0: PCI bridge to [bus 03-0a] Upstream Port
pci 0000:03:03.0: PCI bridge to [bus 05-07] Downstream Port
pci 0000:05:00.0: PCI bridge to [bus 06-07] Thunderbolt Upstream Port
pci 0000:06:00.0: PCI bridge to [bus 07] Thunderbolt Downstream Port
pci 0000:07:00.0: BCM57762 NIC
Unplugging the Thunderbolt switch and the NIC below it resulted in this:
pciehp 0000:03:03.0: Surprise Removal
tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
pciehp 0000:06:00.0: unloading service driver pciehp
pciehp 0000:06:00.0: pcie_isr: intr_loc 11f
pciehp 0000:06:00.0: Switch interrupt received
pciehp 0000:06:00.0: Latch open on Slot
pciehp 0000:06:00.0: Attention button interrupt received
pciehp 0000:06:00.0: Button pressed on Slot
pciehp 0000:06:00.0: Presence/Notify input change
pciehp 0000:06:00.0: Card present on Slot
pciehp 0000:06:00.0: Power fault interrupt received
pciehp 0000:06:00.0: Data Link Layer State change
pciehp 0000:06:00.0: Link Up event
The pciehp driver correctly noticed that the Thunderbolt switch (05:00.0
and 06:00.0) and NIC (07:00.0) had been removed, and it called their driver
remove methods.
Since the NIC was already gone, tg3 received 0xffffffff when it tried to
read from the device. The resulting timeout is a tg3 issue and not of
interest here.
Similarly, since the 06:00.0 Thunderbolt switch was already gone,
pcie_isr() received 0xffff when it tried to read PCI_EXP_SLTSTA, and pciehp
thought that was valid status showing that many events had happened: the
latch had been opened, the attention button had been pressed, a card was
now present, and the link was now up. These are all wrong, of course, but
pciehp went on to try to power up and enumerate devices below the
non-existent bridge:
pciehp 0000:06:00.0: PCI slot - powering on due to button press
pciehp 0000:06:00.0: Surprise Insertion
pci 0000:07:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[bhelgaas: changelog, also check in pcie_poll_cmd() & pcie_do_write_cmd()]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99841
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-07-21 23:25:30 +07:00
|
|
|
if (slot_ctrl == (u16) ~0) {
|
|
|
|
ctrl_info(ctrl, "%s: no response from device\n", __func__);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
slot_ctrl_orig = slot_ctrl;
|
2007-05-31 23:43:34 +07:00
|
|
|
slot_ctrl &= ~mask;
|
2008-04-26 04:39:14 +07:00
|
|
|
slot_ctrl |= (cmd & mask);
|
2007-05-31 23:43:34 +07:00
|
|
|
ctrl->cmd_busy = 1;
|
2008-04-26 04:39:02 +07:00
|
|
|
smp_mb();
|
2019-01-07 20:09:40 +07:00
|
|
|
ctrl->slot_ctrl = slot_ctrl;
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTCTL, slot_ctrl);
|
PCI: pciehp: Compute timeout from hotplug command start time
If we issue a hotplug command, go do something else, then come back and
wait for the command to complete, we don't have to wait the whole timeout
period, because some of it elapsed while we were doing something else.
Keep track of the time we issued the command, and wait only until the
timeout period from that point has elapsed.
For controllers with errata like Intel CF118, we previously timed out
before issuing the second hotplug command:
At time T1 (during boot):
- Write DLLSCE, ABPE, PDCE, etc. to Slot Control
At time T2 (hotplug event):
- Wait for command completion (CC) in Slot Status
- Timeout at T2 + 1 second because CC is never set in Slot Status
- Write PCC, PIC, etc. to Slot Control
With this change, we wait until T1 + 1 second instead of T2 + 1 second.
If the hotplug event is more than 1 second after the boot-time
initialization, we won't wait for the timeout at all.
We still emit a "Timeout on hotplug command" message if it timed out; we
should see this on the first hotplug event on every controller with this
erratum, as well as on real errors on controllers without the erratum.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Tested-by: Rajat Jain <rajatxjain@gmail.com> (IDT 807a controller)
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
2014-06-14 22:55:49 +07:00
|
|
|
ctrl->cmd_started = jiffies;
|
2007-05-31 23:43:34 +07:00
|
|
|
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
/*
|
|
|
|
* Controllers with the Intel CF118 and similar errata advertise
|
|
|
|
* Command Completed support, but they only set Command Completed
|
|
|
|
* if we change the "Control" bits for power, power indicator,
|
|
|
|
* attention indicator, or interlock. If we only change the
|
|
|
|
* "Enable" bits, they never set the Command Completed bit.
|
|
|
|
*/
|
|
|
|
if (pdev->broken_cmd_compl &&
|
|
|
|
(slot_ctrl_orig & CC_ERRATUM_MASK) == (slot_ctrl & CC_ERRATUM_MASK))
|
|
|
|
ctrl->cmd_busy = 0;
|
|
|
|
|
2015-06-09 06:10:50 +07:00
|
|
|
/*
|
|
|
|
* Optionally wait for the hardware to be ready for a new command,
|
|
|
|
* indicating completion of the above issued command.
|
|
|
|
*/
|
|
|
|
if (wait)
|
|
|
|
pcie_wait_cmd(ctrl);
|
|
|
|
|
PCI: pciehp: Handle invalid data when reading from non-existent devices
It's platform-dependent, but an MMIO read to a non-existent PCI device
generally returns data with all bits set. This happens when the host
bridge or Root Complex times out waiting for a response from the device and
fabricates return data to complete the CPU's read.
One example, reported in the bugzilla below, involved this hierarchy:
pci 0000:00:1c.0: PCI bridge to [bus 02-3a] Root Port
pci 0000:02:00.0: PCI bridge to [bus 03-0a] Upstream Port
pci 0000:03:03.0: PCI bridge to [bus 05-07] Downstream Port
pci 0000:05:00.0: PCI bridge to [bus 06-07] Thunderbolt Upstream Port
pci 0000:06:00.0: PCI bridge to [bus 07] Thunderbolt Downstream Port
pci 0000:07:00.0: BCM57762 NIC
Unplugging the Thunderbolt switch and the NIC below it resulted in this:
pciehp 0000:03:03.0: Surprise Removal
tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE will not clear MAC_TX_MODE=ffffffff
pciehp 0000:06:00.0: unloading service driver pciehp
pciehp 0000:06:00.0: pcie_isr: intr_loc 11f
pciehp 0000:06:00.0: Switch interrupt received
pciehp 0000:06:00.0: Latch open on Slot
pciehp 0000:06:00.0: Attention button interrupt received
pciehp 0000:06:00.0: Button pressed on Slot
pciehp 0000:06:00.0: Presence/Notify input change
pciehp 0000:06:00.0: Card present on Slot
pciehp 0000:06:00.0: Power fault interrupt received
pciehp 0000:06:00.0: Data Link Layer State change
pciehp 0000:06:00.0: Link Up event
The pciehp driver correctly noticed that the Thunderbolt switch (05:00.0
and 06:00.0) and NIC (07:00.0) had been removed, and it called their driver
remove methods.
Since the NIC was already gone, tg3 received 0xffffffff when it tried to
read from the device. The resulting timeout is a tg3 issue and not of
interest here.
Similarly, since the 06:00.0 Thunderbolt switch was already gone,
pcie_isr() received 0xffff when it tried to read PCI_EXP_SLTSTA, and pciehp
thought that was valid status showing that many events had happened: the
latch had been opened, the attention button had been pressed, a card was
now present, and the link was now up. These are all wrong, of course, but
pciehp went on to try to power up and enumerate devices below the
non-existent bridge:
pciehp 0000:06:00.0: PCI slot - powering on due to button press
pciehp 0000:06:00.0: Surprise Insertion
pci 0000:07:00.0 id reading try 50 times with interval 20 ms to get ffffffff
[bhelgaas: changelog, also check in pcie_poll_cmd() & pcie_do_write_cmd()]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99841
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2015-07-21 23:25:30 +07:00
|
|
|
out:
|
2006-12-22 08:01:09 +07:00
|
|
|
mutex_unlock(&ctrl->ctrl_lock);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2015-06-09 06:10:50 +07:00
|
|
|
/**
|
|
|
|
* pcie_write_cmd - Issue controller command
|
|
|
|
* @ctrl: controller to which the command is issued
|
|
|
|
* @cmd: command value written to slot control register
|
|
|
|
* @mask: bitmask of slot control register to be modified
|
|
|
|
*/
|
|
|
|
static void pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
|
|
|
|
{
|
|
|
|
pcie_do_write_cmd(ctrl, cmd, mask, true);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Same as above without waiting for the hardware to latch */
|
|
|
|
static void pcie_write_cmd_nowait(struct controller *ctrl, u16 cmd, u16 mask)
|
|
|
|
{
|
|
|
|
pcie_do_write_cmd(ctrl, cmd, mask, false);
|
|
|
|
}
|
|
|
|
|
2019-10-30 00:00:22 +07:00
|
|
|
/**
|
|
|
|
* pciehp_check_link_active() - Is the link active
|
|
|
|
* @ctrl: PCIe hotplug controller
|
|
|
|
*
|
|
|
|
* Check whether the downstream link is currently active. Note it is
|
|
|
|
* possible that the card is removed immediately after this so the
|
|
|
|
* caller may need to take it into account.
|
|
|
|
*
|
|
|
|
* If the hotplug controller itself is not available anymore returns
|
|
|
|
* %-ENODEV.
|
|
|
|
*/
|
|
|
|
int pciehp_check_link_active(struct controller *ctrl)
|
2008-10-22 12:31:44 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2012-01-28 01:55:12 +07:00
|
|
|
u16 lnk_status;
|
2019-10-30 00:00:22 +07:00
|
|
|
int ret;
|
2008-10-22 12:31:44 +07:00
|
|
|
|
2019-10-30 00:00:22 +07:00
|
|
|
ret = pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
|
|
|
|
if (ret == PCIBIOS_DEVICE_NOT_FOUND || lnk_status == (u16)~0)
|
|
|
|
return -ENODEV;
|
2012-01-28 01:55:12 +07:00
|
|
|
|
2019-10-30 00:00:22 +07:00
|
|
|
ret = !!(lnk_status & PCI_EXP_LNKSTA_DLLLA);
|
|
|
|
ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
|
2012-01-28 01:55:12 +07:00
|
|
|
|
|
|
|
return ret;
|
2008-10-22 12:31:44 +07:00
|
|
|
}
|
|
|
|
|
2012-01-28 01:55:11 +07:00
|
|
|
static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
|
|
|
|
{
|
|
|
|
u32 l;
|
|
|
|
int count = 0;
|
|
|
|
int delay = 1000, step = 20;
|
|
|
|
bool found = false;
|
|
|
|
|
|
|
|
do {
|
|
|
|
found = pci_bus_read_dev_vendor_id(bus, devfn, &l, 0);
|
|
|
|
count++;
|
|
|
|
|
|
|
|
if (found)
|
|
|
|
break;
|
|
|
|
|
|
|
|
msleep(step);
|
|
|
|
delay -= step;
|
|
|
|
} while (delay > 0);
|
|
|
|
|
2019-05-09 02:59:00 +07:00
|
|
|
if (count > 1)
|
2019-05-08 06:24:53 +07:00
|
|
|
pr_debug("pci %04x:%02x:%02x.%d id reading try %d times with interval %d ms to get %08x\n",
|
2012-01-28 01:55:11 +07:00
|
|
|
pci_domain_nr(bus), bus->number, PCI_SLOT(devfn),
|
|
|
|
PCI_FUNC(devfn), count, step, l);
|
|
|
|
|
|
|
|
return found;
|
|
|
|
}
|
|
|
|
|
2019-10-26 02:00:46 +07:00
|
|
|
static void pcie_wait_for_presence(struct pci_dev *pdev)
|
|
|
|
{
|
|
|
|
int timeout = 1250;
|
|
|
|
u16 slot_status;
|
|
|
|
|
|
|
|
do {
|
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
|
|
|
if (slot_status & PCI_EXP_SLTSTA_PDS)
|
|
|
|
return;
|
|
|
|
msleep(10);
|
|
|
|
timeout -= 10;
|
|
|
|
} while (timeout > 0);
|
|
|
|
|
|
|
|
pci_info(pdev, "Timeout waiting for Presence Detect\n");
|
|
|
|
}
|
|
|
|
|
2009-09-15 15:30:48 +07:00
|
|
|
int pciehp_check_link_status(struct controller *ctrl)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2013-12-15 03:06:07 +07:00
|
|
|
bool found;
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 lnk_status;
|
|
|
|
|
2018-09-20 23:27:17 +07:00
|
|
|
if (!pcie_wait_for_link(pdev, true))
|
|
|
|
return -1;
|
2008-10-22 12:31:44 +07:00
|
|
|
|
2019-10-26 02:00:46 +07:00
|
|
|
if (ctrl->inband_presence_disabled)
|
|
|
|
pcie_wait_for_presence(pdev);
|
|
|
|
|
2012-01-28 01:55:11 +07:00
|
|
|
found = pci_bus_check_dev(ctrl->pcie->port->subordinate,
|
|
|
|
PCI_DEVFN(0, 0));
|
2011-11-10 14:40:37 +07:00
|
|
|
|
PCI: pciehp: Tolerate initially unstable link
When a device is hotplugged, Presence Detect and Link Up events often do
not occur simultaneously, but with a lag of a few milliseconds. Only
the first event received is relevant, the other one can be disregarded.
Moreover, Stefan Roese reports that on certain platforms, Link State and
Presence Detect may flap for up to 100 ms before stabilizing, suggesting
that such events should be disregarded for at least this long:
https://lkml.kernel.org/r/20180130084121.18653-1-sr@denx.de
On slot enablement, pciehp_check_link_status() waits for 100 ms per
PCIe r4.0, sec 6.7.3.3, then probes the hotplugged device's vendor
register for up to 1 second.
If this succeeds, the link is definitely up, so ignore any Presence
Detect or Link State events that occurred up to this point.
pciehp_check_link_status() then checks the Link Training bit in the
Link Status register. This is the final opportunity to detect
inaccessibility of the device and abort slot enablement. Any link
or presence change that occurs afterwards will cause the slot to be
disabled again immediately after attempting to enable it.
The astute reviewer may appreciate that achieving this behavior would be
more complicated had pciehp not just been converted to enable/disable
the slot exclusively from the IRQ thread: When the slot is enabled via
sysfs, each link or presence flap would otherwise cause the IRQ thread
to run and it would have to sense that those events are belonging to a
concurrent slot enablement operation and disregard them. It would be
much more difficult than this mere 3 line change.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefan Roese <sr@denx.de>
2018-07-20 05:27:49 +07:00
|
|
|
/* ignore link or presence changes up to this point */
|
|
|
|
if (found)
|
|
|
|
atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
|
|
|
|
&ctrl->pending_events);
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnk_status);
|
2008-09-05 10:11:26 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: lnk_status = %x\n", __func__, lnk_status);
|
2008-12-19 13:19:02 +07:00
|
|
|
if ((lnk_status & PCI_EXP_LNKSTA_LT) ||
|
|
|
|
!(lnk_status & PCI_EXP_LNKSTA_NLW)) {
|
2015-06-16 04:28:29 +07:00
|
|
|
ctrl_err(ctrl, "link training error: status %#06x\n",
|
|
|
|
lnk_status);
|
2013-12-15 03:06:07 +07:00
|
|
|
return -1;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2011-11-07 22:53:23 +07:00
|
|
|
pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status);
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
if (!found)
|
|
|
|
return -1;
|
2012-01-28 01:55:11 +07:00
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
return 0;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2012-01-28 01:55:14 +07:00
|
|
|
static int __pciehp_link_set(struct controller *ctrl, bool enable)
|
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2012-01-28 01:55:14 +07:00
|
|
|
u16 lnk_ctrl;
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnk_ctrl);
|
2012-01-28 01:55:14 +07:00
|
|
|
|
|
|
|
if (enable)
|
|
|
|
lnk_ctrl &= ~PCI_EXP_LNKCTL_LD;
|
|
|
|
else
|
|
|
|
lnk_ctrl |= PCI_EXP_LNKCTL_LD;
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnk_ctrl);
|
2012-01-28 01:55:14 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: lnk_ctrl = %x\n", __func__, lnk_ctrl);
|
2013-12-15 03:06:07 +07:00
|
|
|
return 0;
|
2012-01-28 01:55:14 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int pciehp_link_enable(struct controller *ctrl)
|
|
|
|
{
|
|
|
|
return __pciehp_link_set(ctrl, true);
|
|
|
|
}
|
|
|
|
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
int pciehp_get_raw_indicator_status(struct hotplug_slot *hotplug_slot,
|
|
|
|
u8 *status)
|
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
struct controller *ctrl = to_ctrl(hotplug_slot);
|
2018-09-19 02:46:17 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
u16 slot_ctrl;
|
|
|
|
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_get(pdev);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_put(pdev);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
*status = (slot_ctrl & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-19 21:29:00 +07:00
|
|
|
int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
struct controller *ctrl = to_ctrl(hotplug_slot);
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 slot_ctrl;
|
|
|
|
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_get(pdev);
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_put(pdev);
|
2009-11-11 12:34:52 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x, value read %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2013-12-15 03:06:53 +07:00
|
|
|
switch (slot_ctrl & PCI_EXP_SLTCTL_AIC) {
|
|
|
|
case PCI_EXP_SLTCTL_ATTN_IND_ON:
|
2005-04-17 05:20:36 +07:00
|
|
|
*status = 1; /* On */
|
|
|
|
break;
|
2013-12-15 03:06:53 +07:00
|
|
|
case PCI_EXP_SLTCTL_ATTN_IND_BLINK:
|
2005-04-17 05:20:36 +07:00
|
|
|
*status = 2; /* Blink */
|
|
|
|
break;
|
2013-12-15 03:06:53 +07:00
|
|
|
case PCI_EXP_SLTCTL_ATTN_IND_OFF:
|
2005-04-17 05:20:36 +07:00
|
|
|
*status = 0; /* Off */
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
*status = 0xFF;
|
|
|
|
break;
|
|
|
|
}
|
2018-08-19 21:29:00 +07:00
|
|
|
|
|
|
|
return 0;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
void pciehp_get_power_status(struct controller *ctrl, u8 *status)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 slot_ctrl;
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &slot_ctrl);
|
2009-11-11 12:34:52 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x value read %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, slot_ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2013-12-15 03:06:53 +07:00
|
|
|
switch (slot_ctrl & PCI_EXP_SLTCTL_PCC) {
|
|
|
|
case PCI_EXP_SLTCTL_PWR_ON:
|
|
|
|
*status = 1; /* On */
|
2005-04-17 05:20:36 +07:00
|
|
|
break;
|
2013-12-15 03:06:53 +07:00
|
|
|
case PCI_EXP_SLTCTL_PWR_OFF:
|
|
|
|
*status = 0; /* Off */
|
2005-04-17 05:20:36 +07:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
*status = 0xFF;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
void pciehp_get_latch_status(struct controller *ctrl, u8 *status)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2018-09-19 02:46:17 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 slot_status;
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
2008-12-19 13:19:02 +07:00
|
|
|
*status = !!(slot_status & PCI_EXP_SLTSTA_MRLSS);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2019-10-30 00:00:22 +07:00
|
|
|
/**
|
|
|
|
* pciehp_card_present() - Is the card present
|
|
|
|
* @ctrl: PCIe hotplug controller
|
|
|
|
*
|
|
|
|
* Function checks whether the card is currently present in the slot and
|
|
|
|
* in that case returns true. Note it is possible that the card is
|
|
|
|
* removed immediately after the check so the caller may need to take
|
|
|
|
* this into account.
|
|
|
|
*
|
|
|
|
* It the hotplug controller itself is not available anymore returns
|
|
|
|
* %-ENODEV.
|
|
|
|
*/
|
|
|
|
int pciehp_card_present(struct controller *ctrl)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 slot_status;
|
2019-10-30 00:00:22 +07:00
|
|
|
int ret;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2019-10-30 00:00:22 +07:00
|
|
|
ret = pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
|
|
|
if (ret == PCIBIOS_DEVICE_NOT_FOUND || slot_status == (u16)~0)
|
|
|
|
return -ENODEV;
|
|
|
|
|
|
|
|
return !!(slot_status & PCI_EXP_SLTSTA_PDS);
|
2018-09-08 14:59:01 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* pciehp_card_present_or_link_active() - whether given slot is occupied
|
|
|
|
* @ctrl: PCIe hotplug controller
|
|
|
|
*
|
|
|
|
* Unlike pciehp_card_present(), which determines presence solely from the
|
|
|
|
* Presence Detect State bit, this helper also returns true if the Link Active
|
|
|
|
* bit is set. This is a concession to broken hotplug ports which hardwire
|
|
|
|
* Presence Detect State to zero, such as Wilocity's [1ae9:0200].
|
2019-10-30 00:00:22 +07:00
|
|
|
*
|
|
|
|
* Returns: %1 if the slot is occupied and %0 if it is not. If the hotplug
|
|
|
|
* port is not present anymore returns %-ENODEV.
|
2018-09-08 14:59:01 +07:00
|
|
|
*/
|
2019-10-30 00:00:22 +07:00
|
|
|
int pciehp_card_present_or_link_active(struct controller *ctrl)
|
2018-09-08 14:59:01 +07:00
|
|
|
{
|
2019-10-30 00:00:22 +07:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = pciehp_card_present(ctrl);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
return pciehp_check_link_active(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
int pciehp_query_power_fault(struct controller *ctrl)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2018-09-19 02:46:17 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2005-04-17 05:20:36 +07:00
|
|
|
u16 slot_status;
|
|
|
|
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
2008-12-19 13:19:02 +07:00
|
|
|
return !!(slot_status & PCI_EXP_SLTSTA_PFD);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
int pciehp_set_raw_indicator_status(struct hotplug_slot *hotplug_slot,
|
|
|
|
u8 status)
|
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
struct controller *ctrl = to_ctrl(hotplug_slot);
|
2018-07-20 05:27:57 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_get(pdev);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
pcie_write_cmd_nowait(ctrl, status << 6,
|
|
|
|
PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC);
|
2018-07-20 05:27:57 +07:00
|
|
|
pci_config_pm_runtime_put(pdev);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-09-03 18:10:18 +07:00
|
|
|
/**
|
|
|
|
* pciehp_set_indicators() - set attention indicator, power indicator, or both
|
|
|
|
* @ctrl: PCIe hotplug controller
|
|
|
|
* @pwr: one of:
|
|
|
|
* PCI_EXP_SLTCTL_PWR_IND_ON
|
|
|
|
* PCI_EXP_SLTCTL_PWR_IND_BLINK
|
|
|
|
* PCI_EXP_SLTCTL_PWR_IND_OFF
|
|
|
|
* @attn: one of:
|
|
|
|
* PCI_EXP_SLTCTL_ATTN_IND_ON
|
|
|
|
* PCI_EXP_SLTCTL_ATTN_IND_BLINK
|
|
|
|
* PCI_EXP_SLTCTL_ATTN_IND_OFF
|
|
|
|
*
|
|
|
|
* Either @pwr or @attn can also be INDICATOR_NOOP to leave that indicator
|
|
|
|
* unchanged.
|
|
|
|
*/
|
|
|
|
void pciehp_set_indicators(struct controller *ctrl, int pwr, int attn)
|
|
|
|
{
|
|
|
|
u16 cmd = 0, mask = 0;
|
|
|
|
|
|
|
|
if (PWR_LED(ctrl) && pwr != INDICATOR_NOOP) {
|
|
|
|
cmd |= (pwr & PCI_EXP_SLTCTL_PIC);
|
|
|
|
mask |= PCI_EXP_SLTCTL_PIC;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ATTN_LED(ctrl) && attn != INDICATOR_NOOP) {
|
|
|
|
cmd |= (attn & PCI_EXP_SLTCTL_AIC);
|
|
|
|
mask |= PCI_EXP_SLTCTL_AIC;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (cmd) {
|
|
|
|
pcie_write_cmd_nowait(ctrl, cmd, mask);
|
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
int pciehp_power_on_slot(struct controller *ctrl)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2007-05-31 23:43:34 +07:00
|
|
|
u16 slot_status;
|
2013-12-15 03:06:07 +07:00
|
|
|
int retval;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2018-09-06 03:35:41 +07:00
|
|
|
/* Clear power-fault bit from previous power failures */
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
|
2013-12-15 03:06:40 +07:00
|
|
|
if (slot_status & PCI_EXP_SLTSTA_PFD)
|
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
|
|
|
|
PCI_EXP_SLTSTA_PFD);
|
2009-11-13 13:14:10 +07:00
|
|
|
ctrl->power_fault_detected = 0;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2013-12-15 03:06:53 +07:00
|
|
|
pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_ON, PCI_EXP_SLTCTL_PCC);
|
2009-11-11 12:34:52 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
2013-12-15 03:06:53 +07:00
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
|
|
|
|
PCI_EXP_SLTCTL_PWR_ON);
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2012-01-28 01:55:15 +07:00
|
|
|
retval = pciehp_link_enable(ctrl);
|
|
|
|
if (retval)
|
|
|
|
ctrl_err(ctrl, "%s: Can not enable the link!\n", __func__);
|
|
|
|
|
2005-04-17 05:20:36 +07:00
|
|
|
return retval;
|
|
|
|
}
|
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
void pciehp_power_off_slot(struct controller *ctrl)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2013-12-15 03:06:53 +07:00
|
|
|
pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PWR_OFF, PCI_EXP_SLTCTL_PCC);
|
2009-11-11 12:34:52 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
2013-12-15 03:06:53 +07:00
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL,
|
|
|
|
PCI_EXP_SLTCTL_PWR_OFF);
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Process all hotplug events before looking for new ones
Previously we accumulated hotplug events, then processed them, essentially
like this:
events = 0
do {
status = read(Slot Status)
status &= EVENT_MASK # only look at events
events |= status # accumulate events
write(Slot Status, events) # clear events
} while (status)
process events
The problem is that as soon as we clear events in Slot Status, the hardware
may send notifications for new events, and we lose information about the
first events. For example, we might see two Presence Detect Changed
events, but lose the fact that the slot was temporarily empty:
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS clear # slot empty
write PCI_EXP_SLTSTA_PDC # clear PDC event
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS set # slot occupied
The current code does not process a removal; it only processes the
insertion, which fails because we didn't remove the original device.
To avoid this problem, read Slot Status once and process all the events
before reading it again, like this:
do {
read events
clear events
process events
} while (events)
[bhelgaas: changelog, add external loop around pciehp_isr()]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-09-09 03:07:56 +07:00
|
|
|
static irqreturn_t pciehp_isr(int irq, void *dev_id)
|
2005-04-17 05:20:36 +07:00
|
|
|
{
|
2006-12-22 08:01:04 +07:00
|
|
|
struct controller *ctrl = (struct controller *)dev_id;
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
struct device *parent = pdev->dev.parent;
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
u16 status, events = 0;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
/*
|
2018-09-28 04:41:46 +07:00
|
|
|
* Interrupts only occur in D3hot or shallower and only if enabled
|
|
|
|
* in the Slot Control register (PCIe r4.0, sec 6.7.3.4).
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
*/
|
2018-09-28 04:41:46 +07:00
|
|
|
if (pdev->current_state == PCI_D3cold ||
|
|
|
|
(!(ctrl->slot_ctrl & PCI_EXP_SLTCTL_HPIE) && !pciehp_poll_mode))
|
2016-05-13 18:15:31 +07:00
|
|
|
return IRQ_NONE;
|
|
|
|
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
/*
|
|
|
|
* Keep the port accessible by holding a runtime PM ref on its parent.
|
|
|
|
* Defer resume of the parent to the IRQ thread if it's suspended.
|
|
|
|
* Mask the interrupt until then.
|
|
|
|
*/
|
|
|
|
if (parent) {
|
|
|
|
pm_runtime_get_noresume(parent);
|
|
|
|
if (!pm_runtime_active(parent)) {
|
|
|
|
pm_runtime_put(parent);
|
|
|
|
disable_irq_nosync(irq);
|
|
|
|
atomic_or(RERUN_ISR, &ctrl->pending_events);
|
|
|
|
return IRQ_WAKE_THREAD;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
read_status:
|
PCI: pciehp: Process all hotplug events before looking for new ones
Previously we accumulated hotplug events, then processed them, essentially
like this:
events = 0
do {
status = read(Slot Status)
status &= EVENT_MASK # only look at events
events |= status # accumulate events
write(Slot Status, events) # clear events
} while (status)
process events
The problem is that as soon as we clear events in Slot Status, the hardware
may send notifications for new events, and we lose information about the
first events. For example, we might see two Presence Detect Changed
events, but lose the fact that the slot was temporarily empty:
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS clear # slot empty
write PCI_EXP_SLTSTA_PDC # clear PDC event
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS set # slot occupied
The current code does not process a removal; it only processes the
insertion, which fails because we didn't remove the original device.
To avoid this problem, read Slot Status once and process all the events
before reading it again, like this:
do {
read events
clear events
process events
} while (events)
[bhelgaas: changelog, add external loop around pciehp_isr()]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-09-09 03:07:56 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status);
|
|
|
|
if (status == (u16) ~0) {
|
|
|
|
ctrl_info(ctrl, "%s: no response from device\n", __func__);
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
if (parent)
|
|
|
|
pm_runtime_put(parent);
|
PCI: pciehp: Process all hotplug events before looking for new ones
Previously we accumulated hotplug events, then processed them, essentially
like this:
events = 0
do {
status = read(Slot Status)
status &= EVENT_MASK # only look at events
events |= status # accumulate events
write(Slot Status, events) # clear events
} while (status)
process events
The problem is that as soon as we clear events in Slot Status, the hardware
may send notifications for new events, and we lose information about the
first events. For example, we might see two Presence Detect Changed
events, but lose the fact that the slot was temporarily empty:
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS clear # slot empty
write PCI_EXP_SLTSTA_PDC # clear PDC event
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS set # slot occupied
The current code does not process a removal; it only processes the
insertion, which fails because we didn't remove the original device.
To avoid this problem, read Slot Status once and process all the events
before reading it again, like this:
do {
read events
clear events
process events
} while (events)
[bhelgaas: changelog, add external loop around pciehp_isr()]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-09-09 03:07:56 +07:00
|
|
|
return IRQ_NONE;
|
|
|
|
}
|
|
|
|
|
2008-04-26 04:38:57 +07:00
|
|
|
/*
|
PCI: pciehp: Process all hotplug events before looking for new ones
Previously we accumulated hotplug events, then processed them, essentially
like this:
events = 0
do {
status = read(Slot Status)
status &= EVENT_MASK # only look at events
events |= status # accumulate events
write(Slot Status, events) # clear events
} while (status)
process events
The problem is that as soon as we clear events in Slot Status, the hardware
may send notifications for new events, and we lose information about the
first events. For example, we might see two Presence Detect Changed
events, but lose the fact that the slot was temporarily empty:
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS clear # slot empty
write PCI_EXP_SLTSTA_PDC # clear PDC event
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS set # slot occupied
The current code does not process a removal; it only processes the
insertion, which fails because we didn't remove the original device.
To avoid this problem, read Slot Status once and process all the events
before reading it again, like this:
do {
read events
clear events
process events
} while (events)
[bhelgaas: changelog, add external loop around pciehp_isr()]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-09-09 03:07:56 +07:00
|
|
|
* Slot Status contains plain status bits as well as event
|
|
|
|
* notification bits; right now we only want the event bits.
|
2008-04-26 04:38:57 +07:00
|
|
|
*/
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
status &= PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
|
|
|
|
PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_CC |
|
|
|
|
PCI_EXP_SLTSTA_DLLSC;
|
2017-08-01 14:11:52 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If we've already reported a power fault, don't report it again
|
|
|
|
* until we've done something to handle it.
|
|
|
|
*/
|
|
|
|
if (ctrl->power_fault_detected)
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
status &= ~PCI_EXP_SLTSTA_PFD;
|
2017-08-01 14:11:52 +07:00
|
|
|
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
events |= status;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
if (!events) {
|
|
|
|
if (parent)
|
|
|
|
pm_runtime_put(parent);
|
PCI: pciehp: Process all hotplug events before looking for new ones
Previously we accumulated hotplug events, then processed them, essentially
like this:
events = 0
do {
status = read(Slot Status)
status &= EVENT_MASK # only look at events
events |= status # accumulate events
write(Slot Status, events) # clear events
} while (status)
process events
The problem is that as soon as we clear events in Slot Status, the hardware
may send notifications for new events, and we lose information about the
first events. For example, we might see two Presence Detect Changed
events, but lose the fact that the slot was temporarily empty:
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS clear # slot empty
write PCI_EXP_SLTSTA_PDC # clear PDC event
read PCI_EXP_SLTSTA_PDC set, PCI_EXP_SLTSTA_PDS set # slot occupied
The current code does not process a removal; it only processes the
insertion, which fails because we didn't remove the original device.
To avoid this problem, read Slot Status once and process all the events
before reading it again, like this:
do {
read events
clear events
process events
} while (events)
[bhelgaas: changelog, add external loop around pciehp_isr()]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2016-09-09 03:07:56 +07:00
|
|
|
return IRQ_NONE;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
}
|
2007-08-10 06:09:34 +07:00
|
|
|
|
PCI: pciehp: Fix MSI interrupt race
Without this commit, a PCIe hotplug port can stop generating interrupts on
hotplug events, so device adds and removals will not be seen:
The pciehp interrupt handler pciehp_isr() reads the Slot Status register
and then writes back to it to clear the bits that caused the interrupt. If
a different interrupt event bit gets set between the read and the write,
pciehp_isr() returns without having cleared all of the interrupt event
bits. If this happens when the MSI isn't masked (which by default it isn't
in handle_edge_irq(), and which it will never be when MSI per-vector
masking is not supported), we won't get any more hotplug interrupts from
that device.
That is expected behavior, according to the PCIe Base Spec r5.0, section
6.7.3.4, "Software Notification of Hot-Plug Events".
Because the Presence Detect Changed and Data Link Layer State Changed event
bits can both get set at nearly the same time when a device is added or
removed, this is more likely to happen than it might seem. The issue was
found (and can be reproduced rather easily) by connecting and disconnecting
an NVMe storage device on at least one system model where the NVMe devices
were being connected to an AMD PCIe port (PCI device 0x1022/0x1483).
Fix the issue by modifying pciehp_isr() to loop back and re-read the Slot
Status register immediately after writing to it, until it sees that all of
the event status bits have been cleared.
[lukas: drop loop count limitation, write "events" instead of "status",
don't loop back in INTx and poll modes, tweak code comment & commit msg]
Link: https://lore.kernel.org/r/78b4ced5072bfe6e369d20e8b47c279b8c7af12e.1582121613.git.lukas@wunner.de
Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
2020-02-19 21:31:13 +07:00
|
|
|
if (status) {
|
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In MSI mode, all event bits must be zero before the port
|
|
|
|
* will send a new interrupt (PCIe Base Spec r5.0 sec 6.7.3.4).
|
|
|
|
* So re-read the Slot Status register in case a bit was set
|
|
|
|
* between read and write.
|
|
|
|
*/
|
|
|
|
if (pci_dev_msi_enabled(pdev) && !pciehp_poll_mode)
|
|
|
|
goto read_status;
|
|
|
|
}
|
|
|
|
|
2016-09-09 05:30:38 +07:00
|
|
|
ctrl_dbg(ctrl, "pending interrupts %#06x from Slot Status\n", events);
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
if (parent)
|
|
|
|
pm_runtime_put(parent);
|
2007-08-10 06:09:34 +07:00
|
|
|
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
/*
|
|
|
|
* Command Completed notifications are not deferred to the
|
|
|
|
* IRQ thread because it may be waiting for their arrival.
|
|
|
|
*/
|
2016-09-09 05:30:38 +07:00
|
|
|
if (events & PCI_EXP_SLTSTA_CC) {
|
2006-12-22 08:01:10 +07:00
|
|
|
ctrl->cmd_busy = 0;
|
2008-04-26 04:39:02 +07:00
|
|
|
smp_mb();
|
2008-05-28 12:59:44 +07:00
|
|
|
wake_up(&ctrl->queue);
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
|
|
|
|
if (events == PCI_EXP_SLTSTA_CC)
|
|
|
|
return IRQ_HANDLED;
|
|
|
|
|
|
|
|
events &= ~PCI_EXP_SLTSTA_CC;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2018-07-20 05:27:34 +07:00
|
|
|
if (pdev->ignore_hotplug) {
|
|
|
|
ctrl_dbg(ctrl, "ignoring hotplug event %#06x\n", events);
|
|
|
|
return IRQ_HANDLED;
|
PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device
Powering off a hot-pluggable device, e.g., with pci_set_power_state(D3cold),
normally generates a hot-remove event that unbinds the driver.
Some drivers expect to remain bound to a device even while they power it
off and back on again. This can be dangerous, because if the device is
removed or replaced while it is powered off, the driver doesn't know that
anything changed. But some drivers accept that risk.
Add pci_ignore_hotplug() for use by drivers that know their device cannot
be removed. Using pci_ignore_hotplug() tells the PCI core that hot-plug
events for the device should be ignored.
The radeon and nouveau drivers use this to switch between a low-power,
integrated GPU and a higher-power, higher-performance discrete GPU. They
power off the unused GPU, but they want to remain bound to it.
This is a reimplementation of f244d8b623da ("ACPIPHP / radeon / nouveau:
Fix VGA switcheroo problem related to hotplug") but extends it to work with
both acpiphp and pciehp.
This fixes a problem where systems with dual GPUs using the radeon drivers
become unusable, freezing every few seconds (see bugzillas below). The
resume of the radeon device may also fail, e.g.,
This fixes problems on dual GPU systems where the radeon driver becomes
unusable because of problems while suspending the device, as in bug 79701:
[drm] radeon: finishing device.
radeon 0000:01:00.0: Userspace still has active objects !
radeon 0000:01:00.0: ffff8800cb4ec288 ffff8800cb4ec000 16384 4294967297 force free
...
WARNING: CPU: 0 PID: 67 at /home/apw/COD/linux/drivers/gpu/drm/radeon/radeon_gart.c:234 radeon_gart_unbind+0xd2/0xe0 [radeon]()
trying to unbind memory from uninitialized GART !
or while resuming it, as in bug 77261:
radeon 0000:01:00.0: ring 0 stalled for more than 10158msec
radeon 0000:01:00.0: GPU lockup ...
radeon 0000:01:00.0: GPU pci config reset
pciehp 0000:00:01.0:pcie04: Card not present on Slot(1-1)
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
*ERROR* radeon: dpm resume failed
radeon 0000:01:00.0: Wait for MC idle timedout !
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77261
Link: https://bugzilla.kernel.org/show_bug.cgi?id=79701
Reported-by: Shawn Starr <shawn.starr@rogers.com>
Reported-by: Jose P. <lbdkmjdf@sharklasers.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Rajat Jain <rajatxjain@gmail.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
CC: stable@vger.kernel.org # v3.15+
2014-09-11 02:45:01 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
/* Save pending events for consumption by IRQ thread. */
|
|
|
|
atomic_or(events, &ctrl->pending_events);
|
|
|
|
return IRQ_WAKE_THREAD;
|
|
|
|
}
|
|
|
|
|
|
|
|
static irqreturn_t pciehp_ist(int irq, void *dev_id)
|
|
|
|
{
|
|
|
|
struct controller *ctrl = (struct controller *)dev_id;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
|
|
|
irqreturn_t ret;
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
u32 events;
|
|
|
|
|
2019-08-09 17:28:43 +07:00
|
|
|
ctrl->ist_running = true;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
pci_config_pm_runtime_get(pdev);
|
|
|
|
|
|
|
|
/* rerun pciehp_isr() if the port was inaccessible on interrupt */
|
|
|
|
if (atomic_fetch_and(~RERUN_ISR, &ctrl->pending_events) & RERUN_ISR) {
|
|
|
|
ret = pciehp_isr(irq, dev_id);
|
|
|
|
enable_irq(irq);
|
2020-03-18 18:33:12 +07:00
|
|
|
if (ret != IRQ_WAKE_THREAD)
|
|
|
|
goto out;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
}
|
|
|
|
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
synchronize_hardirq(irq);
|
|
|
|
events = atomic_xchg(&ctrl->pending_events, 0);
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
if (!events) {
|
2020-03-18 18:33:12 +07:00
|
|
|
ret = IRQ_NONE;
|
|
|
|
goto out;
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
}
|
PCI: pciehp: Convert to threaded IRQ
pciehp's IRQ handler queues up a work item for each event signaled by
the hardware. A more modern alternative is to let a long running
kthread service the events. The IRQ handler's sole job is then to check
whether the IRQ originated from the device in question, acknowledge its
receipt to the hardware to quiesce the interrupt and wake up the kthread.
One benefit is reduced latency to handle the IRQ, which is a necessity
for realtime environments. Another benefit is that we can make pciehp
simpler and more robust by handling events synchronously in process
context, rather than asynchronously by queueing up work items. pciehp's
usage of work items is a historic artifact, it predates the introduction
of threaded IRQ handlers by two years. (The former was introduced in
2007 with commit 5d386e1ac402 ("pciehp: Event handling rework"), the
latter in 2009 with commit 3aa551c9b4c4 ("genirq: add threaded interrupt
handler support").)
Convert pciehp to threaded IRQ handling by retrieving the pending events
in pciehp_isr(), saving them for later consumption by the thread handler
pciehp_ist() and clearing them in the Slot Status register.
By clearing the Slot Status (and thereby acknowledging the events) in
pciehp_isr(), we can avoid requesting the IRQ with IRQF_ONESHOT, which
would have the unpleasant side effect of starving devices sharing the
IRQ until pciehp_ist() has finished.
pciehp_isr() does not count how many times each event occurred, but
merely records the fact *that* an event occurred. If the same event
occurs a second time before pciehp_ist() is woken, that second event
will not be recorded separately, which is problematic according to
commit fad214b0aa72 ("PCI: pciehp: Process all hotplug events before
looking for new ones") because we may miss removal of a card in-between
two back-to-back insertions. We're about to make pciehp_ist() resilient
to missed events. The present commit regresses the driver's behavior
temporarily in order to separate the changes into reviewable chunks.
This doesn't affect regular slow-motion hotplug, only plug-unplug-plug
operations that happen in a timespan shorter than wakeup of the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:38 +07:00
|
|
|
|
2008-04-26 04:38:57 +07:00
|
|
|
/* Check Attention Button Pressed */
|
2016-09-09 05:30:38 +07:00
|
|
|
if (events & PCI_EXP_SLTSTA_ABP) {
|
2016-09-09 03:19:58 +07:00
|
|
|
ctrl_info(ctrl, "Slot(%s): Attention button pressed\n",
|
2018-09-19 02:46:17 +07:00
|
|
|
slot_name(ctrl));
|
|
|
|
pciehp_handle_button_press(ctrl);
|
2015-06-15 09:35:13 +07:00
|
|
|
}
|
2006-12-22 08:01:04 +07:00
|
|
|
|
2018-09-06 03:35:41 +07:00
|
|
|
/* Check Power Fault Detected */
|
|
|
|
if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
|
|
|
|
ctrl->power_fault_detected = 1;
|
2018-09-19 02:46:17 +07:00
|
|
|
ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(ctrl));
|
2019-09-03 18:10:19 +07:00
|
|
|
pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
|
|
|
|
PCI_EXP_SLTCTL_ATTN_IND_ON);
|
2018-09-06 03:35:41 +07:00
|
|
|
}
|
|
|
|
|
2016-11-19 15:32:45 +07:00
|
|
|
/*
|
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 05:27:46 +07:00
|
|
|
* Disable requests have higher priority than Presence Detect Changed
|
|
|
|
* or Data Link Layer State Changed events.
|
2016-11-19 15:32:45 +07:00
|
|
|
*/
|
2018-07-28 12:18:00 +07:00
|
|
|
down_read(&ctrl->reset_lock);
|
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 05:27:46 +07:00
|
|
|
if (events & DISABLE_SLOT)
|
2018-09-19 02:46:17 +07:00
|
|
|
pciehp_handle_disable_request(ctrl);
|
PCI: pciehp: Become resilient to missed events
A hotplug port's Slot Status register does not count how often each type
of event occurred, it only records the fact *that* an event has occurred.
Previously pciehp queued a work item for each event. But if it missed
an event, e.g. removal of a card in-between two back-to-back insertions,
it queued up the wrong work item or no work item at all. Commit
fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking
for new ones") sought to improve the situation by shrinking the window
during which events may be missed.
But Stefan Roese reports unbalanced Card present and Link Up events,
suggesting that we're still missing events if they occur very rapidly.
Bjorn Helgaas responds that he considers pciehp's event handling
"baroque" and calls for its simplification and rationalization:
https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
It gets worse once a hotplug port is runtime suspended: The port can
signal an interrupt while it and its parents are in D3hot, i.e. while
it is inaccessible. By the time we've runtime resumed all parents to D0
and read the port's Slot Status register, we may have missed an arbitrary
number of events. Event handling therefore needs to be reworked to
become resilient to missed events.
Assume that a Presence Detect Changed event has occurred.
Consider the following truth table:
- Slot is in OFF_STATE and is currently empty. => Do nothing.
(The event is trailing a Link Down or we've
missed an insertion and subsequent removal.)
- Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
- Slot is in ON_STATE and is currently empty. => Turn the slot off.
- Slot is in ON_STATE and is currently occupied. => Turn the slot off,
(Be cautious and assume the card in then back on.
the slot isn't the same as before.)
This leads to the following simple algorithm:
1 If the slot is in ON_STATE, turn it off unconditionally.
2 If the slot is currently occupied, turn it on.
Because those actions are now carried out synchronously, rather than by
scheduled work items, pciehp reacts to the *current* situation and
missed events no longer matter.
Data Link Layer State Changed events can be handled identically to
Presence Detect Changed events. Note that in the above truth table,
a Link Up trailing a Card present event didn't have to be accounted for:
It is filtered out by pciehp_check_link_status().
As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
"Once the Power Indicator begins blinking, a 5-second abort interval
exists during which a second depression of the Attention Button cancels
the operation." In other words, the user can only expect the system to
react to a button press after it starts blinking. Missed button presses
that occur in-between are irrelevant.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefan Roese <sr@denx.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
2018-07-20 05:27:49 +07:00
|
|
|
else if (events & (PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC))
|
2018-09-19 02:46:17 +07:00
|
|
|
pciehp_handle_presence_or_link_change(ctrl, events);
|
2018-07-28 12:18:00 +07:00
|
|
|
up_read(&ctrl->reset_lock);
|
2006-12-22 08:01:04 +07:00
|
|
|
|
2020-03-18 18:33:12 +07:00
|
|
|
ret = IRQ_HANDLED;
|
|
|
|
out:
|
PCI: pciehp: Support interrupts sent from D3hot
If a hotplug port is able to send an interrupt, one would naively assume
that it is accessible at that moment. After all, if it wouldn't be
accessible, i.e. if its parent is in D3hot and the link to the hotplug
port is thus down, how should an interrupt come through?
It turns out that assumption is wrong at least for Thunderbolt: Even
though its parents are in D3hot, a Thunderbolt hotplug port is able to
signal interrupts. Because the port's config space is inaccessible and
resuming the parents may sleep, the hard IRQ handler has to defer
runtime resuming the parents and reading the Slot Status register to the
IRQ thread.
If the hotplug port uses a level-triggered INTx interrupt, it needs to
be masked until the IRQ thread has cleared the signaled events. For
simplicity, this commit also masks edge-triggered MSI/MSI-X interrupts.
Note that if the interrupt is shared (which can only happen for INTx),
other devices are starved from receiving interrupts until the IRQ thread
is scheduled, has runtime resumed the hotplug port's parents and has
read and cleared the Slot Status register.
That delay is dominated by the 10 ms D3hot->D0 transition time of each
parent port. The worst case is a Thunderbolt downstream port at the
end of a daisy chain: There may be up to six Thunderbolt controllers
in-between it and the root port, each comprising an upstream and
downstream port, plus its own upstream port. That's 13 x 10 = 130 ms.
Possible mitigations are polling the interrupt while it's disabled or
reducing the d3_delay of Thunderbolt ports if possible.
Open code masking of the interrupt instead of requesting it with the
IRQF_ONESHOT flag to minimize the period during which it is masked.
(IRQF_ONESHOT unmasks the IRQ only after the IRQ thread has finished.)
PCIe r4.0 sec 6.7.3.4 states that "If wake generation is required by the
associated form factor specification, a hotplug capable Downstream Port
must support generation of a wakeup event (using the PME mechanism) on
hotplug events that occur when the system is in a sleep state or the
Port is in device state D1, D2, or D3Hot."
This would seem to imply that PME needs to be enabled on the hotplug
port when it is runtime suspended. pci_enable_wake() currently doesn't
enable PME on bridges, it may be necessary to add an exemption for
hotplug bridges there. On "Light Ridge" Thunderbolt controllers, the
PME_Status bit is not set when an interrupt occurs while the hotplug
port is in D3hot, even if PME is enabled. (I've tested this on a Mac
and we hardcode the OSC_PCI_EXPRESS_PME_CONTROL bit to 0 on Macs in
negotiate_os_control(), modifying it to 1 didn't change the behavior.)
(Side note: Section 6.7.3.4 also states that "PME and Hot-Plug Event
interrupts (when both are implemented) always share the same MSI or
MSI-X vector". That would only seem to apply to Root Ports, however
the section never mentions Root Ports, only Downstream Ports. This is
explained in the definition of "Downstream Port" in the "Terms and
Acronyms" section of the PCIe Base Spec: "The Ports on a Switch that
are not the Upstream Port are Downstream Ports. All Ports on a Root
Complex are Downstream Ports.")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
2018-07-28 12:18:00 +07:00
|
|
|
pci_config_pm_runtime_put(pdev);
|
2019-08-09 17:28:43 +07:00
|
|
|
ctrl->ist_running = false;
|
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 05:27:46 +07:00
|
|
|
wake_up(&ctrl->requester);
|
2020-03-18 18:33:12 +07:00
|
|
|
return ret;
|
2005-04-17 05:20:36 +07:00
|
|
|
}
|
|
|
|
|
2018-07-20 05:27:39 +07:00
|
|
|
static int pciehp_poll(void *data)
|
|
|
|
{
|
|
|
|
struct controller *ctrl = data;
|
|
|
|
|
|
|
|
schedule_timeout_idle(10 * HZ); /* start with 10 sec delay */
|
|
|
|
|
|
|
|
while (!kthread_should_stop()) {
|
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 05:27:46 +07:00
|
|
|
/* poll for interrupt events or user requests */
|
|
|
|
while (pciehp_isr(IRQ_NOTCONNECTED, ctrl) == IRQ_WAKE_THREAD ||
|
|
|
|
atomic_read(&ctrl->pending_events))
|
2018-07-20 05:27:39 +07:00
|
|
|
pciehp_ist(IRQ_NOTCONNECTED, ctrl);
|
|
|
|
|
|
|
|
if (pciehp_poll_time <= 0 || pciehp_poll_time > 60)
|
|
|
|
pciehp_poll_time = 2; /* clamp to sane value */
|
|
|
|
|
|
|
|
schedule_timeout_idle(pciehp_poll_time * HZ);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-24 05:14:39 +07:00
|
|
|
static void pcie_enable_notification(struct controller *ctrl)
|
2007-11-22 06:07:55 +07:00
|
|
|
{
|
2008-04-26 04:39:05 +07:00
|
|
|
u16 cmd, mask;
|
2005-04-17 05:20:36 +07:00
|
|
|
|
2009-11-13 13:14:10 +07:00
|
|
|
/*
|
|
|
|
* TBD: Power fault detected software notification support.
|
|
|
|
*
|
|
|
|
* Power fault detected software notification is not enabled
|
|
|
|
* now, because it caused power fault detected interrupt storm
|
|
|
|
* on some machines. On those machines, power fault detected
|
|
|
|
* bit in the slot status register was set again immediately
|
|
|
|
* when it is cleared in the interrupt service routine, and
|
|
|
|
* next power fault detected interrupt was notified again.
|
|
|
|
*/
|
2014-02-05 09:29:23 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Always enable link events: thus link-up and link-down shall
|
|
|
|
* always be treated as hotplug and unplug respectively. Enable
|
|
|
|
* presence detect only if Attention Button is not present.
|
|
|
|
*/
|
|
|
|
cmd = PCI_EXP_SLTCTL_DLLSCE;
|
2008-04-26 04:39:06 +07:00
|
|
|
if (ATTN_BUTTN(ctrl))
|
2008-12-19 13:19:02 +07:00
|
|
|
cmd |= PCI_EXP_SLTCTL_ABPE;
|
2014-02-05 09:29:23 +07:00
|
|
|
else
|
|
|
|
cmd |= PCI_EXP_SLTCTL_PDCE;
|
2008-04-26 04:39:05 +07:00
|
|
|
if (!pciehp_poll_mode)
|
2008-12-19 13:19:02 +07:00
|
|
|
cmd |= PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE;
|
2008-04-26 04:39:05 +07:00
|
|
|
|
2008-12-19 13:19:02 +07:00
|
|
|
mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
|
2015-07-02 05:17:49 +07:00
|
|
|
PCI_EXP_SLTCTL_PFDE |
|
2014-02-05 09:29:23 +07:00
|
|
|
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
|
|
|
|
PCI_EXP_SLTCTL_DLLSCE);
|
2008-04-26 04:39:05 +07:00
|
|
|
|
2015-06-09 06:10:50 +07:00
|
|
|
pcie_write_cmd_nowait(ctrl, cmd, mask);
|
2014-09-23 09:36:09 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, cmd);
|
2008-06-20 10:07:08 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void pcie_disable_notification(struct controller *ctrl)
|
|
|
|
{
|
|
|
|
u16 mask;
|
2013-12-15 03:06:16 +07:00
|
|
|
|
2008-12-19 13:19:02 +07:00
|
|
|
mask = (PCI_EXP_SLTCTL_PDCE | PCI_EXP_SLTCTL_ABPE |
|
|
|
|
PCI_EXP_SLTCTL_MRLSCE | PCI_EXP_SLTCTL_PFDE |
|
2009-10-05 15:40:02 +07:00
|
|
|
PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_CCIE |
|
|
|
|
PCI_EXP_SLTCTL_DLLSCE);
|
2013-12-15 03:06:16 +07:00
|
|
|
pcie_write_cmd(ctrl, 0, mask);
|
2014-09-23 09:36:09 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
|
2008-06-20 10:07:08 +07:00
|
|
|
}
|
|
|
|
|
2018-07-20 05:27:53 +07:00
|
|
|
void pcie_clear_hotplug_events(struct controller *ctrl)
|
|
|
|
{
|
|
|
|
pcie_capability_write_word(ctrl_dev(ctrl), PCI_EXP_SLTSTA,
|
|
|
|
PCI_EXP_SLTSTA_PDC | PCI_EXP_SLTSTA_DLLSC);
|
|
|
|
}
|
|
|
|
|
2018-09-28 04:38:19 +07:00
|
|
|
void pcie_enable_interrupt(struct controller *ctrl)
|
|
|
|
{
|
2019-02-01 00:07:46 +07:00
|
|
|
u16 mask;
|
|
|
|
|
|
|
|
mask = PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_DLLSCE;
|
|
|
|
pcie_write_cmd(ctrl, mask, mask);
|
2018-09-28 04:38:19 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void pcie_disable_interrupt(struct controller *ctrl)
|
|
|
|
{
|
2019-02-01 00:07:46 +07:00
|
|
|
u16 mask;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Mask hot-plug interrupt to prevent it triggering immediately
|
|
|
|
* when the link goes inactive (we still get PME when any of the
|
|
|
|
* enabled events is detected). Same goes with Link Layer State
|
|
|
|
* changed event which generates PME immediately when the link goes
|
|
|
|
* inactive so mask it as well.
|
|
|
|
*/
|
|
|
|
mask = PCI_EXP_SLTCTL_HPIE | PCI_EXP_SLTCTL_DLLSCE;
|
|
|
|
pcie_write_cmd(ctrl, 0, mask);
|
2018-09-28 04:38:19 +07:00
|
|
|
}
|
|
|
|
|
2013-08-09 03:09:37 +07:00
|
|
|
/*
|
|
|
|
* pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
|
2014-02-19 09:53:19 +07:00
|
|
|
* bus reset of the bridge, but at the same time we want to ensure that it is
|
|
|
|
* not seen as a hot-unplug, followed by the hot-plug of the device. Thus,
|
|
|
|
* disable link state notification and presence detection change notification
|
|
|
|
* momentarily, if we see that they could interfere. Also, clear any spurious
|
2013-08-09 03:09:37 +07:00
|
|
|
* events after.
|
|
|
|
*/
|
2018-08-19 21:29:00 +07:00
|
|
|
int pciehp_reset_slot(struct hotplug_slot *hotplug_slot, int probe)
|
2013-08-09 03:09:37 +07:00
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
struct controller *ctrl = to_ctrl(hotplug_slot);
|
2013-05-10 00:26:16 +07:00
|
|
|
struct pci_dev *pdev = ctrl_dev(ctrl);
|
2014-02-05 09:30:40 +07:00
|
|
|
u16 stat_mask = 0, ctrl_mask = 0;
|
2018-07-20 06:04:09 +07:00
|
|
|
int rc;
|
2013-08-09 03:09:37 +07:00
|
|
|
|
|
|
|
if (probe)
|
|
|
|
return 0;
|
|
|
|
|
2018-07-28 12:18:00 +07:00
|
|
|
down_write(&ctrl->reset_lock);
|
|
|
|
|
2014-02-19 09:53:19 +07:00
|
|
|
if (!ATTN_BUTTN(ctrl)) {
|
2014-02-05 09:30:40 +07:00
|
|
|
ctrl_mask |= PCI_EXP_SLTCTL_PDCE;
|
|
|
|
stat_mask |= PCI_EXP_SLTSTA_PDC;
|
2013-08-09 03:09:37 +07:00
|
|
|
}
|
2014-02-05 09:30:40 +07:00
|
|
|
ctrl_mask |= PCI_EXP_SLTCTL_DLLSCE;
|
|
|
|
stat_mask |= PCI_EXP_SLTSTA_DLLSC;
|
|
|
|
|
|
|
|
pcie_write_cmd(ctrl, 0, ctrl_mask);
|
2014-09-23 09:36:09 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, 0);
|
2013-08-09 03:09:37 +07:00
|
|
|
|
2018-07-20 06:04:11 +07:00
|
|
|
rc = pci_bridge_secondary_bus_reset(ctrl->pcie->port);
|
2013-08-09 03:09:37 +07:00
|
|
|
|
2014-02-05 09:30:40 +07:00
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, stat_mask);
|
2015-06-09 06:10:50 +07:00
|
|
|
pcie_write_cmd_nowait(ctrl, ctrl_mask, ctrl_mask);
|
2014-09-23 09:36:09 +07:00
|
|
|
ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
|
|
|
|
pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
|
2018-07-28 12:18:00 +07:00
|
|
|
|
|
|
|
up_write(&ctrl->reset_lock);
|
2018-07-20 06:04:09 +07:00
|
|
|
return rc;
|
2013-08-09 03:09:37 +07:00
|
|
|
}
|
|
|
|
|
2009-01-29 10:31:18 +07:00
|
|
|
int pcie_init_notification(struct controller *ctrl)
|
2008-06-20 10:07:08 +07:00
|
|
|
{
|
|
|
|
if (pciehp_request_irq(ctrl))
|
|
|
|
return -1;
|
2013-12-15 03:06:16 +07:00
|
|
|
pcie_enable_notification(ctrl);
|
2009-01-29 10:31:18 +07:00
|
|
|
ctrl->notification_enabled = 1;
|
2008-06-20 10:07:08 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-20 05:27:32 +07:00
|
|
|
void pcie_shutdown_notification(struct controller *ctrl)
|
2008-06-20 10:07:08 +07:00
|
|
|
{
|
2009-01-29 10:31:18 +07:00
|
|
|
if (ctrl->notification_enabled) {
|
|
|
|
pcie_disable_notification(ctrl);
|
|
|
|
pciehp_free_irq(ctrl);
|
|
|
|
ctrl->notification_enabled = 0;
|
|
|
|
}
|
2008-06-20 10:07:08 +07:00
|
|
|
}
|
|
|
|
|
2008-04-26 04:39:08 +07:00
|
|
|
static inline void dbg_ctrl(struct controller *ctrl)
|
2007-11-29 06:11:46 +07:00
|
|
|
{
|
2009-09-15 15:30:14 +07:00
|
|
|
struct pci_dev *pdev = ctrl->pcie->port;
|
2015-06-16 04:28:29 +07:00
|
|
|
u16 reg16;
|
2007-11-29 06:11:46 +07:00
|
|
|
|
2019-05-09 02:59:00 +07:00
|
|
|
ctrl_dbg(ctrl, "Slot Capabilities : 0x%08x\n", ctrl->slot_cap);
|
2013-05-10 00:26:16 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, ®16);
|
2019-05-09 02:59:00 +07:00
|
|
|
ctrl_dbg(ctrl, "Slot Status : 0x%04x\n", reg16);
|
2013-05-10 00:26:16 +07:00
|
|
|
pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, ®16);
|
2019-05-09 02:59:00 +07:00
|
|
|
ctrl_dbg(ctrl, "Slot Control : 0x%04x\n", reg16);
|
2008-04-26 04:39:08 +07:00
|
|
|
}
|
2007-11-29 06:11:46 +07:00
|
|
|
|
2014-04-19 07:13:49 +07:00
|
|
|
#define FLAG(x, y) (((x) & (y)) ? '+' : '-')
|
2013-12-15 03:06:36 +07:00
|
|
|
|
2008-06-20 10:07:08 +07:00
|
|
|
struct controller *pcie_init(struct pcie_device *dev)
|
2008-04-26 04:39:08 +07:00
|
|
|
{
|
2008-06-20 10:07:08 +07:00
|
|
|
struct controller *ctrl;
|
PCI: pciehp: Disable in-band presence detect when possible
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):
Due to architectural issues, the in-band (Physical-Layer-based) portion
of the PD mechanism is deprecated for use with async hot-plug. One issue
is that in-band PD as architected does not detect adapter removal during
certain LTSSM states, notably the L1 and Disabled States. Another issue
is that when both in-band and OOB PD are being used together, the
Presence Detect State bit and its associated interrupt mechanism always
reflect the logical OR of the inband and OOB PD states, and with some
hot-plug hardware configurations, it is important for software to detect
and respond to in-band and OOB PD events independently. If OOB PD is
being used and the associated DSP supports In-Band PD Disable, it is
recommended that the In-Band PD Disable bit be Set, and the Presence
Detect State bit and its associated interrupt mechanism be used
exclusively for OOB PD. As a substitute for in-band PD with async
hot-plug, the reference model uses either the DPC or the DLL Link Active
mechanism.
Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2019-10-26 02:00:45 +07:00
|
|
|
u32 slot_cap, slot_cap2, link_cap;
|
2018-09-08 14:59:01 +07:00
|
|
|
u8 poweron;
|
2008-04-26 04:39:08 +07:00
|
|
|
struct pci_dev *pdev = dev->port;
|
2018-09-19 02:46:17 +07:00
|
|
|
struct pci_bus *subordinate = pdev->subordinate;
|
2007-11-29 06:11:46 +07:00
|
|
|
|
2008-06-20 10:07:08 +07:00
|
|
|
ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL);
|
PCI: Remove unnecessary messages for memory allocation failures
Per ebfdc40969f2 ("checkpatch: attempt to find unnecessary 'out of memory'
messages"), when a memory allocation fails, the memory subsystem emits
generic "out of memory" messages (see slab_out_of_memory() for some of this
logging). Therefore, additional error messages in the caller don't add
much value.
Remove messages that merely report "out of memory".
This preserves some messages that report additional information, e.g.,
allocation failures that mean we drop hotplug events.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[bhelgaas: changelog, squash patches, make similar changes to acpiphp,
cpqphp, ibmphp, keep warning when dropping hotplug event]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-12-29 18:15:16 +07:00
|
|
|
if (!ctrl)
|
2018-09-19 02:46:17 +07:00
|
|
|
return NULL;
|
PCI: Remove unnecessary messages for memory allocation failures
Per ebfdc40969f2 ("checkpatch: attempt to find unnecessary 'out of memory'
messages"), when a memory allocation fails, the memory subsystem emits
generic "out of memory" messages (see slab_out_of_memory() for some of this
logging). Therefore, additional error messages in the caller don't add
much value.
Remove messages that merely report "out of memory".
This preserves some messages that report additional information, e.g.,
allocation failures that mean we drop hotplug events.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[bhelgaas: changelog, squash patches, make similar changes to acpiphp,
cpqphp, ibmphp, keep warning when dropping hotplug event]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-12-29 18:15:16 +07:00
|
|
|
|
2008-08-22 15:16:48 +07:00
|
|
|
ctrl->pcie = dev;
|
2013-12-15 03:06:07 +07:00
|
|
|
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
|
PCI: pciehp: Allow exclusive userspace control of indicators
PCIe hotplug supports optional Attention and Power Indicators, which are
used internally by pciehp. Users can't control the Power Indicator, but
they can control the Attention Indicator by writing to a sysfs "attention"
file.
The Slot Control register has two bits for each indicator, and the PCIe
spec defines the encodings for each as (Reserved/On/Blinking/Off). For
sysfs "attention" writes, pciehp_set_attention_status() maps into these
encodings, so the only useful write values are 0 (Off), 1 (On), and 2
(Blinking).
However, some platforms use all four bits for platform-specific indicators,
and they need to allow direct user control of them while preventing pciehp
from using them at all.
Add a "hotplug_user_indicators" flag to the pci_dev structure. When set,
pciehp does not use either the Attention Indicator or the Power Indicator,
and the low four bits (values 0x0 - 0xf) of sysfs "attention" write values
are written directly to the Attention Indicator Control and Power Indicator
Control fields.
[bhelgaas: changelog, rename flag and accessors to s/attention/indicator/]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-09-13 23:31:59 +07:00
|
|
|
|
|
|
|
if (pdev->hotplug_user_indicators)
|
|
|
|
slot_cap &= ~(PCI_EXP_SLTCAP_AIP | PCI_EXP_SLTCAP_PIP);
|
|
|
|
|
2018-01-17 22:48:39 +07:00
|
|
|
/*
|
|
|
|
* We assume no Thunderbolt controllers support Command Complete events,
|
|
|
|
* but some controllers falsely claim they do.
|
|
|
|
*/
|
|
|
|
if (pdev->is_thunderbolt)
|
|
|
|
slot_cap |= PCI_EXP_SLTCAP_NCCS;
|
|
|
|
|
2008-04-26 04:39:08 +07:00
|
|
|
ctrl->slot_cap = slot_cap;
|
2007-11-29 06:11:46 +07:00
|
|
|
mutex_init(&ctrl->ctrl_lock);
|
2018-09-08 14:59:01 +07:00
|
|
|
mutex_init(&ctrl->state_lock);
|
2018-07-28 12:18:00 +07:00
|
|
|
init_rwsem(&ctrl->reset_lock);
|
PCI: pciehp: Enable/disable exclusively from IRQ thread
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-07-20 05:27:46 +07:00
|
|
|
init_waitqueue_head(&ctrl->requester);
|
2007-11-29 06:11:46 +07:00
|
|
|
init_waitqueue_head(&ctrl->queue);
|
2018-09-08 14:59:01 +07:00
|
|
|
INIT_DELAYED_WORK(&ctrl->button_work, pciehp_queue_pushbutton_work);
|
2008-04-26 04:39:08 +07:00
|
|
|
dbg_ctrl(ctrl);
|
2014-06-14 23:56:31 +07:00
|
|
|
|
2018-09-19 02:46:17 +07:00
|
|
|
down_read(&pci_bus_sem);
|
|
|
|
ctrl->state = list_empty(&subordinate->devices) ? OFF_STATE : ON_STATE;
|
|
|
|
up_read(&pci_bus_sem);
|
|
|
|
|
PCI: pciehp: Disable in-band presence detect when possible
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):
Due to architectural issues, the in-band (Physical-Layer-based) portion
of the PD mechanism is deprecated for use with async hot-plug. One issue
is that in-band PD as architected does not detect adapter removal during
certain LTSSM states, notably the L1 and Disabled States. Another issue
is that when both in-band and OOB PD are being used together, the
Presence Detect State bit and its associated interrupt mechanism always
reflect the logical OR of the inband and OOB PD states, and with some
hot-plug hardware configurations, it is important for software to detect
and respond to in-band and OOB PD events independently. If OOB PD is
being used and the associated DSP supports In-Band PD Disable, it is
recommended that the In-Band PD Disable bit be Set, and the Presence
Detect State bit and its associated interrupt mechanism be used
exclusively for OOB PD. As a substitute for in-band PD with async
hot-plug, the reference model uses either the DPC or the DLL Link Active
mechanism.
Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2019-10-26 02:00:45 +07:00
|
|
|
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP2, &slot_cap2);
|
|
|
|
if (slot_cap2 & PCI_EXP_SLTCAP2_IBPD) {
|
|
|
|
pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_IBPD_DISABLE,
|
|
|
|
PCI_EXP_SLTCTL_IBPD_DISABLE);
|
|
|
|
ctrl->inband_presence_disabled = 1;
|
|
|
|
}
|
|
|
|
|
2019-10-26 02:00:47 +07:00
|
|
|
if (dmi_first_match(inband_presence_disabled_dmi_table))
|
|
|
|
ctrl->inband_presence_disabled = 1;
|
|
|
|
|
2014-04-19 07:13:49 +07:00
|
|
|
/* Check if Data Link Layer Link Active Reporting is implemented */
|
|
|
|
pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
|
2008-10-22 12:31:44 +07:00
|
|
|
|
PCI: pciehp: Always enable occupied slot on probe
Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
there are hot-plug events that occur while interrupt generation is
disabled, and interrupt generation is subsequently enabled."
On probe, we currently clear all event bits in the Slot Status register
with the notable exception of the Presence Detect Changed bit. Thereby
we seek to receive an interrupt for an already occupied slot once event
notification is enabled.
But because the interrupt is optional, users may have to specify the
pciehp_force parameter on the command line, which is inconvenient.
Moreover, now that pciehp's event handling has become resilient to
missed events, a Presence Detect Changed interrupt for a slot which is
powered on is interpreted as removal of the card. If the slot has
already been brought up by the BIOS, receiving such an interrupt on
probe causes the slot to be powered off and immediately back on, which
is likewise undesirable.
Avoid both issues by making the behavior of pciehp_force the default and
clearing the Presence Detect Changed bit on probe.
Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
("Force pciehp, even if OSHP is missing") seems nonsensical because the
OSHP control method is only relevant for SHCP slots according to the
PCI Firmware specification r3.0, sec 4.8.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-20 05:27:50 +07:00
|
|
|
/* Clear all remaining event bits in Slot Status register. */
|
2013-12-15 03:06:47 +07:00
|
|
|
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
|
|
|
|
PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
|
2017-10-14 01:35:47 +07:00
|
|
|
PCI_EXP_SLTSTA_MRLSC | PCI_EXP_SLTSTA_CC |
|
PCI: pciehp: Always enable occupied slot on probe
Per PCIe r4.0, sec 6.7.3.4, a "port may optionally send an MSI when
there are hot-plug events that occur while interrupt generation is
disabled, and interrupt generation is subsequently enabled."
On probe, we currently clear all event bits in the Slot Status register
with the notable exception of the Presence Detect Changed bit. Thereby
we seek to receive an interrupt for an already occupied slot once event
notification is enabled.
But because the interrupt is optional, users may have to specify the
pciehp_force parameter on the command line, which is inconvenient.
Moreover, now that pciehp's event handling has become resilient to
missed events, a Presence Detect Changed interrupt for a slot which is
powered on is interpreted as removal of the card. If the slot has
already been brought up by the BIOS, receiving such an interrupt on
probe causes the slot to be powered off and immediately back on, which
is likewise undesirable.
Avoid both issues by making the behavior of pciehp_force the default and
clearing the Presence Detect Changed bit on probe.
Note that the stated purpose of pciehp_force per the MODULE_PARM_DESC
("Force pciehp, even if OSHP is missing") seems nonsensical because the
OSHP control method is only relevant for SHCP slots according to the
PCI Firmware specification r3.0, sec 4.8.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-20 05:27:50 +07:00
|
|
|
PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC);
|
2007-11-29 06:11:46 +07:00
|
|
|
|
PCI: pciehp: Disable in-band presence detect when possible
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):
Due to architectural issues, the in-band (Physical-Layer-based) portion
of the PD mechanism is deprecated for use with async hot-plug. One issue
is that in-band PD as architected does not detect adapter removal during
certain LTSSM states, notably the L1 and Disabled States. Another issue
is that when both in-band and OOB PD are being used together, the
Presence Detect State bit and its associated interrupt mechanism always
reflect the logical OR of the inband and OOB PD states, and with some
hot-plug hardware configurations, it is important for software to detect
and respond to in-band and OOB PD events independently. If OOB PD is
being used and the associated DSP supports In-Band PD Disable, it is
recommended that the In-Band PD Disable bit be Set, and the Presence
Detect State bit and its associated interrupt mechanism be used
exclusively for OOB PD. As a substitute for in-band PD with async
hot-plug, the reference model uses either the DPC or the DLL Link Active
mechanism.
Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2019-10-26 02:00:45 +07:00
|
|
|
ctrl_info(ctrl, "Slot #%d AttnBtn%c PwrCtrl%c MRL%c AttnInd%c PwrInd%c HotPlug%c Surprise%c Interlock%c NoCompl%c IbPresDis%c LLActRep%c%s\n",
|
2013-12-15 03:06:36 +07:00
|
|
|
(slot_cap & PCI_EXP_SLTCAP_PSN) >> 19,
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_ABP),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_PCP),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_MRLSP),
|
2015-06-16 04:28:29 +07:00
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_AIP),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_PIP),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_HPC),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_HPS),
|
2013-12-15 03:06:36 +07:00
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
|
|
|
|
FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
|
PCI: pciehp: Disable in-band presence detect when possible
The presence detect state (PDS) is normally a logical OR of in-band and
out-of-band (OOB) presence detect. As of PCIe 4.0, there is the option to
disable in-band presence so that the PDS bit always reflects the state of
the out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence whenever
supported (PCIe r5.0, appendix I implementation note):
Due to architectural issues, the in-band (Physical-Layer-based) portion
of the PD mechanism is deprecated for use with async hot-plug. One issue
is that in-band PD as architected does not detect adapter removal during
certain LTSSM states, notably the L1 and Disabled States. Another issue
is that when both in-band and OOB PD are being used together, the
Presence Detect State bit and its associated interrupt mechanism always
reflect the logical OR of the inband and OOB PD states, and with some
hot-plug hardware configurations, it is important for software to detect
and respond to in-band and OOB PD events independently. If OOB PD is
being used and the associated DSP supports In-Band PD Disable, it is
recommended that the In-Band PD Disable bit be Set, and the Presence
Detect State bit and its associated interrupt mechanism be used
exclusively for OOB PD. As a substitute for in-band PD with async
hot-plug, the reference model uses either the DPC or the DLL Link Active
mechanism.
Link: https://lore.kernel.org/r/20191025190047.38130-2-stuart.w.hayes@gmail.com
[bhelgaas: move PCI_EXP_SLTCAP2 read earlier & print PCI_EXP_SLTCAP2_IBPD
value (suggested by Lukas)]
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
2019-10-26 02:00:45 +07:00
|
|
|
FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
|
|
|
|
pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
|
2008-06-20 10:07:08 +07:00
|
|
|
|
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 12:22:00 +07:00
|
|
|
/*
|
|
|
|
* If empty slot's power status is on, turn power off. The IRQ isn't
|
|
|
|
* requested yet, so avoid triggering a notification with this command.
|
|
|
|
*/
|
|
|
|
if (POWER_CTRL(ctrl)) {
|
2018-09-19 02:46:17 +07:00
|
|
|
pciehp_get_power_status(ctrl, &poweron);
|
2018-09-08 14:59:01 +07:00
|
|
|
if (!pciehp_card_present_or_link_active(ctrl) && poweron) {
|
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 12:22:00 +07:00
|
|
|
pcie_disable_notification(ctrl);
|
2018-09-19 02:46:17 +07:00
|
|
|
pciehp_power_off_slot(ctrl);
|
PCI: pciehp: Deduplicate presence check on probe & resume
On driver probe and on resume from system sleep, pciehp checks the
Presence Detect State bit in the Slot Status register to bring up an
occupied slot or bring down an unoccupied slot. Both code paths are
identical, so deduplicate them per Mika's request.
On probe, an additional check is performed to disable power of an
unoccupied slot. This can e.g. happen if power was enabled by BIOS.
It cannot happen once pciehp has taken control, hence is not necessary
on resume: The Slot Control register is set to the same value that it
had on suspend by pci_restore_state(), so if the slot was occupied,
power is enabled and if it wasn't, power is disabled. Should occupancy
have changed during the system sleep transition, power is adjusted by
bringing up or down the slot per the paragraph above.
To allow for deduplication of the presence check, move the power check
to pcie_init(). This seems safer anyway, because right now it is
performed while interrupts are already enabled, and although I can't
think of a scenario where pciehp_power_off_slot() and the IRQ thread
collide, it does feel brittle.
However this means that pcie_init() may now write to the Slot Control
register before the IRQ is requested. If both the CCIE and HPIE bits
happen to be set, pcie_wait_cmd() will wait for an interrupt (instead
of polling the Command Completed bit) and eventually emit a timeout
message. Additionally, if a level-triggered INTx interrupt is used,
the user may see a spurious interrupt splat. Avoid by disabling
interrupts before disabling power. (Normally the HPIE and CCIE bits
should be clear on probe, but conceivably they may already have been
set e.g. by BIOS.)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-07-28 12:22:00 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2008-06-20 10:07:08 +07:00
|
|
|
return ctrl;
|
|
|
|
}
|
|
|
|
|
2009-09-15 15:30:48 +07:00
|
|
|
void pciehp_release_ctrl(struct controller *ctrl)
|
2008-06-20 10:07:08 +07:00
|
|
|
{
|
2018-09-08 14:59:01 +07:00
|
|
|
cancel_delayed_work_sync(&ctrl->button_work);
|
2008-06-20 10:07:08 +07:00
|
|
|
kfree(ctrl);
|
2007-11-29 06:11:46 +07:00
|
|
|
}
|
PCI: pciehp: Add quirk for Command Completed errata
Several PCIe hotplug controllers have errata that mean they do not set the
Command Completed bit unless writes to the Slot Command register change
"Control" bits. Command Completed is never set for writes that only change
software notification "Enable" bits. This results in timeouts like this:
pciehp 0000:00:1c.0:pcie004: Timeout on hotplug command 0x1038 (issued 65284 msec ago)
When this erratum is present, avoid these timeouts by marking commands
"completed" immediately unless they change the "Control" bits.
Here's the text of the Intel erratum CF118. We assume this applies to all
Intel parts:
CF118 PCIe Slot Status Register Command Completed bit not always
updated on any configuration write to the Slot Control
Register
Problem: For PCIe root ports (devices 0 - 10) supporting hot-plug,
the Slot Status Register (offset AAh) Command Completed
(bit[4]) status is updated under the following condition:
IOH will set Command Completed bit after delivering the new
commands written in the Slot Controller register (offset
A8h) to VPP. The IOH detects new commands written in Slot
Control register by checking the change of value for Power
Controller Control (bit[10]), Power Indicator Control
(bits[9:8]), Attention Indicator Control (bits[7:6]), or
Electromechanical Interlock Control (bit[11]) fields. Any
other configuration writes to the Slot Control register
without changing the values of these fields will not cause
Command Completed bit to be set.
The PCIe Base Specification Revision 2.0 or later describes
the “Slot Control Register” in section 7.8.10, as follows
(Reference section 7.8.10, Slot Control Register, Offset
18h). In hot-plug capable Downstream Ports, a write to the
Slot Control register must cause a hot-plug command to be
generated (see Section 6.7.3.2 for details on hot-plug
commands). A write to the Slot Control register in a
Downstream Port that is not hotplug capable must not cause a
hot-plug command to be executed.
The PCIe Spec intended that every write to the Slot Control
Register is a command and expected a command complete status
to abstract the VPP implementation specific nuances from the
OS software. IOH PCIe Slot Control Register implementation
is not fully conforming to the PCIe Specification in this
respect.
Implication: Software checking on the Command Completed status after
writing to the Slot Control register may time out.
Workaround: Software can read the Slot Control register and compare the
existing and new values to determine if it should check the
Command Completed status after writing to the Slot Control
register.
Per Sinan, the Qualcomm QDF2400 controller also does not set the Command
Completed bit unless writes to the Slot Command register change "Control"
bits.
Link: http://www.intel.com/content/www/us/en/processors/xeon/xeon-e7-v2-spec-update.html
Link: https://lkml.kernel.org/r/8770820b-85a0-172b-7230-3a44524e6c9f@molgen.mpg.de
Reported-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Tested-by: Paul Menzel <pmenzel+linux-pci@molgen.mpg.de> # Lenovo X60
Signed-off-by: Sinan Kaya <okaya@codeaurora.org> # Qcom quirk
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2018-05-04 06:39:38 +07:00
|
|
|
|
|
|
|
static void quirk_cmd_compl(struct pci_dev *pdev)
|
|
|
|
{
|
|
|
|
u32 slot_cap;
|
|
|
|
|
|
|
|
if (pci_is_pcie(pdev)) {
|
|
|
|
pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP, &slot_cap);
|
|
|
|
if (slot_cap & PCI_EXP_SLTCAP_HPC &&
|
|
|
|
!(slot_cap & PCI_EXP_SLTCAP_NCCS))
|
|
|
|
pdev->broken_cmd_compl = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
|
|
|
|
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
|
|
|
|
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0400,
|
|
|
|
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
|
|
|
|
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
|
|
|
|
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
|
2018-11-07 14:25:05 +07:00
|
|
|
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HXT, 0x0401,
|
|
|
|
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
|