On a given PCI-E fabric, each device, bridge, and root port can have a
different PCI-E maximum payload size. There is a sizable performance
boost for having the largest possible maximum payload size on each PCI-E
device. However, if improperly configured, fatal bus errors can occur.
Thus, it is important to ensure that PCI-E payloads sends by a device
are never larger than the MPS setting of all devices on the way to the
destination.
This can be achieved two ways:
- A conservative approach is to use the smallest common denominator of
the entire tree below a root complex for every device on that fabric.
This means for example that having a 128 bytes MPS USB controller on one
leg of a switch will dramatically reduce performances of a video card or
10GE adapter on another leg of that same switch.
It also means that any hierarchy supporting hotplug slots (including
expresscard or thunderbolt I suppose, dbl check that) will have to be
entirely clamped to 128 bytes since we cannot predict what will be
plugged into those slots, and we cannot change the MPS on a "live"
system.
- A more optimal way is possible, if it falls within a couple of
constraints:
* The top-level host bridge will never generate packets larger than the
smallest TLP (or if it can be controlled independently from its MPS at
least)
* The device will never generate packets larger than MPS (which can be
configured via MRRS)
* No support of direct PCI-E <-> PCI-E transfers between devices without
some additional code to specifically deal with that case
Then we can use an approach that basically ignores downstream requests
and focuses exclusively on upstream requests. In that case, all we need
to care about is that a device MPS is no larger than its parent MPS,
which allows us to keep all switches/bridges to the max MPS supported by
their parent and eventually the PHB.
In this case, your USB controller would no longer "starve" your 10GE
Ethernet and your hotplug slots won't affect your global MPS.
Additionally, the hotplugged devices themselves can be configured to a
larger MPS up to the value configured in the hotplug bridge.
To choose between the two available options, two PCI kernel boot args
have been added to the PCI calls. "pcie_bus_safe" will provide the
former behavior, while "pcie_bus_perf" will perform the latter behavior.
By default, the latter behavior is used.
NOTE: due to the location of the enablement, each arch will need to add
calls to this function. This patch only enables x86.
This patch includes a number of changes recommended by Benjamin
Herrenschmidt.
Tested-by: Jordan_Hargrave@dell.com
Signed-off-by: Jon Mason <mason@myri.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When setting the PCI-E MRRS, pcie_set_readrq queries the current
settings via a pci_read_config_word call but writes the modified result
via a pci_write_config_dword. This results in writing 16 more bits than
were queried.
Also, the function description comment is slightly incorrect.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The function pci_enable_ari() may mistakenly set the downstream port
of a v1 PCIe switch in ARI Forwarding mode. This is a PCIe v2 feature,
and with an SR-IOV device on that switch port believing the switch above
is ARI capable it may attempt to use functions 8-255, translating into
invalid (non-zero) device numbers for that bus. This has been seen
to cause Completion Timeouts and general misbehaviour including hangs
and panics.
Cc: stable@kernel.org
Acked-by: Don Dutile <ddutile@redhat.com>
Tested-by: Don Dutile <ddutile@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Multiple attempts to dynamically reallocate pci resources have
unfortunately lead to regressions. Though we continue to fix the
regressions and fine tune the dynamic-reallocation behavior, we have not
reached a acceptable state yet.
This patch provides a interim solution. It disables dynamic reallocation
by default, but adds the ability to enable it through pci=realloc kernel
command line parameter.
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI/ACPI: fix type mismatch
PCI: fix new kernel-doc warning
PCI: Fix warning in drivers/pci/probe.c on sparc64
When I added 3448a19da4
I forgot about the special uv handling code for this, so this
patch fixes it up.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Ingo Molnar
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix pci.c kernel-doc warnings:
Warning(drivers/pci/pci.c:3292): No description found for parameter 'flags'
Warning(drivers/pci/pci.c:3292): Excess function parameter 'change_bridge_flags' description in 'pci_set_vga_state'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'drm-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (169 commits)
drivers/gpu/drm/radeon/atom.c: fix warning
drm/radeon/kms: bump kms version number
drm/radeon/kms: properly set num banks for fusion asics
drm/radeon/kms/atom: move dig phy init out of modesetting
drm/radeon/kms/cayman: fix typo in register mask
drm/radeon/kms: fix typo in spread spectrum code
drm/radeon/kms: fix tile_config value reported to userspace on cayman.
drm/radeon/kms: fix incorrect comparison in cayman setup code.
drm/radeon/kms: add wait idle ioctl for eg->cayman
drm/radeon/cayman: setup hdp to invalidate and flush when asked
drm/radeon/evergreen/btc/fusion: setup hdp to invalidate and flush when asked
agp/uninorth: Fix lockups with radeon KMS and >1x.
drm/radeon/kms: the SS_Id field in the LCD table if for LVDS only
drm/radeon/kms: properly set the CLK_REF bit for DCE3 devices
drm/radeon/kms: fixup eDP connector handling
drm/radeon/kms: bail early for eDP in hotplug callback
drm/radeon/kms: simplify hotplug handler logic
drm/radeon/kms: rewrite DP handling
drm/radeon/kms/atom: add support for setting DP panel mode
drm/radeon/kms: atombios.h updates for DP panel mode
...
For KVM device assignment, we'd like to save off the state of a device
prior to passing it to the guest and restore it later. We also want
to allow pci_reset_funciton() to be called while the device is owned
by the guest. This however overwrites and invalidates the struct pci_dev
buffers, so we can't just manually call save and restore. Add generic
interfaces for the saved state to be stored and reloaded back into
struct pci_dev at a later time.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This will allow us to store and load it later.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Latency tolerance reporting allows devices to send messages to the root
complex indicating their latency tolerance for snooped & unsnooped
memory transactions. Add support for enabling & disabling this
feature, along with a routine to set the max latencies a device should
send upstream.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
OBFF (optimized buffer flush/fill), where supported, can help improve
energy efficiency by giving devices information about when interrupts
and other activity will have a reduced power impact. It requires
support from both the device and system (i.e. not only does the device
need to respond to OBFF messages, but the platform must be capable of
generating and routing them to the end point).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add support to allow drivers to enable/disable ID-based ordering. Where
supported, ID-based ordering can significantly improve the latency of
individual requests by preventing them from queueing up behind unrelated
traffic.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pci_pm_reset() function is not a very nice interface due to its
limitations and conditional behavior (e.g. it doesn't affect devices
in low-power states), but it cannot be simply dropped, because
existing device drivers may depend on it. However, its behavior and
limitations should be well documented, so add an appropriate
kerneldoc comment to it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
So in a lot of modern systems, a GPU will always be below a parent bridge that won't share with any other GPUs. This means VGA arbitration on those GPUs can be controlled by using the bridge routing instead of io/mem decodes.
The problem is locating which GPUs share which upstream bridges. This patch attempts to identify all the GPUs which can be controlled via bridges, and ones that can't. This patch endeavours to work out the bridge sharing semantics.
When disabling GPUs via a bridge, it doesn't do irq callbacks or touch the io/mem decodes for the gpu.
Signed-off-by: Dave Airlie <airlied@redhat.com>
v3 -> v2: Moved ASPM enabling logic to pci_set_power_state()
v2 -> v1: Preserved the logic in pci_raw_set_power_state()
: Added ASPM enabling logic after scanning Root Bridge
: http://marc.info/?l=linux-pci&m=130046996216391&w=2
v1 : http://marc.info/?l=linux-pci&m=130013164703283&w=2
The assumption made in commit 41cd766b06
(PCI: Don't enable aspm before drivers have had a chance to veto it) that
pci_enable_device() will result in re-configuring ASPM when aspm_policy is
POWERSAVE is no longer valid. This is due to commit
97c145f7c8 (PCI: read current power state
at enable time) which resets dev->current_state to D0. Due to this the
call to pcie_aspm_pm_state_change() is never made. Note the equality check
(below) that returns early:
./drivers/pci/pci.c: pci_raw_set_pci_power_state()
546 /* Check if we're already there */
547 if (dev->current_state == state)
548 return 0;
Therefore OSPM never configures the PCIe links for ASPM to turn them "on".
Fix it by configuring ASPM from the pci_enable_device() code path. This
also allows a driver such as the e1000e networking driver a chance to
disable ASPM (L0s, L1), if need be, prior to enabling the device. A
driver may perform this action if the device is known to mis-behave
wrt ASPM.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Make wakeup events be reported by the PCI subsystem before attempting to
resume devices or queuing up runtime resume requests for them, because
wakeup events should be reported as soon as they have been detected.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After recent changes related to wakeup events pm_wakeup_event()
automatically checks if the given device is configured to signal wakeup,
so pci_wakeup_event() may be a static inline function calling
pm_wakeup_event() directly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_state only ever returns 0, thus there is no benefit in
having it return any value. Also, a large majority of the callers do
not check the return code of pci_restore_state. Make the
pci_restore_state a void return and avoid the overhead.
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Jon Mason <jon.mason@exar.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When we enable a PCI device, we avoid doing a lot of the initial setup
work if the device's enable count is non-zero. If we don't fetch the
power state though, we may later fail to set up MSI due to the unknown
status. So pick it up before we short circuit the rest due to a
pre-existing enable or mismatched enable/disable pair (as happens with
VGA devices, which are special in a special way).
Tested-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Reported-by: Dave Airlie <airlied@linux.ie>
Tested-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Not all hardware vendors hook up the PME line for legacy PCI devices,
meaning that wakeup events get lost. The only way around this is to poll
the devices to see if their state has changed, so add support for doing
that on legacy PCI devices that aren't part of the core chipset.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Indent the branch of an if.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (30 commits)
PCI: update for owner removal from struct device_attribute
PCI: Fix warnings when CONFIG_DMI unset
PCI: Do not run NVidia quirks related to MSI with MSI disabled
x86/PCI: use for_each_pci_dev()
PCI: use for_each_pci_dev()
PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
PCI: export SMBIOS provided firmware instance and label to sysfs
PCI: Allow read/write access to sysfs I/O port resources
x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN
PCI: remove unused HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_{SIZE|BOUNDARY}
PCI: disable mmio during bar sizing
PCI: MSI: Remove unsafe and unnecessary hardware access
PCI: Default PCIe ASPM control to on and require !EMBEDDED to disable
PCI: kernel oops on access to pci proc file while hot-removal
PCI: pci-sysfs: remove casts from void*
ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe
PCI hotplug: make sure child bridges are enabled at hotplug time
PCI hotplug: shpchp: Removed check for hotplug of display devices
PCI hotplug: pciehp: Fixed return value sign for pciehp_unconfigure_device
PCI: Don't enable aspm before drivers have had a chance to veto it
...
In 2.6.34, we transformed the PCI DMA API into the generic device
mode. The PCI DMA API is just the wrapper of the DMA API.
So we don't need HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_SIZE or
HAVE_ARCH_PCI_SET_DMA_SEGMENT_BOUNDARY (which enable architectures to
have the own implementations). Both haven't been used anyway.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>
virtio-pci resets the device at startup by writing to the status
register, but this does not clear the pci config space,
specifically msi enable status which affects register
layout.
This breaks things like kdump when they try to use e.g. virtio-blk.
Fix by forcing msi off at startup. Since pci.c already has
a routine to do this, we export and use it instead of duplicating code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: linux-pci@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
vlynq: make whole Kconfig-menu dependant on architecture
add descriptive comment for TIF_MEMDIE task flag declaration.
EEPROM: max6875: Header file cleanup
EEPROM: 93cx6: Header file cleanup
EEPROM: Header file cleanup
agp: use NULL instead of 0 when pointer is needed
rtc-v3020: make bitfield unsigned
PCI: make bitfield unsigned
jbd2: use NULL instead of 0 when pointer is needed
cciss: fix shadows sparse warning
doc: inode uses a mutex instead of a semaphore.
uml: i386: Avoid redefinition of NR_syscalls
fix "seperate" typos in comments
cocbalt_lcdfb: correct sections
doc: Change urls for sparse
Powerpc: wii: Fix typo in comment
i2o: cleanup some exit paths
Documentation/: it's -> its where appropriate
UML: Fix compiler warning due to missing task_struct declaration
UML: add kernel.h include to signal.c
...
This fixes all occurrences of pci_enable_device and pci_disable_device
in all comments. There are no code changes involved.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1353) removes a couple of unnecessary assignments from
the PCI core. The should_wakeup flag is naturally initialized to 0;
there's no need to clear it.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If the firmware puts a device back into D0 state at resume time, we'll
update its state in resume_noirq and thus skip the platform resume code.
Calling that code twice should be safe and we ought to avoid getting to
that point anyway, so remove the check and also allow the platform pci
code to be called for D0.
Fixes USB not being powered after resume on recent Lenovo machines.
Acked-by: Alex Chiang <achiang@canonical.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
pcix_get_mmrbc() returns the maximum memory read byte count (mmrbc), if
successful, or an appropriate error value, if not.
Distinguishing errors from correct values and understanding the meaning of an
error can be somewhat confusing in that:
correct values: 512, 1024, 2048, 4096
errors: -EINVAL -22
PCIBIOS_FUNC_NOT_SUPPORTED 0x81
PCIBIOS_BAD_VENDOR_ID 0x83
PCIBIOS_DEVICE_NOT_FOUND 0x86
PCIBIOS_BAD_REGISTER_NUMBER 0x87
PCIBIOS_SET_FAILED 0x88
PCIBIOS_BUFFER_TOO_SMALL 0x89
The PCIBIOS_ errors are returned from the PCI functions generated by the
PCI_OP_READ() and PCI_OP_WRITE() macros.
In a similar manner, pcix_set_mmrbc() also returns the PCIBIOS_ error values
returned from pci_read_config_[word|dword]() and pci_write_config_word().
Following pcix_get_max_mmrbc()'s example, the following patch simply returns
-EINVAL for all PCIBIOS_ errors encountered by pcix_get_mmrbc(), and -EINVAL
or -EIO for those encountered by pcix_set_mmrbc().
This simplification was chosen in light of the fact that none of the current
callers of these functions are interested in the specific type of error
encountered. In the future, should this change, one could simply create a
function that maps each PCIBIOS_ error to a corresponding unique errno value,
which could be called by pcix_get_max_mmrbc(), pcix_get_mmrbc(), and
pcix_set_mmrbc().
Additionally, this patch eliminates some unnecessary variables.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
An e1000 driver on a system with a PCI-X bus was always being returned
a value of 135 from both pcix_get_mmrbc() and pcix_set_mmrbc(). This
value reflects an error return of PCIBIOS_BAD_REGISTER_NUMBER from
pci_bus_read_config_dword(,, cap + PCI_X_CMD,).
This is because for a dword, the following portion of the PCI_OP_READ()
macro:
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER;
expands to:
if (pos & 3) return PCIBIOS_BAD_REGISTER_NUMBER;
And is always true for 'cap + PCI_X_CMD', which is 0xe4 + 2 = 0xe6. ('cap' is
the result of calling pci_find_capability(, PCI_CAP_ID_PCIX).)
The same problem exists for pci_bus_write_config_dword(,, cap + PCI_X_CMD,).
In both cases, instead of calling _dword(), _word() should be called.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When pci_register_set_vga_state() was made __init, the EXPORT_SYMBOL() was
retained, which now leaves us with a section mismatch.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For the PCI_X_STATUS register, pcix_get_max_mmrbc() is returning an incorrect
value, which is based on:
(stat & PCI_X_STATUS_MAX_READ) >> 12
Valid return values are 512, 1024, 2048, 4096, which correspond to a 'stat'
(masked and right shifted by 21) of 0, 1, 2, 3, respectively.
A right shift by 11 would generate the correct return value when 'stat' (masked
and right shifted by 21) has a value of 1 or 2. But for a value of 0 or 3 it's
not possible to generate the correct return value by only right shifting.
Fix is based on pcix_get_mmrbc()'s similar dealings with the PCI_X_CMD register.
Cc: stable@kernel.org
Signed-off-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
We can use pci-dma-compat.h to implement pci_set_dma_mask and
pci_set_consistent_dma_mask as we do with the other PCI DMA API.
We can remove HAVE_ARCH_PCI_SET_DMA_MASK too.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_set_coherent_mask corresponds to pci_set_consistent_dma_mask. This is
necessary to move to the generic device model DMA API from the PCI bus
specific API in the long term.
dma_set_coherent_mask works in the exact same way that
pci_set_consistent_dma_mask does. So this patch also changes
pci_set_consistent_dma_mask to call dma_set_coherent_mask.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This changes pci_set_dma_mask to call the generic DMA API, dma_set_mask.
pci_set_dma_mask (in drivers/pci/pci.c) does the same things that
dma_set_mask does on all the architectures that use pci_set_dma_mask;
calls dma_supprted and sets dev->dma_mask. So we safely change
pci_set_dma_mask to simply call dma_set_mask.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the future, we are going to be changing the lock type for struct
device (once we get the lockdep infrastructure properly worked out) To
make that changeover easier, and to possibly burry the lock in a
different part of struct device, let's create some functions to lock and
unlock a device so that no out-of-core code needs to be changed in the
future.
This patch creates the device_lock/unlock/trylock() functions, and
converts all in-tree users to them.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Alex Chiang <achiang@hp.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Yu Zhao <yu.zhao@intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Cc: CHENG Renquan <rqcheng@smu.edu.sg>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: Frans Pop <elendil@planet.nl>
Cc: David Vrabel <david.vrabel@csr.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Make the run-time power management of PCI devices be inactive by
default by calling pm_runtime_forbid() for each PCI device during its
initialization. This setting may be overriden by the user space with
the help of the /sys/devices/.../power/control interface.
That's necessary to avoid breakage on systems where ACPI-based
wake-up is known to fail for some devices.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Enable NMI on all cpus on UV
vgaarb: Add user selectability of the number of GPUS in a system
vgaarb: Fix VGA arbiter to accept PCI domains other than 0
x86, uv: Update UV arch to target Legacy VGA I/O correctly.
pci: Update pci_set_vga_state() to call arch functions
Set power.async_suspend for all PCI devices and PCIe port services,
so that they can be suspended and resumed in parallel with other
devices they don't depend on in a known way (i.e. devices which are
not their parents or children).
This only affects the "regular" suspend and resume stages, which
means in particular that the restoration of the PCI devices' standard
configuration registers during resume will still be carried out
synchronously (at the "early" resume stage).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
No functional change; this converts loops that iterate from 0 to
PCI_BUS_NUM_RESOURCES through pci_bus resource[] table to use the
pci_bus_for_each_resource() iterator instead.
This doesn't change the way resources are stored; it merely removes
dependencies on the fact that they're in a table.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce run-time PM callbacks for the PCI bus type. Make the new
callbacks work in analogy with the existing system sleep PM
callbacks, so that the drivers already converted to struct dev_pm_ops
can use their suspend and resume routines for run-time PM without
modifications.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Although the majority of PCI devices can generate PMEs that in
principle may be used to wake up devices suspended at run time,
platform support is generally necessary to convert PMEs into wake-up
events that can be delivered to the kernel. If ACPI is used for this
purpose, PME signals generated by a PCI device will trigger the ACPI
GPE associated with the device to generate an ACPI wake-up event that
we can set up a handler for, provided that everything is configured
correctly.
Unfortunately, the subset of PCI devices that have GPEs associated
with them is quite limited. The devices without dedicated GPEs have
to rely on the GPEs associated with other devices (in the majority of
cases their upstream bridges and, possibly, the root bridge) to
generate ACPI wake-up events in response to PME signals from them.
Add ACPI platform support for PCI PME wake-up:
o Add a framework making is possible to use ACPI system notify
handlers for run-time PM.
o Add new PCI platform callback ->run_wake() to struct
pci_platform_pm_ops allowing us to enable/disable the platform to
generate wake-up events for given device. Implemet this callback
for the ACPI platform.
o Define ACPI wake-up handlers for PCI devices and PCI root buses and
make the PCI-ACPI binding code register wake-up notifiers for all
PCI devices present in the ACPI tables.
o Add function pci_dev_run_wake() which can be used by PCI drivers to
check if given device is capable of generating wake-up events at
run time.
Developed in cooperation with Matthew Garrett <mjg@redhat.com>.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add function pci_check_pme_status() that will check the PME status
bit of given device and clear it along with the PME enable bit. It
will be necessary for PCI run-time power management.
Based on a patch from Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, drivers/pci/quirks.c is built unconditionally, but if
CONFIG_PCI_QUIRKS is unset, the only things actually built in this
file are definitions of global variables and empty functions (due to
the #ifdef CONFIG_PCI_QUIRKS embracing all of the code inside the
file). This is not particularly nice and if someone overlooks
the #ifdef CONFIG_PCI_QUIRKS, build errors are introduced.
To clean that up, move the definitions of the global variables in
quirks.c that are always built to pci.c, move the definitions of
the empty functions (compiled when CONFIG_PCI_QUIRKS is unset) to
headers (additionally make these functions static inline) and modify
drivers/pci/Makefile so that quirks.c is only built if
CONFIG_PCI_QUIRKS is set.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For use by code that needs to walk extended capability lists before
pci_dev structures are set up.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFD@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Update pci_set_vga_state to call arch dependent functions to enable Legacy
VGA I/O transactions to be redirected to correct target.
[akpm@linux-foundation.org: make pci_register_set_vga_state() __init]
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McE1J018723@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
It turns out that some PCI devices require extra delays when changing
power state from D3 to D0 (and the other way around). Although this
is against the PCI specification, we can handle it quite easily by
allowing drivers to define arbitrary D3 delays for devices known to
require extra time for switching power states.
Introduce additional field d3_delay in struct pci_dev and use it to
store the value of the device's D0->D3 delay, in miliseconds. Make
the PCI PM core code use the per-device d3_delay unless
pci_pm_d3_delay is greater (in which case the latter is used).
[This also allows the driver to specify d3_delay shorter than the
10 ms required by the PCI standard if the device is known to be able
to handle that.]
Make the sky2 driver set d3_delay to 150 for devices handled by it.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14730 which is a
listed regression from 2.6.30.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
After commit b9c3b26641 ("PCI: support
device-specific reset methods") the kernel build is broken if
CONFIG_PCI_QUIRKS is unset.
Fix this by moving pci_dev_specific_reset() to drivers/pci/quirks.c and
providing an empty replacement for !CONFIG_PCI_QUIRKS builds.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cardbus code creates PCI devices without ever going through the
necessary fixup bits and pieces that normal PCI devices go through.
There's in fact a commented out call to pcibios_fixup_bus() in there,
it's commented because ... it doesn't work.
I could make pcibios_fixup_bus() do the right thing on powerpc easily
but I felt it cleaner instead to provide a specific hook pci_fixup_cardbus
for which a weak empty implementation is provided by the PCI core.
This fixes cardbus on powerbooks and probably all other PowerPC
platforms which was broken completely for ever on some platforms and
since 2.6.31 on others such as PowerBooks when we made the DMA ops
mandatory (since those are setup by the fixups).
Acked-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a new type of quirk for resetting devices at pci_dev_reset time.
This is necessary to handle device with nonstandard reset procedures,
especially useful for guest drivers.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove a stray space in pci_save_state().
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Commit ae21ee65e8 "PCI: acs p2p upsteram
forwarding enabling" doesn't actually enable ACS.
Add a function to pci core to allow an IOMMU to request that ACS
be enabled. The existing mechanism of using iommu_found() in the pci
core to know when ACS should be enabled doesn't actually work due to
initialization order; iommu has only been detected not initialized.
Have Intel and AMD IOMMUs request ACS, and Xen does as well during early
init of dom0.
Cc: Allen Kay <allen.m.kay@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The pcie_flr routine writes the device control register with the FLR bit
set clearing all other fields for the FLR duration. Among other fields,
the Max_Payload_Size is also cleared which can cause errors if there are
transactions lurking in the HW pipeline. The patch replaces the blank
write with read-modify-write of the control register keeping the other
fields intact.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Change for PCI core to use pci_is_pcie() instead of checking
pci_dev->is_pcie.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:
- we currently pick the _first_ exactly matching bus resource entry, but
the _last_ inexactly matching one. Normally first/last shouldn't
matter, but bus resource entries aren't actually all created equal: in
a transparent bus, the last resources will be the parent resources,
which we should generally try to avoid unless we have no choice. So
"first matching" is the thing we should always aim for.
- the patch is a bit bigger than it needs to be, because I simplified the
logic at the same time. It used to be a fairly incomprehensible
if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
best = r; /* Approximating prefetchable by non-prefetchable */
and technically, all the patch did was to make that complex choice be
even more complex (it basically added a "&& !best" to say that if we
already gound a non-prefetchable window for the prefetchable resource,
then we won't override an earlier one with that later one: remember
"first matching").
- So instead of that complex one with three separate conditionals in one,
I split it up a bit, and am taking advantage of the fact that we
already handled the exact case, so if 'res->flags' has the PREFETCH
bit, then we already know that 'r->flags' will _not_ have it. So the
simplified code drops the redundant test, and does the new '!best' test
separately. It also uses 'continue' as a way to ignore the bus
resource we know doesn't work (ie a prefetchable bus resource is _not_
acceptable for anything but an exact match), so it turns into:
/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
if (r->flags & IORESOURCE_PREFETCH)
continue;
/* .. but we can put a prefetchable resource inside a non-prefetchable one */
if (!best)
best = r;
instead. With the comments, it's now six lines instead of two, but it's
conceptually simpler, and I _could_ have written it as two lines:
if ((res->flags & IORESOURCE_PREFETCH) && !best)
best = r; /* Approximating prefetchable by non-prefetchable */
but I thought that was too damn subtle.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.
Make the lock static while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.
Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Messages about PME# being supported and enabled/disabled are probably
useful for debug, but maybe don't need to be on the console.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Jesse accidentally applied v1 [1] of the patchset instead of v2 [2]. This
is the diff between v1 and v2.
The changes in this patch are:
- tidied vsprintf stack buffer to shrink and compute size more
accurately
- use %pR for decoding and %pr for "raw" (with type and flags) instead
of adding %pRt and %pRf
[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Note: dom0 checking in v4 has been separated out into 2/2.
This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:
1) Unintentional failure caused by guest physical address programmed
into the device's DMA that happens to match the memory address range
of other downstream ports in the same PCIe switch. This causes the PCI
transaction to go to the matching downstream port instead of go to the
root complex to get translated by VT-d as it should be.
2) Malicious guest software intentionally attacks another downstream
PCIe device by programming the DMA address into the assigned device
that matches memory address range of the downstream PCIe port.
We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled. This patch is intended to work for both KVM and Xen environments.
Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_dfl_cache_line_size is marked as __initdata but referenced by
pci_init() which is __devinit. Make it __devinitdata instead of
__initdata.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For non hotplug PCI devices, the system firmware usually configures
CLS correctly. For pccard devices system firmware can't do it and
Linux PCI layer doesn't do it either. Unfortunately this leads to
poor performance for certain devices (sata_sil). Unless MWI, which
requires separate configuration, is to be used, CLS doesn't affect
correctness, so the configuration should be harmless.
This patch makes pci_set_cacheline_size() always built and export it
and make pccard call it during attach.
Please note that some other PCI hotplug drivers (shpchp and pciehp)
also configure CLS on hotplug.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <greg@kroah.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: Axel Birndt <towerlexa@gmx.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
sparc64 is now the only user of PCI_CACHE_LINE_BYTES. Drop it and set
pci_dfl_cache_line_size from pcibios_init() instead and drop
PCI_CACHE_LINE_BYTES handling from generic pci code.
Orignally-From: David Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES. Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right. On most configurations, the chance is that
firmware configures the correct value during boot.
This patch makes pci_init() determine CLS by looking at what firmware
has configured. It scans all devices and if all non-zero values
agree, the value is used. If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used. arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.
ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.
While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* git://git.infradead.org/~dwmw2/iommu-2.6.32:
x86: Move pci_iommu_init to rootfs_initcall()
Run pci_apply_final_quirks() sooner.
Mark pci_apply_final_quirks() __init rather than __devinit
Rename pci_init() to pci_apply_final_quirks(), move it to quirks.c
intel-iommu: Yet another BIOS workaround: Isoch DMAR unit with no TLB space
intel-iommu: Decode (and ignore) RHSA entries
intel-iommu: Make "Unknown DMAR structure" message more informative
This function may have done more in the past, but all it does now is
apply the PCI_FIXUP_FINAL quirks. So name it sensibly and put it where
it belongs.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
After attempting to change the power state of a PCI device
pci_raw_set_power_state() doesn't check if the value it wrote into
the device's PCI_PM_CTRL register has been stored in there, but
unconditionally modifies the device's current_state field to reflect
the change. This may cause problems to happen if the power state of
the device hasn't been changed in fact, because it will make the PCI
PM core make a wrong assumption.
To prevent such situations from happening modify
pci_raw_set_power_state() so that it reads the device's PCI_PM_CTRL
register after writing into it and uses the value read from the
register to update the device's current_state field. Also make it
print a message saying that the device refused to change its power
state as requested (returning an error code in such cases would cause
suspend regressions to appear on some systems, where device drivers'
suspend routines return error codes if pci_set_power_state() fails).
Reviewed-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some PCI devices fail if their standard configuration registers are
restored twice in a row. Prevent this from happening by making
pci_restore_state() clear the saved_state flag of the device right
after the device's standard configuration registers have been
populated with the previously saved values.
Simplify PCI PM callbacks by removing the direct clearing of
state_saved from them, as it shouldn't be necessary any more (except
in pci_pm_thaw(), where it has to be cleared, so that the values saved
during the "freeze" phase of hibernation are not used later by mistake).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce a new PCI device flag, wakeup_prepared, to prevent PCI
wake-up preparation code from being executed twice in a row for the
same device and for the same purpose.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Rework the PCI wake-up code so that it's easier to read without
changing the functionality.
Reviewed-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
In general a BIOS may goof or we may hotplug in a hotplug controller.
In either case the kernel needs to reserve resources for plugging
in more devices in the future instead of creating a minimal resource
assignment.
We already do this for cardbus bridges I am just adding a variant
for pcie bridges.
v2: Make testing for pcie hotplug bridges based on a flag.
So far we only set the flag for pcie but a header_quirk
could easily be added for the non-standard pci hotplug
bridges.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Background:
Graphic devices are accessed through ranges in I/O or memory space. While most
modern devices allow relocation of such ranges, some "Legacy" VGA devices
implemented on PCI will typically have the same "hard-decoded" addresses as
they did on ISA. For more details see "PCI Bus Binding to IEEE Std 1275-1994
Standard for Boot (Initialization Configuration) Firmware Revision 2.1"
Section 7, Legacy Devices.
The Resource Access Control (RAC) module inside the X server currently does
the task of arbitration when more than one legacy device co-exists on the same
machine. But the problem happens when these devices are trying to be accessed
by different userspace clients (e.g. two server in parallel). Their address
assignments conflict. Therefore an arbitration scheme _outside_ of the X
server is needed to control the sharing of these resources. This document
introduces the operation of the VGA arbiter implemented for Linux kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some devices allow an individual function to be reset without affecting
other functions in the same device: that's what pci_reset_function does.
For devices that have this support, expose reset attribite in sysfs.
This is useful e.g. for virtualization, where a qemu userspace
process wants to reset the device when the guest is reset,
to emulate machine reboot as closely as possible.
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Without the check, the config space may be filled with zeros. Though
the driver should try to avoid call restoring before saving, but the
pci layer also should check this.
Also removes the existing check in pci_restore_standard_config, since
it's superfluous with the new check in restore_state.
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Alek Du <alek.du@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For many purposes, including interrupt-swizzling, devices with ARI
enabled behave as if they have one device (number 0) and 256 functions.
This probably hasn't bitten us in practice because all ARI devices I've
seen are also IOV devices, and IOV devices are required to use MSI.
This isn't guaranteed, and there are legitimate reasons to use ARI
without IOV, and hence potentially use pin-based interrupts.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For devices attached to the root bus, we can't trigger Secondary Bus
Reset because there is no bridge device associated with the bus. So
need to check bus->self again NULL first before using it.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (74 commits)
PCI: make msi_free_irqs() to use msix_mask_irq() instead of open coded write
PCI: Fix the NIU MSI-X problem in a better way
PCI ASPM: remove get_root_port_link
PCI ASPM: cleanup pcie_aspm_sanity_check
PCI ASPM: remove has_switch field
PCI ASPM: cleanup calc_Lx_latency
PCI ASPM: cleanup pcie_aspm_get_cap_device
PCI ASPM: cleanup clkpm checks
PCI ASPM: cleanup __pcie_aspm_check_state_one
PCI ASPM: cleanup initialization
PCI ASPM: cleanup change input argument of aspm functions
PCI ASPM: cleanup misc in struct pcie_link_state
PCI ASPM: cleanup clkpm state in struct pcie_link_state
PCI ASPM: cleanup latency field in struct pcie_link_state
PCI ASPM: cleanup aspm state field in struct pcie_link_state
PCI ASPM: fix typo in struct pcie_link_state
PCI: drivers/pci/slot.c should depend on CONFIG_SYSFS
PCI: remove redundant __msi_set_enable()
PCI PM: consistently use type bool for wake enable variable
x86/ACPI: Correct maximum allowed _CRS returned resources and warn if exceeded
...
Other functions use type bool, so use that for pci_enable_wake as well.
Signed-off-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a PCI device is not power-manageable either by the platform, or
with the help of the native PCI PM interface, pci_target_state() will
return either PCI_D3hot, or PCI_POWER_ERROR for it, depending on
whether or not the device is configured to wake up the system. Alas,
none of these return values is correct, because each of them causes
pci_prepare_to_sleep() to return error code, although it should
complete successfully in such a case.
Fix this problem by making pci_target_state() always return PCI_D0
for devices that cannot be power managed.
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI-to-PCI Bridge 1.2 specifies that the Secondary Bus Reset bit can
force the assertion of RST# on the secondary interface, which can be
used to reset all devices including subordinates under this bus. This
can be used to reset a function if this function is the only device
under this bus.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI PM 1.2 specifies that the device will perform an internal reset upon
transitioning from D3hot to D0 when the NO_SOFT_RESET bit is clear. This
method can be used to reset a function if neither PCIe FLR nor PCI AF FLR
are supported.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch enhances the FLR functions:
1) remove disable_irq() so the shared IRQ won't be disabled.
2) replace the 1s wait with 100, 200 and 400ms wait intervals
for the Pending Transaction.
3) replace mdelay() with msleep().
4) add might_sleep().
5) lock the device to prevent PM suspend from accessing the CSRs
during the reset.
6) coding style fixes.
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_is_root_bus() in pci_common_swizzle() for checking if the pci
bus is root, for code consistency.
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Use pci_is_root_bus() in pci_get_interrupt_pin() for checking if the
pci bus is root, for code consistency.
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1235) adds an array of PCI power-state names, together
with a simple inline accessor routine.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds support for PCI Express transaction layer end-to-end CRC checking
(ECRC). This patch will enable/disable ECRC checking by setting/clearing
the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
support ECRC.
The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
option. If this option is not set or is set to 'bios", the enable and
generation bits are left in whatever state that firmware/BIOS set them to.
The "off" setting turns them off, and the "on" option turns them on (if the
device supports it).
Turning ECRC on or off can be a data integrity versus performance
tradeoff. In theory, turning it on will catch more data errors, turning
it off means possibly better performance since CRC does not need to be
calculated by the PCIe hardware and packet sizes are reduced.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
According to the PCI PM specification (PCI Bus Power Management
Interface Specification, Rev. 1.2, Section 5.4.1) we are supposed to
reinitialize devices that have PCI_PM_CTRL_NO_SOFT_RESET clear during
all transitions from PCI_D3hot to PCI_D0, but we only do it if the
device's current_state field is equal to PCI_UNKNOWN.
This may lead to problems if a device with PCI_PM_CTRL_NO_SOFT_RESET
unset is put into PCI_D3hot at run time by its driver and
pci_set_power_state() is used to put it back into PCI_D0, because in
that case the device will remain uninitialized after
pci_set_power_state() has returned. Prevent that from happening by
modifying pci_raw_set_power_state() to reinitialize devices with
PCI_PM_CTRL_NO_SOFT_RESET unset during all transitions from D3 to D0.
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Recent PCI PM changes introduced a bug that causes some devices to be
mishandled after kexec and during early initialization. The failure
scenario in the kexec case is the following:
* Assume a PCI device is not power-manageable by the platform and has
PCI_PM_CTRL_NO_SOFT_RESET set in PMCSR.
* The device is put into D3 before kexec (using the native PCI PM).
* After kexec, pci_setup_device() sets the device's power state to
PCI_UNKNOWN.
* pci_set_power_state(dev, PCI_D0) is called by the device's driver.
* __pci_start_power_transition(dev, PCI_D0) is called and since the
device is not power-manageable by the platform, it causes
pci_update_current_state(dev, PCI_D0) to be called. As a result
the device's current_state field is updated to PCI_D3, in
accordance with the contents of its PCI PM registers.
* pci_raw_set_power_state() is called and it changes the device power
state to D0. *However*, it should also call pci_restore_bars() to
reinitialize the device, but it doesn't, because the device's
current_state field has been modified earlier.
To prevent this from happening, modify pci_platform_power_transition()
so that it doesn't use pci_update_current_state() to update the
current_state field for devices that aren't power-manageable by the
platform. Instead, this field should be updated directly for devices
that don't support the native PCI PM.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCIe 1.1 base neither requires the endpoint to implement the entire
PCIe capability structure nor specifies default values of registers
that are not implemented by the device. So we only save and restore
registers that must be implemented by different device types if the
device PCIe capability version is 1.
PCIe 1.1 Capability Structure Expansion ECN and PCIe 2.0 requires
all registers in the PCIe capability to be either implemented or
hardwired to 0. Their PCIe capability version is 2.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch sets up disabled bridges even if buses have already been
added.
pci_assign_unassigned_resources is called after buses are added.
pci_assign_unassigned_resources calls pci_bus_assign_resources.
pci_bus_assign_resources calls pci_setup_bridge to configure BARs of
bridges.
Currently pci_setup_bridge returns immediately if the bus have already
been added. So pci_assign_unassigned_resources can't configure BARs of
bridges that were added in a disabled state; this patch fixes the issue.
On logical hot-add, we need to prevent the kernel from re-initializing
bridges that have already been initialized. To achieve this,
pci_setup_bridge returns immediately if the bridge have already been
enabled.
We don't need to check whether the specified bus is a root bus or not.
pci_setup_bridge is not called on a root bus, because a root bus does
not have a bridge.
The patch adds a new helper function, pci_is_enabled. I made the
function name similar to pci_is_managed. The codes which use
enable_cnt directly are changed to use pci_is_enabled.
Acked-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
trivial: Update my email address
trivial: NULL noise: drivers/mtd/tests/mtd_*test.c
trivial: NULL noise: drivers/media/dvb/frontends/drx397xD_fw.h
trivial: Fix misspelling of "Celsius".
trivial: remove unused variable 'path' in alloc_file()
trivial: fix a pdlfush -> pdflush typo in comment
trivial: jbd header comment typo fix for JBD_PARANOID_IOFAIL
trivial: wusb: Storage class should be before const qualifier
trivial: drivers/char/bsr.c: Storage class should be before const qualifier
trivial: h8300: Storage class should be before const qualifier
trivial: fix where cgroup documentation is not correctly referred to
trivial: Give the right path in Documentation example
trivial: MTD: remove EOL from MODULE_DESCRIPTION
trivial: Fix typo in bio_split()'s documentation
trivial: PWM: fix of #endif comment
trivial: fix typos/grammar errors in Kconfig texts
trivial: Fix misspelling of firmware
trivial: cgroups: documentation typo and spelling corrections
trivial: Update contact info for Jochen Hein
trivial: fix typo "resgister" -> "register"
...
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
If the device is not supposed to wake up the system, ie. when
device_may_wakeup(&dev->dev) returns 'false', pci_prepare_to_sleep()
should pass 'false' to pci_enable_wake() so that it calls the
platform to disable the wake-up capability of the device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The radeonfb driver needs to program the device's PMCSR directly due
to some quirky hardware it has to handle (see
http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
after doing that it needs to call the platform (usually ACPI) to
finish the power transition of the device. Currently it uses
pci_set_power_state() for this purpose, however making a specific
assumption about the internal behavior of this function, which has
changed recently so that this assumption is no longer satisfied.
For this reason, introduce __pci_complete_power_transition() that may
be called by the radeonfb driver to complete the power transition of
the device. For symmetry, introduce __pci_start_power_transition().
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a problem with PCI devices without any PM support (either
native or through the platform) that pci_set_power_state() always
returns error code for them, even if they are being put into D0.
However, such devices are always in D0, so pci_set_power_state()
should return success when attempting to put such a device into D0.
It also should update the current_state field for these devices as
appropriate. This modification is necessary so that the standard
configuration registers of these devices are successfully restored by
pci_restore_standard_config() during the "early" phase of resume.
In addition, pci_set_power_state() should check the value of
current_state before calling the platform to change the power state
of the device to avoid doing that unnecessarily.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Move pci_restore_standard_config() from pci.c to pci-driver.c and
make it static.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Once we have allowed timer interrupts to be enabled during the early
phase of resuming devices, we are now able to use the generic
pci_set_power_state() to put PCI devices into D0 at that time. Then,
the platform-specific PM code will have a chance to handle devices
that don't implement the native PCI PM or that require some
additional, platform-specific operations to be carried out to power
them up. Also, by doing this we can simplify the code quite a bit.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCIe 2.0 defines several new registers (Device Control 2, Link Control 2,
and Slot Control 2). Save and retore them in pci_save_pcie_state() and
pci_restore_pcie_state().
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Restore the volatile registers in the SR-IOV capability after the
D3->D0 transition.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If a device has the SR-IOV capability, initialize it (set the ARI
Capable Hierarchy in the lowest numbered PF if necessary; calculate
the System Page Size for the VF MMIO, probe the VF Offset, Stride
and BARs). A lock for the VF bus allocation is also initialized if
a PF is the lowest numbered PF.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch allows memory resources to be assigned with a specified
alignment at boot-time or run-time. The patch is useful when we use PCI
pass-through, because page-aligned memory resources are required to
securely share PCI resources with guest drivers.
If you want to assign the resource at boot time, please set
"pci=resource_alignment=" boot parameter.
This is format of "pci=resource_alignment=" boot parameter:
[<order of align>@][<domain>:]<bus>:<slot>.<func>[; ...]
Specifies alignment and device to reassign
aligned memory resources.
If <order of align> is not specified, PAGE_SIZE is
used as alignment.
PCI-PCI bridge can be specified, if resource
windows need to be expanded.
This is example:
pci=resource_alignment=20@07:00.0;18@0f:00.0;00:1d.7
If you want to assign the resource at run-time, please set
"/sys/bus/pci/resource_alignment" file, and hot-remove the device and
hot-add the device. For this purpose, fakephp or PCI hotplug interfaces
can be used.
The format of "/sys/bus/pci/resource_alignment" file is the same with
boot parameter. You can use "," instead of ";".
For example:
# cd /sys/bus/pci
# echo -n 20@12:00.0 > resource_alignment
# echo 1 > devices/0000:12:00.0/remove
# echo 1 > rescan
Reviewed-by: Alex Chiang <achiang@hp.com>
Reviewed-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Yuji Shimada <shimada-yxb@necst.nec.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_common_swizzle() seems to have a assumption that
pci_bus->self is NULL on the pci root bus. But it might not be true on
some platforms. Because of this wrong assumption, pci_common_swizzle()
might cause endless loop. We must check pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Current pci_get_interrupt_pin() seems to have an assumption that
pci_bus->self is NULL on the root pci bus. But it might not be true on
some platforms. Because of this wrong assumption, current
pci_get_interrupt_pin() might cause endless loop. We must check
pci_bus->parent instead.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
For all devices need to do function level reset, currently we need wait for
at least 200ms, which can be too long if we have lots of devices...
The patch checked pending bit before msleep() to skip some unnecessary
sleeping interval.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix pci kernel-doc parameter missing notation, correct
function name, and fix typo:
Warning(linux-2.6.28-git10//drivers/pci/pci.c:1511): No description found for parameter 'exclusive'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_standard_config() unconditionally changes current_state
to PCI_D0 after attempting to change the device's power state, but
it should rather read the actual current power state from the
device.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Check if the standard configuration registers of a PCI device have
been saved during suspend before trying to restore them during
resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reported-By: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
pci_restore_standard_config() adds extra delay for PCI buses in
low power states (B2 or B3), but this is only correct for buses in
B2, because the buses in B3 are reset when they are put back into
B0. Thus we should wait for such buses to settle after the reset,
but it's not a good idea to wait that long (1.1 s) with interrupts
off.
On the other hand, we have never waited for buses in B2 and B3
during resume and it seems reasonable to go back to this well
tested behaviour.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Devices that have MSI-X enabled before suspend to RAM or hibernation
and that are in a low power state during resume will not be handled
correctly by pci_restore_standard_config(). Namely, it first calls
pci_restore_state() which calls pci_restore_msi_state(), which in turn
executes __pci_restore_msix_state() that accesses the device's memory
space to restore the contents of the MSI-X table. However, if the
device is in a low power state at this point, it's memory space is
not accessible.
The easiest way to fix this potential problem is to make
pci_restore_standard_config() call pci_restore_state() after
it has put the device into the full power state, D0. Fortunately,
all of this is done with interrupts off, so the change of ordering
should not cause any trouble.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
There is a problem in our handling of suspend-resume of PCI devices that
many of them have their standard config registers restored with
interrupts enabled and they are put into the full power state with
interrupts enabled as well. This may lead to the following scenario:
* an interrupt vector is shared between two or more devices
* one device is resumed earlier and generates an interrupt
* the interrupt handler of another device tries to handle it and
attempts to access the device the config space of which hasn't been
restored yet and/or which still is in a low power state
* the system crashes as a result
To prevent this from happening we should restore the standard
configuration registers of all devices with interrupts disabled and we
should put them into the D0 power state right after that.
Unfortunately, this cannot be done using the existing
pci_set_power_state(), because it can sleep. Also, to do it we have to
make sure that the config spaces of all devices were actually saved
during suspend.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This reverts commit 98e6e286d7, as Yinghai
Lu reports that it breaks kexec with at least the e1000 and e1000e
drivers. The reason is that the shutdown sequence puts the hardware
into D3 sleep, and the commit causes us to claim that it then is in D0
(running) state just because we don't understand the PM capabilities.
Which then later makes "pci_set_power_state()" not do anything, and the
device never wakes up properly and just returns 0xff to everything.
Reported-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: From: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Jesse Barnes <jesse.barnes@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the observation that the power state of a PCI device can be
loaded into its pci_dev structure as soon as pci_pm_init() is run for
it and make that happen.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It generally is better to avoid accessing devices behind bridges that
may not be in the D0 power state, because in that case the bridges'
secondary buses may not be accessible. For this reason, during the
early phase of resume (ie. with interrupts disabled), before
restoring the standard config registers of a device, check the power
state of the bridge the device is behind and postpone the restoration
of the device's config space, as well as any other operations that
would involve accessing the device, if that state is not D0.
In such cases the restoration of the device's config space will be
retried during the "normal" phase of resume (ie. with interrupts
enabled), so that the bridge can be put into D0 before that happens.
Also, save standard configuration registers of PCI devices during the
"normal" phase of suspend (ie. with interrupts enabled), so that the
bridges the devices are behind can be put into low power states (we
don't put bridges into low power states at the moment, but we may
want to do it in the future and it seems reasonable to design for
that).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
PCI devices without drivers are not disabled during suspend and
hibernation, but they are enabled during resume, with the help of
pci_reenable_device(), so there is an unbalanced execution of
pcibios_enable_device() in the resume code path.
To correct this introduce function pci_disable_enabled_device()
that will disable the argument device, if it is enabled when the
function is being run, without updating the device's pci_dev
structure and use it in the suspend code path to balance the
pci_reenable_device() executed during resume.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
During an online device reset it may be useful to disable bus-mastering.
pci_disable_device() does that, and far more besides, so is not suitable
for an online reset.
Add pci_clear_master() which does just this.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds pci_common_swizzle(), which swizzles INTx values all the
way up to a root bridge.
This common implementation can replace several architecture-specific
ones. This should someday be combined with pci_get_interrupt_pin(),
but I left it separate for now to make reviewing easier.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently, PCI devices without the PM capability that are power
manageable by the platform (eg. ACPI) are not handled correctly
by pci_set_power_state(), because their current_state field is not
updated to reflect the new power state of the device. Fix this by
making pci_update_current_state() accept additional argument
representing the power state of the device as set by the platform.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When PCI devices are initialized, we check whether they support PCI PM
caps and set the device can_wakeup flag if so. However, some devices
may have platform provided wakeup events rather than PCI PME signals, so
we need to set can_wakeup in that case too. Doing so should allow
wakeups from many more devices, especially on cost constrained systems.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Joseph Chan <JosephChan@via.com.tw>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Add a function to map a given resource number to a corresponding
register so drivers can get the offset and type of device specific BARs.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Remove the unnecessary number of resources condition checks because
the pci_update_resource() will check availability of the resources.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This cleanup removes unnecessary argument 'struct resource *res' in
pci_update_resource(), so it takes same arguments as other companion
functions (pci_assign_resource(), etc.).
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
It's too large to be inlined.
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch (as1186) fixes a minor mistake in pci_enable_wake(). When
the routine is asked to disable remote wakeup, it should not return an
error merely because the device is not allowed to do wakeups!
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds pci_swizzle_interrupt_pin(), which implements the
INTx swizzling algorithm specified in Table 9-1 of the "PCI-to-PCI
Bridge Architecture Specification," revision 1.2.
There are many architecture-specific implementations of this
swizzle that can be replaced by this common one.
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch makes pci_get_interrupt_pin() return values encoded
the same way as the "Interrupt Pin" value in PCI config space,
i.e., 1=INTA, ..., 4=INTD.
pirq_bios_set() is the only in-tree caller of pci_get_interrupt_pin()
and pci_get_interrupt_pin() is not exported.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since interrupts will soon be disabled at PCI resume time, we need to
pre-allocate memory to save/restore PCI config space (or use GFP_ATOMIC,
but this is safer).
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.
This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.
In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The _OSC capability OSC_MSI_SUPPORT is set when the root bridge is added
with pci_acpi_osc_support(), so we no longer need to do it in the PCI
MSI driver. Also adds the function pci_msi_enabled, which returns true
if pci=nomsi is not on the kernel command-line.
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The _OSC capability OSC_EXT_PCI_CONFIG_SUPPORT is set when the root
bridge is added with pci_acpi_osc_support() if we can access PCI
extended config space.
This adds the function pci_ext_cfg_avail which returns true if we can
access PCI extended config space (offset greater than 0xff). It
currently only returns false if arch=x86 and raw_pci_ext_ops is not set
(which might happen if pci=nommcfg is set on the kernel command-line).
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Some PCI devices implement PCI Advanced Features, which means they
support Function Level Reset(FLR). Implement support for that in
pci_reset_function.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Separate out function level reset so that pci_reset_function can be more
easily extended.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
for fsck sake, it's used only when parsing kernel command line...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before initialization, dev->irq may be zero. Make sure we don't disable
it at reset time in that case.
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The original ARI support code has a compatibility problem with non-ARI
devices. If a device doesn't support ARI, turning on ARI forwarding on
its upper level bridge will cause undefined behavior.
This fix turns on ARI forwarding only when the subordinate devices
support it.
Tested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Sometimes, it's necessary to enable software's ability to quiesce and
reset endpoint hardware with function-level granularity, so provide
support for it.
The patch implement Function Level Reset(FLR) feature following PCI-e
spec. And this is the first step. We would add more generic method, like
D0/D3, to allow more devices support this function.
The patch contains two functions. pcie_reset_function() is the new
driver API, and, contains some action to quiesce a device. The other
function is a helper: pcie_execute_reset_function() just executes the
reset for a particular device function.
Current the usage model is in KVM. Function reset is necessary for
assigning device to a guest, or moving it between partitions.
For Function Level Reset(FLR), please refer to PCI Express spec chapter
6.6.2.
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Currently linux doesn't have any code to set the "MSI supported" bit in
Support Fireld of _OSC. This patch adds the code for that.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch adds support for PCI Express Alternative Routing-ID
Interpretation (ARI) capability.
The ARI capability extends the Function Number field of the PCI Express
Endpoint by reusing the Device Number which is otherwise hardwired to 0.
With ARI, an Endpoint can have up to 256 functions.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This is a cleanup that changes all PCI configuration space size
representations to the macros (PCI_CFG_SPACE_SIZE and
PCI_CFG_SPACE_EXP_SIZE). And the macros are also moved from
drivers/pci/probe.c to drivers/pci/pci.h.
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Checkpatch would have complained about this but neither Bjorn nor myself
ran it prior to pushing. Fixup the issues Andrew pointed out.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This patch changes these two messages:
pci 0000:00:03.0: supports D1
pci 0000:00:03.0: supports D2
to this:
pci 0000:00:03.0: supports D1 D2
It also trivially converts a "dev_printk(KERN_INFO, ...)" to
"dev_info(...)".
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Many device drivers use the following sequence of statements to enable
the device to wake up the system while being in the D3_hot or D3_cold
low power state:
pci_enable_wake(pdev, PCI_D3hot, 1);
pci_enable_wake(pdev, PCI_D3cold, 1);
However, the second call is not necessary if the first one succeeds (the
ordering of the statements above doesn't matter here) and it may even be
harmful, because we are not supposed to enable PME# after the wake-up
power has been enabled for the device.
To allow drivers to overcome this problem, introduce function
pci_wake_from_d3() that will enable the device to wake up the system
from any of D3_hot and D3_cold as long as the wake-up from at least one
of them is supported.
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This converts things in drivers/pci to use %pR to printout the
content of a struct resource instead of hand-casted %llx or
other variants.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Export pci_pme_active() to drivers, so that they can clear the
PME_status bit and disable PME# for their devices without involving
ACPI.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Libata has some hacks to deal with certain controllers going silly in D3
state. The right way to handle this is to keep a PCI device flag for
such devices. That can then be generalised for no ATA devices with power
problems.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Make more PCI PM core functionality available to drivers
* Export pci_pme_capable() so that it can be called directly by
drivers (for example, tg3 needs that).
* Move the state choosing part of pci_prepare_to_sleep() to a
separate function, pci_target_state(), that can be called directly
by drivers (for example, tg3 needs that).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Fix kernel-doc comments so that they don't produce errors.
Also cut some extraneous copy-paste text.
Error(linhead//drivers/pci/pci.c:1133): duplicate section name 'Description'
Error(linhead//drivers/pci/pci.c:1189): duplicate section name 'Description'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/pci/pci.c needs pm_wakeup.h since it uses device_set_wakup_capable().
The latter also needs to be stubbed out for !CONFIG_PM.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
The recently introduced pci_prepare_to_sleep() needs the following fix,
because there are systems which are not power manageable by ACPI (ie.
ACPI doesn't provide methods to put the device into low power states and
back), but require ACPI hooks to be executed for wake-up to work.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
If the offset of PCI device's PM capability in its configuration space,
the mask of states that the device supports PME# from and the D1 and D2
support bits are cached in the corresponding struct pci_dev, the PCI
device PM code can be simplified quite a bit.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce functions pci_prepare_to_sleep() and pci_back_from_sleep(),
to be used by the PCI drivers that want to place their devices into
the lowest power state appropiate for them (PCI_D3hot, if the device
is not supposed to wake up the system, or the deepest state from
which the wake-up is possible, otherwise) while the system is being
prepared to go into a sleeping state and to put them back into D0
during the subsequent transition to the working state.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* Introduce function acpi_pm_device_sleep_wake() for enabling and
disabling the system wake-up capability of devices that are power
manageable by ACPI.
* Introduce function acpi_bus_can_wakeup() allowing other (dependent)
subsystems to check if ACPI is able to enable the system wake-up
capability of given device.
* Introduce callback .sleep_wake() in struct pci_platform_pm_ops and
for the ACPI PCI 'driver' make it use acpi_pm_device_sleep_wake().
* Introduce callback .can_wakeup() in struct pci_platform_pm_ops and
for the ACPI 'driver' make it use acpi_bus_can_wakeup().
* Move the PME# handlig code out of pci_enable_wake() and split it
into two functions, pci_pme_capable() and pci_pme_active(),
allowing the caller to check if given device is capable of
generating PME# from given power state and to enable/disable the
device's PME# functionality, respectively.
* Modify pci_enable_wake() to use the new ACPI callbacks and the new
PME#-related functions.
* Drop the generic .platform_enable_wakeup() callback that is not
used any more.
* Introduce device_set_wakeup_capable() that will set the
power.can_wakeup flag of given device.
* Rework PCI device PM initialization so that, if given device is
capable of generating wake-up events, either natively through the
PME# mechanism, or with the help of the platform, its
power.can_wakeup flag is set and its power.should_wakeup flag is
unset as appropriate.
* Make ACPI set the power.can_wakeup flag for devices found to be
wake-up capable by it.
* Make the ACPI wake-up code enable/disable GPEs for devices that
have the wakeup.flags.prepared flag set (which means that their
wake-up power has been enabled).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Rework pci_set_power_state() so that the platform callback is
invoked before the native mechanism, if necessary. Also, make
the function check if the device is power manageable by the
platform before invoking the platform callback.
This may matter if the device dependent on additional power
resources controlled by the platform is being put into D0, in which
case those power resources must be turned on before we attempt to
handle the device itself.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Introduce function pointer platform_pci_power_manageable to be used
by the platform-related code to point to a function allowing us to
check if given device is power manageable by the platform.
Introduce acpi_pci_power_manageable() playing that role for ACPI.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Convert printks to use dev_printk().
I converted pr_debug() to dev_dbg(). Both use KERN_DEBUG and are enabled
only when DEBUG is defined.
I converted printk(KERN_DEBUG) to dev_printk(KERN_DEBUG), not to dev_dbg(),
because dev_dbg() is only enabled when DEBUG is defined.
I converted DBG(KERN_INFO) (only in setup-bus.c) to dev_info(). The DBG()
name makes it sound like debug, but it's been enabled forever, so dev_info()
preserves the previous behavior.
I tried to make the resource assignment formats more consistent, e.g.,
"BAR %d: got res [%#llx-%#llx] bus [%#llx-%#llx] flags %#lx\n"
instead of sometimes using "start-end" and sometimes using "size@start".
I'm not attached to one or the other; I'd just like them consistent.
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Since the second argument of acpi_pci_choose_state() and
platform_pci_choose_state() is never used, remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.
This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.
In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.
Note: some devices might not work well with aspm, either because chipset
issue or device issue. The patch provide API (pci_disable_link_state),
driver can disable ASPM for specific device.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Revert as it is reported to cause problems for people.
commit 4348a2dc49
Author: Shaohua Li <shaohua.li@intel.com>
Date: Wed Oct 24 10:45:08 2007 +0800
pcie: utilize pcie transaction pending bit
PCIE has a mechanism to wait for Non-Posted request to complete. I think
pci_disable_device is a good place to do this.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Due to the regression reported at
http://bugzilla.kernel.org/show_bug.cgi?id=10065
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Soeren Sonnenburg <kernel@nn7.de>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
During the last step of hibernation in the "platform" mode (with the
help of ACPI) we use the suspend code, including the devices'
->suspend() methods, to prepare the system for entering the ACPI S4
system sleep state.
But at least for some devices the operations performed by the
->suspend() callback in that case must be different from its operations
during regular suspend.
For this reason, introduce the new PM event type PM_EVENT_HIBERNATE and
pass it to the device drivers' ->suspend() methods during the last phase
of hibernation, so that they can distinguish this case and handle it as
appropriate. Modify the drivers that handle PM_EVENT_SUSPEND in a
special way and need to handle PM_EVENT_HIBERNATE in the same way.
These changes are necessary to fix a hibernation regression related
to the i915 driver (ref. http://lkml.org/lkml/2008/2/22/488).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds PCI's accessor for segment_boundary_mask in device_dma_parameters.
The default segment_boundary is set to 0xffffffff, same to the block layer's
default value (and the scsi mid layer uses the same value).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds struct device_dma_parameters in struct pci_dev and properly
sets up a pointer in struct device.
The default max_segment_size is set to 64K, same to the block layer's
default value.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Mostly-acked-by: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 6c723d5bd8.
It caused build errors on non-x86 platforms, config file confusion, and
even some boot errors on some x86-64 boxes. All around, not quite ready
for prime-time :(
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (64 commits)
PCI: make pci_bus a struct device
PCI: fix codingstyle issues in include/linux/pci.h
PCI: fix codingstyle issues in drivers/pci/pci.h
PCI: PCIE ASPM support
PCI: Fix fakephp deadlock
PCI: modify SB700 SATA MSI quirk
PCI: Run ACPI _OSC method on root bridges only
PCI ACPI: AER driver should only register PCIe devices with _OSC
PCI ACPI: Added a function to register _OSC with only PCIe devices.
PCI: constify function pointer tables
PCI: Convert drivers/pci/proc.c to use unlocked_ioctl
pciehp: block new requests from the device before power off
pciehp: workaround against Bad DLLP during power off
pciehp: wait for 1000ms before LED operation after power off
PCI: Remove pci_enable_device_bars() from documentation
PCI: Remove pci_enable_device_bars()
PCI: Remove users of pci_enable_device_bars()
PCI: Add pci_enable_device_{io,mem} intefaces
PCI: avoid save the same type of cap multiple times
PCI: correctly initialize a structure for pcie_save_pcix_state()
...
PCI Express ASPM defines a protocol for PCI Express components in the D0
state to reduce Link power by placing their Links into a low power state
and instructing the other end of the Link to do likewise. This
capability allows hardware-autonomous, dynamic Link power reduction
beyond what is achievable by software-only controlled power management.
However, The device should be configured by software appropriately.
Enabling ASPM will save power, but will introduce device latency.
This patch adds ASPM support in Linux. It introduces a global policy for
ASPM, a sysfs file /sys/module/pcie_aspm/parameters/policy can control
it. The interface can be used as a boot option too. Currently we have
below setting:
-default, BIOS default setting
-powersave, highest power saving mode, enable all available ASPM
state
and clock power management
-performance, highest performance, disable ASPM and clock power
management
By default, the 'default' policy is used currently.
In my test, power difference between powersave mode and performance mode
is about 1.3w in a system with 3 PCIE links.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Now that all in-tree users are gone, this removes pci_enable_device_bars()
completely.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The pci_enable_device_bars() interface isn't well suited to PCI
because you can't actually enable/disable BARs individually on
a device. So for example, if a device has 2 memory BARs 0 and 1,
and one of them (let's say 1) has not been successfully allocated
by the firmware or the kernel, then enabling memory decoding
shouldn't be permitted for the entire device since it will decode
whatever random address is still in that BAR 1.
So a device must be either fully enabled for IO, for Memory, or
for both. Not on a per-BAR basis.
This provides two new functions, pci_enable_device_io() and
pci_enable_device_mem() to replace pci_enable_device_bars(). The
implementation internally builds a BAR mask in order to be able
to use existing arch infrastructure.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Avoid adding the same type of cap multiple times, otherwise we will see dead loop.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
save_state->cap_nr should be correctly set, otherwise we can't find the
saved cap at resume.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pci_save/store_state has multiple bugs, which will cause cap can't be
saved/restored correctly. Below 3 patches fix them.
fix the typo in pci_save_pcix_state
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
PCIE has a mechanism to wait for Non-Posted request to complete. I think
pci_disable_device is a good place to do this.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch makes the needlessly global pci_restore_bars() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There's no reason not to allow multiple calls to pcim_enable_device().
Calls after the first one can simply be noop. All PCI resources will
be released when the initial pcim_enable_device() resource is
released.
This allows more flexibility to managed PCI users.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* Introduce pci_domains_supported global, hardcoded to zero if
!CONFIG_PCI_DOMAINS.
* Introduce 'nodomains' boot option, which clears pci_domains_supported
on platforms that enable it by default (x86, x86-64, and others when
they are converted to use this).
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For cases in which CONFIG_PCIEAER=y (such as distro kernels), allow users
to disable PCIE Advanced Error Reporting by using "pci=noaer" on the
kernel command line.
This can be used to work around hardware or (kernel) software problems.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Replacing n & (n - 1) for power of 2 check by is_power_of_2(n)
Signed-off-by: vignesh babu <vignesh.babu@wipro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix undocumented function parameters in PCI and drivers/base.
Warning(linux-2.6.23-rc1//drivers/pci/pci.c:1526): No description found for parameter 'rq'
Warning(linux-2.6.23-rc1//drivers/base/firmware_class.c:245): No description found for parameter 'bin_attr'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source
ACPI: quiet ACPI Exceptions due to no _PTC or _TSS
ACPI: Remove references to ACPI_STATE_S2 from acpi_pm_enter
ACPI: Kconfig: always enable CONFIG_ACPI_SLEEP on X86
ACPI: Kconfig: fold /proc/acpi/sleep under CONFIG_ACPI_PROCFS
ACPI: Kconfig: CONFIG_ACPI_PROCFS now defaults to N
ACPI: autoload modules - Create __mod_acpi_device_table symbol for all ACPI drivers
ACPI: autoload modules - Create ACPI alias interface
ACPI: autoload modules - ACPICA modifications
ACPI: asus-laptop: Fix failure exits
ACPI: fix oops due to typo in new throttling code
ACPI: ignore _PSx method for hotplugable PCI devices
ACPI: Use ACPI methods to select PCI device suspend state
ACPI, PNP: hook ACPI D-state to PNP suspend/resume
ACPI: Add acpi_pm_device_sleep_state helper routine
ACPI: Implement the set_target() callback from pm_ops
Some odd ACPI implementations choke if certain controller is disabled
when ACPI suspend is invoked but we still need to make sure the PCI
device is enabled during resume. Simply using pci_enable_device()
unbalances device enable count. Export __pci_reenable_device().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
applied after Rafel's 'PM: Update global suspend and hibernation
operations framework' patch set
Signed-off-by: Shaohua Li<shaohua.li@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Check for PCI_CAP_ID_PM before checking the device state. Apparently fixes
some log spam via the 3c59x driver.
Signed-off-by: Andrew Lunn <andrew.lunn@ascom.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As suggested by Andrew, add pci_try_set_mwi(), which does not require
return-value checking.
- add pci_try_set_mwi() without __must_check
- make it return 0 on success, errno if the "try" failed or error
- review callers
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch introduces an interface to read and write PCI-X / PCI-Express
maximum read byte count values from PCI config space. There is a second
function that returns the maximum _designed_ read byte count, which marks the
maximum value for a device, since some drivers try to set MMRBC to the
highest allowed value and rely on such a function.
Based on patch set by Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Functions marked __devinit will be removed after kernel init. But being
exported they are potentially called by a module much later.
So the safer choice seems to be to keep the function even in the non
CONFIG_HOTPLUG case.
This silence the follwoing section mismatch warnings:
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_device from __ksymtab_gpl between '__ksymtab_pci_bus_add_device' (at offset 0x20) and '__ksymtab_pci_walk_bus'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_create_bus from __ksymtab_gpl between '__ksymtab_pci_create_bus' (at offset 0x40) and '__ksymtab_pci_stop_bus_device'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_max_busnr from __ksymtab_gpl between '__ksymtab_pci_bus_max_busnr' (at offset 0xc0) and '__ksymtab_pci_assign_resource_fixed'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_claim_resource from __ksymtab_gpl between '__ksymtab_pci_claim_resource' (at offset 0xe0) and '__ksymtab_pcie_port_bus_type'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_add_devices from __ksymtab between '__ksymtab_pci_bus_add_devices' (at offset 0x70) and '__ksymtab_pci_bus_alloc_resource'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_scan_bus_parented from __ksymtab between '__ksymtab_pci_scan_bus_parented' (at offset 0x90) and '__ksymtab_pci_root_buses'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_assign_resources from __ksymtab between '__ksymtab_pci_bus_assign_resources' (at offset 0x4d0) and '__ksymtab_pci_bus_size_bridges'
WARNING: drivers/built-in.o - Section mismatch: reference to .init.text:pci_bus_size_bridges from __ksymtab between '__ksymtab_pci_bus_size_bridges' (at offset 0x4e0) and '__ksymtab_pci_setup_cardbus'
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Warning(linux-2621-rc3g7/drivers/pci/pci.c:1283): No description found for parameter 'dev'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Adds a new API which can be used to issue various types
of PCI-E reset, including PCI-E warm reset and PCI-E hot reset.
This is needed for an ipr PCI-E adapter which does not properly
implement BIST. Running BIST on this adapter results in PCI-E
errors. The only reliable reset mechanism that exists on this
hardware is PCI Fundamental reset (warm reset). Since driving
this type of reset is architecture unique, this provides the
necessary hooks for architectures to add this support.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This defines a platform hook to enable/disable a device as a wakeup event
source. It's initially for use with ACPI, but more generally it could be used
whenever enable_irq_wake()/disable_irq_wake() don't suffice.
The hook is called -- if available -- inside pci_enable_wake(); and the
semantics of that call are enhanced so that support for PCI PME# is no longer
needed. It can now work for devices with "legacy PCI PM", when platform
support allows it. (That support would use some board-specific signal for for
the same purpose as PME#.)
[akpm@linux-foundation.org: Make it compile with CONFIG_PM=n]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Because we do not reserve space for the pci-x and pci-e state in struct
pci dev we need to dynamically allocate it. However because we need
to support restore being called multiple times after a single save
it is never safe to free the buffers we have allocated to hold the
state.
So this patch modifies the save routines to first check to see
if we have already allocated a state buffer before allocating
a new one. Then the restore routines are modified to not free
the state after restoring it. Simple and it fixes some subtle
error path handling bugs, that are hard to test for.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are two ways pci_save_state and pci_restore_state are used. As
helper functions during suspend/resume, and as helper functions around
a hardware reset event. When used as helper functions around a hardware
reset event there is no reason to believe the calls will be paired, nor
is there a good reason to believe that if we restore the msi state from
before the reset that it will match the current msi state. Since arch
code may change the msi message without going through the driver, drivers
currently do not have enough information to even know when to call
pci_save_state to ensure they will have msi state in sync with the other
kernel irq reception data structures.
It turns out the solution is straight forward, cache the state in the
existing msi data structures (not the magic pci saved things) and
have the msi code update the cached state each time we write to the hardware.
This means we never need to read the hardware to figure out what the hardware
state should be.
By modifying the caching in this manner we get to remove our save_state
routines and only need to provide restore_state routines.
The only fields that were at all tricky to regenerate were the msi and msi-x
control registers and the way we regenerate them currently is a bit dependent
upon assumptions on how we use the allow msi registers to be configured and used
making the code a little bit brittle. If we ever change what cases we allow
or how we configure the msi bits we can address the fragility then.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sanity check in pcim_pin_device() was too restrictive in that it didn't
allow multiple calls to the function, which is against the devres
philosohpy of fire-and-forget. Track pinned status separately and allow
pinning multiple times.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In some cases when we are not using msi we need a way to ensure that the
hardware does not have an msi capability enabled. Currently the code has been
calling disable_msi_mode to try and achieve that. However disable_msi_mode
has several other side effects and is only available when msi support is
compiled in so it isn't really appropriate.
Instead this patch implements pci_msi_off which disables all msi and msix
capabilities unconditionally with no additional side effects.
pci_disable_device was redundantly clearing the bus master enable flag and
clearing the msi enable bit. A device that is not allowed to perform bus
mastering operations cannot generate intx or msi interrupt messages as those
are essentially a special case of dma, and require bus mastering. So the call
in pci_disable_device to disable msi capabilities was redundant.
quirk_pcie_pxh also called disable_msi_mode and is updated to use pci_msi_off.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CARDBUS_MEM_SIZE was increased to 64MB on 2.6.20-rc2, but larger size might
result in allocation failure for the reserving itself on some platforms
(for example typical 32bit MIPS). Make it (and CARDBUS_IO_SIZE too)
customizable by "pci=" option for such platforms.
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Implement device resource management, in short, devres. A device
driver can allocate arbirary size of devres data which is associated
with a release function. On driver detach, release function is
invoked on the devres data, then, devres data is freed.
devreses are typed by associated release functions. Some devreses are
better represented by single instance of the type while others need
multiple instances sharing the same release function. Both usages are
supported.
devreses can be grouped using devres group such that a device driver
can easily release acquired resources halfway through initialization
or selectively release resources (e.g. resources for port 1 out of 4
ports).
This patch adds devres core including documentation and the following
managed interfaces.
* alloc/free : devm_kzalloc(), devm_kzfree()
* IO region : devm_request_region(), devm_release_region()
* IRQ : devm_request_irq(), devm_free_irq()
* DMA : dmam_alloc_coherent(), dmam_free_coherent(),
dmam_declare_coherent_memory(), dmam_pool_create(),
dmam_pool_destroy()
* PCI : pcim_enable_device(), pcim_pin_device(), pci_is_managed()
* iomap : devm_ioport_map(), devm_ioport_unmap(), devm_ioremap(),
devm_ioremap_nocache(), devm_iounmap(), pcim_iomap_table(),
pcim_iomap(), pcim_iounmap()
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
The PCI save/restore code doesn't need to care about MSI vs MSI-X, all
it really wants is to say "save/restore all MSI(-X) info for this device".
This is borne out in the code, we call the MSI and MSI-X save routines
side by side, and similarly with the restore routines.
So combine the MSI/MSI-X routines into pci_save_msi_state() and
pci_restore_msi_state(). It is up to those routines to decide what state
needs to be saved.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Return early from pci_set_power_state() if hardware does not support
power management. This way, we do not generate noise in the logs.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Since isa_bridge is neither assigned any value !NULL nor used on !Alpha,
there's no reason for providing it.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch contains the following cleanups:
- move all EXPORT_SYMBOL's directly below the code they are exporting
- move all DECLARE_PCI_FIXUP_*'s directly below the functions they
are calling
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch adds the following changes into generic PCI code especially
for PCI legacy I/O port free drivers.
- Added new pci_request_selected_regions() and
pci_release_selected_regions() for PCI legacy I/O port free
drivers in order to request/release only the selected regions.
- Added helper routine pci_select_bars() which makes proper mask
of BARs from the specified resource type. This would be very
helpful for users of pci_enable_device_bars().
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Original patch was posted as "PCI : Move pci_fixup_device and is_enabled".
This 1 of 3 patches does:
- reverts small part of Inaky's patch
(remove __pci_enable_device)
This change will be recovered by 3rd patch.
- temporarily remove pci_fixup_device.
This change will be recovered by 2nd patch.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
While testing 2.6.20-rc3 on a machine with some CK804 chipsets, we noticed
that quirk_nvidia_ck804_msi_ht_cap() was not detecting HT MSI capabilities
anymore. It is actually caused by the MSI mapping on the root chipset
being the 2nd HT capability in the chain. pci_find_ht_capability() does
not seem to find anything but the first HT cap correctly, because it
forgets to increment the position before looking for the next cap. The
following patch seems to fix it.
At least, this proves that having a ttl is good idea since the machine
would have been stucked in an infinite loop if we didn't have a ttl :)
We have to pass pos + PCI_CAP_LIST_NEXT to __pci_find_next_cap_ttl to
get the next HT cap instead of the same one again.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There are already several places in the kernel that want to search a PCI
device for a given Hypertransport capability. Although this is possible
using pci_find_capability() etc., it makes sense to encapsulate that
logic in a helper - pci_find_ht_capability().
To cater for searching exhaustively for a capability, we also provide
pci_find_next_ht_capability().
We also need to cater for the fact that the HT capability fields may be
either 3 or 5 bits wide. pci_find_ht_capability() deals with this for you,
but callers using the #defines directly must handle that themselves.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The current implementation of __pci_bus_find_cap() does two things,
first it determines the start of the capability chain for the device,
and then it trys to find the requested capability.
Split these out, so that we can use the two parts independantly in
a subsequent patch. Externally visible behaviour should be unchanged.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changes the pci_{enable,disable}_device() functions to work in a
nested basis, so that eg, three calls to enable_device() require three
calls to disable_device().
The reason for this is to simplify PCI drivers for
multi-interface/capability devices. These are devices that cram more
than one interface in a single function. A relevant example of that is
the Wireless [USB] Host Controller Interface (similar to EHCI) [see
http://www.intel.com/technology/comms/wusb/whci.htm].
In these kind of devices, multiple interfaces are accessed through a
single bar and IRQ line. For that, the drivers map only the smallest
area of the bar to access their register banks and use shared IRQ
handlers.
However, because the order at which those drivers load cannot be known
ahead of time, the sequence in which the calls to pci_enable_device()
and pci_disable_device() cannot be predicted. Thus:
1. driverA starts pci_enable_device()
2. driverB starts pci_enable_device()
3. driverA shutdown pci_disable_device()
4. driverB shutdown pci_disable_device()
between steps 3 and 4, driver B would loose access to it's device,
even if it didn't intend to.
By using this modification, the device won't be disabled until all the
callers to enable() have called disable().
This is implemented by replacing 'struct pci_dev->is_enabled' from a
bitfield to an atomic use count. Each caller to enable increments it,
each caller to disable decrements it. When the count increments from 0
to 1, __pci_enable_device() is called to actually enable the
device. When it drops to zero, pci_disable_device() actually does the
disabling.
We keep the backend __pci_enable_device() for pci_default_resume() to
use and also change the sysfs method implementation, so that userspace
enabling/disabling the device doesn't disable it one time too much.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
pSeries is the only architecture left using HAVE_ARCH_PCI_MWI and it's
really inappropriate for its needs. It really wants to disable MWI
altogether. So here are a pair of stub implementations for pci_set_mwi
and pci_clear_mwi.
Also rename pci_generic_prep_mwi to pci_set_cacheline_size since that
better reflects what it does.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The setting of the CACHE_LINE_SIZE register in sparc64's pci
initialisation code isn't quite adequate as the device may have
incompatible requirements. The generic code tests for this, so switch
sparc64 over to using it.
Since sparc64 has different L1 cache line size and PCI cache line size,
it would need to override the generic code like i386 and ia64 do. We
know what the cache line size is at compile time though, so introduce a
new optional constant PCI_CACHE_LINE_BYTES.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David Miller <davem@davemloft.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Shouldn't PCI-X state be saved/restored? No device really needs this
right now. qla24xx (fc HBA) and mthca (infiniband) don't do suspend,
and sky2 resets its tweaks when links are brought up.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Restore PCI Express capability registers after PM event.
This includes maxumum MTU for PCI express and other vital data.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
[PATCH] Don't set calgary iommu as default y
[PATCH] i386/x86-64: New Intel feature flags
[PATCH] x86: Add a cumulative thermal throttle event counter.
[PATCH] i386: Make the jiffies compares use the 64bit safe macros.
[PATCH] x86: Refactor thermal throttle processing
[PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
[PATCH] Fix unwinder warning in traps.c
[PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
[PATCH] x86: Move direct PCI scanning functions out of line
[PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
[PATCH] Don't leak NT bit into next task
[PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
[PATCH] Fix some broken white space in ia32_signal.c
[PATCH] Initialize argument registers for 32bit signal handlers.
[PATCH] Remove all traces of signal number conversion
[PATCH] Don't synchronize time reading on single core AMD systems
[PATCH] Remove outdated comment in x86-64 mmconfig code
[PATCH] Use string instructions for Core2 copy/clear
[PATCH] x86: - restore i8259A eoi status on resume
[PATCH] i386: Split multi-line printk in oops output.
...
Some buggy systems can machine check when config space accesses
happen for some non existent devices. i386/x86-64 do some early
device scans that might trigger this. Allow pci=noearly to disable
this. Also when type 1 is disabling also don't do any early
accesses which are always type1.
This moves the pci= configuration parsing to be a early parameter.
I don't think this can break anything because it only changes
a single global that is only used by PCI.
Cc: gregkh@suse.de
Cc: Trammell Hudson <hudson@osresearch.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Convert some framework code to handle the new PRETHAW message.
- IDE just treats it like a FREEZE.
- The pci_choose_state() thingie still doesn't use PCI_D0 when it gets a
FREEZE (and now PRETHAW) event, which seems rather buglike but wasn't
something to change with this patch.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When changing power states from D0->DX and then from DX->D0, some
Intel PCIE chipsets will cause a device reset to occur. This will
cause problems for any D State other than D3, since any state
information that the driver will expect to be present coming from
a D1 or D2 state will have been cleared. This patch addes a
flag to the pci_dev structure to indicate that devices should
not use states D1 or D2, and will set that flag for the affected
chipsets. This patch also modifies pci_set_power_state() so that
when a device driver tries to set the power state on
a device that is downstream from an affected chipset, or on one
of the affected devices it only allows state changes to or
from D0 & D3. In addition, this patch allows the delay time
between D3->D0 to be changed via a quirk. These chipsets also
need additional time to change states beyond the normal 10ms.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is needed if we wish to change the size of the resource structures.
Based on an original patch from Vivek Goyal <vgoyal@in.ibm.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brice said the pci_save_msi_state breaks his driver in his special usage
(not in suspend/resume), as pci_save_msi_state will disable msi mode. In
his usage, pci_save_state will be called at runtime, and later (after
the device operates for some time and has an error) pci_restore_state
will be called.
In another hand, suspend/resume needs disable msi mode, as device should
stop working completely. This patch try to workaround this issue.
Drivers are expected call pci_disable_device in suspend time after
pci_save_state.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If a device is already enabled, don't bother reenabling it.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Acked-By: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
According to Intel ICH spec, there are several rules that Base Address
should be programmed before IOSE (PCICMD register ) enabled.
For example ICH7:
12.1.3 SATA : the base address register for the bus master register
should be programmed before this bit is set.
11.1.3: PCICMD (USB): The base address register for USB should be
programmed before this bit is set.
....
To make sure kernel code follow this rule , and prevent unnecessary
confusion. I proposal this patch.
Signed-off-by: Luming Yu <luming.yu@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
At least one laptop blew up on resume from suspend with a black screen due
to a lack of this patch. By only writing back config space that is
different, we minimise the possibility of accidents like this.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch revives pci_find_ext_capability (has been disabled a couple month
ago since it was not used anywhere. See http://lkml.org/lkml/2006/1/20/247).
It will now be used by the myri10ge driver.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andrew J. Gallatin <gallatin@myri.com>
drivers/pci/pci.c | 3 +--
include/linux/pci.h | 2 ++
2 files changed, 3 insertions(+), 2 deletions(-)
Signed-off-by: Jeff Garzik <jeff@garzik.org>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (169 commits)
commit 78a596b449
Author: Adrian Bunk <bunk@stusta.de>
Date: Fri Mar 31 01:38:12 2006 -0800
[PATCH] remove kernel/power/pm.c:pm_unregister()
Since the last user is removed in -mm, we can now remove this long deprecated
function.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
commit 21440d3133
Author: David Brownell <david-b@pacbell.net>
Date: Sat Apr 1 10:21:52 2006 -0800
[PATCH] dma doc updates
...
Print more diagnostic info to help identify the source of power management
suspend failures.
Example:
usb_hcd_pci_suspend(): pci_set_power_state+0x0/0x1af() returns -22
pci_device_suspend(): usb_hcd_pci_suspend+0x0/0x11b() returns -22
suspend_device(): pci_device_suspend+0x0/0x34() returns -22
Work-in-progress. It needs lots more suspend_report_result() calls sprinkled
everywhere.
Cc: Patrick Mochel <mochel@digitalimplant.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@suspend2.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add missing 'const' to pci_request_region[s] 'res_name' arg,
since we pass it directly to __request_region(), whose 'name' arg
is also const.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Several drivers are starting to grow options to disable MSI. However,
it's often a host chipset issue, not something which individual drivers
should handle. So we add the pci=nomsi kernel parameter to allow the user
to disable MSI modes for systems we haven't added to the quirk list yet.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Change the semantics of this call to return the max reserved
bus number instead of just the max assigned bus number.
Signed-off-by: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Clean up: move assignments outside of if() statements.
AFAICT, no functional change. Easier to read/understand.
Depends on "[PATCH 1/3] msi vector targeting abstractions"
by Mark Maule <maule@sgi.com>.
I expect one hunk to fail if applied against 2.6.15.
This is essentially Joe Perches' patch.
I've cleaned up the one instance added by Mark's patch.
Signed-off-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>