Store the firmware registers and other relevant data to a firmware crash dump
file and provide it to user-space via debugfs. Should help with figuring out
why the firmware crashed.
kvalo: remove dbglog support, rework and refactor the code to avoid ifdefs and
otherwise simplify it as well
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k_pci_diag_read32() is for reading u32 from a device and ath10k_pci_diag_read_hi()
is a helper for reading data using "host interest" table.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Make probe/remove functions shorter and easier to
understand.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The ATH10K_PCI_FEATURE_MSI_X was originally
introduced to support both chips QCA988Xv1 and
QCA988Xv2. Since v1 isn't supported anymore it
doesn't make sense to keep the feature flag
around. Since this is the last one remove the
whole thing.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The soc powersave was disabled by default. It
never was fully tested. Some hw apparently had
problems with it and the implementation itself had
a possible race.
Just remove the refcounting and simply wake up the
device when probing and put to sleep when
removing.
kvalo: make ath10k_pci_wake() and _sleep() static
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Use the common convention of embedding private
structures inside parent structures. This
reduces allocations and simplifies pci probing
code.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The 10.2 firmware is a successor of 10.1 firmware
(formerly identified as 10.x). Both share a lot
but have some slight ABI differences that need to
be taken care of.
The 10.2 firmware introduces some new features but
those can be added in subsequent patches. This
patch makes ath10k boot and work with 10.2 with
comparable functionality to 10.1.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible to enter an endless loop while
processing a single pci copy engine pipe. This
could effectively render ath10k incapable of
responding to any requests.
An example case when this could happen is when
firmware generates a lot of events, e.g. spectral
scan phyerr via WMI.
Reported-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible for tx completion not to be
processed. In that case an old stack pointer was
left on copy engine tx ring. Next bmi exchange
would immediately pop it and use complete() on the
completion struct there causing corruption.
Make sure to wait for both tx and rx completions
properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This prevents leaving incomplete scatter-gather
transfer on CE rings which can lead firmware to
crash.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible to read invalid state of CE ring
buffer indexes. This could lead to scatter-gather
transfer failure in mid-way and crash firmware
later by leaving garbage data on the ring.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The tasklet is already guaranteed to be killed on
the teardown path.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Recently there was a bug discovered that involved
hif_stop() being called twice that ended up with a
double free_irq() call but it only manifested with
multiple MSI interrupts mapping.
Catch this kind of a problem early in driver
regardless of interrupt mapping.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
On ARM-based (MSM mach), the pci_assign_resource() is passing
some invalid pointers and leading to L2 cache errors,
what prevents the PCI communication completly.
So far I have not found this funtion to be directly called by
any other wifi driver and did not found this assigning needed
on any other platform. So removing it completely.
Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This can be useful for early initialization
debugging, i.e. ROM crashes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Sometimes warm reset works upon retry. It might be
related to imperfect warm reset routine, but for
now let's just do the retries.
This should improve the reliability of some chips
that hang/crash with cold reset which is used as a
last resort if warm reset fails.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Warm reset is now able to recover after device
crashes which required a cold reset before.
This should greatly reduce chances of getting data
bus errors or host system freezes due to buggy
cold reset on some chips.
kvalo: use ath10k_pci_soc_*()
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
When warm resetting it's possible for device to
crash during initialization. Instead of waiting 3
seconds just return failure as soon as
FW_IND_EVENT_PENDING is set.
This speeds up device bootup and recovery in some
cases.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This just makes it easier to tell apart different
kinds of bringup failure.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Since copy engine allocation has been revised the
ath10k_pci_ce_deinit() now simply zeroes copy
engine registers. It's probably a good idea to do
that before reseting for a more graceful device
reset.
Before ath10k_pci_ce_deinit() freed copy engine
ringbuffer memory so it was required to call it
after resetting. Otherwise it was possible for
device to access unmapped/freed copy engine
ringbuffer memory.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Definitions by which copy engine structure are
allocated do not change so it doesn't make much
sense to re-create those structures each time
device is booted (e.g. due to firmware recovery).
This should decrease chance of memory allocation
failures.
While at it remove per_transfer_context pointer
indirection. The array has been trailing the copy
engine ringbuffer structure anyway. This also
saves pointer size worth of bytes for each copy
engine ringbuffer.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This allows to use GFP_KERNEL allocation. This
should decrease chance of allocation failure, e.g.
during firmware recovery.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
To make it easier to debug pci problems improve the log messages in pci.c. Also
change some debug messages to warning messages to more easily catch problems.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
We do not really support older firmware API 1 anymore, so better remove
MODULE_FIRMWARE() declarations for them and only list for API 2 files.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The parameter name was ath10k_target_ps, but actually it should be just
target_ps. Module parameter names should not use the ath10k_ prefix.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
As cold reset is not reliable with CUS223 boards, make it possible
to disable cold reset entirely and only use warm reset. This makes it also
easier to debug warm reset problems.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k_pci_wait_for_target_init() did really follow the style used elsewhere in
ath10k. Use ath10k_pci_read/write() wrappers, simplify the while loop and
improve warning messages.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
One of the premises was to guarantee serialized
completion handling for upper layers
(HTC/WMI/HTT). Since quite some time now it is no
longer necessary.
The other premise was to batch up tx/rx
completions to take advantage of hot caches.
However frame tx/rx completion indications come in
on a single pipe already so they are already
batched up. More meaningful batching is done in
HTT itself.
This means PCI completion is no longer necessary
to keep around. It just wastes memory, cycles and
SLOC.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Going through full htc tx path for htt tx is a
waste of resources. By skipping it it's possible
to easily submit scatter-gather to the pci hif for
reduced host cpu load and improved performance.
The new approach uses dma pool to store the
following metadata for each tx request:
* msdu fragment list
* htc header
* htt tx command
The htt tx command contains a msdu prefetch.
Instead of copying it original mapped msdu address
is used to submit a second scatter-gather item to
hif to make a complete htt tx command.
The htt tx command itself hands over dma mapped
pointers to msdus and completion of the command
itself doesn't mean the frame has been sent and
can be unmapped/freed. This is why htc tx
completion is skipped for htt tx as all tx related
resources are freed upon htt tx completion
indication event (which also implicitly means htt
tx command itself was completed).
Since now each htt tx request effectively consists
of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
is updated to allow maximum of
TARGET_10X_NUM_MSDU_DESC msdus being queued. This
keeps the tx path resource management simple.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
PCI is capable of handling scatter-gather lists.
This can be used to avoid copying memory.
Change the name of the callback while at to
reflect its purpose.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The flag wasn't used anymore. No need to keep it.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
As result deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
In case IRQ configuration is unknown possibly enabled MSIs
are left enabled in ath10k_pci_deinit_irq(). This update
fixes the described misbehaviour.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The documentation states that pci_enable_msi_block() returns the number of
requests 'could have been allocated', not 'could allocate'. IOW, MSIs are *not*
enabled if a positive value returned.
kvalo: add commit log based on Alexander's email
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Hardware CUS232 version 2 has some issues with cold
reset that lead to Data Bus Errors or system hangs
in some cases. It's safer to use warm reset when
possible as it shouldn't trigger the
aforementioned issues.
Prefer warm reset over cold reset. However since
warm reset doesn't work after FW crash make sure to
fallback to cold reset when booting up the HW.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Marek Puzyniak <marek.puzyniak@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
10.x FW has no structure member sw_version_1. Thus,
both fw_version_release and fw_version_build are not
available. The provided fw_version_major is also wrong.
Fix this by using the fw_version from struct wiphy.
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This can be useful for testing and debugging.
This introduces new ath10k_pci module parameter
`irq_mode`. By default it is 0, meaning automatic
irq mode (MSI-X as long as both target HW and host
platform supports it). The parameter works on a
best effort basis.
kvalo: fix typo "ayto"
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's possible for FW to panic during early boot.
The patch re-introduces support to detect and
print those crashes.
This introduces an additional irq handler that is
set for the duration of early boot and shutdown.
The handler is then overriden with regular
handlers upon hif start().
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Preparation for code re-use. Also use ioread/write
wrappers.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's impossible to rely on disable_irq() and/or CE
interrupt masking with legacy shared interrupts.
Other devices sharing the same irq line may assert
it while ath10k is doing something that requires
no interrupts.
Irq handlers are now registered after all
preparations are complete so spurious/foreign
interrupts won't do any harm. The handlers are
unregistered when no interrupts are required (i.e.
during driver teardown).
This also removes the ability to receive FW early
indication (since interrupts are not registered
until early boot is complete). This is not mission
critical (it's more of a hint that early boot
failed due to unexpected FW crash) and will be
re-added in a follow up patch.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The function did a couple of things: it allocated
CE completions, registered CE callbacks and
enabled CE interrupts through HW registers.
This cannot be so. Split the function into one
that allocates CE completions and the other one
that starts off CE operation.
This is required for future legacy shared
interrupt handling.
This also fixes possible memory leak if post rx
failed.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's not really necessary for interrupts to be
used for BMI. BMI already assumes there's only one
caller at a time and it works directly with CE.
Make BMI poll for CE completions instead of
waiting for interrupts. This makes disabling
interrupts during early boot possible.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Hardware waits until host signals whether it has
chosen MSI(-X) or shared legacy interrupts. It is
not required for the driver to register interrupt
handlers immediately.
This patch prepares the pci irq code for more
changes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k assumed all interrupts were directed to it.
This isn't the case for legacy shared interrupts.
ath10k consumed interrupts for other devices.
Check device irq status and return IRQ_NONE when
appropriate.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Convert the MSI failure warnings to a debug message to make them less spammy.
Also convert the irq mode printout to a single print to make it easier to
show it only once.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...