Delay 1ms is too long for both diag read and write operations.
This is observed when writing a big memory buffer to target or
reading a big memory buffer from target. Take writing/reading
512k bytes as example, the delay itself is 256ms as the maximum
length of every write/read is 2k size.
Reduce the delay to 50us for read and write operations.
Take the ath10k_pci_targ_cpu_to_ce_addr() out of loop and put it
in the beginning of the loop for ath10k_pci_diag_read_mem().
The ath10k_pci_targ_cpu_to_ce_addr() is to convert the address
from target cpu's perspective to CE's perspective, so it makes
no sense to convert a CE's perspective address again in the loop.
It's a wrong implementation but happens to work.
If the target address is below 1M space, then the convert in the loop
from the second time becomes wrong because the previously converted address
is larger than 1M. The counterpart ath10k_pci_diag_write_mem() has the
correct implementation.
With this change, ath10k_pci_diage_read_mem() works correctly no matter
the target address is below 1M or above 1M.
It's tested with QCA6174 hw3.2 and
firmware-6.bin_WLAN.RM.4.4.1-00111-QCARMSWP-1. QCA9377 is also affected.
Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
It's easier to violate abstractions and introduce bugs when snoc.h is
including pci.h. Let's not do that.
I'm not extremely familiar with this driver yet, but several of the
shared PCI/SNOC bits seem to be related to the Copy Engine, so move them
to ce.h.
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Remove bus specific dependencies from CE layer
to have common CE layer across multiple targets.
This is required for adding support for WCN3990
chipset support as WCN3990 chipset uses SNOC
bus interface with Copy Engine endpoint.
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With QCA4019 platform, SRAM address can be accessed directly from host but
currently, we are assuming sram addresses cannot be accessed directly and
hence we convert the addresses.
While there, clean up growing hw checks during conversion of target CPU
address to CE address. Now we have function pointer pertaining to different
chips.
Signed-off-by: Ashok Raj Nagarajan <arnagara@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
DIAG_TRANSFER_LIMIT is redefined with same value and comments
just below this entry, remove this duplicate entry.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Add NAPI support for rx and tx completion. NAPI poll is scheduled
from interrupt handler. The design is as below
- on interrupt
- schedule napi and mask interrupts
- on poll
- process all pipes (no actual Tx/Rx)
- process Rx within budget
- if quota exceeds budget reschedule napi poll by returning budget
- process Tx completions and update budget if necessary
- process Tx fetch indications (pull-push)
- push any other pending Tx (if possible)
- before resched or napi completion replenish htt rx ring buffer
- if work done < budget, complete napi poll and unmask interrupts
This change also get rid of two tasklets (intr_tq and txrx_compl_task).
Measured peak throughput with NAPI on IPQ4019 platform in controlled
environment. No noticeable reduction in throughput is seen and also
observed improvements in CPU usage. Approx. 15% CPU usage got reduced
in UDP uplink case.
DL: AP DUT Tx
UL: AP DUT Rx
IPQ4019 (avg. cpu usage %)
========
TOT +NAPI
=========== =============
TCP DL 644 Mbps (42%) 645 Mbps (36%)
TCP UL 673 Mbps (30%) 675 Mbps (26%)
UDP DL 682 Mbps (49%) 680 Mbps (49%)
UDP UL 720 Mbps (28%) 717 Mbps (11%)
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Store pci chip secific reset funtions in struct ath10k_pci
as callbacks during early ath10k_pci_probe() and use the
callback to perform chip specific resets. This patch essentially
adds two callback in ath10k_pci, one for doing soft reset and
the other for hard reset. By using callbacks we can get rid of
those hw revision checks in ath10k_pci_safe_chip_reset() and
ath10k_pci_chip_reset(). As such this patch does not fix
any issue.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
MSI-X is never well-tested, might contain bugs and generally isn't
really all that useful to maintain. Also ath10k is mainly used with
shared/singly-MSI interrupt systems. Hence removing MSI range support.
This change will be useful for further cleanup in copy engine lock
and to add NAPI support.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
qca4019 deals with below register memory region to control the clock,
reset, etc.
- Memory to control wifi core
- gcc (outside of wifi)
- tcsr (outside of wifi)
Add new helper functions to perform read/write in above registers
spaces. Actual ioremap for above registers are done in later patch.
Struct ath10k_ahb is introduced to maintain ahb specific info and
memory this struct will be allocated in the continuation of struct
ath10k_pci (again, memory ath10k_ahb is allocated in the later patch).
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
qca4019 uses ahb instead of pci where it slightly differs in device
enumeration, clock control, reset control, etc. Good thing is that
ahb also uses copy engine for the data transaction. So, the most of
the stuff implemented in pci.c/ce.c are reusable in ahb case too.
Device enumeration in ahb case comes through platform driver/device
model. All resource details like irq, memory map, clocks, etc for
qca4019 can be fetched from of_node of platform device.
Simply flow would look like,
device tree => platform device (kernel) => platform driver (ath10k)
Device tree entry will have all qca4019 resource details and the same
info will be passed to kernel. Kernel will prepare new platform device
for that entry and expose DT info to of_node in platform device.
Later, ath10k would register platform driver with unique compatible name
and then kernels binds to corresponding compatible entry & calls ath10k
ahb probe functions. From there onwards, ath10k will take control of it
and move forward.
New bool flag CONFIG_ATH10K_AHB is added in Kconfig to conditionally
enable ahb support in ath10k. On enabling this flag, ath10k_pci.ko
will have ahb support. This patch adds only basic skeleton and few
macros to support ahb in the context of qca4019.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Some of the code present in ath10k_pci_{probe|remove} are reusable
in ahb case too. To avoid code duplication, move reusable code to
new functions. Later, those new functions can be called from ahb
module's probe and exit functions.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k_pci_read32/write32() does work more specific to
PCI by ensuring pci wake/sleep for every read and write.
There is a plan to use most of stuff available in pci.c
(irq stuff, copy engine, etc) for AHB case. Such kind
of pci wake/sleep for every read/write is not required
in AHB case (qca4019). All those reusable areas in pci.c
and ce.c calls ath10k_pci_read32/write32() for low level
read and write.
In fact, ath10k_pci_read32/write32() should do what it does
today for PCI case. But for AHB, it has to do differently.
To make ath10k_pci_read32/write32() more generic, new function
pointers are added in ar_pci for the function which does
operation more close to the bus. Later, corresponding bus
specific read and write function will be mapped to that.
ath10k_pci_read32/write32() are changed to call directly
those function pointers without worrying which bus underlying
to it. Also, the function to get number of bank is changed
in the same way.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Some of static functions present in pci.c file are reusable
in ahb (qca4019) case. Remove static word for those reusable
functions and have those function prototype declaration in
pci.h file. So that, pci.h header file can be included in
ahb module and reused. There is no functionality changes done
in this patch.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This patch disables PCI PS for QCA988X and QCA99X0, Since PCI PS is
validated for QCA6174, let it be enabled only for QCA6174. It would be
better to execute PCI PS related functions only for the supported devices.
PCI time out issue is observed with QCA99X0 on x86 platform, We will
disable PCI PS for QCA988X and QCA99X0 until PCI PS is properly implemented.
Taking and releasing ps_lock is causing higher CPU consumption. Michal Kazior
suggested ps_lock overhead to be reworked so that ath10k_pci_wake/sleep
functions are called less often, i.e. move the powersave logic up (only during
irq handling, tx path, submitting fw commands) but that's a bigger change and
can be implemented later.
Signed-off-by: Anilkumar Kolli <akolli@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register receive callbacks for every copy engines (CE) separately
instead of having common receive handler. Some of the copy engines
receives different type of messages (i.e HTT/HTC/pktlog) from target.
Hence to service them accordingly, register per copy engine receive
callbacks.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It is noticed that pci wakeup time is exceeding current timeout (10ms)
randomly which is tested on QCA988x. So, the wake up time is increased
to 30 ms and added debug prints to log total timeout.
Signed-off-by: Maharaja Kennadyrajan <c_mkenna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Make the helper converting target virtual address space to CE address
space a target type specific to support QCA99X0. Also make this as
function instead of macro.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible to force an out of bounds MMIO
read/write via debugfs. E.g. on QCA988X this could
be triggered with:
echo 0x2080e0 | tee /sys/kernel/debug/ieee80211/*/ath10k/reg_addr
cat /sys/kernel/debug/ieee80211/*/ath10k/reg_value
BUG: unable to handle kernel paging request at ffffc90001e080e0
IP: [<ffffffff8135c860>] ioread32+0x40/0x50
...
Call Trace:
[<ffffffffa00d0c7f>] ? ath10k_pci_read32+0x4f/0x70 [ath10k_pci]
[<ffffffffa0080f50>] ath10k_reg_value_read+0x90/0xf0 [ath10k_core]
[<ffffffff8115c2c1>] ? handle_mm_fault+0xa91/0x1050
[<ffffffff81189758>] __vfs_read+0x28/0xe0
[<ffffffff812e4694>] ? security_file_permission+0x84/0xa0
[<ffffffff81189ce3>] ? rw_verify_area+0x53/0x100
[<ffffffff81189e1a>] vfs_read+0x8a/0x140
[<ffffffff8118acb9>] SyS_read+0x49/0xb0
[<ffffffff8104e39c>] ? trace_do_page_fault+0x3c/0xc0
[<ffffffff8196596e>] system_call_fastpath+0x12/0x71
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
By using SOC_WAKE register it is possible to bring
down power consumption of QCA61X4 from 36mA to
16mA when associated and idle.
Currently the sleep threshold/grace period is at a
very conservative value of 60ms.
Contrary to QCA61X4 the QCA988X firmware doesn't
have Rx/beacon filtering available for client mode
and SWBA events are used for beaconing in AP/IBSS
so the SoC needs to be woken up at least every
~100ms in most cases. This means that QCA988X
is at a disadvantage and the power consumption
won't drop as much as for QCA61X4.
Due to putting irq-safe spinlocks on every MMIO
read/write it is expected this can cause a little
performance regression on some systems. I haven't
done any thorough measurements but some of my
tests don't show any extreme degradation.
The patch removes some explicit pci_wake calls
that were added in 320e14b8db51aa ("ath10k: fix
some pci wake/sleep issues"). This is safe because
all MMIO accesses are now wrapped and the device
is woken up automatically if necessary.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It is actually safe to enable ASPM after the
device is booted up.
This reduces power drain of QCA61X4 when driver is
simply loaded (no interface is up) from 31mA to
14mA. QCA988X wasn't measured but doesn't seem to
regress in any other way.
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
During drv_start/drv_stop stress testing in ARM platform,
sometimes target is taking more that 5ms to wake up. Similar
behaviour also noted during driver load and unload iterations.
On such cases, the wakup duration lies between 5-6ms. Hence
increasing pci wakup timeout 10ms to be more safer. With this
changes, able to complete power down/up >100 iterations without
any issues.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This will make it easier to extend and maintain
list of supported hardware.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Some copy engine structures are target specific
and are uploaded to the device during
init/configuration.
This also cleans up a bit diag_mem_read/write
implicit byteswap mess leaving only
diag_access_read/write with an implicit endianess
byteswap.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
There are basically no more uses for
ar_pci->started. It is also perfectly safe to call
hif_stop without hif_start now.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible on a host system running low on
memory to end up with no rx buffers on pci pipes.
This makes the driver more robust as it won't fail
to start if it can't allocate all rx buffers right
away. If it is fatal then upper layers will notice
trouble anyway.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's not really necessary to have a dedicated irq
handler just for the sake of catching early fw
crashes anymore. It is now safe to use one handler
even during early stages of device boot up.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Store the firmware registers and other relevant data to a firmware crash dump
file and provide it to user-space via debugfs. Should help with figuring out
why the firmware crashed.
kvalo: remove dbglog support, rework and refactor the code to avoid ifdefs and
otherwise simplify it as well
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The ATH10K_PCI_FEATURE_MSI_X was originally
introduced to support both chips QCA988Xv1 and
QCA988Xv2. Since v1 isn't supported anymore it
doesn't make sense to keep the feature flag
around. Since this is the last one remove the
whole thing.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The soc powersave was disabled by default. It
never was fully tested. Some hw apparently had
problems with it and the implementation itself had
a possible race.
Just remove the refcounting and simply wake up the
device when probing and put to sleep when
removing.
kvalo: make ath10k_pci_wake() and _sleep() static
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Use the common convention of embedding private
structures inside parent structures. This
reduces allocations and simplifies pci probing
code.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible for tx completion not to be
processed. In that case an old stack pointer was
left on copy engine tx ring. Next bmi exchange
would immediately pop it and use complete() on the
completion struct there causing corruption.
Make sure to wait for both tx and rx completions
properly.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
One of the premises was to guarantee serialized
completion handling for upper layers
(HTC/WMI/HTT). Since quite some time now it is no
longer necessary.
The other premise was to batch up tx/rx
completions to take advantage of hot caches.
However frame tx/rx completion indications come in
on a single pipe already so they are already
batched up. More meaningful batching is done in
HTT itself.
This means PCI completion is no longer necessary
to keep around. It just wastes memory, cycles and
SLOC.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's possible for FW to panic during early boot.
The patch re-introduces support to detect and
print those crashes.
This introduces an additional irq handler that is
set for the duration of early boot and shutdown.
The handler is then overriden with regular
handlers upon hif start().
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It wasn't really useful to have it to begin with.
This makes it a little simpler to re-arrange PCI
init code as some function depended on
ar_pci->ce_count being set.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The value provided by num_sends_allowed is now
derived from CE source ringbuffer state.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
We should not try to access hw if wakeup fails so add
proper error checking for that. Also add the timeout lenght
to the warning message.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Improve code readability by using enum and a
switch-case.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Since the firmware support is no longer available for hw1.0,
drop all code (especially workarounds) for those units.
Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Unify the PCI options location.
By default the SoC PS option is disabled to boost the
performance and due to poor stability on early HW revisions.
In future we can remove the module parameter and turn on/off
the PS for given hardware.
This change also makes the pci module parameter for SoC PS static.
Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>