Remove bus specific dependencies from CE layer
to have common CE layer across multiple targets.
This is required for adding support for WCN3990
chipset support as WCN3990 chipset uses SNOC
bus interface with Copy Engine endpoint.
Signed-off-by: Govind Singh <govinds@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Define structures for the copy engine ctrl/misc registers,
that includes CE CMD halt, watermark source, watermark destination,
host IE ring, source, destination and dmax ring.
This adds support to avoid the conditional compilation,
code optimization and dynamic configuration of the copy engine
register map for respective hardware bus interface.
Signed-off-by: Sarada Prasanna Garnayak <c_sgarna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k firmware checks nbytes == 0 as part of determining if DMA
has completed successfully. To help make this work more often,
have the driver initialize nbytes to zero when freeing the descriptor
slot.
Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
In 'ath10k_ce_alloc_pipe' the compile time sanity check to
ensure that there is sufficient buffers in CE4 for HTT Tx
MSDU descriptors, but this did not take into account of the
case with 'peer flow control' enabled, fix this.
Cc: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Remove obselete Copy Engine comments referring to the function
ath10k_ce_sendlist_send as this function was removed long time back
by the commit 2e761b5a52 ("ath10k: remove ce_sendlist_send").
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fixes checkpatch warnings:
drivers/net/wireless/ath/ath10k/pci.c:1593: Statements should start on a tabstop
drivers/net/wireless/ath/ath10k/ce.c:962: Alignment should match open parenthesis
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
use dma_zalloc_coherent() instead of dma_alloc_coherent and memset().
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Firmware is running watchdog timer for tracking copy engine ring index
and write index. Whenever both indices are stuck at same location for
given duration, watchdog will be trigger to assert target. While
updating copy engine destination ring write index, driver ensures that
write index will not be same as read index by finding delta between these
two indices (CE_RING_DELTA).
HTT target to host copy engine (CE5) is special case where ring buffers
will be reused and delta check is not applied while updating write index.
In rare scenario, whenever CE5 ring is full, both indices will be referring
same location and this is causing CE ring stuck issue as explained
above. This issue is originally reported on IPQ4019 during long hour stress
testing and during veriwave max clients testsuites. The same issue is
also observed in other chips as well. Fix this by ensuring that write
index is one less than read index which means that full ring is
available for receiving data.
Cc: stable@vger.kernel.org
Tested-by: Tamizh chelvam <c_traja@qti.qualcomm.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Correct some trivial comment typos.
Remove unnecessary parentheses in a long line.
Signed-off-by: Joe Perches <joe@perches.com>
[kvalo@qca.qualcomm.com: drop the change for return]
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
With the %pK format specifier we hide the kernel addresses
with the help of kptr_restrict sysctl.
In this patch, %p is changed to %pK in the driver code.
The sysctl is documented in Documentation/sysctl/kernel.txt.
Signed-off-by: Maharaja Kennadyrajan <c_mkenna@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Fix checkpatch warnings about use of spaces with operators:
spaces preferred around that '*' (ctx:VxV)
This has been recently added to checkpatch.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Whenever htt rx indication i.e target to host messages are received
on rx copy engine (CE5), the message will be freed after processing
the response. Then CE 5 will be refilled with new descriptors at
post rx processing. This memory alloc and free operations can be avoided
by reusing the same descriptors.
During CE pipe allocation, full ring is not initialized i.e n-1 entries
are filled up. So for CE 5 full ring should be filled up to reuse
descriptors. Moreover CE 5 write index will be updated in single shot
instead of incremental access. This could avoid multiple pci_write and
ce_ring access. From experiments, It improves CPU usage by ~3% in IPQ4019
platform.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. For diag register read and write
operations, 'paddr' is stored in transfer context. ath10k doesn't rely
on the meta/transfer_id. So the unused output arguments {bufferp, nbytesp
and transfer_idp} are removed from CE recv_next completion.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
For the messages from host to target, shadow copy of CE descriptors
are maintained in source ring. Before writing actual CE descriptor,
first shadow copy is filled and then it is copied to CE address space.
To optimize in download path and to reduce d-cache pressure, removing
shadow copy of CE descriptors. This will also reduce driver memory
consumption by 33KB during on device probing.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. ath10k doesn't rely on the meta/transfer_id
when handling send completion (htc ep id is stored in sk_buff control
buffer). So the unused output arguments {bufferp, nbytesp and transfer_idp}
are removed from CE send completion. This change is needed before removing
the shadow copy of copy engine (CE) descriptors in follow up patch.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Currently to avoid uncached memory access while filling up copy engine
descriptors, shadow descriptors are used. This can be optimized further
by removing shadow descriptors. To achieve that first shadow ring
dependency in ce_send is removed by creating local copy of the
descriptor on stack and make a one-shot copy into the "uncached"
descriptor.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register receive callbacks for every copy engines (CE) separately
instead of having common receive handler. Some of the copy engines
receives different type of messages (i.e HTT/HTC/pktlog) from target.
Hence to service them accordingly, register per copy engine receive
callbacks.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Register send completion callbacks for every copy engines (CE) separately
instead of having common completion handler. Since some of the copy
engines delivers different type of messages, per-CE callbacks help to
service them differently.
Reviewed-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
After processing received packets from copy engine, host will allocate
new buffer and queue them back to copy engine ring for further
packet reception. On post rx processing path, skb allocation and
dma mapping are unnecessarily handled within ce_lock. This is affecting
peak throughput and also causing more CPU consumption. Optimize this
by acquiring ce_lock only when accessing copy engine ring and moving
skb allocation out of ce_lock.
In AP148 platform with QCA99x0 in conducted environment, UDP uplink peak
throughput is improved from ~1320 Mbps to ~1450 Mbps and TCP uplink peak
throughput is increased from ~1240 Mbps (70% host CPU load) to ~1300 Mbps
(71% CPU load). Similarly ~40Mbps improvement is observed in downlink
path.
Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
QCA99X0 uses two new copy engine src desc flags for interrupt
indication. Bit_2 is to mark if host interrupt is disabled after
processing the current desc and bit_3 is to mark if target interrupt
is diabled after the processing of current descriptor.
CE_DESC_FLAGS_META_DATA_MASK and CE_DESC_FLAGS_META_DATA_LSB are based
on the target type.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The QCA6174 in combination with new wmi-tlv firmware is capable of
multi-channel, beamforming, tdls and other features.
This patch just makes it possible to boot these devices and do some basic stuff
like connect to an AP without encryption. Some things may not work or may be
unreliable. New features will be implemented later. This will be addressed
eventually with future patches.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Latest main firmware branch introduced a new WMI
ABI called wmi-tlv. It is not a tlv strictly
speaking but something that resembles it because
it is ordered and may have duplicate id entries.
This prepares ath10k to support new hw.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
While testing other things I've found that CE
items aren't cleared properly. This could lead to
null dereferences in BMI.
To prevent that make sure CE revoking clears the
nbytes value (which is used as a buffer completion
indication) and memset the entire CE ring data
shared between host and target when
(re)initializing.
Also make sure to check BMI xfer pointer and print
a splat instead of crashing the kernel.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Make ath10k_pci_init_pipes() effectively only
alter shared target-host data.
The per_transfer_context is a host-only thing.
It is necessary to preserve it's contents for a
more robust ring cleanup.
This is required for future warm reset fixes.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Calling init to reinit ce pipe state would also
re-set all static structure links and setting
(which don't change over driver lifecycle).
Make it so alloc links structures and initializes
static data and init part to setup state
variables and clear stuff.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
diag_read() is used for reading from firmware memory via the diagnose window.
First user will be cal_data debugfs file.
To serialise diagnostic window access and make it safe to use while firmware is
running take ce_lock both in ath10k_pci_diag_write_mem() and
ath10k_pci_diag_read_mem(). Because of that all the CE calls had to be changed
to _nolock variants.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The diagnostic window (CE7) uses polling and is not initiliased to retrieve
interrupts so disable interrupts altogether for CE7. Otherwise ath10k crashes
when using the diagnostic window while the firmware is running due to NULL
dereference and polling reads timeout.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This makes it a lot easier to log and debug
messages if there's more than 1 ath10k device on a
system.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It was possible on a host system running low on
memory to end up with no rx buffers on pci pipes.
This makes the driver more robust as it won't fail
to start if it can't allocate all rx buffers right
away. If it is fatal then upper layers will notice
trouble anyway.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It doesn't make much sense to overwrite send_cb
and recv_cb callbacks over and over again whenever
transport starts. Just make sure to unmask copy
engine interrupts when starting.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The soc powersave was disabled by default. It
never was fully tested. Some hw apparently had
problems with it and the implementation itself had
a possible race.
Just remove the refcounting and simply wake up the
device when probing and put to sleep when
removing.
kvalo: make ath10k_pci_wake() and _sleep() static
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The tx ring index was immediately trimmed with a
bitmask. This discarded the 0xFFFFFFFF error case
(which theoretically can happen when a device is
abruptly disconnected) and led to using an invalid
tx ring index. This could lead to memory
corruption.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This prevents leaving incomplete scatter-gather
transfer on CE rings which can lead firmware to
crash.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Definitions by which copy engine structure are
allocated do not change so it doesn't make much
sense to re-create those structures each time
device is booted (e.g. due to firmware recovery).
This should decrease chance of memory allocation
failures.
While at it remove per_transfer_context pointer
indirection. The array has been trailing the copy
engine ringbuffer structure anyway. This also
saves pointer size worth of bytes for each copy
engine ringbuffer.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This allows to use GFP_KERNEL allocation. This
should decrease chance of allocation failure, e.g.
during firmware recovery.
Reported-By: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Going through full htc tx path for htt tx is a
waste of resources. By skipping it it's possible
to easily submit scatter-gather to the pci hif for
reduced host cpu load and improved performance.
The new approach uses dma pool to store the
following metadata for each tx request:
* msdu fragment list
* htc header
* htt tx command
The htt tx command contains a msdu prefetch.
Instead of copying it original mapped msdu address
is used to submit a second scatter-gather item to
hif to make a complete htt tx command.
The htt tx command itself hands over dma mapped
pointers to msdus and completion of the command
itself doesn't mean the frame has been sent and
can be unmapped/freed. This is why htc tx
completion is skipped for htt tx as all tx related
resources are freed upon htt tx completion
indication event (which also implicitly means htt
tx command itself was completed).
Since now each htt tx request effectively consists
of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
is updated to allow maximum of
TARGET_10X_NUM_MSDU_DESC msdus being queued. This
keeps the tx path resource management simple.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
PCI is capable of handling scatter-gather lists.
This can be used to avoid copying memory.
Change the name of the callback while at to
reflect its purpose.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It's impossible to rely on disable_irq() and/or CE
interrupt masking with legacy shared interrupts.
Other devices sharing the same irq line may assert
it while ath10k is doing something that requires
no interrupts.
Irq handlers are now registered after all
preparations are complete so spurious/foreign
interrupts won't do any harm. The handlers are
unregistered when no interrupts are required (i.e.
during driver teardown).
This also removes the ability to receive FW early
indication (since interrupts are not registered
until early boot is complete). This is not mission
critical (it's more of a hint that early boot
failed due to unexpected FW crash) and will be
re-added in a follow up patch.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This patch moves irq registering after necessary
structures have been allocated and initialized.
This should prevent interrupts from causing
tasklet access invalid memory pointers.
Reported-By: Ben Greear <greearb@candelatech.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
This shouldn't be silenced. This will be necessary
for PCI init code reordering.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Make sure to put target back to sleep. This was a
minor issue as it didn't really matter if we put
target back to sleep at this point. It just looked
wrong.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
CE error interrupts were not disabled. This could
lead to invalid memory accesses / memory
corruption.
Also make sure CE watermark interrupts are also
disabled.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It wasn't really useful to have it to begin with.
This makes it a little simpler to re-arrange PCI
init code as some function depended on
ar_pci->ce_count being set.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
10.1.389 firmware has some differences in
calculation of number of outstanding HTT TX
completions. This led to FW crashes of 10.1.389
while main firmware branch was unnaffected.
The patch makes sure ath10k doesn't queue up more
MSDUs than it should.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
The value provided by num_sends_allowed is now
derived from CE source ringbuffer state.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It is completely pointless to keep this function
around. It doesn't do anything different than
ce_send except it introduces more overhead.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
struct ce_sendlist is useless as we always add just one buffer onto it.
And most importantly, it's ugly as it doesn't use skb properly.
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>