There is a difference in the bit position of the normal interrupt summary
enable (NIE) and abnormal interrupt summary enable (AIE) between revisions
of the hardware. For older revisions the NIE and AIE bits are positions
16 and 15 respectively. For newer revisions the NIE and AIE bits are
positions 15 and 14. The effect in changing the bit position is that
newer hardware won't receive AIE interrupts in the current version of the
driver. Specifically, the driver uses this interrupt to collect
statistics on when a receive buffer unavailable event occurs and to
restart the driver/device when a fatal bus error occurs.
Update the driver to set the interrupt enable bit based on the reported
version of the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add some additional statistics for tracking VXLAN packets and checksum
errors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware has the capability to perform checksum offload support
(both Tx and Rx) and TSO support for VXLAN packets. Add the support
required to enable this.
The hardware can only support a single VXLAN port for offload. If more
than one VXLAN port is added then the offload capabilities have to be
disabled and can no longer be advertised.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add per queue Tx and Rx packet and byte counts.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently whenever the driver needs to enable or disable interrupts for
a DMA channel it reads the interrupt enable register (IER), updates the
value and then writes the new value back to the IER. Since the hardware
does not change the IER, software can track this value and elimiate the
need to read it each time.
Add the IER value to the channel related data structure and use that as
the base for enabling and disabling interrupts, thus removing the need
for the MMIO read.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When transmitting a TSO packet, the driver only increments the TSO packet
statistic by one rather than the number of total packets that were sent.
Update the driver to record the total number of packets that resulted from
TSO transmit.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to change some general performance settings and to provide
some performance settings based on the device that is probed.
This includes:
- Setting the maximum read/write outstanding request limit
- Reducing the AXI interface burst length size
- Selectively setting the Tx and Rx descriptor pre-fetch threshold
- Selectively setting additional cache coherency controls
Tested and verified on all versions of the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the driver hardcodes the PBLx8 setting. Remove the need for
specifying the PBLx8 setting and automatically calculate based on the
specified PBL value. Since the PBLx8 setting applies to both Tx and Rx
use the same PBL value for both of them.
Also, the driver currently uses a bit field to set the AXI master burst
len setting. Change to the full bit field range and set the burst length
based on the specified value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In prep for setting fine grained read and write DMA cache coherency
controls, allow specific values to be used to set the cache coherency
registers.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to perform memory allocations on the node of the device. The
original allocation or the ring structure and Tx/Rx queues allocated all
of the memory at once and then carved it up for each channel and queue.
To best ensure that we get as much memory from the NUMA node as we can,
break the channel and ring allocations into individual allocations.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Just to be on the safe side, should the update of the timestamp registers
not complete, issue a warning rather than looping forever waiting for the
update to complete.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer hardware does not provide a cumulative payload length when multiple
descriptors are needed to handle the data. Once the MTU increases beyond
the size that can be handled by a single descriptor, the SKB does not get
built properly by the driver.
The driver will now calculate the size of the data buffers used by the
hardware. The first buffer of the first descriptor is for packet headers
or packet headers and data when the headers can't be split. Subsequent
descriptors in a multi-descriptor chain will not use the first buffer. The
second buffer is used by all the descriptors in the chain for payload data.
Based on whether the driver is processing the first, intermediate, or last
descriptor it can calculate the buffer usage and build the SKB properly.
Tested and verified on both old and new hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MDIO register mode is set when the device is probed. But when the
device is brought down and then back up, the MDIO register mode has been
reset. Be sure to reset the mode during device startup and only change
the mode of the address specified.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The xgbe_init() routine returns a return code indicating success or
failure, but the return code is not checked. Add code to xgbe_init()
to issue a message when failures are seen and add code to check the
xgbe_init() return code.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A newer version of the hardware is using the same PCI ids for the network
device but has altered register definitions for determining the window
settings for the indirect PCS access. Add support to check for this
hardware and if found use the new register values.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GPIO support in the hardware allows for up to 16 GPIO pins, enumerated
from 0 to 15. The driver uses the wrong value (16) to validate the GPIO
pin range in the routines to set and clear the GPIO output pins. Update
the code to use the correct value (15).
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the phylib support in the kernel to communicate with and control an
MDIO attached PHY. Use the hardware's MDIO communication mechanism to
communicate with the PHY.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some versions of the amd-xgbe device are capable of reporting ECC error
information back to the driver. Add support to process, track and report
on this information.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current per channel DMA interrupt support is based on an edge
triggered interrupt that is not maskable. This results in having to call
the disable_irq/enable_irq functions in order to prevent interrupts
during napi processing. The hardware now has a way to configure the per
channel DMA interrupt that will allow for masking the interrupt which
prevents calling disable_irq/enable_irq now. This patch makes use of
this support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the reading of the Tx timestamp to account for a hardware issue
on how the fields and interrupt are cleared. The "seconds" portion of
the timestamp should be read first, followed by the "nanoseconds" portion.
Reading the "nanoseconds" portion should clear the timestamp data and the
interrupt. Because of an issue with the hardware this order is reversed
and reading the "seconds" portion actually clears the timestamp. The code
currently follows this workaround, but to guard against future versions
where this is fixed add a field to the version data to indicate if the
workaround is required or not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a hardware issue, it is possible for interrupt events to be
incorrectly generated when performing a soft reset. To guard against
this, perform the soft reset twice.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the driver framework to separate out platform/ACPI specific code
from general code during device initialization. This will allow for the
introduction of PCI device support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx and Rx DMA channel status determiniation is different depending on the
version of the hardware. Update the channel status processing code to
account for the change. Also, reduce the timeout value used when stopping
the channels.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading all management counter registers as 64-bit
values. The indication of whether to read the high 32-bits to form
a 64-bit value is indicated in the version data.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to support accessing of the PCS registers
in a new way, while maintaining the current access method. Provide a
version specific field that indicates the method to use.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to work with more than one type of phy by
adding additional callable functions into the phy interface and removing
phy specific settings/functions from non-phy related files.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate the FIFO across the hardware Rx queues based on the priority
of the queues. Giving more FIFO resources to queues with a higher
priority. If PFC is active but not enabled for a queue, then less
resources can allocated to the queue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the Rx and Tx fifos are evenly allocated between the hardware
queues of the device. As more queues are instantiated, the fifo memory
needs to be able to be allocated based on queue priority. This allows for
higher priority queues to have more fifo memory than lower priority
queues. Prepare for this by modifying the current fifo calculation to
assign the fifo queue allocation in an array that is then used to program
the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the auto-negotiation interrupt handling disables the irq
instead of masking off the interrupts. This was done because the phy
library was originally used to read and write the PCS registers, which
could not be performed in interrupt context. Now that the phy library is
no longer used to read and write the PCS registers the interrupts can be
masked off in the interrupt service routine eliminating the need to call
disable_irq/enable_irq. This also requires changing the protection mutex
to a spinlock.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check to be sure that the Rx queue fifos are empty before stopping the
Rx DMA channels.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the netdev traffic class setup is only performed when invoked
through the ndo_setup_tc interface. However, the same setup should be
performed when the dcbnl interface (ieee_setets) is invoked. Rework the
netdev traffic class setup to be invokable through either interface and
also provide the priority to traffic class mapping if available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver is checking the pfc_en field of the ieee_pfc structure to
determine whether to associate a priority with a traffic class. This is
incorrect since the pfc_en field is for determining if PFC is enabled
for a traffic class.
The association of priority to traffic class does not depend on whether
the traffic class is enabled for PFC, so remove that check. Also, the
mapping of priorities to traffic classes should be done when configuring
the traffic classes and not the PFC support so move the priority to traffic
class association from xgbe_config_dcb_pfc to xgbe_config_dcb_tc.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the PFC flow control is enabled on all traffic classes if
one or more traffic classes request it. The PFC enable setting of the
traffic class should be used to determine whether to enable or disable
flow control for the traffic class.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the hardware is placed in promiscuous mode it will still perform
VLAN filtering and therefore may not pass all packets to the driver.
Disable all VLAN filtering when entering promiscuous mode and restore
VLAN filtering upon exit from promiscuous mode. In order to avoid adding
forward declarations, move the VLAN related functions earlier in the
file.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of the loop we test "if (!count)" but because "count--" is
a post-op then the loop will end with count set to -1. I have fixed
this by changing it to --count.
Fixes: c5aa9e3b81 ('amd-xgbe: Initial AMD 10GbE platform driver')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During Tx cleanup it's still possible for the descriptor data to be
read ahead of the descriptor index. A memory barrier is required between
the read of the descriptor index and the start of the Tx cleanup loop.
This allows a change to a lighter-weight barrier in the Tx transmit
routine just before updating the current descriptor index.
Since the memory barrier does result in extra overhead on arm64, keep
the previous change to not chase the current descriptor value. This
prevents the execution of the barrier for each loop performed.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/ipv6/xfrm6_output.c
net/openvswitch/flow_netlink.c
net/openvswitch/vport-gre.c
net/openvswitch/vport-vxlan.c
net/openvswitch/vport.c
net/openvswitch/vport.h
The openvswitch conflicts were overlapping changes. One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.
The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.
Signed-off-by: David S. Miller <davem@davemloft.net>
The code currently uses the lightweight dma_wmb barrier before updating
the current descriptor count. Under heavy load, the Tx cleanup routine
was seeing the updated current descriptor count before the updated
descriptor information. As a result, the Tx descriptor was being cleaned
up before it was used because it was not "owned" by the hardware yet,
resulting in a Tx queue hang.
Using the wmb barrier insures that the descriptor is updated before the
descriptor counter preventing the Tx queue hang. For extra insurance,
the Tx cleanup routine is changed to grab the current decriptor count on
entry and uses that initial value in the processing loop rather than
trying to chase the current value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculation of the Tx and Rx fifo sizes can be calculated rather
than hardcoded in a switch statement. Additionally, the per-queue fifo
sizes can be calculated rather than hardcoded using if/else if statements
that can possibly underutilize the available fifo area.
Change the code to calculate the fifo sizes and the per-queue fifo sizes
to simplify the code and make best use of the available fifo.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove an unneeded semicolon at the end of a switch statement block.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running a kernel configured with CONFIG_DMA_API_DEBUG=y a warning
is issued:
DMA-API: device driver tries to sync DMA memory it has not allocated
This warning is the result of mapping the full range of the Rx buffer
pages allocated and then performing a dma_sync_single_for_cpu against
a calculated DMA address. The proper thing to do is to use the
dma_sync_single_range_for_cpu with a base DMA address and an offset.
Reported-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The AMD XGBE device is intended to work with a specific integrated PHY
and that PHY is not meant to be a standalone PHY for use by other
devices. As such this patch removes the phylib driver and implements
the PHY support in the amd-xgbe driver (the majority of the logic from
the phylib driver is moved into the amd-xgbe driver).
Update the driver version to 1.0.1.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the network interface message level settings for
determining whether to issue some of the driver messages. Make
use of the netif_* interface where appropriate.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add additional/extended statistics beyond what is provided by the
hardware to be reported via ethtool. The new stats focus on the
calls into ndo_start_xmit and the napi_poll routine.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently a call to configure the Rx mode (promiscuous mode, all
multicast mode, etc.) is made in xgbe_start separate from the xgbe_init
function. This call to set the Rx mode should be part of the xgbe_init
function so that calls to the init function don't have to be preceded
with calls to configure the Rx mode.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the device must be down in order to update the rx-frames
coalescing setting because the interrupt indicator is set in the
descriptor data during initialization. Allow this setting to be changed
while the device is up by moving the interrupt decision into the
descriptor reset function and base the decision off of the supplied
descriptor index value.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Tx coalescing support in the driver was a software implementation
for something lacking in the hardware. Using hrtimers, the idea was to
trigger a timer interrupt after having queued a packet for transmit.
Unfortunately, as the timer value was lowered, the timer expired before
the hardware actually did the transmit and so it was racey and resulted
in unnecessary interrupts.
Remove the Tx coalescing support and hrtimer and replace with a Tx timer
that is used as a reclaim timer in case of inactivity.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new lighter weight memory barriers when working with the device
descriptors.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clarify that the queues referred to in a message when the device is
brought up are hardware queues and not necessarily related to the
Linux network queues.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>