Simple round-robin hardware TX scheduling can cause starvation of TX rings
with small packets when other TX rings have large TSO or jumbo packets.
In the simplest case, consider 2 TCP streams running in opposite
directions. The TSO TX traffic will hash to one ring and the ACKs for the
incoming data on a different TCP connection will hash to a different TX
ring. The hardware fetches one complete TSO packet (up to 64K data)
before servicing the other TX ring. When it gets to the other TX ring, it
will only fetch one packet (64-byte ACK packet in this case). After that,
it will switch back to the 1st ring filled with more TSO packets. Because
only one ACK can go out roughly every 500 usec in this case, the incoming
data rate becomes very low.
Update version to 3.125.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Default remains the same.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
by introducing tg3_stop() that does the opposite of tg3_start(). This
function will be useful when adding the support for changing the numbe
of rx and tx rings.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
by introducing tg3_start() that handles all initialization steps from
IRQ allocation. This function will be needed when adding support for
changing the number of rx and tx rings.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
since the number of rings can be different.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
irq_cnt is no longer necessarily equal to the number rx or tx rings.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is preparation work to allow the number of RX and TX rings to be
configured separately.
Reviewed-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/team/team.c
drivers/net/usb/qmi_wwan.c
net/batman-adv/bat_iv_ogm.c
net/ipv4/fib_frontend.c
net/ipv4/route.c
net/l2tp/l2tp_netlink.c
The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.
qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.
With help from Antonio Quartulli.
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function of_phy_connect() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 51af6d7c1f.
Breaks the build with CONFIG_PCI_ATS not enabled.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit c0357e975a modified bnx2 to switch from
using ioremap/iounmap to pci_iomap/pci_iounmap. They missed a spot in the error
path of bnx2_init_one though. This patch just cleans that up.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Michael Chan <mcan@broadcom.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GCC can't see that in all non-error-return paths we do in fact
set *using_dac to something.
Add an explicit initialization to remove this warning:
drivers/net/ethernet/brocade/bna/bnad.c: In function ‘bnad_pci_probe’:
drivers/net/ethernet/brocade/bna/bnad.c:3079:5: warning: ‘using_dac’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/net/ethernet/brocade/bna/bnad.c:3233:7: note: ‘using_dac’ was declared here
Signed-off-by: David S. Miller <davem@davemloft.net>
Current VFs enumeration algorithm used in be_find_vfs does not take domain
number into the match. The match found in igb/ixgbe is more elegant and
safe.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new common code routine to upgrade an adapter's
firmware. This routine handles all of the complexities of working with the
the existing adapter firmware in order to quiesce the adapter and uP, etc.
For an automatic upgrade it will send a HELLO command to check if cxgb4
want/can upgrade firmware, i.e. if cxgb4 is MASTER and has newer firmware
that it wants to load and call the new common code routine t4_fw_upgrade.
Note that it should not issue a RESET command after a successful firmware
upgrade.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a card had already been initialized, on reloading cxgb4 driver firmware
required an upgrade but the upgrade did not happen. In that case a mailbox
timeout would occur during T4 configuration file stuff. The fix is to let the
caller know the firmware was not upgraded so a reset would be issued before
starting the T4 config stuff.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case if user defined configuration file at /lib/firmware/cxgb4/t4-config.txt
location and also factory default configuration file written to FLASH are not
present then driver will use hardwired configuration settings.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Starting with T4 firmware version 1.3.11.0 the firmware now supports device
configuration via a Firmware Configuration File. The Firmware Configuration
File was primarily developed in order to centralize all of the configuration,
resource allocation, etc. for Unified Wire operation where multiple
Physical / Virtual Function Drivers would be using a T4 adapter simultaneously.
The Firmware Configuration file can live in three locations as shown below
in order of precedence.
1) User defined configuration file: /lib/firmware/cxgb4/t4-config.txt
2) Factory Default configuration file written to FLASH within
the manufacturing process.
3) Hardwired driver configuration.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds new enums and macros to enable T4 configuration file support. It
also removes duplicate macro definitions.
It fixes the build failure in cxgb4vf driver introduced because of old macro
definition removal.
It also performs SGE initialization based on T4 configuration file is provided
or not. If it is provided then it uses the parameters provided in it otherwise
it uses hard coded values.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements two new functions t4_mem_win_read and t4_memory_read.
These new functions can be used to read memory via the PCIE memory window.
Please note, for proper execution of these functions PCIE_MEM_ACCESS_BASE_WIN
registers must be setup correctly like how setup_memwin in the cxgb4 driver
does it.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function of_phy_connect() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MIPSsim platform is no longer supported or used. This patch
deletes the Ethernet driver.
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is primarily to address transmission timeout occurrences, when
multiple H/W Tx queues are being used concurrently. Because in
the priority scheduling mode the controller does not service the
Tx queues equally (but in ascending index order), Tx timeouts are
being triggered rightaway for a basic test with multiple simultaneous
connections like:
iperf -c <server_ip> -n 100M -P 8
resulting in kernel trace:
NETDEV WATCHDOG: eth1 (fsl-gianfar): transmit queue <X> timed out
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:255
...
and controller reset during intense traffic, and possibly further
complications.
This patch changes the default H/W Tx scheduling setting (TXSCHED)
for multi-queue devices, from priority scheduling mode to a weighted
round robin mode with equal weights for all H/W Tx queues, and
addresses the issue above.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the definition of bnx2x_tests_str_arr from static char
pointer to static const char bi-directional array. Also the
bnx2x_get_strings function is simplified.
Reported-by: Joe Perches <joe@perches.com>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent kernel changes we can now return errors on a failure to setup a
VLAN filter. This patch takes advantage of that opportunity so that we can
return either an EIO error in the case of a mailbox failure, or an EACCESS
error in the case of being denied access to the VLAN filter table by the
PF.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change fixes the ixgbevf driver so that it can correctly drop a frame
should it receive a jumbo frame.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While fixing up a patch from Alex Duyck to use q_vectors in ring containers
to update the ITR I bungled it and missed actually updating the counters
in the ring container q_vectors. This patch fixes my mistake and makes
interrupt moderation actually work.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove 'rx_ring' parameter as it is not used in ixgbevf_receive_skb
Signed-off-by: Narendra K <narendra_k@dell.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The counter is not valid unless the controller is running in IOV mode.
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The VF driver was not designed to correctly handle a message timeout. As
a result it is possible for one bad message to invalidate all messages
following it until the part is reset. Instead we should copy the example
in igbvf of how to handle a mailbox event and message timeout.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We're now using isa_virt_to_bus(), and there really
isn't a generic and consistent test for whether a
platform provides this interface or not.
This driver is also for an x86-only device.
Signed-off-by: David S. Miller <davem@davemloft.net>
PTP Hardware Clock devices appear as class devices in sysfs. This patch
changes the registration API to use the parent device, clarifying the
clock's relationship to the underlying device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is meant to improve performance on systems that do not require
the DMA unmap calls. On those systems we do not need to make use of the
unmap address for Tx or the unmap length so we can drop both thereby
reducing the size of the Tx buffer info structure.
In addition I have changed the logic to check for unmap length instead of
unmap address when checking to see if a buffer needs to be unmapped from
DMA use. The reasons for this change is that on some platforms it is
possible to receive a valid DMA address of 0 from an IOMMU.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of storing the RSS key as a character array we can simplify the
configuration by making it a u32 array. This allows us to just write one
value per register without any unnecessary operations to construct the
value.
This change will produce the same exact key, the only difference is that I
translated the u8 array to a u32 array which will be correctly ordered on
writes to hardware by the cpu_to_le32 operations that are built into the
writel calls.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch cleans up our RSS indirection table configuration so that we
generate the same table regardless of CPU endianness. In addition it
changes the table setup so that instead of doing a modulo based setup it is
instead a divisor based setup. The advantage to this is that we should be
able to take the Rx hash and compute the Rx queue with very little CPU
overhead if needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that Tx cleanup is done in a do/while loop instead
of a for loop. The main motivation behind this is the fact that we should
never be invoked with a budget less than 1 so we can skip checking the
budget before processing the first descriptor.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change removes the code that was doing the NUMA allocations for the
q_vectors, rings, and ring resources. The problem is the logic used assumed
that the NUMA nodes were always interleved and that is not always the case.
At some point I hope to add this functionality back in a more controlled
manner in the future.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Due to a hardware issue, on i210 and i211 parts, the TNCRS statistic
provides an invalid value. This patch changes the update stats function
to increment the stat only for non-i210/i211 parts.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adapt the pre-existing and assigned VFs code to the ixgbe way introduced
in commit 9297127b9c.
Instead of searching the enabled VFs we use pci_num_vf to determine enabled VFs.
By comparing to which PF an assigned VF is owned it's possible to decide
whether to leave it enabled or not.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Robert Garrett <robertx.e.garrett@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull networking updates from David Miller:
"More bug fixes, nothing gets past these guys"
1) More kernel info leaks found by Mathias Krause, this time in the
IPSEC configuration layers.
2) When IPSEC policies change, we do not properly make sure that cached
routes (which could now be stale) throughout the system will be
revalidated. Fix this by generalizing the generation count
invalidation scheme used by ipv4. From Nicolas Dichtel.
3) When repairing TCP sockets, we need to allow to restore not just the
send window scale, but the receive one too. Extend the existing
interface to achieve this in a backwards compatible way. From
Andrey Vagin.
4) A fix for FCOE scatter gather feature validation erroneously caused
scatter gather to be disabled for things like AOE too. From Ed L
Cashin.
5) Several cases of mishandling of error pointers, from Mathias Krause,
Wei Yongjun, and Devendra Naga.
6) Fix gianfar build, from Richard Cochran.
7) CAP_NET_* failures should return -EPERM not -EACCES, from Zhao
Hongjiang.
8) Hardware reset fix in janz-ican3 CAN driver, from Ira W Snyder.
9) Fix oops during rmmod in ti_hecc CAN driver, from Marc Kleine-Budde.
10) The removal of the conditional compilation of the clk support code
in the stmmac driver broke things. This is because the interfaces
used are the ones that don't also perform the enable/disable of the
clk. Fix from Stefan Roese.
11) The QFQ packet scheduler can record out of range virtual start
times, resulting later in misbehavior and even crashes. Fix from
Paolo Valente.
12) If MSG_WAITALL is used with IOAT DMA under TCP, we can wedge the
receiver when the advertised receive window goes to zero. Detect
this case and force the processing of the IOAT DMA queue when it
happens to avoid getting stuck. Fix from Michal Kubecek.
13) batman-adv assumes that test_bit() returns only 0 or 1, but this is
not true for x86 (which returns -1 or 0, via the 'sbb' instruction).
Fix from Linus Lussing.
14) Fix small packet corruption in e1000, from Tushar Dave.
15) make_blackhole() in the IPSEC policy code can do one read unlock too
many, fix from Li RongQing.
16) The new tcp_try_coalesce() code introduced a bug in TCP URG
handling, fix from Eric Dumazet.
17) Fix memory leak in __netif_receive_skb() when doing zerocopy and
when hit an OOM condition. From Michael S Tsirkin.
18) netxen blindly deferences pdev->bus->self, which is not guarenteed
to be non-NULL. Fix from Nikolay Aleksandrov.
19) Fix a performance regression caused by mistakes in ipv6 checksum
validation in the bnx2x driver, fix from Michal Schmidt.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
net/stmmac: Use clk_prepare_enable and clk_disable_unprepare
net: change return values from -EACCES to -EPERM
net/irda: sh_sir: fix return value check in sh_sir_set_baudrate()
stmmac: fix return value check in stmmac_open_ext_timer()
gianfar: fix phc index build failure
ipv6: fix return value check in fib6_add()
bnx2x: remove false warning regarding interrupt number
can: ti_hecc: fix oops during rmmod
can: janz-ican3: fix support for older hardware revisions
net: do not disable sg for packets requiring no checksum
aoe: assert AoE packets marked as requiring no checksum
at91ether: return PTR_ERR if call to clk_get fails
xfrm_user: don't copy esn replay window twice for new states
xfrm_user: ensure user supplied esn replay window is valid
xfrm_user: fix info leak in copy_to_user_tmpl()
xfrm_user: fix info leak in copy_to_user_policy()
xfrm_user: fix info leak in copy_to_user_state()
xfrm_user: fix info leak in copy_to_user_auth()
net: qmi_wwan: adding Huawei E367, ZTE MF683 and Pantech P4200
tcp: restore rcv_wscale in a repair mode (v2)
...
Commit eb716c54b1 ("sunbmac: remove
unnecessary setting of skb->dev") caused the local varible 'dev'
in bigmac_init_rings to become unused. And now the compiler
warns about it.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an issue introduced by commit ID 6a81c26f
[net/stmmac: remove conditional compilation of clk code], which
switched from the internal stmmac_clk_{en}{dis}able calls to
clk_{en}{dis}able. By this, calling clk_prepare and clk_unprepare
was removed.
clk_{un}prepare is mandatory for platforms using common clock framework.
Since these drivers are used by SPEAr platform, which supports common
clock framework, add clk_{un}prepare() support for them. Otherwise
the clocks are not correctly en-/disabled and ethernet support doesn't
work.
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of error, the function clk_get() returns ERR_PTR()
and never returns NULL pointer. The NULL test in the error
handling should be replaced with IS_ERR().
dpatch engine is used to auto generated this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a build failure introduced in commit 66636287
("gianfar: Support the get_ts_info ethtool method."). Not only was a
global variable inconsistently named, but also it was not exported as
it should have been.
This fix is also needed in stable version 3.5.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since version 7.4 the FW configures in the pci config space the max
number of interrupts available to the physical function, instead of
the exact number to use.
This causes a false warning in driver when comparing the number of
configured interrupts to the number about to be used.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
This series contains updates to igb and ixgbevf.
v2: updated patch description in 04 patch (ixgbevf: scheduling while
atomic in reset hw path)
...
Akeem G. Abodunrin (1):
igb: Support to enable EEE on all eee_supported devices
Alexander Duyck (2):
igb: Remove artificial restriction on RQDPC stat reading
ixgbevf: Add support for VF API negotiation
John Fastabend (1):
ixgbevf: scheduling while atomic in reset hw path
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
we are currently returning ENODEV, as the clk_get may give a exact
error code in its returned pointer, assign it to the ret by using the
PTR_ERR function, so that the subsequent goto label will jump to the
error path and clean the driver and return the error correctly.
Signed-off-by: Devendra Naga <devendra.aaru@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings says:
====================
1. Extension to PPS/PTP to allow for PHC devices where pulses are
subject to a variable but measurable delay.
2. PPS/PTP/PHC support for Solarflare boards with a timestamping
peripheral.
3. MTD support for updating the timestamping peripheral on those boards.
4. Fix for potential over-length requests to firmware.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>