The driver was offloading the VLAN tag into the skb
any time there was a VLAN tag and the hardware stripping was
enabled. Just check to make sure it's enabled before put_tag.
Change-Id: Ife95290c06edd9a616393b38679923938b382241
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A mirror rule ID may be zero so do not return invalid parameter when the
user passes in a zero value for a rule ID.
Change-ID: I261b8c24725ce2c6ed32f859da81093dfcbe2970
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add device capability which defines if update is available and security
check is needed during update process.
Change-ID: I380787c878275e1df18b39198df3ee3666342282
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If the PF driver reports proper support, allow the PF driver to
configure RSS on the behalf of the VF driver. This will allow for RSS
support on future hardware without changes to the VF driver.
Unfortunately, the old RSS code still needs to stay as the driver needs
to be compatible with PF drivers that don't support this interface. But
this change still simplifies the data structures a bunch and makes this
code simpler to read and maintain.
Change-ID: I0375aad40788ecdc0cb24d5cfeccf07804e69771
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
To add a little flexibility to the nvmupdate facility, this code adds the
ability to specify an AQ event opcode to wait on after the Exec_AQ request.
Change-ID: Iddbfd63c3de8df3edb9d3e90678b08989bc4946e
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A little bit of code cleanup in prep for more cloud filter work.
Change-ID: I0dc33ce0d4c207944336a07437640fef920c100c
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Under some circumstances the driver remove function may be called before
the driver is fully initialized. So we can't assume that we know where
our towel is at, or that all of the data structures are initialized.
To ensure that we don't panic, check that the vsi_res pointer is valid
before dereferencing it. Then drink beer and eat peanuts.
Change-ID: If697b4db57348e39f9538793e16aa755e3e1af03
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add support for configuring RSS on behalf of the VFs. This removes the
burden of dealing with different hardware interfaces from the VF
drivers, allowing for better future compatibility.
Change-ID: Icea75d3f37241ee8e447be5779e5abb53ddf04c0
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Looking over the documentation it turns out enabling IPIP and SIT offloads
for i40e is pretty straightforward. As such I decided to enable them with
this patch. In my testing I am seeing an improvement of 8 to 10 Gb/s
for IPIP and SIT tunnels with this offload enabled.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The feature flags list for i40e and i40evf is beginning to become pretty
massive. I plan to add another 4 or so features to these drivers and
duplicating the flags for each and every flags list is becoming a bit
repetitive.
The primary change here is that we now build our features list around
hw_encap_features. After that we assign that to vlan_features,
hw_features, and finally map that onto features. In addition we end up
throwing features onto hw_encap_features that end up having no effect such
as the Rx offloads and SCTP_CRC. However that should have no impact and
makes things a bit easier for us as hw_encap_features is one of the less
updated features maps available.
For i40evf I went through and sanity checked a few features as well.
Specifically RXCSUM was being set as a read-only feature which didn't make
much sense. I have updated things so we can clear the NETIF_F_RXCSUM flag
since that is really a software feature and not a hardware one anyway so
disabling it is just a matter of ignoring the result from the hardware.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The APIs for making this sort of configuration [e.g., via ethtool] are
already present in qede, but the current configuration flow in qed doesn't
respect it.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's some inconsistency in current logic determining whether the
link settings of a given interface can be changed; I.e., in all modes
other than the so-called `deault' mode the interfaces are forbidden from
changing the configuration - but even this rule is not applied to all
user APIs that may change the configuration.
Instead, let the core-module [qed] decide whether an interface can change
the configuration by supporting a new API function. We also revise the
current rule, allowing all interfaces to change their configurations while
laying the infrastructure for future modes where an interface would be
blocked from making such a configuration.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a getter for the interfaces private flags.
The only parameter currently supported is whether the interface is a
coupled function [required for supporting 100g].
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a difference in statsitics' names starting at qed and
propagating to qede, where egress counters indicate ranges while ingress
counters indiciate high-end.
Align all statistcs to follow the same conventions - name indicates range.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-04-25
This series contains updates to ixgbe and ixgbevf.
Emil provides several patches, starting with the consolidation of the
logic behind configuring spoof checking. Fixed an issue which was
causing link issues for backplane devices because x550em_a/x devices
did not have a default value for mac->ops.setup_link. Refactored the
ethtool stats to bring the logic closer to how ixgbe handles stats and
sets up per-queue stats for ixgbevf.
Mark adds a new register to wait for previous register writes to complete
before issuing a register read, which is needed when slower links are
in use. Fixed the flow control setup for x550em_a, the incorrect
fc_setup function was being used.
Don added a workaround for empty SFP+ cage crosstalk, since on some
systems the crosstalk could lead to link flap on empty SFP+ cages.
Jake converts ixgbe and ixgbevf to use the BIT() macro.
Alex Duyck adds support for partial GSO segmentation in the case of
tunnels for ixgbe and ixgbevf. Then preps for HyperV by moving the API
negotiation into mac_ops.
Arnd Bergmann provides a fix for the ARM compile warnings in linux-next
by converting the use of a udelay() to msleep().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added x550em_a support causes a link failure on ARM because of
an overly long time passed into udelay():
ERROR: "__bad_udelay" [drivers/net/ethernet/intel/ixgbe/ixgbe.ko] undefined!
There are multiple variants of the ixgbe_acquire_swfw_sync_*() function,
and the other ones all use msleep(), so we can safely assume that all
callers are allowed to sleep, which makes msleep() a better replacement
than mdelay().
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 49425dfc74 ("ixgbe: Add support for x550em_a 10G MAC type")
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch moves API negotiation into mac_ops. The general idea here is
that with HyperV on the way we need to make certain that anything that will
have different versions between HyperV and a standard VF needs to be
abstracted enough so that we can have a separate function between the two
so we can avoid changes in one breaking something in the other.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for partial GSO segmentation in the case of
tunnels. Specifically with this change the driver an perform segmentation
as long as the frame either has IPv6 inner headers, or we are allowed to
mangle the IP IDs on the inner header. This is needed because we will not
be modifying any fields from the start of the start of the outer transport
header to the start of the inner transport header as we are treating them
like they are just a block of IP options.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Also cleanup a case where we're bit shifting a value into place, and use
an unsigned constant. Make use of the unsigned postfix in places where
BIT() macro is not appropriate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Make use of GENMASK instead of open coding the equivalent operation
incorrectly.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Several areas of ixgbe were written before widespread usage of the
BIT(n) macro. With the impending release of GCC 6 and its associated new
warnings, some usages such as (1 << 31) have been noted within the ixgbe
driver source. Fix these wholesale and prevent future issues by simply
using BIT macro instead of hand coded bit shifts.
Also fix a few shifts that are shifting values into place by using the
'u' prefix to indicate unsigned. It doesn't strictly matter in these
cases because we're not shifting by too large a value, but these are all
unsigned values and should be indicated as such.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It is possible on some systems that crosstalk could lead to link flap
on empty SFP+ cages. A new NVM bit was defined to let SW know it
needs to implement the work around which consists of verifying that
there is a module in the cage before acting on the LSC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Somehow the wrong fc_setup function was used for x550em_a, so
correct that. Also set setup_link to NULL as its value is
determined later, just like it is with X550EM_x.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement per-queue statistics for packets, bytes and busy poll
specific counters.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This brings the logic closer to how we handle the stats in ixgbe and it
sets us up for introducing per-queue stats.
Use IXGBEVF_STAT and IXGBEVF_NETDEV_STAT for accessing the driver and
netdev stats respectively. This way we don't have to calculate the
stats based on register values which could lead to the counters not
being initialized properly when the interface is down.
IXGBEVF_QUEUE_STATS_LEN is set to include the number of queues.
Also some defines were renamed to use the IXGBEVF prefix.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use a new register to wait for previous register writes to complete
before issuing a register read. This is needed when slower links
are in use.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
media disconnect & connect. The second half should be added to the list
head, not to the tail. So all events are processed in normal order.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This field is used to record the RX queue index for a redirect action
passed via ring_cookie field in struct ethtool_rx_flow_spec which is
a u64 value.
For ex: after adding a filter rule to redirect to a VF using ethtool
# echo 4 > /sys/class/net/p4p1/device/sriov_numvfs
# ethtool -N p4p1 flow-type ip4 src-ip 192.168.0.1 action 0x100000000
querying for the rule shows the Action as 'Direct to queue 0'
# ethtool -n p4p1
4 RX rings available
Total 1 rules
Filter: 2045
Rule Type: Raw IPv4
Src IP addr: 192.168.0.1 mask: 0.0.0.0
Dest IP addr: 0.0.0.0 mask: 255.255.255.255
TOS: 0x0 mask: 0xff
Protocol: 0 mask: 0xff
L4 bytes: 0x0 mask: 0xffffffff
VLAN EtherType: 0x0 mask: 0xffff
VLAN: 0x0 mask: 0xffff
User-defined: 0x0 mask: 0xffffffffffffffff
Action: Direct to queue 0
With this fix, ethtool will report the right queue index even for VFs.
Action: Direct to queue 4294967296
Here 4294967296 corresponds to 0x100000000.
We need to update 'ethtool' to report the queue index as a Hex value so
that it is more user friendly and matches with the 'action' value that
is passed when adding the rule.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
X550EM_a/x did not have a default value for mac->ops.setup_link which
was causing link issues for backplane devices.
This patch sets mac->ops.setup_link to ixgbe_setup_mac_link_X540 for
X550EM_a/x which is also default for X550. This will result in
mac->ops.setup_link calling the link setup function for the respective
PHY type in case we do not need a special function to deal with it.
Reported-by: Ken Cox <jkc@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previously the PF driver would only set VLAN spoof checking if
the VF had created VLANs. This was done by setting and checking
a counter (vlan_count) whenever a VLAN was created by the VF.
However it is possible for the vlan_count to be !=0 while there are
no VLANs assigned to the VF due to the count incrementing every
time a VLAN 0 is added on ifdown/up, which resulted in VLAN spoofing
always being set for those VFs.
This patch cleans up the logic by unconditionally setting VLAN based on
how the VF is configured (via ip link set ethX vf Y spoofchk on/off).
This change also resolves an issue where the VLAN spoofing can remain
set even after being disabled by the user due to the driver enabling
VLAN spoof checking every time a VLAN is added to the VF, but would
only allow changes in the setting if vlan_count != 0.
Also default_vf_vlan_id and vlans_enabled were removed from the
vf_data_storage structure since they are not being used in the driver.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Consolidate the logic behind configuring spoof checking:
Move the setting of the MAC, VLAN and Ethertype spoof checking into
ixgbe_ndo_set_vf_spoofchk().
Change ixgbe_set_mac_anti_spoofing() to set MAC spoofing per VF similar
to the VLAN and Ethertype functions - this allows us to call the helper
functions in ixgbe_ndo_set_vf_spoofchk() for all spoof check types and
only disable MAC spoof checking when creating MACVLAN.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Conflicts were two cases of simple overlapping changes,
nothing serious.
In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix memory leak in iwlwifi, from Matti Gottlieb.
2) Add missing registration of netfilter arp_tables into initial
namespace, from Florian Westphal.
3) Fix potential NULL deref in DecNET routing code.
4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry
Ivanov.
5) Fix dst ref counting in VRF, from David Ahern.
6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck.
7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause.
8) Ravalidate IPV6 datagram socket cached routes properly, particularly
with UDP, from Martin KaFai Lau.
9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang.
10) Fix stats typing in bcmgenet driver, from Eric Dumazet.
11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing,
from Joe Stringer.
12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown.
13) atl2 doesn't actually support scatter-gather, so don't advertise the
feature. From Ben Hucthings.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
openvswitch: use flow protocol when recalculating ipv6 checksums
Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets
atl2: Disable unimplemented scatter/gather feature
net/mlx4_en: Split SW RX dropped counter per RX ring
net/mlx4_core: Don't allow to VF change global pause settings
net/mlx4_core: Avoid repeated calls to pci enable/disable
net/mlx4_core: Implement pci_resume callback
net: phy: spi_ks8895: Don't leak references to SPI devices
net: ethernet: davinci_emac: Fix platform_data overwrite
net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable
qede: Fix single MTU sized packet from firmware GRO flow
qede: Fix setting Skb network header
qede: Fix various memory allocation error flows for fastpath
tcp: Merge tx_flags and tskey in tcp_shifted_skb
tcp: Merge tx_flags and tskey in tcp_collapse_retrans
drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
openvswitch: Orphan skbs before IPv6 defrag
Revert "Prevent NUll pointer dereference with two PHYs on cpsw"
VSOCK: Only check error on skb_recv_datagram when skb is NULL
...
Equivalent to "vxlan: break dependency with netdev drivers", don't
autoload geneve module in case the driver is loaded. Instead make the
coupling weaker by using netdevice notifiers as proxy.
Cc: Jesse Gross <jesse@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently all drivers depend and autoload the vxlan module because how
vxlan_get_rx_port is linked into them. Remove this dependency:
By using a new event type in the netdevice notifier call chain we proxy
the request from the drivers to flush and resetup the vxlan ports not
directly via function call but by the already existing netdevice
notifier call chain.
I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so.
We don't need to save those ids, as the event type field is an unsigned
long and using specialized event types for this purpose seemed to be a
more elegant way. This also comes in beneficial if in future we want to
add offloading knobs for vxlan.
Cc: Jesse Gross <jesse@kernel.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic_attach_func requires rtnl_lock to be held.
Cc: Dept-GELinuxNICDev@qlogic.com
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan_get_rx_port requires rtnl_lock to be held.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Shannon Nelson <shannon.nelson@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_start_port requires rtnl_lock to be held.
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
fm10k_open requires rtnl_lock to be held.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Shannon Nelson <shannon.nelson@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
be_open calls down to functions which expects rtnl lock to be held.
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For IPv6, if the device indicates that the checksum is correct, set
CHECKSUM_UNNECESSARY.
Reported-by: Subbarao Narahari <snarahari@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atl2 includes NETIF_F_SG in hw_features even though it has no support
for non-linear skbs. This bug was originally harmless since the
driver does not claim to implement checksum offload and that used to
be a requirement for SG.
Now that SG and checksum offload are independent features, if you
explicitly enable SG *and* use one of the rare protocols that can use
SG without checkusm offload, this potentially leaks sensitive
information (before you notice that it just isn't working). Therefore
this obscure bug has been designated CVE-2016-2117.
Reported-by: Justin Yackoski <jyackoski@crypto-nite.com>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: ec5f061564 ("net: Kill link between CSUM and SG features.")
Signed-off-by: David S. Miller <davem@davemloft.net>
Counts the number of RX buffer allocation failures and shows it
in ethtool statistics.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move mlx5e_handle_csum and eth_type_trans to the end of
mlx5e_build_rx_skb to gain some more time before accessing
skb->data, to reduce cache misses.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bit-op operation one line before is an explicit barrier
by itself.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of netdev_alloc_skb, we use the napi_alloc_skb function
which is designated to allocate skbuff's for RX in a
channel-specific NAPI instance, and implies the IP packet alignment.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the allocation of a linear (physically continuous) MPWQE fails,
we allocate a fragmented MPWQE.
This is implemented via device's UMR (User Memory Registration)
which allows to register multiple memory fragments into ConnectX
hardware as a continuous buffer.
UMR registration is an asynchronous operation and is done via
ICO SQs.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added ICO (Internal Control Operations) SQ per channel to be used
for driver internal operations such as memory registration for
fragmented memory and nop requests upon ifconfig up.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce the feature of multi-packet WQE (RX Work Queue Element)
referred to as (MPWQE or Striding RQ), in which WQEs are larger
and serve multiple packets each.
Every WQE consists of many strides of the same size, every received
packet is aligned to a beginning of a stride and is written to
consecutive strides within a WQE.
In the regular approach, each regular WQE is big enough to be capable
of serving one received packet of any size up to MTU or 64K in case of
device LRO is enabled, making it very wasteful when dealing with
small packets or device LRO is enabled.
For its flexibility, MPWQE allows a better memory utilization
(implying improvements in CPU utilization and packet rate) as packets
consume strides according to their size, preserving the rest of
the WQE to be available for other packets.
MPWQE default configuration:
Num of WQEs = 16
Strides Per WQE = 2048
Stride Size = 64 byte
The default WQEs memory footprint went from 1024*mtu (~1.5MB) to
16 * 2048 * 64 = 2MB per ring.
However, HW LRO can now be supported at no additional cost in memory
footprint, and hence we turn it on by default and get an even better
performance.
Performance tested on ConnectX4-Lx 50G.
To isolate the feature under test, the numbers below were measured with
HW LRO turned off. We verified that the performance just improves when
LRO is turned back on.
* Netperf single TCP stream:
- BW raised by 10-15% for representative packet sizes:
default, 64B, 1024B, 1478B, 65536B.
* Netperf multi TCP stream:
- No degradation, line rate reached.
* Pktgen: packet rate raised by 2-10% for traffic of different message
sizes: 64B, 128B, 256B, 1024B, and 1500B.
* Pktgen: packet loss in bursts of small messages (64byte),
single stream:
- | num packets | packets loss before | packets loss after
| 2K | ~ 1K | 0
| 8K | ~ 6K | 0
| 16K | ~13K | 0
| 32K | ~28K | 0
| 64K | ~57K | ~24K
As expected as the driver can receive as many small packets (<=64B) as
the number of total strides in the ring (default = 2048 * 16) vs. 1024
(default ring size regardless of packets size) before this feature.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>