Introduce new rtnl UAPI that exposes a list of vlans per VF, giving
the ability for user-space application to specify it for the VF, as an
option to support 802.1ad.
We adjusted IP Link tool to support this option.
For future use cases, the new UAPI supports multiple vlans. For now we
limit the list size to a single vlan in kernel.
Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward
compatibility with older versions of IP Link tool.
Add a vlan protocol parameter to the ndo_set_vf_vlan callback.
We kept 802.1Q as the drivers' default vlan protocol.
Suitable ip link tool command examples:
Set vf vlan protocol 802.1ad:
ip link set eth0 vf 1 vlan 100 proto 802.1ad
Set vf to VST (802.1Q) mode:
ip link set eth0 vf 1 vlan 100 proto 802.1Q
Or by omitting the new parameter
ip link set eth0 vf 1 vlan 100
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to allow for RX network flow classification to insert
and remove Rx filter by ethtool. Ethtool interface has it's own rules
manager
Show all filters:
$ ethtool -n eth0
4 RX rings available
Total 2 rules
Signed-off-by: Ruhao Gao <ruhao.gao@ni.com>
Signed-off-by: Gangfeng Huang <gangfeng.huang@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.
For an IMX6Q, this gives around 6% increased TCP receive performance,
which is cache operations bound and reduces CPU load for TCP transmit.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Properly stop the extra workqueue items and ensure that we resume
cleanly. This is better than using igb_ptp_init and igb_ptp_stop since
these functions destroy the PHC device, which will cause other problems
if we do so. Since igb_ptp_reset now re-schedules the work-queue item we
don't need an equivalent igb_ptp_resume in the resume workflow.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modify igb_ptp_init to take advantage of igb_ptp_reset, and remove
duplicated work that was occurring in both igb_ptp_reset and
igb_ptp_init.
In total, resetting the TSAUXC register, and resetting the system time
both happen in igb_ptp_reset already. igb_ptp_reset now also takes care
of starting the delayed work item for overflow checks, as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that we do have pci_request_mem_regions() and pci_release_mem_regions()
at hand, use it in the Intel ethernet drivers.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: David S. Miller <davem@davemloft.net>
This patch adds support for offloading IPXIP6 type packets that represent
either IPv4 or IPv6 encapsulated inside of an IPv6 outer IP header. In
addition with this change we should also be able to support FOU
encapsulated traffic with outer IPv6 headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for partial GSO segmentation in the case of
tunnels. Specifically with this change the driver an perform segmentation
as long as the frame either has IPv6 inner headers, or we are allowed to
mangle the IP IDs on the inner header. This is needed because we will not
be modifying any fields from the start of the start of the outer transport
header to the start of the inner transport header as we are treating them
like they are just a block of IP options.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.
Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
a trans_start struct member exists twice:
- in struct net_device (legacy)
- in struct netdev_queue
Instead of open-coding dev->trans_start usage to obtain the current
trans_start value, use dev_trans_start() instead.
This is not exactly the same, as dev_trans_start also considers
the trans_start values of the netdev queues owned by the device
and provides the most recent one.
For legacy devices this doesn't matter as dev_trans_start can cope
with netdev trans_start values of 0 (they are ignored).
This is a prerequisite to eventual removal of dev->trans_start.
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 3eb14ea8d9 ("igb: Fix a deadlock in
igb_sriov_reinit")
It is the same as commit f468adc944 ("igb: missing rtnl_unlock in
igb_sriov_reinit()")
There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a
deadlock.
Signed-off-by: Arika Chen <arika.chen@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The Intel i211 LOM PCIe Ethernet controllers' iNVM operates as an OTP
and has no external EEPROM interface [1]. The following allows the
driver to pickup the MAC address from a device tree blob when CONFIG_OF
has been enabled.
[1]
http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html
Signed-off-by: John Holland <jotihojr@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch enables bulk free in Tx cleanup for igb and cleans up the
boolean logic in the polling routines for igb in the hopes of avoiding
any mix-ups similar to what occurred with i40e and i40evf.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We were casting the addr as __beXX and then passing it into le32_to_cpu
because the device expects the MAC address to be in network order even
though the register set is little endian. Instead of casting it as __beXX
we can just cast it as __leXX in order to maintain consistency since the
region of memory is already in little endian order as far as we are
concerned.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
The success of CMA allocation largely depends on the success of
migration and key factor of it is page reference count. Until now, page
reference is manipulated by direct calling atomic functions so we cannot
follow up who and where manipulate it. Then, it is hard to find actual
reason of CMA allocation failure. CMA allocation should be guaranteed
to succeed so finding offending place is really important.
In this patch, call sites where page reference is manipulated are
converted to introduced wrapper function. This is preparation step to
add tracepoint to each page reference manipulation function. With this
facility, we can easily find reason of CMA allocation failure. There is
no functional change in this patch.
In addition, this patch also converts reference read sites. It will
help a second step that renames page._count to something else and
prevents later attempt to direct access to it (Suggested by Andrew).
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Problem: When switching off VLAN offloading on an i350, the VLAN
interface gets unusable. For testing, set up a VLAN on an i350
and some remote machine, e.g.:
$ ip link add link eth0 name eth0.42 type vlan id 42
$ ip addr add 192.168.42.1/24 dev eth0.42
$ ip link set dev eth0.42 up
Offloading is switched on by default:
$ ethtool -k eth0 | grep vlan-offload
rx-vlan-offload: on
tx-vlan-offload: on
$ ping -c 3 -I eth0.42 192.168.42.2
[...works as usual...]
Now switch off VLAN offloading and try again:
$ ethtool -K eth0 rxvlan off
Actual changes:
rx-vlan-offload: off
tx-vlan-offload: off [requested on]
$ ping -c 3 -I eth0.42 192.168.42.2
PING 192.168.42.2 (192.168.42.2) from 192.168.42.1 eth0.42: 56(84) bytes of da
ta.
--- 192.168.42.2 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 1999ms
I can only reproduce it on an i350, the above works fine on a 82580.
While inspecting the igb source, I came across the code in igb_set_vmolr
which sets the E1000_VMOLR_STRVLAN/E1000_DVMOLR_STRVLAN flags once and
for all, and in all of the igb code there's no other place where the
STRVLAN is set or cleared. Thus, VLAN stripping is enabled in igb
unconditionally, independently of the offloading setting.
I compared that to the latest Intel igb-5.3.3.5 driver from
http://sourceforge.net/projects/e1000/ which in fact sets and clears the
STRVLAN flag independently from igb_set_vmolr in its own function
igb_set_vf_vlan_strip, depending on the vlan settings.
So I included the STRVLAN handling from the igb-5.3.3.5 driver into our
current igb driver and tested the above scenario again. This time ping
still works after switching off VLAN offloading.
Tested on i350, with and without addtional VFs, as well as on 82580
successfully.
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for generic Tx checksums to the igb driver. It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.
In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
MACLEN (maximum of 127), retrieved from:
skb_network_offset()
IPLEN (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset
The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.
I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
E1000_MRQC_ENABLE_RSS_4Q enables 4 and 8 queues depending on the part
so rename to be generic.
Similarly, E1000_MRQC_ENABLE_VMDQ_RSS_2Q has no numeric meaning so
rename to be more generic.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Override EEPROM settings for specific OEM devices.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
I210 device IPv6 autoconf test sometimes fails,
because DAD NS for link-local is not transmitted.
This packet is silently dropped.
This problem is seen only GbE environment.
igb_watchdog_task link up detection continues to the following process.
The following cases are observed:
1.PHY 1000BASE-T Status Register Remote receiver status bit is NG.
(NG status becomes OK after about 200 - 700ms)
2.In this case, the transfer packet is silently dropped.
1000BASE-T Status register
[Expected]: 0x3800 or 0x7800
[problem occurred]: 0x2800 or 0x6800
Frequency of occurrence: approx 1/10 - 1/40 observed
In order to avoid this problem,
wait until 1000BASE-T Status register "Remote receiver status OK"
After applying this patch, at least 400 runs succeed with no problems.
Signed-off-by: Takuma Ueba <t.ueba11@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There was a workaround partially implemented for the 82576 that is needed
in order for VLAN tag stripping to function correctly. The original code
had side effects that would make it so the workaround was active on all
MACs. I have updated the code so that the workaround is enabled, but
limited to the 82576, or activated if we exceed the available unicast
addresses.
The workaround has a side effect of mirroring all of the traffic outgoing
from the VFs back to the PF. As such it is not recommended to use the
82576 in promiscuous mode as it will take a performance hit, though this is
now consistent with the performance as seen on the out-of-tree igb driver.
I also limited the scope of the UTA bits all being set to only when the
VMOLR register is enabled. This should limit the effects of the UTA
register so that we don't pick up any excess traffic unless promiscuous
mode has been enabled on the PF, whereas before the PF would have ended up
in something equivalent to unicast promiscuous mode with VLAN filtering
otherwise.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we can use the bridge utility to add a FDB
entry for the PF to an igb port. By doing this we can enable the VFs to
talk to virtual ports residing on top of the PF.
In addition this should also address issues with MACVLANs trying to reside
on top of the PF as well as they would have had similar issues when added
to the PF with SR-IOV enabled.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch drops several checks that we dropped from ixgbe some ago. It
should not be possible for us to be called with either of the conditional
statements returning true so we can just drop them from the hot-path.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change fixes things so that we can fully support SR-IOV or the
recently added NTUPLE filtering while allowing support for VLAN promiscuous
mode. By making this change we are able to support possible scenarios such
as SR-IOV with the PF connected to a Linux bridge hosting other VMs.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is meant to clean-up the configuration of the VF port based VLAN
configuration. The original logic was a bit muddled and had some
undesirable side effects such as VLANs being either completely stripped
from the port or VLANs being left when they shouldn't be. The idea behind
this code is to avoid any events such as spurious spoof notifications when
we are removing one VLAN tag and replacing it with another.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we can merge the configuration of the VLVF
registers into the setting of the VFTA register. By doing this we simplify
the logic and make use of similar functionality that we have already added
for ixgbe making it easier to maintain both drivers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch makes it so that we always add VLAN 0. This is important as we
need to guarantee the PF can receive untagged frames in the case of SR-IOV
being enabled but VLAN filtering not being enabled in the kernel.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The RLPML registers already take the size of VLAN headers into account when
determining the maximum packet length. This is called out in EAS documents
for several parts including the 82576 and the i350. As such we can drop
the addition of size to the value programmed into the RLPML registers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since the igb driver is using page based receive there is no point in
limiting the Rx capabilities of the device. The driver can receive 9K
jumbo frames at all times. The only changes needed due to MTU changes are
updates for the FIFO sizes and flow-control watermarks.
Update the maximum frame size to reflect the 9.5K limitation of the
hardware, and replace all instances of max_frame_size with
MAX_JUMBO_FRAME_SIZE when referring to an Rx FIFO or frame.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Drop a bunch of hand written byte swapping code in favor of just doing the
byte swapping ourselves. The registers are little endian registers storing
a big endian value so if we read the MAC address array as little endian
then we will get the CPU registers into the proper layout.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
By the commit 72ddef0506 ("igb: Fix oops caused by missing queue
pairing"), the IGB_FLAG_QUEUE_PAIRS flag can now be set when changing the
number of queues by "ethtool -L", but it is never cleared unless the igb
driver is reloaded.
This patch clears it if queue pairing becomes unnecessary as a result of
"ethtool -L".
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If VFs are enabled (max_vfs >= 1), both max_rss_queues and
adapter->rss_queues are set to 2 in the case of e1000_82576.
In this case, IGB_FLAG_QUEUE_PAIRS is always set in the default block as a
result of fall-through, thus setting it in the e1000_82576 block is not
necessary.
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The SCTP checksum is really a CRC and is very different from the
standards 1's complement checksum that serves as the checksum
for IP protocols. This offload interface is also very different.
Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these
differences. The term CSUM should be reserved in the stack to refer
to the standard 1's complement IP checksum.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean up array_rd32 so that it uses igb_rd32 the same as rd32, per the
suggestion of Alexander Duyck, and use io_addr in more places, so that
we don't have the need to call E1000_REMOVED (which simply looks for a
null hw_addr) nearly as much.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The combined effect of commits 6423fc3416 ("igb: do not re-init SR-IOV
during probe") and ceee3450b3 ("igb: make sure SR-IOV init uses the
right number of queues") causes VFs no longer getting set up, leading
to NULL pointer dereferences due to the adapter's ->vf_data being NULL
while ->vfs_allocated_count is non-zero. The first commit not only
neglected the side effect of igb_sriov_reinit() that the second commit
tried to account for, but also that of setting IGB_FLAG_HAS_MSIX,
without which igb_enable_sriov() is effectively a no-op. Calling
igb_{,re}set_interrupt_capability() as done here seems to address this,
but I'm not sure whether this is better than sinply reverting the other
two commits.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As per Eric Dumazet's previous patches:
(see commit (24d2e4a507) - tg3: use napi_complete_done())
Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout
GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>
Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS
igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1176930056 1475.36 797726 16384.00 71905
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1175182320 50476.00 23282 16384.00 71816
i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.
Other drivers were only compile tested.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to deprecate the use of 'struct timespec' on 32-bit
architectures, as it is will overflow in 2038. The igb
driver uses it to read the current time, and can simply
be changed to use ktime_get_real_ts64() instead.
Because of hardware limitations, there is still an overflow
in year 2106, which we cannot really avoid, but this documents
the overflow.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Reviewed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In igb_sw_init() the sequence of calls was changed from
igb_init_queue_configuration()
igb_init_interrupt_scheme()
igb_probe_vfs()
to
igb_probe_vfs()
igb_init_queue_configuration()
igb_init_interrupt_scheme()
This results in adapter->flags not having the IGB_FLAG_HAS_MSIX bit set
during igb_probe_vfs()->igb_enable_sriov(). Therefore SR-IOV does not
get enabled properly and we run into a NULL pointer if the max_vfs
module parameter is specified (adapter->vf_data does not get allocated,
crash on accessing the structure).
[ 7.419348] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
[ 7.419367] IP: [<ffffffffa02161c6>] igb_reset+0xe6/0x5d0 [igb]
[ 7.419370] PGD 0
[ 7.419373] Oops: 0002 [#1] SMP
[ 7.419381] Modules linked in: ahci(+) libahci igb(+) i40e(+) vxlan ip6_udp_tunnel udp_tunnel megaraid_sas(+) ixgbe(+) mdio
[ 7.419385] CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 4.2.0+ #153
[ 7.419387] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 1.6.0 03/07/2013
[...]
[ 7.419431] Call Trace:
[ 7.419442] [<ffffffffa0217236>] igb_probe+0x8b6/0x1340 [igb]
[ 7.419447] [<ffffffff814c7f15>] local_pci_probe+0x45/0xa0
Prevent this by setting the IGB_FLAG_HAS_MSIX bit before calling
igb_probe_vfs(). The real interrupt capabilities will be checked during
igb_init_interrupt_scheme() so this is safe to do.
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit c48a11c7ad ("netvm: propagate page->pfmemalloc to skb") added
checks for page->pfmemalloc to __skb_fill_page_desc():
if (page->pfmemalloc && !page->mapping)
skb->pfmemalloc = true;
It assumes page->mapping == NULL implies that page->pfmemalloc can be
trusted. However, __delete_from_page_cache() can set set page->mapping
to NULL and leave page->index value alone. Due to being in union, a
non-zero page->index will be interpreted as true page->pfmemalloc.
So the assumption is invalid if the networking code can see such a page.
And it seems it can. We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf. There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.
The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead. We can reuse the index
again but define it to an impossible value (-1UL). This is the page
index so it should never see the value that large. Replace all direct
users of page->pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.
The information will get lost if somebody wants to use page->index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g. what SLAB and SLUB do).
[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad ("netvm: propagate page->pfmemalloc to skb")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Debugged-by: Vlastimil Babka <vbabka@suse.com>
Debugged-by: Jiri Bohac <jbohac@suse.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org> [3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to igb_probe_vfs() could lead to the PF holding onto all
of the queues. Reorder igb_probe_vfs() to be before
gb_init_queue_configuration() and add some more error checking.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In error handling code of igb_probe, the memory adapter->shadow_vfta
allocated by kcalloc in igb_sw_init is not freed. So when register_netdev
or igb_init_i2c is failed, a memory leak will occur.
This patch adds kfree to fix it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When igb_init_interrupt_scheme in igb_sriov_reinit is failed, the lock
acquired by rtnl_lock() is not released, which causes a deadlock.
This patch adds rtnl_unlock() in error handling to fix it.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the .remove() callback for a PF is called, SR-IOV support for the
device is disabled, which requires unbinding and removing the VFs.
The VFs may be in-use either by the host kernel or userspace, such as
assigned to a VM through vfio-pci. In this latter case, the VFs may
be removed either by shutting down the VM or hot-unplugging the
devices from the VM. Unfortunately in the case of a Windows 2012 R2
guest, hot-unplug is broken due to the ordering of the PF driver
teardown. Disabling SR-IOV prior to unregister_netdev() avoids this
issue.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>