Commit Graph

3759 Commits

Author SHA1 Message Date
Jacob Keller
40de1fad41 fm10k: split fm10k_reinit into two functions
There are several flows in the driver which perform the similar function
of tearing down software and restoring software to recover from certain
errors or PCIe events, including:

  * fm10k_reinit
  * fm10k_suspend/resume
  * fm10k_io_error_detected/fm10k_io_resume

In addition, we want to implement a .reset_notify() handler as well
which will also perform similar function.

Rework how the driver codes reset and resume flows by separating out the
reinit logic into two functions "fm10k_prepare_for_reset" and
"fm10k_handle_reset". This first step will allow us to re-use this
functionality in the similar blocks of code instead of re-coding the
same sequence of events slightly different.

The end result should be more maintainable and correct, fixing several
inconsistencies with the work flow.

The new functions expect to take the rtnl_lock() themselves, and it does
have the unfortunate side effect of having the reinit flow take then
release then take the rtnl_lock. However, this minor downside is
out weighted by the benefits of code reduction and reducing needless
difference between these flows.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:13 -07:00
Jacob Keller
94877768cf fm10k: wait for queues to drain if stop_hw() fails once
It turns out that sometimes during a reset the Tx queues will be
temporarily stuck longer than .stop_hw() expects. Work around this issue
by attempting to .stop_hw() first. If it tails, wait a number of
attempts until the Tx queues appear to be drained. After this, attempt
stop_hw() again. This ensures that we avoid waiting if we don't need to,
such as during the first initialization of a VF, and give the proper
amount of time necessary to recover from most situations. It is possible
that the hardware is actually stuck. For PFs, this is usually fixed by
a datapath reset. Unfortunately the VF cannot request a similar reset
for itself.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:12 -07:00
Jacob Keller
106ca42356 fm10k: only warn when stop_hw fails with FM10K_ERR_REQUESTS_PENDING
When stop_hw() routine fails with FM10K_ERR_REQUESTS_PENDING, this
indicates that the Tx or Rx queues did not shutdown within the time
limit. Print a more suitable message at the dev_info level instead of
dev_err.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:12 -07:00
Jacob Keller
34bad71c7c fm10k: use actual hardware registers when checking for pending Tx
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:11 -07:00
Jacob Keller
892c9e0872 fm10k: perform data path reset even when switch is not ready
A while ago, an additional check for the switch being ready was added to
reset_hw. A recent refactor accidentally made this check return an error
code on failure which caused fm10k_probe to fail when the switch wasn't
brought up first. The original reasoning for the check was to prevent
additional data path reset when the fabric wasn't ready yet. However,
there isn't a compelling reason to keep the check, as the data path
reset will restore hardware to a known good state. Remove the check and
perform the data path reset regardless of the switch manager state.

An alternative fix is to return FM10K_SUCCESS instead, and bypass the
actual data path reset. This should be fine as we will perform
a reset_hw once the switch is active. However, since data path reset
will reset many parts of the hardware it seems better to just perform
the reset regardless of switch state.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:11 -07:00
Jacob Keller
ce33624f37 fm10k: don't stop reset due to FM10K_ERR_REQUESTS_PENDING
Don't report FM10K_ERR_REQUESTS_PENDING when we fail to disable queues
within the timeout. This can occur due to a hardware Tx hang, or when
the switch ethernet fabric is resetting while we are transmitting
traffic. It can sometimes take up to 500ms before the Tx DMA engine
gives up. Instead, just skip the DMA engine check and perform
a data-path reset anyways. Add a statistic counter to keep track of the
number of resets occurring while we have pending DMA on the rings.

In order to prevent having to re-assign err to 0, re-order the
last few items of the reset_hw_pf function so that we don't perform
"return err" at the end.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:11 -07:00
Ngai-Mint Kwan
5e93cbadd3 fm10k: Reset mailbox global interrupts
When a data path reset is initiated, write control to the PCIE_GMBX is
yanked from the switch manager. The switch manager writes to this
register to clear mailbox global interrupt bits as part of its mailbox
interrupt handling routine. When the device recovers from the data path
reset and these bits are not cleared, it will prevent future mailbox
global interrupts from being triggered. Upon confirming that the device
has exited from a data path reset, clear these bits to ensure the proper
functioning of the mailbox global interrupt.

Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:10 -07:00
Jacob Keller
9d73edee59 fm10k: prevent multiple threads updating statistics
Also prevent updating stats while the interface is down. If we're
already updating stats, just return doing nothing. When we take the
device down, block stat updates until we come back up. This ensures that
we avoid tearing down rings when we're updating statistics, and prevents
updating statistics until we're up.

We can't re-use the __FM10K_DOWN for this because it wouldn't prevent
multiple threads from accessing statistics. Neither does it prevent the
case where we start updating stats and then start going down in another
thread.

The fm10k_get_stats64 is except from this, because it has a completely
different flow which does not suffer from the same issues as
fm10k_update_stats might.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:10 -07:00
Jacob Keller
b624714bc9 fm10k: avoid possible null pointer dereference in fm10k_update_stats
It's currently possible for fm10k_update_stats to be called during the
window when we go down and the rings are removed. This can result in
a null pointer dereference. In fm10k_get_stats64 we work around this by
using ACCESS_ONCE and a null pointer check inside the loop. Use this
same flow in the fm10k_update_stats to avoid the potential null pointer.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:09 -07:00
Jacob Keller
1b00c6c064 fm10k: no need to continue in fm10k_down if __FM10K_DOWN already set
Return early from fm10k_down() when we are already down, since that
means another thread is either already finished or has started going
down, so shouldn't conflict with them.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-20 15:22:09 -07:00
Guilherme G. Piccoli
7f6c553902 i40e: use valid online CPU on q_vector initialization
Currently, the q_vector initialization routine sets the affinity_mask
of a q_vector based on v_idx value. Meaning a loop iterates on v_idx,
which is an incremental value, and the cpumask is created based on
this value.

This is a problem in systems with multiple logical CPUs per core (like in
SMT scenarios). If we disable some logical CPUs, by turning SMT off for
example, we will end up with a sparse cpu_online_mask, i.e., only the first
CPU in a core is online, and incremental filling in q_vector cpumask might
lead to multiple offline CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

This patch changes the way q_vector's affinity_mask is created: it iterates
on v_idx, but consumes the CPU index from the cpu_online_mask instead of
just using the v_idx incremental value.

No functional changes were introduced.

Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14 23:39:12 -07:00
Paolo Abeni
4b732cd4bb ixgbe: napi_poll must return the work done
Currently the function ixgbe_poll() returns 0 when it clean completely
the rx rings, but this foul budget accounting in core code.
Fix this returning the actual work done, capped to weight - 1, since
the core doesn't allow to return the full budget when the driver modifies
the napi status

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14 23:34:52 -07:00
Kiran Patil
f6bd09625b i40e: enable VSI broadcast promiscuous mode instead of adding broadcast filter
This patch sets VSI broadcast promiscuous mode during VSI add sequence
and prevents adding MAC filter if specified MAC address is broadcast.

Change-ID: Ia62251fca095bc449d0497fc44bec3a5a0136773
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14 23:26:59 -07:00
Alexander Duyck
858296c878 i40e/i40evf: Fix i40e_rx_checksum
There are a couple of issues I found in i40e_rx_checksum while doing some
recent testing.  As a result I have found the Rx checksum logic is pretty
much broken and returning that the checksum is valid for tunnels in cases
where it is not.

First the inner types are not the correct values to use to test for if a
tunnel is present or not.  In addition the inner protocol types are not a
bitmask as such performing an OR of the values doesn't make sense.  I have
instead changed the code so that the inner protocol types are used to
determine if we report CHECKSUM_UNNECESSARY or not.  For anything that does
not end in UDP, TCP, or SCTP it doesn't make much sense to report a
checksum offload since it won't contain a checksum anyway.

This leaves us with the need to set the csum_level based on some value.
For that purpose I am using the tunnel_type field.  If the tunnel type is
GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
header.  In the case of GRE or UDP we will have a possible checksum present
so for this reason it should be safe to set the csum_level to 1 to indicate
that we are reporting the state of the inner header.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14 23:17:45 -07:00
Sabrina Dubroca
e5de25dce9 drivers/net: fixup comments after "Future-proof tunnel offload handlers"
Some comments weren't updated to reflect the renaming of ndo's and the
change of arguments.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-11 13:42:11 -07:00
David S. Miller
30d0844bdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx5/core/en.h
	drivers/net/ethernet/mellanox/mlx5/core/en_main.c
	drivers/net/usb/r8152.c

All three conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-06 10:35:22 -07:00
David S. Miller
435c556cde Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2016-06-29

This series contains updates and fixes to e1000e, igb, ixgbe and fm10k.  A
true smorgasbord of changes.

Jake cleans up some obscurity by not using the BIT() macro on bitshift
operation and also fixed the calculated index when looping through the
indir array.  Fixes the issue with igb's workqueue item for overflow
check from causing a surprise remove event.  The ptp_flags variable is
added to simplify the work of writing several complex MAC type checks
in the PTP code while fixing the workqueue.

Alex Duyck fixes the receive buffers alignment which should not be L1
cache aligned, but to 512 bytes instead.

Denys Vlasenko prevents a division by zero which was reported under
VMWare for e1000e.

Amritha fixes an issue where filters in a child hash table must be
cleared from the hardware before delete the filter links in ixgbe.

Bhaktipriya Shridhar simply replaces the deprecated create_workqueue()
with alloc_workqueue() for fm10k.

Tony corrects ixgbe ethtool reporting to show x550 supports hardware
timestamping of all packets.

Emil fixes an issue where MAC-VLANs on the VF fail to pass traffic due
to spoofed packets.

Andrew Lunn increases performance on some systems where syncing a buffer
for DMA is expensive.  So rather than sync the whole 2K receive buffer,
only synchronize the length of the frame.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 09:29:07 -04:00
David S. Miller
ee58b57100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:03:36 -04:00
Andrew Lunn
64f2525ca4 igb: Only DMA sync frame length
On some platforms, syncing a buffer for DMA is expensive. Rather than
sync the whole 2K receive buffer, only synchronise the length of the
frame, which will typically be the MTU, or a much smaller TCP ACK.

For an IMX6Q, this gives around 6% increased TCP receive performance,
which is cache operations bound and reduces CPU load for TCP transmit.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 13:59:24 -07:00
Emil Tantilov
581e0c7df9 ixgbe: fix spoofed packets with macvlans
When setting spoofing, both VLAN and MAC need to be set together.
This change resolves an issue where MAC-VLANs on the VF fail to pass
traffic due to spoofed packets.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 13:06:31 -07:00
Tony Nguyen
918b89e77f ixgbe: Correct reporting of timestamping for x550
Update ixgbe_ethtool_get_ts_info() to show that x550 supports hardware
timestamping of all packets.

Reported-by: Guy Harris <guy@alum.mit.edu>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 12:57:19 -07:00
Bhaktipriya Shridhar
0a38c17a21 fm10k: Remove create_workqueue
alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workitem (viz
fm10k_service_task, which manages and runs other subtasks) is involved in
normal device operation and requires forward progress under memory
pressure.

create_workqueue has been replaced with alloc_workqueue with max_active
as 0 since there is no need for throttling the number of active work
items.

Since network devices may be used in memory reclaim path,
WQ_MEM_RECLAIM has been set to guarantee forward progress.

flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue
becomes empty. Hence the call to flush_workqueue() has been dropped.

Signed-off-by: Bhaktipriya Shridhar <bhaktipriya96@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 11:18:36 -07:00
Jacob Keller
8646f7b4cd igb: call igb_ptp_suspend during suspend/resume cycle
Properly stop the extra workqueue items and ensure that we resume
cleanly. This is better than using igb_ptp_init and igb_ptp_stop since
these functions destroy the PHC device, which will cause other problems
if we do so. Since igb_ptp_reset now re-schedules the work-queue item we
don't need an equivalent igb_ptp_resume in the resume workflow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 11:14:31 -07:00
Jacob Keller
e3f2350de8 igb: implement igb_ptp_suspend
Make igb_ptp_stop take advantage of this new function to reduce code
duplication.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 11:00:22 -07:00
Jacob Keller
4f3ce71bb8 igb: re-use igb_ptp_reset in igb_ptp_init
Modify igb_ptp_init to take advantage of igb_ptp_reset, and remove
duplicated work that was occurring in both igb_ptp_reset and
igb_ptp_init.

In total, resetting the TSAUXC register, and resetting the system time
both happen in igb_ptp_reset already. igb_ptp_reset now also takes care
of starting the delayed work item for overflow checks, as well.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:56:11 -07:00
Jacob Keller
63737166a0 igb: introduce IGB_PTP_OVERFLOW_CHECK flag
Don't continue to use complex MAC type checks for handling various cases
where we have overflow check code. Make this code more obvious by
introducing a flag which is enabled for hardware that needs these
checks.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:51:34 -07:00
Jacob Keller
462f118882 igb: introduce ptp_flags variable and use it to replace IGB_FLAG_PTP
Upcoming patches will introduce new PTP specific flags. To avoid
cluttering the normal flags variable, introduce PTP specific "ptp_flags"
variable for this purpose, and move IGB_FLAG_PTP to become
IGB_PTP_ENABLED.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:48:07 -07:00
Amritha Nambiar
12746fd21e ixgbe: Error handler for duplicate filter locations in hardware for cls_u32 offloads
For u32 classifier filters, avoid overwriting existing filter
in a hardware location without removing it first, to clean up
inconsistencies due to duplicate values for filter location.

Verified with the following filters:

Create child hash tables:
	handle 1: u32 divisor 1
	handle 2: u32 divisor 1

Link to the child hash table from parent hash table:
	handle 800:0:11 u32 ht 800: link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat \
	match ip protocol 6 ff match ip dst 15.0.0.1/32

	handle 800:0:12 u32 ht 800: link 2: \
	offset at 0 mask 0f00 shift 6 plus 0 eat \
	match ip protocol 17 ff match ip dst 16.0.0.1/32

Add filter into child hash table:
	handle 1:0:3 u32 ht 1: \
	match tcp src 22 ffff action drop

Add another filter to the same location:
	handle 2:0:3 u32 ht 2: \
	match tcp src 33 ffff action drop

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:44:02 -07:00
Amritha Nambiar
1ecedc926b ixgbe: Fix deleting link filters for cls_u32 offloads
On deleting filters which are links to a child hash table, the filters
in the child hash table must be cleared from the hardware if there
is no link between the parent and child hash table.

Verified with the following filters:

Create a child hash table:
	handle 1: u32 divisor 1

Link to the child hash table from parent hash table:
	handle 800:0:10 u32 ht 800: link 1: \
	offset at 0 mask 0f00 shift 6 plus 0 eat \
	match ip protocol 6 ff match ip dst 15.0.0.1/32

Add filters into child hash table:
	handle 1:0:2 u32 ht 1: \
	match tcp src 22 ffff action drop
        handle 1:0:3 u32 ht 1: \
        match tcp src 33 ffff action drop

Delete link filter from parent hash table:
	handle 800:0:10 u32

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:05:24 -07:00
Denys Vlasenko
3d05b15b03 e1000e: prevent division by zero if TIMINCA is zero
Users report that under VMWare, er32(TIMINCA) returns zero.
This causes division by zero at init time as follows:

 ==>       incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
           for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
                   /* latch SYSTIMH on read of SYSTIML */
                   systim_next = (cycle_t)er32(SYSTIML);
                   systim_next |= (cycle_t)er32(SYSTIMH) << 32;

                   time_delta = systim_next - systim;
                   temp = time_delta;
 ====>             rem = do_div(temp, incvalue);

This change makes kernel survive this, and users report that
NIC does work after this change.

Since on real hardware incvalue is never zero, this should not affect
real hardware use case.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 10:00:22 -07:00
Jacob Keller
34875887f3 fm10k: fix incorrect index calculation in fm10k_write_reta
The index calculated when looping through the indir array passed to
fm10k_write_reta was incorrectly calculated as the first part i needs to
be multiplied by 4.

Fixes: 0cfea7a65738 ("fm10k: fix possible null pointer deref after kcalloc", 2016-04-13)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 09:53:36 -07:00
Alexander Duyck
fb5677aa26 fm10k: Align Rx buffers to 512B blocks
While reviewing the i40e driver changes to support page based receive I
realized that I had overlooked the fact that the fm10k hardware required a
512 byte alignment for Rx buffers.  This patch is meant to address that by
changing the alignment for Rx buffers to 512 bytes instead of allowing it
to be L1 cache aligned.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 09:46:17 -07:00
Jacob Keller
124579de46 fm10k: don't use BIT() macro where the value isn't a bitmask
The FM10K_MAX_DATA_PER_TXD is really just using a bitshift as a power of
2 operation in an efficient manner. We shouldn't represent this as a BIT()
because that obscures the intention of the operation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 09:38:59 -07:00
Xin Long
b3a3c5176c ixgbevf: ixgbevf_write/read_posted_mbx should use IXGBE_ERR_MBX to initialize ret_val
Now ixgbevf_write/read_posted_mbx use -IXGBE_ERR_MBX as the initiative
return value, but it's incorrect, cause in ixgbevf_vlan_rx_add_vid(),
it use err == IXGBE_ERR_MBX, the err returned from mac.ops.set_vfta,
and in ixgbevf_set_vfta_vf, it return from write/read_posted. so we
should initialize err with IXGBE_ERR_MBX, instead of -IXGBE_ERR_MBX.

With this fix, the other functions that called it also can work well,
cause they only care about if err is 0 or not.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 09:18:06 -07:00
Jarod Wilson
838086414b e1000e: keep Rx/Tx HW_VLAN_CTAG in sync
The bit in the e1000 driver that mentions explicitly that the hardware
has no support for separate RX/TX VLAN accel toggling rings true for
e1000e as well, and thus both NETIF_F_HW_VLAN_CTAG_RX and
NETIF_F_HW_VLAN_CTAG_TX need to be kept in sync.

Revert a portion of commit 889ad45666 ("e1000e: keep VLAN interfaces
functional after rxvlan off") since keeping the bits in sync resolves
the original issue.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-29 09:10:17 -07:00
Jarod Wilson
889ad45666 e1000e: keep VLAN interfaces functional after rxvlan off
I've got a bug report about an e1000e interface, where a VLAN interface is
set up on top of it:

$ ip link add link ens1f0 name ens1f0.99 type vlan id 99
$ ip link set ens1f0 up
$ ip link set ens1f0.99 up
$ ip addr add 192.168.99.92 dev ens1f0.99

At this point, I can ping another host on vlan 99, ip 192.168.99.91.
However, if I do the following:

$ ethtool -K ens1f0 rxvlan off

Then no traffic passes on ens1f0.99. It comes back if I toggle rxvlan on
again. I'm not sure if this is actually intended behavior, or if there's a
lack of software VLAN stripping fallback, or what, but things continue to
work if I simply don't call e1000e_vlan_strip_disable() if there are
active VLANs (plagiarizing a function from the e1000 driver here) on the
interface.

Also slipped a related-ish fix to the kerneldoc text for
e1000e_vlan_strip_disable here...

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-29 07:39:48 -04:00
Neerav Parikh
85a1aab79c i40e: Don't notify client(s) for DCB changes on all VSIs
When LLDP/DCBX change happens the i40e driver code flow tried to
notify the client(s) for each of the PF VSIs. This resulted into
kernel panic on the first VSI that didn't have any netdev
associated to it.

The DCB change notification to the client(s) should be done only
once for the PF/LAN VSI where the client(s) instances have been
added to. Also, move the notification call after the PF driver has
made changes related to the updated DCB configuration.

Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com>
Tested-by: Ronald J Bynoe <ronald.j.bynoe@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:22:28 -07:00
Tushar Dave
a70e407f6d i40e: Fix errors resulted while turning off TSO
On systems with 128 CPUs, turning off TSO results in errors,

i40e 0000:03:00.0: failed to get tracking for 1 vectors for VSI 400, err=-12
i40e 0000:03:00.0: Couldn't create FDir VSI
i40e 0000:03:00.0: i40e_ptp_init: PTP not supported on eth0
i40e 0000:03:00.0: couldn't add VEB, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOENT
i40e 0000:03:00.0: rebuild of switch failed: -1, will try to set up simple PF connection
i40e 0000:03:00.0 eth0: adding 00:10:e0:8a:24:b6 vid=0

Enabling FD_SB without checking availability of MSI-X vector is the
root cause. This change adds necessary check.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:21:26 -07:00
Bimmy Pujari
0706195802 i40e/i40evf: Bump version from 1.5.16 to 1.6.4
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:14:30 -07:00
Shannon Nelson
2d1de8283f i40e: add VSI info to macaddr messages
Since the macaddr add and delete happens asynchronously, error
messages don't easily get associated to the actual request. Here
we add a bit of information to the error messages to help
determine the source of the error.

Change-ID: Id2d6df5287141c3579677d72d8bd21122823d79f
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:10:58 -07:00
Mitch Williams
5bc160319f i40e: set default VSI without a reset
Remove the need for a reset when the device enters limited promiscuous
mode. This was causing heartburn for people who were using VFs and
bridging, since this would require all of the VFs to undergo a reset
each time the PF changed its promiscuity.

Change-ID: I0a83495c5e4d68112bbc7a7a076d20fa8dd3b61c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:06:50 -07:00
Mitch Williams
63590b6129 i40evf: always activate correct MAC address filter
Always add MAC address at the tail of the MAC filter list. Since the
device's "real" MAC address is added first, it will always be at the
beginning of the list. This prevents an issue where the "real" MAC
filter might not get added if too many other filters are added before
bringing the interface up.

Change-ID: I34a8aeebeb0cb87a44b24118adc4176c7b943c1c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 16:02:48 -07:00
Catherine Sullivan
7d64402f5a i40e: Fix RSS to not be limited by the number of CPUs
Limiting qcount to pf->num_lan_msix, effectively limits the RSS queues
to only use the number of CPUs, and ignore all other queues. We don't
want to do this. If the user has changed the RSS settings to use more
queues then CPUS, we want to trust they know what they are doing and
let them. More importantly, if we tell them that is what we did, we want
to actually do it and allow traffic into all of the queues we have
allocated. This does not change the default setting to initially
allocate only the number of CPUS of queue pairs.

Change-ID: Ie941a96e806e4bcd016addb4e17affb46770ada5
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:58:36 -07:00
Avinash Dayanand
01a7a9fef4 i40e: Removing unnecessary code which caused supported link mode bug
Removing this code which wasn't allowing 100BaseT to show up in the supported
link modes for 10GBaseT PHYs.

Change-ID: Iada2eafa7ef6b4bac9a2a1380ff533ae5de51e1d
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:56:37 -07:00
Serey Kong
6536227d1d i40e: fix missing DA cable check
When a Direct Attach (DA) cable is used, if the i40e_set_settings
function is called it would return an error. Add the DA type so
the function won't fail.

Change-ID: I2b802f27a5d91cfefa72fd1f852acb4d74647a8e
Signed-off-by: Serey Kong <serey.kong@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:51:54 -07:00
Greg Rose
059ff69b5f i40e: Save PCI state before suspend
The i40e_suspend() function was failing to save PCI state
and this would result in a kernel stack trace from a WARN_ONCE in the
pci_legacy_suspend() function.

Add a call to pci_save_state() to fix that problem.

Change-ID: I4736e62bb660966bd208cc8af617a14cb07fc4bd
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:43:39 -07:00
Greg Rose
b33d3b7321 i40e: Clean up MSIX IRQs before suspend
The i40e_suspend() function calls another function that preps the device
for the power save and resume by freeing all the Tx/Rx resources and
interrupts but that function does not free the "other" causes interrupt
vector and IRQ. It also fails to call synchronize_irq() before freeing
the IRQ vectors.  This sometimes may result in some AER errors on those
systems with that PCIe error reporting feature enabled.

Call synchronize_irq() before freeing IRQ vectors and explicitly free
the other causes interrupt resources and shut down that MSIX interrupt.

Change-ID: Ib88e4536756518a352446da0232189716618ad81
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:37:16 -07:00
Mitch Williams
0e8d95f896 i40evf: don't overflow buffer
If the user adds an obscene amount of MAC addresses, the driver will run
into the situation where it has too many address requests to fit into a
single PF message. The driver checks for this case, and calculates the
maximum number of messages that it can send. Then it completely ignores
this count and overflows the buffer.

Fix this by checking the address count and bailing out of the loop at
the appropriate time.

Change-ID: If8dcbb04602c75941dc0cd8309065e1de9ca791c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:31:50 -07:00
Catherine Sullivan
f980d445e5 i40e: Add a call to set the client interface down
We were failing to set the client interface down when we put the VSI
down. Add this call so that the client doesn't get an open called with
no close.

Also remove an un-needed delay. The VF should not be affected at all by
i40e_down.

Change-ID: I1135dffef534bf84e6fed57cf51bcf590e6cfaf7
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:25:36 -07:00
Mitch Williams
bb36071721 i40e: write HENA for VFs
Now that VF RSS is configured by the PF driver, it needs to set the RSS
Hash Enable registers by default. Without this, no packets will be
hashed and they'll all end up on queue 0.

Change-ID: I38e425f40ddb81e3b19a951cfbb939fa5b1123f1
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:19:40 -07:00
Mitch Williams
3e25a8f31a i40e: add hw struct local variable
This function uses the i40e_hw struct all over the place, so why doesn't
it keep a pointer to the struct? Add this pointer as a local variable
and use it consistently throughout the function.

Change-ID: I10eb688fe40909433fcb8ac7ac891cef67445d72
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:15:47 -07:00
Mitch Williams
fb70fabad8 i40e: add functions to control default VSI
Add functions to enable and disable default VSI on a VEB. This allows
for configuration of limited promiscuous mode specifically for bridging
purposes.

Change-ID: I0cc5bd68b31c500fdff4d47e1f15d50d2739faf4
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-06-27 15:08:28 -07:00
Johannes Thumshirn
56d766d64c ethernet/intel: Use pci_(request|release)_mem_regions
Now that we do have pci_request_mem_regions() and pci_release_mem_regions()
at hand, use it in the Intel ethernet drivers.

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: David S. Miller <davem@davemloft.net>
2016-06-23 11:48:58 -05:00
Alexander Duyck
b3a49557d5 ixgbe: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port
This change replaces the network device operations for adding or removing a
VXLAN port with operations that are more generically defined to be used for
any UDP offload port but provide a type.  As such by just adding a line to
verify that the offload type is VXLAN we can maintain the same
functionality.

In addition I updated the socket address family check so that instead of
excluding IPv6 we instead abort of type is not IPv4.  This makes much more
sense as we should only be supporting IPv4 outer addresses on this
hardware.

The last change is that I pulled the rtnl_lock/unlock into the conditional
statement for IXGBE_FLAG2_VXLAN_REREG_NEEDED.  The motivation behind this
is to avoid unneeded bouncing of the mutex which will just slow down the
handling of this call anyway.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:30 -07:00
Alexander Duyck
06a5f7f167 i40e: Move all UDP port notifiers to single function
This patch goes through and combines the notifiers for VXLAN and GENEVE
into a single function for each action.  So there is now one combined
function for getting ports, one for adding the ports, and one for deleting
the ports.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:30 -07:00
Alexander Duyck
f174cdbe5b fm10k: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port
This change replaces the network device operations for adding or removing a
VXLAN port with operations that are more generically defined to be used for
any UDP offload port but provide a type.  As such by just adding a line to
verify that the offload type if VXLAN we can maintain the same
functionality.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:30 -07:00
Alexander Duyck
bf2d1df395 intel: Add support for IPv6 IP-in-IP offload
This patch adds support for offloading IPXIP6 type packets that represent
either IPv4 or IPv6 encapsulated inside of an IPv6 outer IP header.  In
addition with this change we should also be able to support FOU
encapsulated traffic with outer IPv6 headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 19:25:52 -04:00
Tom Herbert
7e13318daa net: define gso types for IPx over IPv4 and IPv6
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 18:03:15 -04:00
Alexander Duyck
5eee87cd51 ixgbe: Fix VLAN features error
It looks like at some point I somehow transposed the location of setting
the VLAN features in netdev->features and the configuration of the
vlan_features.  As a result the driver is now generating a warning about
vlan_features being setup incorrectly.

This patch corrects that by placing the update of netdev->features to
include the VLAN features so that it is after the point where we write
netdev->features into netdev->vlan_features.

Fixes: b83e30104b ("ixgbe/ixgbevf: Add support for GSO partial")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-16 16:56:38 -07:00
Emil Tantilov
11f2b494bc ixgbe: use correct mask when enabling sriov
Swap the parameters in GENMASK in order to generate the correct mask.

This change fixes Tx hangs when enabling SRIOV.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-16 16:49:36 -07:00
Dan Carpenter
1c306f7f62 i40e: fix an uninitialized variable bug
We removed this initialization but it is required.  Let's put it back.

Fixes: 895106a577 ('i40e: trivial fixes')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:21:51 -07:00
Bimmy Pujari
c74dff1aaa i40e: Bump version from 1.5.10 to 1.5.16
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:17:19 -07:00
Mitch Williams
d96a83def2 i40e: don't add broadcast filter for VFs
Now that all VSIs are configured to receive broadcasts as default, we
don't need to add a filter. This eliminates an annoying but harmless
error message each time VFs are created or reset.

Change-ID: I4cd6339684df45b0d2722133eeb84c14fa93ea19
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:11:48 -07:00
Mitch Williams
a876c3ba59 i40e/i40evf: properly report Rx packet hash
This logic is inverted. If the RXHASH flag is set, then we should go
ahead and call skb_set_hash.

Change-ID: Ib2e30356dced1d3e939c8061ab6ad5bd94197e7c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:07:40 -07:00
Ashish Shah
4b28cdba48 i40e: set context to use VSI RSS LUT for SR-IOV
For the SR-IOV VSIs, when the queue filtering section is valid,
the RSS LUT needs to be set to use the VSI specific lookup table
(otherwise it will use the PF RSS LUT table).

Change-ID: Ia9377cc818078238a75c3bdeade1b593a91b3480
Signed-off-by: Ashish Shah <ashish.n.shah@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:00:24 -07:00
Akeem G Abodunrin
73df8c9e3e i40e: Correct UDP packet header for non_tunnel-ipv6
This patch corrects Rx ptype payload layer for non_tunneled ipv6. It
should be layer 4 for UDP, instead of layer 3.

Change-ID: I9382e4458ab3c4e58f6d2e9f195d5d4ee513805e
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 23:56:13 -07:00
Jacob Keller
c420815d12 i40e: change Rx hang message into a WARN_ONCE
Use WARN_ONCE in order to highlight the issue, but don't display
a warning every time. The user should be able to see the ethtool counter
we created if necessary to see how often it is occurring.

Change-ID: I40c4ea159819b64a7d33b7f5716749089791533a
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 23:44:59 -07:00
Catherine Sullivan
06566e5dd4 i40e: Refactor ethtool get_settings
Previously we were only looking at the FW supported PHY types if link
was down, because we want to be more specific when link is up. This
refactor changes this. When link is down, we still rely on the FW
supported PHY types, but when link is up, we select the possible
supported link modes from what we know about the current PHY type, and
AND that with the FW supported PHY types.

Change-ID: Ice5dad83f2a17932b0b8b59f07439696ad6aa013
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 23:32:15 -07:00
Mitch Williams
eee4172abc i40e: lie to the VF
If an untrusted VF attempts to configure promiscuous mode, log a message
pointing out its naughty behavior. But then, instead of returning an
error to the offender, just lie to it and say everything's OK. It will
continue on its way, thinking it's in promiscuous mode, but receiving no
packets except its own.

Change-ID: I63369215b1720f3c531eedfc06af86ff8c0e3dc8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 23:23:19 -07:00
Anjali Singhai Jain
b556989230 i40e: Add vf-true-promisc-support priv flag
This patch adds priv-flag knob to configure global true promisc
support. With this patch the user can decide the flavor of
promiscuous that the VFs will see when promiscuous mode is enabled
on the interface. Since this a global setting for the whole device,
the priv-flag is exposed only on the first PF of the device.

The default is true promisc support is off, which means the promisc
mode for the VF will be limited/defport mode.

For the PF, we still will be in limited promisc unless in MFP mode
irrespective of the flavor picked through this knob.

Usage:
On PF0
ethtool --show-priv-flags p261p1
Private flags for p261p1:
MFP                    : off
LinkPolling            : off
flow-director-atr      : on
veb-stats              : off
hw-atr-eviction        : off
vf-true-promisc-support: off

to enable setting true promisc
ethtool --set-priv-flags p261p1 vf-true-promisc-support on

At this point if the VF is set to trust and promisc is enabled
on the VF through
ip link set ... promisc on
The VF/VFs will be able to see ALL ingress traffic

Change-Id: I8fac4b6eb1af9ca77b5376b79c50bdce5055bd94
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 22:48:46 -07:00
Shannon Nelson
f3d5849756 i40e: Implement the API function for aq_set_switch_config
Add the support code for calling the AdminQ API call aq_set_switch_config

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 22:37:02 -07:00
Anjali Singhai Jain
f42a5c74da i40e: Add allmulti support for the VF
This patch enables a feature to enable/disable all multicast
for a trusted VF.

Change-Id: I926eba7f8850c8d40f8ad7e08bbe4056bbd3985f
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 22:31:42 -07:00
Kevin Scott
06c0e39bbe i40e: Add support for disabling all link and change bits needed for PHY interactions
Add flag to tell firmware to disable link on all ports.

This patch changes the bits set for telling firmware the PHY needs
to be modified by driver.  Without this patch, the setting will only
set that mode for the current port on the device.  Because the
MDIO interface is common for the copper device. The command needs to
set the mode for all ports.

Change-ID: I8baa7da91d384291ac95b41ae1a516604f8eb67f
Signed-off-by: Kevin Scott <kevin.c.scott@intel.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 21:36:59 -07:00
Jacob Keller
aa524b66c5 e1000e: don't modify SYSTIM registers during SIOCSHWTSTAMP ioctl
The e1000e_config_hwtstamp function was incorrectly resetting the SYSTIM
registers every time the ioctl was being run. If you happened to be
running ptp4l and lost the PTP connect (removing cable, or blocking the
UDP traffic for example), then ptp4l will eventually perform a restart
which involves re-requesting timestamp settings. In e1000e this has the
unfortunate and incorrect result of resetting SYSTIME to the kernel
time. Since kernel time is usually in UTC, and PTP time is in TAI, this
results in the leap second being re-applied.

Fix this by extracting the SYSTIME reset out into its own function,
e1000e_ptp_reset, which we call during reset to restore the hardware
registers. This function will (a) restart the timecounter based on the
new system time, (b) restore the previous PPB setting, and (c) restore
the previous hwtstamp settings.

In order to perform (b), I had to modify the adjfreq ptp function
pointer to store the old delta each time it is called. This also has the
side effect of restoring the correct base timinca register correctly.
The driver does not need to explicitly zero the ptp_delta variable since
the entire adapter structure comes zero-initialized.

Reported-by: Brian Walsh <brian@walsh.ws>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:30:44 -07:00
Alexander Duyck
e10715d3e9 igb/igbvf: Add support for GSO partial
This patch adds support for partial GSO segmentation in the case of
tunnels.  Specifically with this change the driver an perform segmentation
as long as the frame either has IPv6 inner headers, or we are allowed to
mangle the IP IDs on the inner header.  This is needed because we will not
be modifying any fields from the start of the start of the outer transport
header to the start of the inner transport header as we are treating them
like they are just a block of IP options.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:26:37 -07:00
Jacob Keller
942c711206 e1000e: mark shifted values as unsigned
The E1000_ICH_NVM_SIG_MASK value is shifted, out to the 31st bit, which
is the signed bit for signed constants. Mark these values as unsigned to
prevent compiler warnings and issues on platforms which a different
signed bit implementation.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:19:05 -07:00
Jacob Keller
18dd239207 e1000e: use BIT() macro for bit defines
This prevents signed bitshift issues when the shift would overwrite the
signed bit, and prevents making this mistake in the future when copying
and modifying code.

Use GENMASK or the unsigned postfix for cases which aren't suitable for
BIT() macro.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:15:36 -07:00
Jacob Keller
0ed2dbf4f4 igbvf: use BIT() macro instead of shifts
To prevent signed bitshift issues, and improve code readability, use the
BIT() macro. Also make use of GENMASK or the unsigned postfix where this
is more appropriate than BIT()

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:12:03 -07:00
Jacob Keller
12b28b4108 igbvf: remove unused variable and dead code
The variable rdlen is set but never used, and thus setting it is dead
code. Remove it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:06:33 -07:00
Nathan Sullivan
3f544d2a4d igb: adjust PTP timestamps for Tx/Rx latency
Table 7-62 on page 338 of the i210 datasheet lists TX and RX latencies
for the various speeds the chip supports.  To give better PTP timestamp
accuracy, adjust the timestamps by the amounts Intel gives based on
current link speed.

Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 15:02:08 -07:00
Denys Vlasenko
ab507c9a54 e1000e: e1000e_cyclecounter_read(): do overflow check only if needed
SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0]

er32(SYSTIML) are probably moderately expensive (they are pci bus reads).
Can we avoid one of them? Yes, we can.

If the SYSTIML value we see is smaller than 0xff000000, the overflow
into SYSTIMH would require at least two increments.

We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order.

Even if one increment happens between them, the overflow into SYSTIMH
is impossible, and we can avoid doing another er32(SYSTIML) read
and overflow check.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:56:35 -07:00
Denys Vlasenko
a07fd74d5e e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check
If two consecutive reads of the counter are the same, it is also
not an overflow.  "systimel_1 < systimel_2" should be
"systimel_1 <= systimel_2".

Before the patch, we could perform an *erroneous* correction:

Let's say that systimel_1 == systimel_2 == 0xffffffff.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: intel-wired-lan@lists.osuosl.org
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:52:31 -07:00
Denys Vlasenko
fb5277f2c2 e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64
"incvalue" variable holds a result of "er32(TIMINCA) &
E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)"
as a divisor.

Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:46:45 -07:00
Jacob Keller
8008f68cb8 igb: make igb_update_pf_vlvf static
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:39:59 -07:00
Jacob Keller
a51d8c217b igb: use BIT() macro or unsigned prefix
For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.

Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:39:47 -07:00
Brian Walsh
847042a6a5 e1000e: Cleanup consistency in ret_val variable usage
Fixed the file to use a consistent ret_val for return value checking.

Signed-off-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:30:40 -07:00
Steve Shih
e11f303e3d e1000e: fix ethtool autoneg off for non-copper
This patch fixes the issues for disabling auto-negotiation and forcing
speed and duplex settings for the non-copper media.

For non-copper media, e1000_get_settings should return ETH_TP_MDI_INVALID for
eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent e1000_set_settings
call would not fail with -EOPNOTSUPP.

e1000_set_spd_dplx should not automatically turn autoneg back on for forced
1000 Mbps full duplex settings for non-copper media.

Cc: xe-kernel@external.cisco.com
Cc: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Steve Shih <sshih@cisco.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-13 14:23:37 -07:00
Julia Lawall
3949c4ac8c i40e: constify i40e_client_ops structure
The i40e_client_ops structure is never modified, so declare it as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 23:32:59 -07:00
Arnd Bergmann
ce927db487 i40e: fix misleading indentation
Newly added code in i40e_vc_config_promiscuous_mode_msg() is indented
in a way that gcc rightly complains about:

drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function 'i40e_vc_config_promiscuous_mode_msg':
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1543:4: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
    if (f->vlan >= 0 && f->vlan <= I40E_MAX_VLANID)
    ^~
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1550:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
     aq_err = pf->hw.aq.asq_last_status;

From the context, it looks like the aq_err assignment was meant to be
inside of the conditional expression, so I'm adding the appropriate
curly braces now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5676a8b9cd ("i40e: Add VF promiscuous mode driver support")
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 23:25:34 -07:00
Jesse Brandeburg
147e81ec75 i40e: Test memory before ethtool alloc succeeds
When testing on systems with very limited amounts of RAM, a bug was
found where, while changing the number of descriptors using ethtool,
the driver didn't test the limits of system memory before permanently
assuming it would be able to get receive buffer memory.

Work around this issue by pre-allocation of the receive buffer
memory, in the "ghost" ring, which is then used during reinit
using the new ring length.

Change-Id: I92d7a5fb59a6c884b2efdd1ec652845f101c3359
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 23:17:07 -07:00
Mitch Williams
b163098ea1 i40evf: Allocate Rx buffers properly
Allocate the correct number of RX buffers, and don't fiddle with
next_to_use. The common RX code handles all of this. This fixes a memory
leak of one page each time the driver is opened.

Change-Id: Id06eca353086e084921f047acad28c14745684ee
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 23:07:30 -07:00
Jesse Brandeburg
bec60fc42b i40e/i40evf: Remove unused hardware receive descriptor code
The hardware supports a 16 byte descriptor for receive, but the
driver was never using it in production.  There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole
lot of complexity while getting rid of the code.

Also since the previous patch made us use no-split mode all the
time, drop any support in the driver for any other value in dtype
and assume it is always zero (aka no-split).

Hooray for code removal!

Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 22:59:54 -07:00
Jesse Brandeburg
ab9ad98eb5 i40evf: refactor receive routine
This is part 2 of the Rx refactor series, just including
changes to i40evf.

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove VF ethtool control over packet split
- remove priv_flags interface since it is unused

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 22:42:58 -07:00
Jesse Brandeburg
19b85e677d i40evf: Drop packet split receive routine
As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.

Some of the split related context set up code stays in
i40e_virtchnl_pf.c in case an older VF driver tries to load
and still wants to use packet split.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 22:31:23 -07:00
Jesse Brandeburg
1a557afc4d i40e: Refactor receive routine
This is part 1 of the Rx refactor series, just including
changes to i40e.

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove ethtool control over packet split

Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 21:53:16 -07:00
Jesse Brandeburg
04b3b77981 i40e/i40evf: Remove reference to ring->dtype
As part of the rx-refactor, the dtype variable in the i40e_ring
struct is no longer used, so remove it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 18:59:23 -07:00
Jesse Brandeburg
b32bfa1724 i40e: Drop packet split receive routine
As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 18:52:06 -07:00
Jesse Brandeburg
f8a952cb40 i40e/i40evf: Refactor tunnel interpretation
Refactor the interpretation of a tunnel.  This removes
some code and lets us start using the hardware's parsing.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 18:24:10 -07:00
David S. Miller
aa8a8b05ad Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2016-05-04

This series contains updates to ixgbe, ixgbevf and traffic class helpers.

Sridhar adds helper functions to the tc_mirred header to access tcf_mirred
information and then implements them for ixgbe to enable redirection to
a SRIOV VF or an offloaded MACVLAN device queue via tc 'mirred' action.

Amritha adds support to set filters with multiple header fields (L3,L4)
to match on.

KY Srinivasan from Microsoft add Hyper-V support into ixgbevf.

Emil adds 82599 sub-device IDs that were missing from the list of parts
that support WoL.  Then simplified the logic we use to determine WoL
support by reading the EEPROM bits for MACs X540 and newer.

Preethi cleaned up duplicate and unused device IDs.  Fixed our ethtool
stat reporting where we were ignoring higher 32 bits of stats registers,
so fill out 64 bit stat values into two 32 bit words.

Babu Moger from Oracle improves VF performance issues on SPARC.

Alex Duyck cleans up some of the Hyper-V implementation from KY so that
we can just use function pointers instead of having to identify if a
given VF is running on a Linux or Windows PF.

Usha makes sure that DCB and FCoE is disabled for X550EM_x/a MACs and
cleans up the DCB initialization in the process.

Tony cleans up the API for ixgbevf_update_xcast_mode() so we do not
have to pass in the netdev parameter, since it was never used in the
function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 17:13:34 -04:00
Florian Westphal
9b36627ace net: remove dev->trans_start
previous patches removed all direct accesses to dev->trans_start,
so change the netif_trans_update helper to update trans_start of
netdev queue 0 instead and then remove trans_start from struct net_device.

AFAICS a lot of the netif_trans_update() invocations are now useless
because they occur in ndo_start_xmit and driver doesn't set LLTX
(i.e. stack already took care of the update).

As I can't test any of them it seems better to just leave them alone.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 14:16:50 -04:00