Commit Graph

70243 Commits

Author SHA1 Message Date
Jacob Keller
759dc4a7e6 i40e: initialize our affinity_mask based on cpu_possible_mask
On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.

This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.

The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.

We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:09:03 -07:00
Jacob Keller
9254c0e34e i40e: move enabling icr0 into i40e_update_enable_itr
If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:07:13 -07:00
Jacob Keller
ba4460d45a i40e: remove workaround for resetting XPS
Since commit 3ffa037d7f ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.

This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.

Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:06:02 -07:00
Carolyn Wyborny
19279235be i40e: Fix for unused value issue found by static analysis
This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution.  The condition  at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:02:16 -07:00
Mariusz Stachura
68e49702a1 i40e: 25G FEC status improvements
This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:01:03 -07:00
Mariusz Stachura
8774370d26 i40e/i40evf: support for VF VLAN tag stripping control
This patch gives VF capability to control VLAN tag stripping via
ethtool. As rx-vlan-offload was fixed before, now the VF is able to
change it using "ethtool --offload <IF> rxvlan on/off" settings.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:47:43 -07:00
Jacob Keller
8c9eb350aa i40e: force VMDQ device name truncation
In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.

When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.

The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:44:04 -07:00
Jacob Keller
696ac80aa1 i40evf: fix possible snprintf truncation of q_vector->name
The q_vector names are based on the interface name with a driver prefix,
the type of q_vector setup, and the queue number. We previously set the
size of this variable to IFNAMSIZ + 9, which is incorrect, because we
actually include a minimum of 14 characters extra beyond the interface
name size.

New versions of GCC since 7 include a new warning that detects this
possible truncation and complains. We can fix this by increasing the
size in case our interface name is too large to avoid truncation. We
don't need to go beyond 14 because the compiler is smart enough to
realize our values can never exceed size of 1. We do go up to 15 here
because possible future changes may increase the number of queues beyond
one digit.

While we are here, also change some variables to be unsigned (since they
are never negative) and stop using an extra unnecessary %s format
specifier.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:58 -07:00
Akeem G Abodunrin
e53b382f3a i40e: Use correct flag to enable egress traffic for unicast promisc
Albeit, we usually set true promiscuous mode for both multicast and
unicast at the same time - however, it is possible to set it
individually, so using allmulti flag which is only for allmulticast might
caused unwanted behavior in mirroring egress traffic promiscuous for
unicast in VF.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:53 -07:00
Jacob Keller
b5d5504aa1 i40e: prevent snprintf format specifier truncation
Increase the size of the prefix buffer so that it can hold enough
characters for every possible input. Although 20 is enough for all
expected inputs, it is possible for the values to be larger than
expected, resulting in a possibly truncated string. Additionally, lets
use sizeof(prefix) in order to ensure we use the correct size if we need
to change the array length in the future.

New versions of GCC starting at 7 now include warnings to prevent
truncation unless you handle the return code. At most 27 bytes can be
written here, so lets just increase the buffer size even if for all
expected hw->bus.* values we only needed 20.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:41 -07:00
Mariusz Stachura
ed601f6601 i40e: Store the requested FEC information
Store information about FEC modes, that were requested. It will be used
in printing link status information function and this way there is no
need to call admin queue there.

Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:43:34 -07:00
Sudheer Mogilappagari
167d52edc4 i40e: Update state variable for adminq subtask
During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This fix updates the state variables so that adminq_subtask will have
accurate information whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 15:42:53 -07:00
Antoine Ténart
ec15ecdee5 net: mvpp2: fix the packet size configuration for 10G
The MVPP22_XLG_CTRL1_FRAMESIZELIMIT define is used as an offset, but is
defined as BIT(0). Updated its name to contains "OFFS" as in offset and
fix its value using the offset value, 0.

Reported-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Antoine Tenart <antoine.tenart@free-electrons.com>
Fixes: 76eb1b1de5 ("net: mvpp2: set maximum packet size for 10G ports")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25 20:10:42 -07:00
David S. Miller
49107fcbf4 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-08-25

This series contains updates to i40e and i40evf only.

Mitch adjusts the max packet size to account for two VLAN tags.

Sudheer provides a fix to ensure that the watchdog timer is scheduled
immediately after admin queue operations are scheduled in i40evf_down().
Fixes an issue by adding locking around the admin queue command and
update of state variables so that adminq_subtask will have the accurate
information whenever it gets scheduled.

Anjali fixes a bug where the PF flag setup should happen before the VMDq
RSS queue count is initialized for VMDq VSI to get the right number of
queues for RSS in the case of x722 devices.  Fixed a problem with the
hardware ATR eviction feature where the NVM setting was incorrect.

Jake separates the flags into two types, hw_features and flags.  The
hw_features flags contain a set of features which are enabled at init
time and will not contain feature flags that can be toggled.  Everything
else will remain in the flags variable, and can be modified anytime
during run time.  We should not be directly copying a cpumask_t, since
it is bitmap and might not be copied correctly, so use cpumask_copy()
instead.

Stefan Assmann makes vf _offload_flags more "generic" by renaming it to
vf_cap_flags, which allows other capabilities besides offloading to be
added.

Alan makes it such that if adaptive-rx/tx is enabled, the user cannot
make any manual adjustments to interrupt moderation.  Also makes it so
that if ITR is disabled by adaptive-rx/tx is then enabled, ITR will be
re-enabled.

v2: Dropped patches #1 & #8 from the original patch series submission,
    while Jesse and Jake re-work their patches based on feedback from
    David Miller.  Also removed the duplicate patch 3 that was
    accidentally sent out twice in the previous submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25 19:39:58 -07:00
Simon Horman
6abd224b25 nfp: add basic SR-IOV ndo functions to representors
Add basic ndo_set/get_vf to support SR-IOV on all types
of port representors.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25 19:24:58 -07:00
Pablo Cascón
25528d90f5 nfp: add basic SR-IOV ndo functions
Add basic ndo_set/get_vf to support SR-IOV.

VF to egress phy static mapping by now.

Use vfcfg ABI version 2 to write the info to the FW and collect
the return value from the mailbox.

Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Jimmy Kizito <jimmy.kizito@netronome.com>
Signed-off-by: Rami Tomer <rami.tomer@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25 19:24:58 -07:00
Dan Carpenter
7d8697afae hinic: skb_pad() frees on error
The skb_pad() function frees the skb on error, so this code has a double
free.

Fixes: 00e57a6d4a ("net-next/hinic: Add Tx operation")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-25 17:13:04 -07:00
Sudheer Mogilappagari
2bf01935ec i40e: synchronize nvmupdate command and adminq subtask
During NVM update, state machine gets into unrecoverable state because
i40e_clean_adminq_subtask can get scheduled after the admin queue
command but before other state variables are updated. This causes
incorrect input to i40e_nvmupd_check_wait_event and state transitions
don't happen.

This issue existed before but surfaced after commit 373149fc99
("i40e: Decrease the scope of rtnl lock")

This fix adds locking around admin queue command and update of
state variables so that adminq_subtask will have accurate information
whenever it gets scheduled.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:52 -07:00
Alan Brady
06b2decd92 i40e: prevent changing ITR if adaptive-rx/tx enabled
Currently the driver allows the user to change (or even disable)
interrupt moderation if adaptive-rx/tx is enabled when this should
not be the case.

Adaptive RX/TX will not respect the user's ITR settings so
allowing the user to change it is weird.  This bug would also
allow the user to disable interrupt moderation with adaptive-rx/tx
enabled which doesn't make much sense either.

This patch makes it such that if adaptive-rx/tx is enabled, the user
cannot make any manual adjustments to interrupt moderation.  It also
makes it so that if ITR is disabled but adaptive-rx/tx is then
enabled, ITR will be re-enabled.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:46 -07:00
Jacob Keller
7e4d01e7d3 i40e: use cpumask_copy instead of direct assignment
According to the header file cpumask.h, we shouldn't be directly copying
a cpumask_t, since its a bitmap and might not be copied correctly. Lets
use the provided cpumask_copy() function instead.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:42 -07:00
Alan Brady
f0db789287 i40evf: use netdev variable in reset task
If we're going to bother initializing a variable to reference it we might
as well use it.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:38 -07:00
Stefan Assmann
fbb113f773 i40e/i40evf: rename vf_offload_flags to vf_cap_flags in struct virtchnl_vf_resource
The current name of vf_offload_flags indicates that the bitmap is
limited to offload related features. Make this more generic by renaming
it to vf_cap_flags, which allows for other capabilities besides
offloading to be added.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:52:29 -07:00
Jacob Keller
fcf6cfc8a6 i40e: move check for avoiding VID=0 filters into i40e_vsi_add_vlan
In i40e_vsi_add_vlan we treat attempting to add VID=0 as an error,
because it does not do what the caller might expect. We already special
case VID=0 in i40e_vlan_rx_add_vid so that we avoid this error when
adding the VLAN.

This special casing is necessary so that we do not add the VLAN=0 filter
since we don't want to stop receiving untagged traffic. Unfortunately,
not all callers of i40e_vsi_add_vlan are aware of this, including when
we add VLANs from a VF device.

Rather than special casing every single caller of i40e_vsi_add_vlan,
lets just move this check internally. This makes the code simpler
because the caller does not need to be aware of how VLAN=0 is special,
and we don't forget to add this check in new places.

This fixes a harmless error message displaying when adding a VLAN from
within a VF. The message was meaningless but there is no reason to
confuse end users and system administrators, and this is now avoided.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:45 -07:00
Jacob Keller
841c950d67 i40e/i40evf: use cmpxchg64 when updating private flags in ethtool
When a user gives an invalid command to change a private flag which is
not supported, either because it is read-only, or the device is not
capable of the feature, we simply ignore the request.

A naive solution would simply be to report error codes when one of the
flags was not supported. However, this causes problems because it makes
the operation not atomic. If a user requests multiple private flags
together at once we could end up changing one before failing at the
second flag.

We can do a bit better if we instead update a temporary copy of the
flags variable in the loop, and then copy it into place after. If we
aren't careful this has the pitfall of potentially silently overwriting
any changes caused by other threads.

Avoid this by using cmpxchg64 which will compare and swap the flags
variable only if it currently matched the old value. We'll report
-EAGAIN in the (hopefully rare!) case where the cmpxchg64 fails.

This ensures that we can properly report when flags are not supported in
an atomic fashion without the risk of overwriting other threads changes.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:34 -07:00
Anjali Singhai Jain
10a955ff62 i40e: Detect ATR HW Evict NVM issue and disable the feature
This patch fixes a problem with the HW ATR eviction feature where the
NVM setting was incorrect.  This patch detects the issue on X720
adapters and disables the feature if the NVM setting is incorrect.

Without this patch, HW ATR Evict feature does not work on broken NVMs
and is not detected either.  If the HW ATR Evict feature is disabled
the SW Eviction feature will take effect.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:26 -07:00
Jacob Keller
28921a0c2f i40e: remove workaround for Open Firmware MAC address
Since commit b499ffb0a2 ("i40e: Look up MAC address in Open Firmware
or IDPROM"), we've had support for obtaining the MAC address
form Open Firmware or IDPROM.

This code relied on sending the Open Firmware address directly to the
device firmware instead of relying on our MAC/VLAN filter list. Thus,
a work around was introduced in commit b1b15df592 ("i40e: Explicitly
write platform-specific mac address after PF reset")

We refactored the Open Firmware address enablement code in the ill-named
commit 41c4c2b50d ("i40e: allow look-up of MAC address from Open
Firmware or IDPROM")

Since this refactor, we no longer even set I40E_FLAG_PF_MAC. Further, we
don't need this work around, because we actually store the MAC address
as part of the MAC/VLAN filter hash. Thus, we will restore the address
correctly upon reset.

The refactor above failed to revert the workaround, so do that now.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:20 -07:00
Jacob Keller
d36e41dc78 i40e: separate hw_features from runtime changing flags
The number of flags found in pf->flags has grown quite large, and there
are a lot of different types of flags. Most of the flags are simply
hardware features which are enabled on some firmware or some MAC types.
Other flags are dynamic run-time flags which enable or disable certain
features of the driver.

Separate these two types of flags into pf->hw_features and pf->flags.
The hw_features list will contain a set of features which are enabled at
init time. This will not contain toggles or otherwise dynamically
changing features. These flags should not need atomic protections, as
they will be set once during init and then be essentially read only.

Everything else will remain in the flags variable. These flags may be
modified at any time during run time. A future patch may wish to convert
these flags into set_bit/clear_bit/test_bit or similar approach to
ensure atomic correctness.

The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but
currently is used by ethtool in the private flags settings, and thus has
been left as part of flags.

Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the
hw_features but this patch has not tried to untangle it yet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:15 -07:00
Anjali Singhai Jain
5a433199bf i40e: Fix a bug with VMDq RSS queue allocation
The X722 pf flag setup should happen before the VMDq RSS queue count is
initialized for VMDq VSI to get the right number of queues for RSS in
case of X722 devices.

Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:46:06 -07:00
Sudheer Mogilappagari
fe2647ab0c i40evf: prevent VF close returning before state transitions to DOWN
Currently i40evf_close() can return before state transitions to
__I40EVF_DOWN because of the latency involved in processing and
receiving response from PF driver and scheduling of VF watchdog_task.
Due to this inconsistency an immediate call to i40evf_open() fails
because state is still DOWN_PENDING.

When a VF interface is in up state and we try to add it as slave,
The bonding driver calls dev_close() and dev_open() in short duration
resulting in dev_open returning error. The ifenslave command needs
to be run again for dev_open to succeed.

This fix ensures that watchdog timer is scheduled immediately after
admin queue operations are scheduled in i40evf_down(). In addition a
wait condition is added at the end of i40evf_close so that function
wont return when state is still DOWN_PENDING. The timeout value is
chosen after some profiling and includes some buffer.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:45:55 -07:00
Mitch Williams
1e3a5fd5c0 i40e/i40evf: adjust packet size to account for double VLANs
Now that the kernel supports double VLAN tags, we should at least play
nice. Adjust the max packet size to account for two VLAN tags, not just
one.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-25 14:45:28 -07:00
Haiyang Zhang
c6f71c418f hv_netvsc: Fix rndis_filter_close error during netvsc_remove
We now remove rndis filter before unregister_netdev(), which calls
device close. It involves closing rndis filter already removed.

This patch fixes this error.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:55:59 -07:00
David S. Miller
0cf3f4c37d mlx5-updates-2017-08-24
This series includes updates to mlx5 core driver.
 
 From Gal and Saeed, three cleanup patches.
 From Matan, Low level flow steering improvements and optimizations,
  - Use more efficient data structures for flow steering objects handling.
  - Add tracepoints to flow steering operations.
  - Overall these patches improve flow steering rule insertion rate by a
    factor of seven in large scales (~50K rules or more).
 
 -Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZntEwAAoJEEg/ir3gV/o+ghgIAJ5UBPWvZspnbQJHBopsJh47
 d4qt4HrcxxoA07d7QflGSmzqqvoX87eo6mVMQ/WkB+0D8KxggXYr75EOk4lQeYYo
 kiZ+4GdR6UaeQMhykcThKUyEpv60/8wLmXaHvhdWOaVsmzAFwQK0u5HGJlW14lzx
 LHvJGWG377zu+SdpR6wNDrwaHhk2B4Azqb5bomiGTPCg1RdZv3i37/hbF00X9GHB
 ZzPg3Mc5RQvF1fu9H35x4f15pturmMbtuGzmR2oKHMmNS2XQd6lFFlXfQxVUxtdg
 hvAj7RYFrmY1fAPp9cMZbB5ibKkFUFE6idebfrTIrVQrbxv9o0nwRvZTB4lbe9U=
 =hpBO
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-08-24

This series includes updates to mlx5 core driver.

From Gal and Saeed, three cleanup patches.
From Matan, Low level flow steering improvements and optimizations,
 - Use more efficient data structures for flow steering objects handling.
 - Add tracepoints to flow steering operations.
 - Overall these patches improve flow steering rule insertion rate by a
   factor of seven in large scales (~50K rules or more).

====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:49:56 -07:00
Dan Carpenter
256fbe1112 hinic: uninitialized variable in hinic_api_cmd_init()
We never set the error code in this function.

Fixes: eabf0fad81 ("net-next/hinic: Initialize api cmd resources")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:47:11 -07:00
Florian Fainelli
43cee2d246 net: mv643xx_eth: Be drop monitor friendly
txq_reclaim() does the normal transmit queue reclamation and
rxq_deinit() does the RX ring cleanup, none of these are packet drops,
so use dev_consume_skb() for both locations.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 21:27:09 -07:00
Florian Fainelli
1e9d8e7ad3 tg3: Be drop monitor friendly
tg3_tx() does the normal packet TX completion,
tigon3_dma_hwbug_workaround() and tg3_tso_bug() both need to allocate a
new SKB that is suitable to workaround HW bugs, and finally
tg3_free_rings() is doing ring cleanup. Use dev_consume_skb_any() for
these 3 locations to be SKB drop monitor friendly.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 18:52:32 -07:00
Bhumika Goyal
39a7e58924 net/mlx5e: make mlx5e_profile const
Make this const as it is only passed as an argument to the function
mlx5e_create_netdev and the corresponding argument is of type const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 12:33:31 -07:00
Bhumika Goyal
3f2c5fb2d8 net/mlx4_core: make mlx4_profile const
Make these const as they are only used in a copy operation.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 12:33:31 -07:00
Jesper Dangaard Brouer
2886447dc5 ixgbe: use return codes from ndo_xdp_xmit that are distinguishable
For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is
used in three different cases.  (1) When the index or ifindex lookup
fails, and in the ixgbe driver (2) when link is down and (3) when XDP
have not been enabled.

The return code can be picked up by the tracepoint xdp:xdp_redirect
for diagnosing why XDP_REDIRECT isn't working.  Thus, there is a need
different return codes to tell the issues apart.

I'm considering using a specific err-code scheme for XDP_REDIRECT
instead of using these errno codes.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 11:59:37 -07:00
Arkadi Sharshevsky
a481d71323 mlxsw: spectrum_dpipe: Add support for controlling neighbor counters
Add support for controlling neighbor counters via dpipe.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
a86f030915 mlxsw: spectrum_dpipe: Add support for IPv4 host table dump
Add support for IPv4 host table dump.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
7cfcbc7591 mlxsw: spectrum_router: Add support for setting counters on neighbors
Add support for setting counters on neighbors based on dpipe's host table
counter status. This patch also adds the ability for getting the counter
value, which will be used by the dpipe host table implementation in the
next patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
6bba7e20da mlxsw: reg: Make flow counter set type enum to be shared
This is done as a preparation before introducing support for neighbor
counters. The flow counter's type enum is used by many registers, yet,
until now it was used only by mgpc and thus it was private. This patch
updates the namespace for more generic usage.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
6aecb36bc0 mlxsw: spectrum_dpipe: Add IPv4 host table initial support
Add IPv4 host table initial support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
7e57ae9fc5 mlxsw: spectrum_dpipe: Fix label name
Change label name for case of erif table init failure.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
f17cc84d1c mlxsw: spectrum_router: Add helpers for neighbor access
This is done as a preparation before introducing the ability to dump the
host table via dpipe, and to count the table size. The mlxsw's neighbor
representative struct stays private to the router module.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
3580732448 devlink: Move dpipe entry clear function into devlink
The entry clear routine can be shared between the drivers, thus it is
moved inside devlink.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
ffd3cdccf2 devlink: Add support for dynamic table size
Up until now the dpipe table's size was static and known at registration
time. The host table does not have constant size and it is resized in
dynamic manner. In order to support this behavior the size is changed
to be obtained dynamically via an op.

This patch also adjust the current dpipe table for the new API.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Arkadi Sharshevsky
23ca5ec3af mlxsw: spectrum_dpipe: Fix erif table op name space
Fix ERIF's table operations name space.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-24 09:33:16 -07:00
Matan Barak
4c03e69ab1 net/mlx5: Add tracepoints
Add a tracepoint infrastructure for mlx5_core driver.
Implemented flow steering tracepoints:
1. Add flow group
2. Remove flow group
3. Add flow table entry
4. Remove flow table entry
5. Add flow table rule
6. Remove flow table rule

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00
Matan Barak
693c6883bb net/mlx5: Add hash table for flow groups in flow table
When adding a flow table entry (fte) to a flow table (ft), we first
need to find its flow group (fg). Currently, this is done by
traversing a linear list of all flow groups in the flow table.
Furthermore, since multiple flow groups which correspond to the same
fte mask may exist in the same ft, we can't just stop at the first
match. Converting the linear list to rhltable in order to speed things
up.

The last four patches increases the steering rules update rate by a
factor of more than 7 (for insertion of 50K steering rules).

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-24 16:02:58 +03:00