Commit Graph

26391 Commits

Author SHA1 Message Date
Ido Schimmel
a2d2a20553 mlxsw: spectrum: Replace hard-coded default VID with a define
Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).

Prepare for this conversion by replacing the hard-coded '1' with a
define.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
f40be47a3e mlxsw: spectrum_router: Do not force specific configuration order
In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.

In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.

The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.

Instead, call into the bridge code which will check if NVE should be
enabled on the FID.

This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
David S. Miller
6eea2db210 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-12-20

This series contains updates to e100, igb, ixgbe, i40e and ice drivers.

I replaced spinlocks for mutex locks to reduce the latency on CPU0 for
igb when updating the statistics.  This work was based off a patch
provided by Jan Jablonsky, which was against an older version of the igb
driver.

Jesus adjusts the receive packet buffer size from 32K to 30K when
running in QAV mode, to stay within 60K for total packet buffer size for
igb.

Vinicius adds igb kernel documentation regarding the CBS algorithm and
its implementation in the i210 family of NICs.

YueHaibing from Huawei fixed the e100 driver that was potentially
passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL()
instead.

Konstantin Khorenko fixes i40e where we were not setting up the
neigh_priv_len in our net_device, which caused the driver to read beyond
the neighbor entry allocated memory.

Miroslav Lichvar extends the PTP gettime() to read the system clock by
adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e.

Young Xiao fixed the ice driver to only enable NAPI on q_vectors that
actually have transmit and receive rings.

Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the
NIC does not wake up when a cable is plugged in.  This was due to the
driver not setting PME during runtime suspend.

Stephen Douthit enables the ixgbe driver allow DSA devices to use the
MII interface to talk to switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:34:30 -08:00
Jason Gunthorpe
ed50edfb72 Merge branch 'mlx5-next' into rdma.git
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5 updates taken for dependencies on following patches.

* branche 'mlx5-next': (23 commits)
  IB/mlx5: Introduce uid as part of alloc/dealloc transport domain
  net/mlx5: Add shared Q counter bits
  net/mlx5: Continue driver initialization despite debugfs failure
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c
  net/mlx5: Remove the get protocol device interface entry
  net/mlx5: Support extended destination format in flow steering command
  net/mlx5: E-Switch, Change vhca id valid bool field to bit flag
  net/mlx5: Introduce extended destination fields
  net/mlx5: Revise gre and nvgre key formats
  net/mlx5: Add monitor commands layout and event data
  net/mlx5: Add support for plugged-disabled cable status in PME
  net/mlx5: Add support for PCIe power slot exceeded error in PME
  net/mlx5: Rework handling of port module events
  net/mlx5: Move flow counters data structures from flow steering header
  ...
2018-12-20 13:24:50 -07:00
Steve Douthit
643bae17fd ixgbe: use mii_bus to handle MII related ioctls
Use the mii_bus callbacks to address the entire clause 22/45 address
space.  Enables userspace to poke switch registers instead of a single
PHY address.

The ixgbe firmware may be polling PHYs in a way that is not protected by
the mii_bus lock.  This isn't new behavior, but as Andrew Lunn pointed
out there are more addresses available for conflicts.

Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:22:39 -08:00
Steve Douthit
8fa10ef012 ixgbe: register a mdiobus
Most dsa devices expect a 'struct mii_bus' pointer to talk to switches
via the MII interface.

While this works for dsa devices, it will not work safely with Linux
PHYs in all configurations since the firmware of the ixgbe device may
be polling some PHY addresses in the background.

Signed-off-by: Stephen Douthit <stephend@silicom-usa.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:19:11 -08:00
Kai-Heng Feng
1fb3a7a75e igb: Fix an issue that PME is not enabled during runtime suspend
I210 ethernet card doesn't wakeup when a cable gets plugged. It's
because its PME is not set.

Since commit 42eca23021 ("PCI: Don't touch card regs after runtime
suspend D3"), if the PCI state is saved, pci_pm_runtime_suspend() stops
calling pci_finish_runtime_suspend(), which enables the PCI PME.

To fix the issue, let's not to save PCI states when it's runtime
suspend, to let the PCI subsystem enables PME.

Fixes: 42eca23021 ("PCI: Don't touch card regs after runtime suspend D3")
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:14:23 -08:00
Young Xiao
eec903769b ice: Do not enable NAPI on q_vectors that have no rings
If ice driver has q_vectors w/ active NAPI that has no rings,
then this will result in a divide by zero error. To correct it
I am updating the driver code so that we only support NAPI on
q_vectors that have 1 or more rings allocated to them.

See commit 13a8cd191a ("i40e: Do not enable NAPI on q_vectors
that have no rings") for detail.

Signed-off-by: Young Xiao <YangX92@hotmail.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:10:24 -08:00
Miroslav Lichvar
9a2d57a7a0 i40e: extend PTP gettime function to read system clock
This adds support for the PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:06:35 -08:00
Konstantin Khorenko
31389b53b3 i40e: define proper net_device::neigh_priv_len
Out of bound read reported by KASan.

i40iw_net_event() reads unconditionally 16 bytes from
neigh->primary_key while the memory allocated for
"neighbour" struct is evaluated in neigh_alloc() as

  tbl->entry_size + dev->neigh_priv_len

where "dev" is a net_device.

But the driver does not setup dev->neigh_priv_len and
we read beyond the neigh entry allocated memory,
so the patch in the next mail fixes this.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 12:02:26 -08:00
YueHaibing
cd0d465bb6 e100: Fix passing zero to 'PTR_ERR' warning in e100_load_ucode_wait
Fix a static code checker warning:
drivers/net/ethernet/intel/e100.c:1349
 e100_load_ucode_wait() warn: passing zero to 'PTR_ERR'

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 11:54:27 -08:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Jesus Sanchez-Palencia
6f9ae17530 igb: Change RXPBSIZE size when setting Qav mode
Section 4.5.9 of the datasheet says that the total size of all packet
buffers combined (TxPB 0 + 1 + 2 + 3 + RxPB + BMC2OS + OS2BMC) must not
exceed 60KB. Today we are configuring a total of 62KB, so reduce the
RxPB from 32KB to 30KB in order to respect that.

The choice of changing RxPBSIZE here is mainly because it seems more
correct to give more priority to the transmit packet buffers over the
receiver ones when running in Qav mode. Also, the BMC2OS and OS2BMC
sizes are already too short.

Signed-off-by: Jesus Sanchez-Palencia <jesus.s.palencia@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 11:45:10 -08:00
Jeff Kirsher
59361316af igb: reduce CPU0 latency when updating statistics
This change is based off of the work and suggestion of Jan Jablonsky
<jan.jablonsky@thalesgroup.com>.

The Watchdog workqueue in igb driver is scheduled every 2s for each
network interface. That includes updating a statistics protected by
spinlock. Function igb_update_stats in this case will be protected
against preemption. According to number of a statistics registers
(cca 60), processing this function might cause additional cpu load
 on CPU0.

In case of statistics spinlock may be replaced with mutex, which
reduce latency on CPU0.

CC: Bernhard Kaindl  <bernhard.kaindl@thalesgroup.com>
CC: Jan Jablonsky <jan.jablonsky@thalesgroup.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-20 11:02:06 -08:00
Jakub Kicinski
4987eaccd2 nfp: bpf: optimize codegen for JSET with a constant
The top word of the constant can only have bits set if sign
extension set it to all-1, therefore we don't really have to
mask the top half of the register.  We can just OR it into
the result as is.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-20 17:28:29 +01:00
Jakub Kicinski
6e774845b3 nfp: bpf: remove the trivial JSET optimization
The verifier will now understand the JSET instruction, so don't
mark the dead branch in the JIT as noop.  We won't generate any
code, anyway.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-20 17:28:28 +01:00
Michael Chan
0c2ff8d796 bnxt_en: Adjust default RX coalescing ticks to 10 us.
For a little better performance on faster machines and faster link
speeds.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Venkat Duvvuru
abd43a1352 bnxt_en: Support for 64-bit flow handle.
Older firmware only supports 16-bit flow handle, because of which the
number of flows that can be offloaded can’t scale beyond a point.
Newer firmware supports 64-bit flow handle enabling the host to scale
upto millions of flows. With the new 64-bit flow handle support, driver
has to query flow stats in a different way compared to the older approach.

This patch adds support for 64-bit flow handle and new way to query
flow stats.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Michael Chan
cf6daed098 bnxt_en: Increase context memory allocations on 57500 chips for RDMA.
If RDMA is supported on the 57500 chip, increase context memory
allocations for the resources used by RDMA.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Michael Chan
08fe9d1816 bnxt_en: Add Level 2 context memory paging support.
Add the new functions bnxt_alloc_ctx_pg_tbls()/bnxt_free_ctx_pg_tbls()
to allocate and free pages for context memory.  The new functions
will handle the different levels of paging support and allocate/free
the pages accordingly using the existing functions.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Michael Chan
4f49b2b8d4 bnxt_en: Enhance bnxt_alloc_ring()/bnxt_free_ring().
To support level 2 context page memory structures, enhance the
bnxt_ring_mem_info structure with a "depth" field to specify the page
level and add a flag to specify using full pages for L1 and L2 page
tables.  This is needed to support RDMA functionality on 57500 chips
since RDMA requires more context memory.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Venkat Duvvuru
760b6d3341 bnxt_en: Add support for 2nd firmware message channel.
Earlier, some of the firmware commands (ex: CFA_FLOW_*) which are processed
by KONG processor were sent to the CHIMP processor from the host. This
approach was taken as there was no direct message channel to KONG.
CHIMP in turn used to send them to KONG. Newer firmware supports a new
message channel which the host can send messages directly to the KONG
processor.

This patch adds support for required changes needed in the driver
to support direct KONG message channel.  This speeds up flow related
messages sent to the firmware for CLS_FLOWER offload.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Venkat Duvvuru
5c209fc821 bnxt_en: Introduce bnxt_get_hwrm_resp_addr & bnxt_get_hwrm_seq_id routines.
These routines will be enhanced in the subsequent patch to
return the 2nd firmware comm. channel's hwrm response address &
sequence id respectively.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:16 -08:00
Venkat Duvvuru
89455017fb bnxt_en: Avoid arithmetic on void * pointer.
Typecast hwrm_cmd_resp_addr to (u8 *) from (void *) before doing
arithmetic.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:15 -08:00
Venkat Duvvuru
2e9ee39877 bnxt_en: Use macros for firmware message doorbell offsets.
In preparation for adding a 2nd communication channel to firmware.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:15 -08:00
Venkat Duvvuru
fc718bb2d1 bnxt_en: Set hwrm_intr_seq_id value to its inverted value.
Set hwrm_intr_seq_id value to its inverted value instead of
HWRM_SEQ_INVALID, when an hwrm completion of type
CMPL_BASE_TYPE_HWRM_DONE is received. This will enable us to use
the complete 16-bit sequence ID space.

Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:15 -08:00
Michael Chan
3322479e6d bnxt_en: Update firmware interface spec. to 1.10.0.33.
The major changes are in the flow offload firmware APIs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 08:26:15 -08:00
Aviv Heller
a64917446e net/mlx5: Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled,
instead of calling e-switch LAG prereq routine.

Since LAG with SRIOV is allowed only when switchdev mode is on.

Fixes: eff849b2c6 ("net/mlx5: Allow/disallow LAG according to pre-req only")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00
Aviv Heller
0a5b589111 net/mlx5: Fix query_nic_sys_image_guid() error during init
vport system image guid should be queried using vport nic API for
Ethernet ports, and vport hca API for Infiniband ports.

Fixes: fadd59fc50 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00
Eli Britstein
e32ee6c78e net/mlx5e: Support tunnel encap over tagged Ethernet
Generate encap header depending on the routed device to support
native/tagged Ethernet header.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00
Eli Britstein
aa331450b8 net/mlx5e: Support VLAN encap ETH header generation
Support generation of native or tagged Ethernet header for encap
header, depending on provided net device.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00
Eli Britstein
c7bcb277bd net/mlx5e: Re-order route and encap header memory allocation
Change the order to first route IPv4/6 and return if error. Only after
successful route continue to allocate an encap header, with no
functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:02 -08:00
Eli Britstein
05ada1adb6 net/mlx5e: Tunnel encap ETH header helper function
In tunnel encap we prepare the encap header for IPv4/6 cases, in two
separate functions. For ETH header generation the code is almost
duplicated.

Move the ETH header generation code from IPv4/6 functions to a helper
function, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:02 -08:00
Eli Britstein
b168cff0b9 net/mlx5e: Fail attempt to offload e-switch TC encap flows with vlan on underlay
Currently we don't support nor fail attempts to offload encap flows routed
to vlan device on the underlay network. We wrongly consider a vlan underlay
device to be on the same e-switch b/c the switchdev ID is retrieved recursively.

Add explicit check for that and fail such attempts.

Also align to a more strict check for the ingress and the underlay devices
to practically be on the same eswitch.

Fixes: ce99f6b97f ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Fixes: 3e621b19b0 ('net/mlx5e: Support TC encapsulation offloads with upper devices')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:02 -08:00
Eli Britstein
442e1228cb net/mlx5e: Tunnel routing output devs helper function
For tunnel we determine the output devs for IPv4/6 cases, in two
separate functions, with a duplicated code.

Move that code from IPv4/6 functions to a helper function, with no
functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:01 -08:00
Eli Britstein
a0646c88ed net/mlx5e: Fail attempt to offload e-switch TC flows with egress upper devices
We use the switchdev parent HW id helper to identify if the mirred device
shares the same ASIC/port with the ingress device. This can get us wrong
in the presence of upper devices such as vlan or bridge set over the HW
devices (VF or uplink representors), b/c the switchdev ID is retrieved
recursively.

To fail offload attempts in such cases, we condition the check on the
egress device to have not only the same switchdev ID but also the relevant
mlx5 netdev ops.

Fixes: 03a9d11e6e ('net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads')
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:01 -08:00
Or Gerlitz
1ee4457c5c net/mlx5e: Allow vlans on e-switch uplink reps
There are cases (e.g tunneling with vlan on underlay and potentially
more) where this makes sense, so allow that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:01 -08:00
Gavi Teitz
4c8fb2986d net/mlx5e: Increase VF representors' SQ size to 128
The default size for the VF representors' SQ was too small to handle high
packet rates. Doubling the size from 64 to 128 drastically improves the
packet rate under stress (by about 50%), whereas increasing the size
beyond 128 has not shown to make any further difference.

The impact of the SQ size was measured with UDP traffic, in the following
topology: TG <-> PF <-> TC forwarding <-> VF representor <-> VF in VM
over a single core processing bi-directional traffic, with the following
results:

                                  SQ size of 64:     SQ size of 128:
Packet rate for 64B UDP packets:    860 [Kpps]         1280 [Kpps]
Packet rate for 114B VxLan
encapsulated UDP packets:           320 [Kpps]          500 [Kpps]

Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:00 -08:00
Miroslav Lichvar
4a0475d57a mlx5: extend PTP gettime function to read system clock
Read the system time right before and immediately after reading the low
register of the internal timer. This adds support for the
PTP_SYS_OFFSET_EXTENDED ioctl.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:00 -08:00
Miroslav Lichvar
5d8678365c mlx5: update timecounter at least twice per counter overflow
The timecounter needs to be updated at least once in half of the
cyclecounter interval to prevent timecounter_cyc2time() interpreting a
new timestamp as an old value and causing a backward jump.

This would be an issue if the timecounter multiplier was so small that
the update interval would not be limited by the 64-bit overflow in
multiplication.

Shorten the calculated interval to make sure the timecounter is updated
in time even when the system clock is slowed down by up to 10%, the
multiplier is increased by up to 10%, and the scheduled overflow check
is late by 15%.

Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ariel Levkovich <lariel@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:00 -08:00
Peng Li
1154bb26c8 net: hns3: remove redundant variable initialization
This patch removes the redundant variable initialization,
as driver will devm_kzalloc to set value to hdev soon.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:59 -08:00
Peng Li
31a16f99e0 net: hns3: fix the descriptor index when get rss type
Driver gets rss information from the last descriptor of the packet.
When driver handle the rss type, ring->next_to_clean indicates the
first descriptor of next packet.

This patch fix the descriptor index with "ring->next_to_clean - 1".

Fixes: 232fc64b6e ("net: hns3: Add HW RSS hash information to RX skb")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:59 -08:00
Jian Shen
8edc2285b7 net: hns3: don't restore rules when flow director is disabled
When user disables flow director, all the rules will be disabled. But
when reset happens, it will restore all the rules again. It's not
reasonable. This patch fixes it by add flow director status check before
restore fules.

Fixes: 6871af29b3 ("net: hns3: Add reset handle for flow director")
Fixes: c17852a893 ("net: hns3: Add support for enable/disable flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:59 -08:00
Jian Shen
0285dbae5d net: hns3: fix vf id check issue when add flow director rule
When add flow director fule for vf, the vf id is used as array
subscript before valid checking, which may cause memory overflow.

Fixes: dd74f815dd ("net: hns3: Add support for rule add/delete for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Huazhong Tan
39cfbc9c4f net: hns3: reset tqp while doing DOWN operation
While doing DOWN operation, the driver will reclaim the memory which has
already used for TX. If the hardware is processing this memory, it will
cause a RCB error to the hardware. According the hardware's description,
the driver should reset the tqp before reclaim the memory during DOWN.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Jian Shen
75edb61086 net: hns3: add max vector number check for pf
Each pf supports max 64 vectors and 128 tqps. For 2p/4p core scenario,
there may be more than 64 cpus online. So the result of min_t(u16,
num_Online_cpus(), tqp_num) may be more than 64. This patch adds check
for the vector number.

Fixes: dd38c72604 ("net: hns3: fix for coalesce configuration lost during reset")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Peng Li
1b7d7b0581 net: hns3: fix a bug caused by udelay
udelay() in driver may always occupancy processor. If there is only
one cpu in system, the VF driver may initialize fail when insmod
PF and VF driver in the same system. This patch use msleep() to free
cpu when VF wait PF message.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Jian Shen
a298797532 net: hns3: change default tc state to close
In original codes, default tc value is set to the max tc. It's more
reasonable to close tc by changing default tc value to 1. Users can
enable it with lldp tool when they want to use tc.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Jian Shen
8cdb992f0d net: hns3: refine the handle for hns3_nic_net_open/stop()
When triggering nic down, there is a time window between bringing down
the protocol stack and stopping the work task. If the net is up in the
time window, it may bring up the protocol stack again.

This patch fixes it by stop the work task at the beginning of
hns3_nic_net_stop(). To keep symmetrical, start the work task at the
end of hns3_nic_net_open().

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 23:47:58 -08:00
Antoine Tenart
1b451fb205 net: mvpp2: fix the phylink mode validation
The mvpp2_phylink_validate() sets all modes that are supported by a
given PPv2 port. An mistake made the 10000baseT_Full mode being
advertised in some cases when a port wasn't configured to perform at
10G. This patch fixes this.

Fixes: d97c9f4ab0 ("net: mvpp2: 1000baseX support")
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 16:38:35 -08:00
Biao Huang
22a3a5403b net-next: stmmac: dwmac-mediatek: remove fine-tune property
1. remove fine-tune property and related setting to simplify
the timing adjustment flow.
2. set timing value according to the value from device tree,
and will not care whether PHY insert internal delay.

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 16:24:58 -08:00
Bryan Whitehead
e0e587878f lan743x: Remove MAC Reset from initialization
The MAC Reset was noticed to erase important EEPROM settings.
It is also unnecessary since a chip wide reset was done earlier
in initialization, and that reset preserves EEPROM settings.

There for this patch removes the unnecessary MAC specific reset.

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 15:48:11 -08:00
Alaa Hleihel
4765420439 net/mlx5e: Remove the false indication of software timestamping support
mlx5 driver falsely advertises support of software timestamping.
Fix it by removing the false indication.

Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-19 13:31:16 -08:00
Yuval Avnery
f033788914 net/mlx5: Typo fix in del_sw_hw_rule
Expression terminated with "," instead of ";", resulted in
set_fte getting bad value for modify_enable_mask field.

Fixes: bd5251dbf1 ("net/mlx5_core: Introduce flow steering destination of type counter")
Signed-off-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-19 13:31:16 -08:00
Tariq Toukan
bfc698254b net/mlx5e: RX, Fix wrong early return in receive queue poll
When the completion queue of the RQ is empty, do not immediately return.
If left-over decompressed CQEs (from the previous cycle) were processed,
need to go to the finalization part of the poll function.

Bug exists only when CQE compression is turned ON.

This solves the following issue:
mlx5_core 0000:82:00.1: mlx5_eq_int:544:(pid 0): CQ error on CQN 0xc08, syndrome 0x1
mlx5_core 0000:82:00.1 p4p2: mlx5e_cq_error_event: cqn=0x000c08 event=0x04

Fixes: 4b7dfc9925 ("net/mlx5e: Early-return on empty completion queues")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-19 13:31:16 -08:00
Ido Schimmel
b61cd7c6f9 mlxsw: spectrum_router: Hold a reference on RIF's netdev
Previous patches tried to make RIF deletion more robust and avoid
use-after-free situations.

As another precaution, hold a reference on a RIF's netdev and release it
when the RIF is deleted.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
965fa8e600 mlxsw: spectrum_router: Make RIF deletion more robust
In the past we had multiple instances where RIFs were not properly
deleted.

One of the reasons for leaking a RIF was that at the time when IP
addresses were flushed from the respective netdev (prompting the
destruction of the RIF), the netdev was no longer a mlxsw upper. This
caused the inet{,6}addr notification blocks to ignore the NETDEV_DOWN
event and leak the RIF.

Instead of checking whether the netdev is our upper when an IP address
is removed, we can instead check if the netdev has a RIF configured.

To look up a RIF we need to access mlxsw private data, so the patch
stores the notification blocks inside a mlxsw struct. This then allows
us to use container_of() and extract the required private data.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
21ffedb6db mlxsw: spectrum_router: Propagate 'struct mlxsw_sp' further
Next patch is going to make RIF deletion more robust by removing
reliance on fragile mlxsw_sp_lower_get(). This is because a netdev is
not necessarily our upper anymore when its IP addresses are flushed.

The inet{,6}addr notification blocks are going to resolve 'struct
mlxsw_sp' using container_of(), but the functions they call still use
mlxsw_sp_lower_get().

As a preparation for the next patch, propagate 'struct mlxsw_sp' down to
the functions called from the notification blocks and remove reliance on
mlxsw_sp_lower_get().

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
be2d6f421f mlxsw: spectrum: Properly cleanup LAG uppers when removing port from LAG
When a LAG device or a VLAN device on top of it is enslaved to a bridge,
the driver propagates the CHANGEUPPER event to the LAG's slaves.

This causes each physical port to increase the reference count of the
internal representation of the bridge port by calling
mlxsw_sp_port_bridge_join().

However, when a port is removed from a LAG, the corresponding leave()
function is not called and the reference count is not decremented. This
leads to ugly hacks such as mlxsw_sp_bridge_port_should_destroy() that
try to understand if the bridge port should be destroyed even when its
reference count is not 0.

Instead, make sure that when a port is unlinked from a LAG it would see
the same events as if the LAG (or its uppers) were unlinked from a
bridge.

The above is achieved by walking the LAG's uppers when a port is
unlinked and calling mlxsw_sp_port_bridge_leave() for each upper that is
enslaved to a bridge.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
635c8c8bba mlxsw: spectrum: Remove reference count from VLAN entries
Commit b3529af6bb ("spectrum: Reference count VLAN entries") started
reference counting port-VLAN entries in a similar fashion to the 8021q
driver.

However, this is not actually needed and only complicates things.
Instead, the driver should forbid the creation of a VLAN on a port if
this VLAN already exists. This would also solve the issue fixed by the
mentioned commit.

Therefore, remove the get()/put() API and use create()/destroy()
instead.

One place that needs special attention is VLAN addition in a VLAN-aware
bridge via switchdev operations. In case the VLAN flags (e.g., 'pvid')
are toggled, then the VLAN entry already exists. To prevent the driver
from wrongly returning EEXIST, the driver is changed to check in the
prepare phase whether the entry already exists and only returns an error
in case it is not associated with the correct bridge port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
e149113a74 mlxsw: spectrum: Handle VLAN device unlinking
In commit 993107fea5 ("mlxsw: spectrum_switchdev: Fix VLAN device
deletion via ioctl") I fixed a bug caused by the fact that the driver
views differently the deletion of a VLAN device when it is deleted via
an ioctl and netlink.

Instead of relying on a specific order of events (device being
unregistered vs. VLAN filter being updated), simply make sure that the
driver performs the necessary cleanup when the VLAN device is unlinked,
which always happens before the other two events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
f1d7c33d6a mlxsw: spectrum_fid: Remove unused function
This function is no longer used. Remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
32fd4b49a3 mlxsw: spectrum_router: Do not destroy RIFs based on FID's reference count
Currently, when a RIF is constructed on top of a FID, the RIF increments
the FID's reference count and the RIF is destroyed when the FID's
reference count drops to 1. This effectively means that when no local
ports are member in the FID, the FID is destroyed regardless if the
router port is a member in the FID or not.

The above can lead to the unexpected behavior in which routes using a
VLAN interface as their nexthop device are no longer offloaded after the
last local port leaves the corresponding VLAN (FID).

Example:
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1 offload
# bridge vlan del vid 10 dev swp3
# ip -4 route show dev br0.10
192.0.2.0/24 proto kernel scope link src 192.0.2.1

After the patch, the route is offloaded before and after the VLAN is
removed from local port 'swp3', as the RIF corresponding to 'br0.10'
continues to exists.

In order to remove RIFs' reliance on the underlying FID's reference
count, we need to add a reference count to sub-port RIFs, which are RIFs
that correspond to physical ports and their uppers (e.g., LAG devices).

In this case, each {Port, VID} ('struct mlxsw_sp_port_vlan') needs to
hold a reference on the RIF. For example:

                       bond0.10
                          |
                        bond0
                          |
                      +-------+
                      |       |
                    swp1    swp2

Both {Port 1, VID 10} and {Port 2, VID 10} will hold a reference on the
RIF corresponding to 'bond0.10'. When the last reference is dropped, the
RIF will be destroyed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Ido Schimmel
927d0ef10a mlxsw: spectrum: Sanitize VLAN interface's uppers
Currently, only VRF and macvlan uppers are supported on top of VLAN
device configured over a bridge, so make sure the driver forbids other
uppers.

Note that enslavement to a VRF is handled earlier in the notification
block, so there is no need to check for a VRF upper here.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 12:28:07 -08:00
Michael Chan
84404d5fd5 bnxt_en: Fix ethtool self-test loopback.
The current code has 2 problems.  It assumes that the RX ring for
the loopback packet is combined with the TX ring.  This is not
true if the ethtool channels are set to non-combined mode.  The
second problem is that it won't work on 57500 chips without
adjusting the logic to get the proper completion ring (cpr) pointer.
Fix both issues by locating the proper cpr pointer through the RX
ring.

Fixes: e44758b78a ("bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:31:09 -08:00
Florian Westphal
a84e3f5333 xfrm: prefer secpath_set over secpath_dup
secpath_set is a wrapper for secpath_dup that will not perform
an allocation if the secpath attached to the skb has a reference count
of one, i.e., it doesn't need to be COW'ed.

Also, secpath_dup doesn't attach the secpath to the skb, it leaves
this to the caller.

Use secpath_set in places that immediately assign the return value to
skb.

This allows to remove skb->sp without touching these spots again.

secpath_dup can eventually be removed in followup patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:38 -08:00
Florian Westphal
6362a6a040 drivers: net: ethernet: mellanox: use skb_sec_path helper
Will avoid touching this when sp pointer is removed from sk_buff struct.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Florian Westphal
2fdb435bc0 drivers: net: intel: use secpath helpers in more places
Use skb_sec_path and secpath_exists helpers where possible.
This reduces noise in followup patch that removes skb->sp pointer.

v2: no changes, preseve acks from v1.

Acked-by: Shannon Nelson <shannon.lee.nelson@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 11:21:37 -08:00
Ioana Radulescu
610febc68a dpaa2-eth: Add QBMAN related stats
Add statistics for pending frames in Rx/Tx conf FQs and
number of buffers in pool. Available through ethtool -S.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-19 10:37:22 -08:00
Heiner Kallweit
33f18c96af net: ethernet: don't set phylib state CHANGELINK in drivers
After phy_start() phylib takes care of all needed actions, including
aneg settings and checking link state. There's no need to set state
PHY_CHANGELINK in drivers.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 22:07:20 -08:00
Colin Ian King
f7db2beb4c vxge: ensure data0 is initialized in when fetching firmware version information
Currently variable data0 is not being initialized so a garbage value is
being passed to vxge_hw_vpath_fw_api and this value is being written to
the rts_access_steer_data0 register.  There are other occurrances where
data0 is being initialized to zero (e.g. in function
vxge_hw_upgrade_read_version) so I think it makes sense to ensure data0
is initialized likewise to 0.

Detected by CoverityScan, CID#140696 ("Uninitialized scalar variable")

Fixes: 8424e00dfd ("vxge: serialize access to steering control register")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 22:00:40 -08:00
Bryan Whitehead
0db7d253e9 lan743x: Expand phy search for LAN7431
The LAN7431 uses an external phy, and it can be found anywhere in
the phy address space. This patch uses phy address 1 for LAN7430
only. And searches all addresses otherwise.

Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 21:35:56 -08:00
David S. Miller
6c86bc2342 mlx5-uplink-rep-2018-12-15
Or Gerlitz says:
 
 This series is essentially a cleanup to align with the rest of the NIC
 switchdev drivers and make us
 more robust and clear/n: currently the PF netdev serves as the mlx5
 e-switch uplink netdev
 representor when going into switchdev mode and back as plain NIC
 netdev when going out.
 This causes some irregularities and misc troubles.
 
 Move to use dedicated uplink rep, as we have for the VF vports.
 
 The uplink rep netdev does has sysfs link and supports the sriov vf
 mac ndo, these two are in
 use by libvirt and other orchestrators, It also has richer ethtool
 support to allow controlling the
 port link & mtu along with supporting dcb and plugging into the mlx5
 lag logic.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcF/MBAAoJEEg/ir3gV/o+apkIAKVeB/ha8qiY+9ntA9B8HVxO
 cYb+fv4+DuYjTg/nN5QdYSAisN4SwFDA5ukz/ckr6XwV65EgTjAtlon/fHg/rDyz
 YT3dJRP9yjNXKxRvxa0K4PmTasmXGXzaJzsXYemHsfCDdnkNknZsqA+yoNe+2c40
 HoNsdy/Pvr18PjzYtpX0BiuSjH623daBmdFH0o+rwuwsApkAUlBJfosTg/7xyU7v
 TAjy8S1HCloJ7OAjOlzXVmYr65sAlQoFr860vLDQ7Og7JemxjSXQ+AdNb6mSAvZ5
 LBEvCA3G5tb4OkFX+1i/alxOcuG8/FoO+O3ZQWve1DNxoS76po851bgHXIdZCPg=
 =pqTq
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-uplink-rep-2018-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed:

====================
mlx5-uplink-rep-2018-12-15

Or Gerlitz says:

This series is essentially a cleanup to align with the rest of the NIC
switchdev drivers and make us
more robust and clear/n: currently the PF netdev serves as the mlx5
e-switch uplink netdev
representor when going into switchdev mode and back as plain NIC
netdev when going out.
This causes some irregularities and misc troubles.

Move to use dedicated uplink rep, as we have for the VF vports.

The uplink rep netdev does has sysfs link and supports the sriov vf
mac ndo, these two are in
use by libvirt and other orchestrators, It also has richer ethtool
support to allow controlling the
port link & mtu along with supporting dcb and plugging into the mlx5
lag logic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 16:44:45 -08:00
Anssi Hannula
6e0af29806 net: macb: add missing barriers when reading descriptors
When reading buffer descriptors on RX or on TX completion, an
RX_USED/TX_USED bit is checked first to ensure that the descriptors have
been populated, i.e. the ownership has been transferred. However, there
are no memory barriers to ensure that the data protected by the
RX_USED/TX_USED bit is up-to-date with respect to that bit.

Specifically:

- TX timestamp descriptors may be loaded before ctrl is loaded for the
  TX_USED check, which is racy as the descriptors may be updated between
  the loads, causing old timestamp descriptor data to be used.

- RX ctrl may be loaded before addr is loaded for the RX_USED check,
  which is racy as a new frame may be written between the loads, causing
  old ctrl descriptor data to be used.
  This issue exists for both macb_rx() and gem_rx() variants.

Fix the races by adding DMA read memory barriers on those paths and
reordering the reads in macb_rx().

I have not observed any actual problems in practice caused by these
being missing, though.

Tested on a ZynqMP based system.

Fixes: 89e5785fc8 ("[PATCH] Atmel MACB ethernet driver")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 16:17:48 -08:00
Anssi Hannula
8159ecab0d net: macb: fix dropped RX frames due to a race
Bit RX_USED set to 0 in the address field allows the controller to write
data to the receive buffer descriptor.

The driver does not ensure the ctrl field is ready (cleared) when the
controller sees the RX_USED=0 written by the driver. The ctrl field might
only be cleared after the controller has already updated it according to
a newly received frame, causing the frame to be discarded in gem_rx() due
to unexpected ctrl field contents.

A message is logged when the above scenario occurs:

  macb ff0b0000.ethernet eth0: not whole frame pointed by descriptor

Fix the issue by ensuring that when the controller sees RX_USED=0 the
ctrl field is already cleared.

This issue was observed on a ZynqMP based system.

Fixes: 4df95131ea ("net/macb: change RX path for GEM")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 16:17:48 -08:00
Anssi Hannula
e100a897bf net: macb: fix random memory corruption on RX with 64-bit DMA
64-bit DMA addresses are split in upper and lower halves that are
written in separate fields on GEM. For RX, bit 0 of the address is used
as the ownership bit (RX_USED). When the RX_USED bit is unset the
controller is allowed to write data to the buffer.

The driver does not guarantee that the controller already sees the upper
half when the RX_USED bit is cleared, possibly resulting in the
controller writing an incoming frame to an address with an incorrect
upper half and therefore possibly corrupting unrelated system memory.

Fix that by adding the necessary DMA memory barrier between the writes.

This corruption was observed on a ZynqMP based system.

Fixes: fff8019a08 ("net: macb: Add 64 bit addressing support for GEM")
Signed-off-by: Anssi Hannula <anssi.hannula@bitwise.fi>
Acked-by: Harini Katakam <harini.katakam@xilinx.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 16:17:48 -08:00
Claudiu Beznea
4298388574 net: macb: restart tx after tx used bit read
On some platforms (currently detected only on SAMA5D4) TX might stuck
even the pachets are still present in DMA memories and TX start was
issued for them. This happens due to race condition between MACB driver
updating next TX buffer descriptor to be used and IP reading the same
descriptor. In such a case, the "TX USED BIT READ" interrupt is asserted.
GEM/MACB user guide specifies that if a "TX USED BIT READ" interrupt
is asserted TX must be restarted. Restart TX if used bit is read and
packets are present in software TX queue. Packets are removed from software
TX queue if TX was successful for them (see macb_tx_interrupt()).

Signed-off-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 15:57:07 -08:00
YueHaibing
8937388acb qlcnic: remove set but not used variables 'op, cmd_op'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1070:5: warning:
 variable 'op' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c:1342:5: warning:
 variable 'cmd_op' set but not used [-Wunused-but-set-variable]

'op' never used since introduction in commit 7cb03b2347 ("qlcnic:
Support VF-PF communication channel commands.")
'cmd_op' not used since commit 6226204bcf ("qlcnic: Fix operation
type and command type.")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 15:49:46 -08:00
Dan Carpenter
b26322d2ac net: stmmac: Fix an error code in probe()
The function should return an error if create_singlethread_workqueue()
fails.

Fixes: 34877a15f7 ("net: stmmac: Rework and fix TX Timeout code")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 15:45:28 -08:00
Dan Carpenter
f07d427689 qed: Fix an error code qed_ll2_start_xmit()
We accidentally deleted the code to set "rc = -ENOMEM;" and this patch
adds it back.

Fixes: d2201a2159 ("qed: No need for LL2 frags indication")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 15:43:44 -08:00
Heiner Kallweit
2429f13870 net: fec: remove workaround to restart phylib state machine on MDIO timeout
There's a workaround to restart the phylib state machine in case of a
MDIO access timeout. Seems it was introduced to deal with the
consequences of a too small MDIO timeout. See also commit message of
c3b084c24c ("net: fec: Adjust ENET MDIO timeouts") which increased
the timeout value later. Due to the later timeout value fix it seems
to be safe to remove the workaround.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 15:01:55 -08:00
Antoine Tenart
0067917720 net: mvpp2: 10G modes aren't supported on all ports
The mvpp2_phylink_validate() function sets all modes that are
supported by a given PPv2 port. A recent change made all ports to
advertise they support 10G modes in certain cases. This is not true,
as only the port #0 can do so. This patch fixes it.

Fixes: 01b3fd5ac9 ("net: mvpp2: fix detection of 10G SFP modules")
Cc: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 14:48:15 -08:00
Yunsheng Lin
af854724e5 net: hns3: fix a SSU buffer checking bug
When caculating the SSU buffer, it first allocate tx and
rx private buffer, then the remaining buffer is for rx
shared buffer. The remaining buffer size should be at
least bigger than or equal to the shared_std, which is the
minimum shared buffer size required by the driver, but
currently if the remaining buffer size is equal to the
shared_std, it returns failure, which causes SSU buffer
allocation failure problem.

This patch fixes this problem by rounding up shared_std before
checking the the remaining buffer size bigger than or equal to
the shared_std.

Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Yunsheng Lin
b9a400ac29 net: hns3: aligning buffer size in SSU to 256 bytes
The hardware expects the buffer size set to SSU is aligned to
256 bytes, this patch aligns the buffer size to 256 byte using
roundup or rounddown function.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Yunsheng Lin
368686be23 net: hns3: getting tx and dv buffer size through firmware
This patch adds support of getting tx and dv buffer size through
firmware, because different version of hardware requires different
size of tx and dv buffer.

This patch also add dv_buf_size to tc' private buffer size even if
pfc is not enable for the tc.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Peng Li
0ad5ea5dbd net: hns3: synchronize speed and duplex from phy when phy link up
Driver calls phy_connect_direct and registers hclge_mac_adjust_link
to synchronize mac speed and duplex from phy. It is better to
synchronize mac speed and duplex from phy when phy link up.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Fuyun Liang
8362089d78 net: hns3: remove 1000M/half support of phy
Our phy does not support 1000M/half, this patch removes 1000M/half from
PHY_SUPPORTED_FEATURES.

Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Peng Li
7445565cd0 net: hns3: update coalesce param per second
coalesce param updates every 100 napi times, it may update a little
late if ping test after a high rate flow, may over napi poll is called
100 times as ping test sends packets every second.

This patch updates coalesce param every second, instead with every
100 napi times. It can not update the param 100% in time, but the
lag time is very short.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
ae6017a711 net: hns3: fix incomplete uninitialization of IRQ in the hns3_nic_uninit_vector_data()
In the hns3_nic_uninit_vector_data(), the procedure of uninitializing
the tqp_vector's IRQ has not set affinity_notify to NULL and changes
its init flag. This patch fixes it. And for simplificaton, local
variable tqp_vector is used instead of priv->tqp_vector[i].

Fixes: 424eb834a9 ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
b51c366df7 net: hns3: remove unnecessary configuration recapture while resetting
When doing reset, it is unnecessary to get the hardware's default
configuration again, otherwise, the user's configuration will be
overwritten.

Fixes: 4ed340ab8f ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
b644a8d4cb net: hns3: update some variables while hclge_reset()/hclgevf_reset() done
When hclge_reset() completes successfully, it should update the
last_reset_time, set reset_fail_cnt to 0, and set reset_type of
hnae3_ae_dev to HNAE3_NONE_RESET.

Also when hclgevf_reset() completes successfully, it should update
the last_reset_time, and set reset_type of hnae3_ae_dev to
HNAE3_NONE_RESET.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
531eba0fe2 net: hns3: fix napi_disable not return problem
While doing DOWN, the calling of napi_disable() may not return, since the
napi_complete() in the hns3_nic_common_poll() will never be called when
HNS3_NIC_STATE_DOWN is set. So we need to call napi_complete() before
checking HNS3_NIC_STETE_DOWN.

Fixes: ff0699e04b ("net: hns3: stop napi polling when HNS3_NIC_STATE_DOWN is set")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
e3338205f0 net: hns3: uninitialize pci in the hclgevf_uninit
In the hclgevf_pci_reset(), it only uninitialize and initialize
the msi, so if the initialization fails, hclgevf_uninit_hdev()
does not need to uninitialize the msi, but needs to uninitialize
the pci, otherwise it will cause pci resource not free.

Fixes: 862d969a3a ("net: hns3: do VF's pci re-initialization while PF doing FLR")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:01 -08:00
Huazhong Tan
cda69d2445 net: hns3: fix error handling int the hns3_get_vector_ring_chain
When hns3_get_vector_ring_chain() failed in the
hns3_nic_init_vector_data(), it should do the error handling instead
of return directly.

Also, cur_chain should be freed instead of chain and head->next should
be set to NULL in error handling of hns3_get_vector_ring_chain.

This patch fixes them.

Fixes: 73b907a083 ("net: hns3: bugfix for buffer not free problem during resetting")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 12:01:00 -08:00
Ido Schimmel
5edb7e8bd5 mlxsw: spectrum_nve: Fix memory leak upon driver reload
The pointer was NULLed before freeing the memory, resulting in a memory
leak. Trace from kmemleak:

unreferenced object 0xffff88820ae36528 (size 512):
  comm "devlink", pid 5374, jiffies 4295354033 (age 10829.296s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000a43f5195>] kmem_cache_alloc_trace+0x1be/0x330
    [<00000000312f8140>] mlxsw_sp_nve_init+0xcb/0x1ae0
    [<0000000009201d22>] mlxsw_sp_init+0x1382/0x2690
    [<000000007227d877>] mlxsw_sp1_init+0x1b5/0x260
    [<000000004a16feec>] __mlxsw_core_bus_device_register+0x776/0x1360
    [<0000000070ab954c>] mlxsw_devlink_core_bus_device_reload+0x129/0x220
    [<00000000432313d5>] devlink_nl_cmd_reload+0x119/0x1e0
    [<000000003821a06b>] genl_family_rcv_msg+0x813/0x1150
    [<00000000d54d04c0>] genl_rcv_msg+0xd1/0x180
    [<0000000040543d12>] netlink_rcv_skb+0x152/0x3c0
    [<00000000efc4eae8>] genl_rcv+0x2d/0x40
    [<00000000ea645603>] netlink_unicast+0x52f/0x740
    [<00000000641fca1a>] netlink_sendmsg+0x9c7/0xf50
    [<00000000fed4a4b8>] sock_sendmsg+0xbe/0x120
    [<00000000d85795a9>] __sys_sendto+0x397/0x620
    [<00000000c5f84622>] __x64_sys_sendto+0xe6/0x1a0

Fixes: 6e6030bd54 ("mlxsw: spectrum_nve: Implement common NVE core")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 09:17:39 -08:00
Ido Schimmel
5d5043917a mlxsw: spectrum: Add trap for decapsulated ARP packets
After a packet was decapsulated it is classified to the relevant FID
based on its VNI and undergoes L2 forwarding.

Unlike regular (non-encapsulated) ARP packets, Spectrum does not trap
decapsulated ARP packets during L2 forwarding and instead can only trap
such packets in the underlay router during decapsulation.

Add this missing packet trap, which is required for VXLAN routing when
the MAC of the target host is not known.

Fixes: b02597d513 ("mlxsw: spectrum: Add NVE packet traps")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 09:17:38 -08:00
Shalom Toledo
cf0b70e71b mlxsw: core: Increase timeout during firmware flash process
During the firmware flash process, some of the EMADs get timed out, which
causes the driver to send them again with a limit of 5 retries. There are
some situations in which 5 retries is not enough and the EMAD access fails.
If the failed EMAD was related to the flashing process, the driver fails
the flashing.

The reason for these timeouts during firmware flashing is cache misses in
the CPU running the firmware. In case the CPU needs to fetch instructions
from the flash when a firmware is flashed, it needs to wait for the
flashing to complete. Since flashing takes time, it is possible for pending
EMADs to timeout.

Fix by increasing EMADs' timeout while flashing firmware.

Fixes: ce6ef68f43 ("mlxsw: spectrum: Implement the ethtool flash_device callback")
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-18 09:17:38 -08:00
John Hurley
b12c97d45c nfp: flower: fix cb_ident duplicate in indirect block register
Previously the identifier used for indirect block callback registry and
for block rule cb registry (when done via indirect blocks) was the pointer
to the netdev we were interested in receiving updates on. This worked fine
if a single app existed that registered one callback per netdev of
interest. However, if multiple cards are in place and, in turn, multiple
apps, then each app may register the same callback with the same
identifier to both the netdev's indirect block cb list and to a block's cb
list. This can lead to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the app pointer as the identifier for
netdev indirect block cb registry, allowing each app to register a unique
callback per netdev. For block cb registry, the same app may register
multiple cbs to the same block if using TC shared blocks. Instead of the
app, use the pointer to the allocated cb_priv data as the identifier here.
This means that there can be a unique block callback for each app/netdev
combo.

Fixes: 3166dd07a9 ("nfp: flower: offload tunnel decap rules via indirect TC blocks")
Reported-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:34:12 -08:00
Shalom Toledo
d1675a1602 mlxsw: spectrum: Update the supported firmware to version 13.1910.622
This new firmware contains:
 * New packet traps for discarded packets
 * Secure firmware flash bug fix
 * Fence mechanism bug fix
 * TCAM RMA bug fix

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:32:29 -08:00
Vasundhara Volam
56d3746247 bnxt_en: query force speeds before disabling autoneg mode.
With autoneg enabled, PHY loopback test fails. To disable autoneg,
driver needs to send a valid forced speed to FW. FW is not sending
async event for invalid speeds. To fix this, query forced speeds
and send the correct speed when disabling autoneg mode.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
fd3ab1c70e bnxt_en: Do not free port statistics buffer when device is down.
Port statistics which include RDMA counters are useful even when the
netdevice is down.  Do not free the port statistics DMA buffers
when the netdevice is down.  This is keep the snapshot of the port
statistics and counters will just continue counting when the
netdevice goes back up.

Split the bnxt_free_stats() function into 2 functions.  The port
statistics buffers will only be freed when the netdevice is
removed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
b8875ca356 bnxt_en: Save ring statistics before reset.
With the current driver, the statistics reported by .ndo_get_stats64()
are reset when the device goes down.  Store a snapshot of the
rtnl_link_stats64 before shutdown.  This snapshot is added to the
current counters in .ndo_get_stats64() so that the counters will not
get reset when the device is down.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam
7c675421af bnxt_en: Return linux standard errors in bnxt_ethtool.c
Currently firmware specific errors are returned directly in flash_device
and reset ethtool hooks. Modify it to return linux standard errors
to userspace when flashing operations fail.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
24654f095e bnxt_en: Don't set ETS on unused TCs.
Currently, the code allows ETS bandwidth weight 0 to be set on unused TCs.
We should not set any DCB parameters on unused TCs at all.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
e37fed7903 bnxt_en: Add ethtool -S priority counters.
Display the CoS counters as additional priority counters by looking up
the priority to CoS queue mapping.  If the TX extended port statistics
block size returned by firmware is big enough to cover the CoS counters,
then we will display the new priority counters.  We call firmware to get
the up-to-date pri2cos mapping to convert the CoS counters to
priority counters.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
b16b689186 bnxt_en: Add SR-IOV support for 57500 chips.
There are some minor differences when assigning VF resources on the
new chips.  The MSIX (NQ) resource has to be assigned and ring group
is not needed on the new chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
36d65be9a8 bnxt_en: Disable MSIX before re-reserving NQs/CMPL rings.
When bringing up a device, the code checks to see if the number of
MSIX has changed.  pci_disable_msix() should be called first before
changing the number of reserved NQs/CMPL rings.  This ensures that
the MSIX vectors associated with the NQs/CMPL rings are still
properly mapped when pci_disable_msix() masks the vectors.

This patch will prevent errors when RDMA support is added for the new
57500 chips.  When the RDMA driver shuts down, the number of NQs is
decreased and we must use the new sequence to prevent MSIX errors.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam
780baad44f bnxt_en: Reserve 1 stat_ctx for RDMA driver.
bnxt_en requires same number of stat_ctxs as CP rings but RDMA
requires only 1 stat_ctx.  Also add a new parameter resv_stat_ctxs
to better keep track of stat_ctxs reserved including resources used
by RDMA.  Add a stat_ctxs parameter to all the relevant resource
reservation functions so we can reserve the correct number of
stat_ctxs.

Prior to this patch, we were not reserving the extra stat_ctx for
RDMA and RDMA would not work on the new 57500 chips.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam
f4e896142d bnxt_en: Do not modify max_stat_ctxs after RDMA driver requests/frees stat_ctxs
Calling bnxt_set_max_func_stat_ctxs() to modify max stat_ctxs requested
or freed by the RDMA driver is wrong. After introducing reservation of
resources recently, the driver has to keep track of all stat_ctxs
including the ones used by the RDMA driver.  This will provide a better
foundation for accurate accounting of the stat_ctxs.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Vasundhara Volam
c027c6b4e9 bnxt_en: get rid of num_stat_ctxs variable
For bnxt_en driver, stat_ctxs created will always be same as
cp_nr_rings. Remove extra variable that duplicates the value.
Also introduce bnxt_get_avail_stat_ctxs_for_en() helper to get
available stat_ctxs and bnxt_get_ulp_stat_ctxs() helper to return
number of stat_ctxs used by RDMA.

Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
e916b0815a bnxt_en: Add bnxt_get_avail_cp_rings_for_en() helper function.
The available CP rings are calculated differently on the new 57500
chips, so add this helper to do this calculation correctly.  The
VFs will be assigned these available CP rings.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:53 -08:00
Michael Chan
f7588cd893 bnxt_en: Store the maximum NQs available on the PF.
The PF has a pool of NQs and MSIX vectors assigned to it based on
NVRAM configurations.  The number of usable MSIX vectors on the PF
is the minimum of the NQs and MSIX vectors.  Any excess NQs without
associated MSIX may be used for the VFs, so we need to store this
max_nqs value.  max_nqs minus the NQs used by the PF will be the
available NQs for the VFs.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 23:08:52 -08:00
Leon Romanovsky
199fa087dc net/mlx5: Continue driver initialization despite debugfs failure
The failure to create debugfs entry is unpleasant event, but not enough
to abort drier initialization. Align the mlx5_core code to debugfs design
and continue execution whenever debugfs_create_dir() successes or not.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 14:35:25 -08:00
Joakim Tjernlund
a28777f250 ucc_geth: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Joakim Tjernlund
6211d46713 gianfar: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Joakim Tjernlund
6e8b0ff1ba dpaa_eth: Add change_carrier() for Fixed PHYs
This allows to control carrier from /sys/class/net/ethX/carrier
for Fixed PHYs.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:24:32 -08:00
Eric Dumazet
4beaacc6fe net/mlx4_en: remove fallback after kzalloc_node()
kzalloc_node(..., GFP_KERNEL, node) will attempt to allocate
memory as close as possible to the node.

There is no need to fallback to kzalloc() if this has failed.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-17 11:18:02 -08:00
Or Gerlitz
ff9b85de5d net/mlx5e: Add some ethtool port control entries to the uplink rep netdev
Some of the ethtool entries to control the port should be supported by
the uplink rep netdev in switchdev mode, add them.

While here, add also the get/set coalesce entries for all reps.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:29 -08:00
Or Gerlitz
371289b61a net/mlx5e: Expose ethtool pause and link functions to mlx5e callers
Towards supporting set/get of global pause for the port
and get of the port link ksetting from the uplink representor,
expose the relevant entries to other mlx5 callers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:28 -08:00
Or Gerlitz
073caf5088 net/mlx5e: Add sriov and udp tunnel ndo support for the uplink rep
Some of the sriov ndo calls are needed also on the switchdev mode -
e.g setup VF mac and reading vport stats. Add them to the uplink rep
netdev ops. Same for the UDP tunnel ones, need them there to identify
offloaded udp tunnel ports.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:28 -08:00
Or Gerlitz
b36cdb42ad net/mlx5e: Handle port mtu/link, dcb and lag for uplink reps
Take care of setup/teardown for the port link, dcb, lag as well as
dealing with port mtu and carrier for e-switch uplink representors.

This is achieved by adding a dedicated profile instance for uplink
representors which includes the enable/disable and more profile routines
which are invoked by the general mlx5e code for netdev attach/detach.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:28 -08:00
Or Gerlitz
aec002f6f8 net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode
Now, when we have a dedicated uplink representor, the netdev instance
set over the esw manager vport (PF) is of no-use. As such, remove it
once we're on switchdev mode and get it back to life when off switchdev.

This is done by reloading the Ethernet interface as well (we already
do that for the IB interface) from the eswitch code while going in/out
of switchdev mode.

The Eth add/remove entries are modified to act differently when called in
switchdev mode. In this case we only deal with registration of the eth
vport representors. The rep netdevices are created from the eswitch call
to load the registered eth representors.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:27 -08:00
Or Gerlitz
13e509a4c1 net/mlx5e: Remove leftover code from the PF netdev being uplink rep
Remove some last leftovers from using the PF netdev as
the e-switch uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:27 -08:00
Or Gerlitz
d9ee0491c2 net/mlx5e: Use dedicated uplink vport netdev representor
Currently, when running in sriov switchdev mode, we are using the PF
netdevice as the uplink representor, this is problematic from few aspects:

- will break when the PF isn't eswitch manager (e.g smart NIC env)
- misalignment with other NIC switchdev drivers
- makes us have and maintain special code, hurts the driver quality/robustness
- which in turn opens the door for future bugs

As of each and all of the above, we move to have a dedicated netdev representor
for the uplink vport in a similar manner done for for the VF vports.

This includes the following:

1. have an uplink rep netdev as we have for VF reps
2. all reps use same load/unload functions
3. HW stats for uplink based on physical port counters and not vport counters
4. link state for the uplink managed through PAOS and not vport state
5. the uplink rep has sysfs link to the PF PCI function && uses the PF MAC address

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:27 -08:00
Or Gerlitz
025380b20d net/mlx5e: Use single argument for the esw representor build params helper
This is prep step towards adding dedicated uplink representor.

The patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:26 -08:00
Or Gerlitz
915fe1a0d9 net/mlx5: E-Switch, Remove redundant reloading of the IB interface
The reload of the IB interface done on the offloads stop call is
redundant b/c we do that on mlx5_eswitch_disable_sriov(), remove it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-17 11:03:26 -08:00
Nir Dotan
03ce5bd187 mlxsw: reg: Activate Bloom filter
Now that mlxsw driver handles all aspects of updating
the Bloom filter mechanism, set bf_bypass value to false
and allow HW to use Bloom filter.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
dd97d85f1e mlxsw: spectrum_acl: Set master RP index on transition to eRP
Bloom filter is updated on transitions from a single rule pattern,
also called master RP, to eRP table and vice versa. Since rules are
being written to or deleted from the Bloom filter on such transitions,
it is not required to keep the same eRP bank ID for the master RP.

Change master RP index assignment so it will be assigned with zero.
This is consistent with the assignment of the first available spot
that is used for allocating eRP's indices.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
135fd95728 mlxsw: spectrum_acl: Update Bloom filter on eRP transitions
Bloom filter update is required only for rules which reside on an
eRP. When the region has only a single rule pattern then eRP table
is not used, however insertion of another pattern would trigger a
move to an active eRP table so it is imperative to update the Bloom
filter with all previously configured rules.

Add a method that updates Bloom filter entries for all rules
currently configured in the region, on the event of a transition
from master mask to eRP, or vice versa. For that purpose, maintain
a list of all A-TCAM rules within mlxsw_sp_acl_atcam_region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
8c81b7438b mlxsw: spectrum_acl: Set A-TCAM rules in Bloom filter
Add calls to eRP module for updating Bloom filter when a rule is
added or removed from the A-TCAM. eRP module will update the Bloom
filter only for cases in which the region has an active eRP table.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
f5a2852ed0 mlxsw: spectrum_acl: Add Bloom filter update
Add Bloom filter update for rule insertion and rule removal scenarios.
This is done within eRP module in order to assure that Bloom filter
updates are done only for rules which are part of an eRP, as HW does not
consult Bloom filter for entries when there is a single (master) mask in
the region.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
7585cacdb9 mlxsw: spectrum_acl: Add Bloom filter handling
Spectrum-2 HW uses Bloom filter in order to skip lookups on specific
eRPs. It uses crc-16-Msbit-first calculation over a specific layout
of a rule's key fields combined with eRP ID as well as region ID.
Per potential lookup, iff the Bloom filter entry of the calculated
index is empty, then the lookup can be skipped. Hence, the mlxsw
driver should update the Bloom filter entry per each rule insertion
or deletion when rules are part of an eRP.

Add functions for adding and deleting entries in the Bloom filter.
In order to do so also add crc-16 computation based on the specific
Spectrum-2 polynomial and a function for encoding the crc-16 input
in the manner dictated by HW implementation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:34 -08:00
Nir Dotan
0487cfba86 mlxsw: spectrum_acl: Introduce Bloom filter
Lay the foundations for Bloom filter handling. Introduce a new file for
Bloom filter actions.

Add struct mlxsw_sp_acl_bf to struct mlxsw_sp_acl_erp_core and initialize
the Bloom filter data structure. Also take care of proper destruction when
terminating.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
Nir Dotan
944068582f mlxsw: resources: Add Spectrum-2 Bloom filter resource
Add the maximum Bloom filter logarithmic size per eRP table bank.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
Nir Dotan
418089a850 mlxsw: reg: Add Policy Engine Algorithmic Bloom Filter Entries Register
Bloom filter is a bit vector which allows the HW a fast lookup on a
small size bit vector, that may reduce the number of lookups on the
A-TCAM memory. PEABFE register allows setting values to the bits of
the bit vector mentioned above.
Add the register to be later used in A-TCAM optimizations.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 15:20:33 -08:00
Jakub Kicinski
036b9e7cae nfp: abm: allow to opt-out of RED offload
FW team asks to be able to not support RED even if NIC is capable
of buffering for testing and experimentation.  Add an opt-out flag.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:41:42 -08:00
Marcin Wojtas
e735fd55b9 net: mvneta: fix operation for 64K PAGE_SIZE
Recent changes in the mvneta driver reworked allocation
and handling of the ingress buffers to use entire pages.
Apart from that in SW BM scenario the HW must be informed
via PRXDQS about the biggest possible incoming buffer
that can be propagated by RX descriptors.

The BufferSize field was filled according to the MTU-dependent
pkt_size value. Later change to PAGE_SIZE broke RX operation
when usin 64K pages, as the field is simply too small.

This patch conditionally limits the value passed to the BufferSize
of the PRXDQS register, depending on the PAGE_SIZE used.
On the occasion remove now unused frag_size field of the mvneta_port
structure.

Fixes: 562e2f467e ("net: mvneta: Improve the buffer allocation method for SWBM")
Signed-off-by: Marcin Wojtas <mw@semihalf.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:39:59 -08:00
Yonglong Liu
6adafc356e net: hns: Fix ping failed when use net bridge and send multicast
Create a net bridge, add eth and vnet to the bridge. The vnet is used
by a virtual machine. When ping the virtual machine from the outside
host and the virtual machine send multicast at the same time, the ping
package will lost.

The multicast package send to the eth, eth will send it to the bridge too,
and the bridge learn the mac of eth. When outside host ping the virtual
mechine, it will match the promisc entry of the eth which is not expected,
and the bridge send it to eth not to vnet, cause ping lost.

So this patch change promisc tcam entry position to the END of 512 tcam
entries, which indicate lower priority. And separate one promisc entry to
two: mc & uc, to avoid package match the wrong tcam entry.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
726ae5c9e5 net: hns: Add mac pcs config when enable|disable mac
In some case, when mac enable|disable and adjust link, may cause hard to
link(or abnormal) between mac and phy. This patch adds the code for rx PCS
to avoid this bug.

Disable the rx PCS when driver disable the gmac, and enable the rx PCS
when driver enable the mac.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
7e74a19ca5 net: hns: Fix ntuple-filters status error.
The ntuple-filters features is forced on by chip.
But it shows "ntuple-filters: off [fixed]" when use ethtool.
This patch make it correct with "ntuple-filters: on [fixed]".

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
a57275d355 net: hns: Avoid net reset caused by pause frames storm
There will be a large number of MAC pause frames on the net,
which caused tx timeout of net device. And then the net device
was reset to try to recover it. So that is not useful, and will
cause some other problems.

So need doubled ndev->watchdog_timeo if device watchdog occurred
until watchdog_timeo up to 40s and then try resetting to recover
it.

When collecting dfx information such as hardware registers when tx timeout.
Some registers for count were cleared when read. So need move this task
before update net state which also read the count registers.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
c82bd077e1 net: hns: Free irq when exit from abnormal branch
1.In "hns_nic_init_irq", if request irq fail at index i,
  the function return directly without releasing irq resources
  that already requested.

2.In "hns_nic_net_up" after "hns_nic_init_irq",
  if exceptional branch occurs, irqs that already requested
  are not release.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
31f6b61d81 net: hns: Clean rx fbd when ae stopped.
If there are packets in hardware when changing the speed or duplex,
it may cause hardware hang up.

This patch adds the code to wait rx fbd clean up when ae stopped.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:32 -08:00
Yonglong Liu
5778b13b64 net: hns: Fixed bug that netdev was opened twice
After resetting dsaf to try to repair chip error such as ecc error,
the net device will be open if net interface is up. But at this time
if there is the users set the net device up with the command ifconfig,
the net device will be opened twice consecutively.

Function napi_enable was called when open device. And Kernel panic will
be occurred if it was called twice consecutively. Such as follow:
static inline void napi_enable(struct napi_struct *n)
{
         BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state));
         smp_mb__before_clear_bit();
         clear_bit(NAPI_STATE_SCHED, &n->state);
}

[37255.571996] Kernel panic - not syncing: BUG!
[37255.595234] Call trace:
[37255.597694] [<ffff80000008ab48>] dump_backtrace+0x0/0x1a0
[37255.603114] [<ffff80000008ad08>] show_stack+0x20/0x28
[37255.608187] [<ffff8000009c4944>] dump_stack+0x98/0xb8
[37255.613258] [<ffff8000009c149c>] panic+0x10c/0x26c
[37255.618070] [<ffff80000070f134>] hns_nic_net_up+0x30c/0x4e0
[37255.623664] [<ffff80000070f39c>] hns_nic_net_open+0x94/0x12c
[37255.629346] [<ffff80000084be78>] __dev_open+0xf4/0x168
[37255.634504] [<ffff80000084c1ac>] __dev_change_flags+0x98/0x15c
[37255.640359] [<ffff80000084c29c>] dev_change_flags+0x2c/0x68
[37255.769580] [<ffff8000008dc400>] devinet_ioctl+0x650/0x704
[37255.775086] [<ffff8000008ddc38>] inet_ioctl+0x98/0xb4
[37255.780159] [<ffff800000827b7c>] sock_do_ioctl+0x44/0x84
[37255.785490] [<ffff800000828e04>] sock_ioctl+0x248/0x30c
[37255.790737] [<ffff80000026dc6c>] do_vfs_ioctl+0x480/0x618
[37255.796156] [<ffff80000026de94>] SyS_ioctl+0x90/0xa4
[37255.801139] SMP: stopping secondary CPUs
[37255.805079] kbox: catch panic event.
[37255.809586] collected_len = 128928, LOG_BUF_LEN_LOCAL = 131072
[37255.816103] flush cache 0xffff80003f000000  size 0x800000
[37255.822192] flush cache 0xffff80003f000000  size 0x800000
[37255.828289] flush cache 0xffff80003f000000  size 0x800000
[37255.834378] kbox: no notify die func register. no need to notify
[37255.840413] ---[ end Kernel panic - not syncing: BUG!

This patchset fix this bug according to the flag NIC_STATE_DOWN.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:31 -08:00
Yonglong Liu
4ad26f117b net: hns: Some registers use wrong address according to the datasheet.
According to the hip06 datasheet:
1.Six registers use wrong address:
  RCB_COM_SF_CFG_INTMASK_RING
  RCB_COM_SF_CFG_RING_STS
  RCB_COM_SF_CFG_RING
  RCB_COM_SF_CFG_INTMASK_BD
  RCB_COM_SF_CFG_BD_RINT_STS
  DSAF_INODE_VC1_IN_PKT_NUM_0_REG
2.The offset of DSAF_INODE_VC1_IN_PKT_NUM_0_REG should be
  0x103C + 0x80 * all_chn_num
3.The offset to show the value of DSAF_INODE_IN_DATA_STP_DISC_0_REG
  is wrong, so the value of DSAF_INODE_SW_VLAN_TAG_DISC_0_REG will be
  overwrite

These registers are only used in "ethtool -d", so that did not cause ndev
to misfunction.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:31 -08:00
Yonglong Liu
308c6cafde net: hns: All ports can not work when insmod hns ko after rmmod.
There are two test cases:
1. Remove the 4 modules:hns_enet_drv/hns_dsaf/hnae/hns_mdio,
   and install them again, must use "ifconfig down/ifconfig up"
   command pair to bring port to work.

   This patch calls phy_stop function when init phy to fix this bug.

2. Remove the 2 modules:hns_enet_drv/hns_dsaf, and install them again,
   all ports can not use anymore, because of the phy devices register
   failed(phy devices already exists).

   Phy devices are registered when hns_dsaf installed, this patch
   removes them when hns_dsaf removed.

The two cases are sometimes related, fixing the second case also requires
fixing the first case, so fix them together.

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:31 -08:00
Yonglong Liu
4e1d4be681 net: hns: Incorrect offset address used for some registers.
According to the hip06 Datasheet:
1. The offset of INGRESS_SW_VLAN_TAG_DISC should be 0x1A00+4*all_chn_num
2. The offset of INGRESS_IN_DATA_STP_DISC should be 0x1A50+4*all_chn_num

Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-16 12:07:31 -08:00
David S. Miller
63de273f34 mlx5e-updates-2018-12-14 (VF Lag)
From Aviv Heller,
 
 Subsequent patches introduce VF LAG, which provdies load-balancing and
 high-availability capabilities for VFs associated with different
 physical ports of the same Connect-X card.
 
 This series consists of the following:
  - mlx5 devcom, driver infrastructure that facilitates operations that involve
    both core devices (physical functions) of the same card, to synchronize and
    communicate between two driver instances of the same card.
  - Infrastructure for TC rule duplication.
  - Changes to LAG logic to enable its use when SR-IOV is enabled
  - PFs in switchdev mode is the only mode currently supported.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcFCCzAAoJEEg/ir3gV/o+VzIH/3xOcSuDIB7rM/mNoMtXVvvu
 ZBZGUlUh7PoX/+fLjKYuQhosCvx1kpecPeNTa/R4yFy3kUl70CsKT6Cqtw2qGiPU
 2hfr75MobM5nMwOU1QDXYyfSaXZsfUj25DE25OSvtMygR6YjeU0Lm4peWBnd9MD0
 Oj5lilggS6hiNyCTI0s1D4tR0Uh2roj2tAGW7JRkd1Et34u9KZbBitCWvYkTtqcp
 XauuCdjlDNZenOg0Lue9fAc4Jo9AVEaB2w/4dZDsarZyvabQ4DxVrmrVOgLmWBjc
 Tr10JCuBvgdX/66JikJQ08Fkx/pHGby6jxcI4ux3QEoHe4/umm3Te0+5/pIxe58=
 =wDuV
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-14 (VF Lag)

From Aviv Heller,

Subsequent patches introduce VF LAG, which provdies load-balancing and
high-availability capabilities for VFs associated with different
physical ports of the same Connect-X card.

This series consists of the following:
 - mlx5 devcom, driver infrastructure that facilitates operations that involve
   both core devices (physical functions) of the same card, to synchronize and
   communicate between two driver instances of the same card.
 - Infrastructure for TC rule duplication.
 - Changes to LAG logic to enable its use when SR-IOV is enabled
 - PFs in switchdev mode is the only mode currently supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:29:56 -08:00
Ilias Apalodimas
35e07d2347 net: socionext: remove mmio reads on Tx
Currently the driver issues 2 mmio reads to figure out the number of
transmitted packets and clean them. We can get rid of the expensive
reads since BIT 31 of the Tx descriptor can be used for that.
We can also remove the budget counting of Tx completions since all of
the descriptors are not deliberately processed.

Performance numbers using pktgen are:
size  pre-patch(pps)  post-patch(pps)
64       362483           427916
128      358315           411686
256      352725           389683
512      215675           216464
1024     113812           114442

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:15:19 -08:00
Ilias Apalodimas
17a12eaaf0 net: socionext: correctly recover txq after being full
Running pktgen with packets sizes > 512b ends up in the interface Txq
getting stuck.
"netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!"
appears on dmesg but the interface never recovers. It requires an
ifconfig down/up to make the interface usable again.

The reason that triggers this, is a race condition between
.ndo_start_xmit and the napi completion. The available budget is
calculated first and indicates the queue is full. Due to a costly
netif_err() the queue is not stopped in time while the napi completion
runs, clears the irq and frees up descriptors, thus the queue never wakes
up again.

Fix this by moving the print after stopping the queue, make the print
ratelimited, add barriers and check for cleaned descriptors..

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 13:15:19 -08:00
Heiner Kallweit
e782410ed2 r8169: improve spurious interrupt detection
Improve detection of spurious interrupts by checking against the
interrupt mask as currently set in the chip.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 11:22:48 -08:00
Yangtao Li
b09026c691 cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()
We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define
such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the
DEFINE_SHOW_ATTRIBUTE macro to simplify some code.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 11:21:22 -08:00
liuzhongzhu
82e00b86a5 net: hns3: Add "tm map" status information query function
This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump tm map 100 > cmd
queue_id | qset_id | pri_id | tc_id
0100     | 0065    | 08     | 00
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
0c29d1912b net: hns3: Add "queue map" information query function
This patch prints queue map information.

debugfs command:
echo dump queue map > cmd

Sample Command:
root@(none)# echo queue map > cmd
 local queue id | global queue id | vector id
          0              32             769
          1              33             770
          2              34             771
          3              35             772
          4              36             773
          5              37             774
          6              38             775
          7              39             776
          8              40             777
          9              41             778
         10              42             779
         11              43             780
         12              44             781
         13              45             782
         14              46             783
         15              47             784
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
c0ebebb9cc net: hns3: Add "dcb register" status information query function
This patch prints dcb register status  information by module.

debugfs command:
root@(none)# echo dump reg dcb > cmd
 roce_qset_mask: 0x0
 nic_qs_mask: 0x0
 qs_shaping_pass: 0x0
 qs_bp_sts: 0x0
 pri_mask: 0x0
 pri_cshaping_pass: 0x0
 pri_pshaping_pass: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
27cf979a15 net: hns3: Add "status register" information query function
This patch prints status register information by module.

debugfs command:
echo dump reg [mode name] > cmd

Sample Command:
root@(none)# echo dump reg bios common > cmd
 BP_CPU_STATE: 0x0
 DFX_MSIX_INFO_NIC_0: 0xc000
 DFX_MSIX_INFO_NIC_1: 0xf
 DFX_MSIX_INFO_NIC_2: 0x2
 DFX_MSIX_INFO_NIC_3: 0x2
 DFX_MSIX_INFO_ROC_0: 0xc000
 DFX_MSIX_INFO_ROC_1: 0x0
 DFX_MSIX_INFO_ROC_2: 0x0
 DFX_MSIX_INFO_ROC_3: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
7737f1fbb5 net: hns3: Add "manager table" information query function
This patch prints manager table information.

debugfs command:
echo dump mng tbl > cmd

Sample Command:
root@(none)# echo dump mng tbl > cmd
 entry|mac_addr         |mask|ether|mask|vlan|mask|i_map|i_dir|e_type
 00   |01:00:5e:00:00:01|0   |00000|0   |0000|0   |00   |00   |0
 01   |c2:f1:c5:82:68:17|0   |00000|0   |0000|0   |00   |00   |0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:18 -08:00
liuzhongzhu
122bedc56a net: hns3: Add "bd info" query function
This patch prints Sending and receiving
package descriptor information.

debugfs command:
echo dump bd info 1 > cmd

Sample Command:
root@(none)# echo bd info 1 > cmd
hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0
hns3 0000:7d:00.0: (TX) addr: 0x0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)send_size: 0
hns3 0000:7d:00.0: (TX)vlan_tso: 0
hns3 0000:7d:00.0: (TX)l2_len: 0
hns3 0000:7d:00.0: (TX)l3_len: 0
hns3 0000:7d:00.0: (TX)l4_len: 0
hns3 0000:7d:00.0: (TX)vlan_tag: 0
hns3 0000:7d:00.0: (TX)tv: 0
hns3 0000:7d:00.0: (TX)vlan_msec: 0
hns3 0000:7d:00.0: (TX)ol2_len: 0
hns3 0000:7d:00.0: (TX)ol3_len: 0
hns3 0000:7d:00.0: (TX)ol4_len: 0
hns3 0000:7d:00.0: (TX)paylen: 0
hns3 0000:7d:00.0: (TX)vld_ra_ri: 0
hns3 0000:7d:00.0: (TX)mss: 0
hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120
hns3 0000:7d:00.0: (RX)addr: 0xffee7000
hns3 0000:7d:00.0: (RX)pkt_len: 0
hns3 0000:7d:00.0: (RX)size: 0
hns3 0000:7d:00.0: (RX)rss_hash: 0
hns3 0000:7d:00.0: (RX)fd_id: 0
hns3 0000:7d:00.0: (RX)vlan_tag: 0
hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0
hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0
hns3 0000:7d:00.0: (RX)bd_base_info: 0

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15 10:54:17 -08:00
Arnd Bergmann
51367e423c w90p910_ether: remove incorrect __init annotation
The get_mac_address() function is normally inline, but when it is
not, we get a warning that this configuration is broken:

WARNING: vmlinux.o(.text+0x4aff00): Section mismatch in reference from the function w90p910_ether_setup() to the function .init.text:get_mac_address()
The function w90p910_ether_setup() references
the function __init get_mac_address().
This is often because w90p910_ether_setup lacks a __init

Remove the __init to make it always do the right thing.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 14:42:51 -08:00
Atul Gupta
848dd1c1cb crypto/chelsio/chtls: macro correction in tx path
corrected macro used in tx path. removed redundant hdrlen
and check for !page in chtls_sendmsg

Signed-off-by: Atul Gupta <atul.gupta@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:39:39 -08:00
Wen Yang
390de19404 net/ibmvnic: Remove tests of member address
The driver was checking for non-NULL address.
- adapter->napi[i]

This is pointless as these will be always non-NULL, since the
'dapter->napi' is allocated in init_napi().
It is safe to get rid of useless checks for addresses to fix the
coccinelle warning:
>>drivers/net/ethernet/ibm/ibmvnic.c: test of a variable/field address
Since such statements always return true, they are redundant.

Signed-off-by: Wen Yang <wen.yang99@zte.com.cn>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: Paul Mackerras <paulus@samba.org>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thomas Falcon <tlfalcon@linux.ibm.com>
CC: John Allen <jallen@linux.ibm.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: linuxppc-dev@lists.ozlabs.org
CC: netdev@vger.kernel.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:36:38 -08:00
Nathan Chancellor
2ab4c3426c drivers: net: xgene: Remove unnecessary forward declarations
Clang warns:

drivers/net/ethernet/apm/xgene/xgene_enet_main.c:33:36: warning:
tentative array definition assumed to have one element
static const struct acpi_device_id xgene_enet_acpi_match[];
                                   ^
1 warning generated.

Both xgene_enet_acpi_match and xgene_enet_of_match are defined before
their uses at the bottom of the file so this is unnecessary. When
CONFIG_ACPI is disabled, ACPI_PTR becomes NULL so xgene_enet_acpi_match
doesn't need to be defined.

Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:35:27 -08:00
Aviv Heller
9582466640 net/mlx5: Handle LAG FW commands failure gracefully
When create_lag or destroy_lag FW commands fail, display an appropriate
error message, and try to recover, if possible.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:56 -08:00
Aviv Heller
7c34ec19e1 net/mlx5: Make RoCE and SR-IOV LAG modes explicit
With the introduction of SR-IOV LAG, checking whether LAG is active
is no longer good enough, since RoCE and SR-IOV LAG each entails
different behavior by both the core and infiniband drivers.

This patch introduces facilities to discern LAG type, in addition to
mlx5_lag_is_active(). These are implemented in such a way as to allow
more complex mode combinations in the future.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:55 -08:00
Aviv Heller
292612d68c net/mlx5: Rename mlx5_lag_is_bonded() to __mlx5_lag_is_active()
The new name better represents the function's aim, and sets a precedent
for a '__' prefix for internal, non-locked versions of LAG APIs.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
8aaca1976e net/mlx5: Allow co-enablement of uplink LAG and SRIOV
Enable setting uplink LAG if sriov is enabled on both ports in switchdev
mode.

Once the sriov mode is changed from switchdev for any of the ports, the
LAG instance is disabled.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
eff849b2c6 net/mlx5: Allow/disallow LAG according to pre-req only
Remove the lag forbid/allow functions, change the lag prereq check to
run in the do-bond logic, so every change in the prereq state will
cause LAG to be disabled/enabled accordingly after the next do-bond run.

Add lag update function, so every component which changes the prereq
state and want the LAG to re-calc the conditions can call the update
function.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:54 -08:00
Rabie Loulou
3b5ff59fd8 net/mlx5: Adjustments for the activate LAG logic to run under sriov
When HW lag is set/unset, roce must not be enabled on the port, as such
we wrap such changes with roce enable/disable either directly or through
re-creation of IB device.

Currently, lag and sriov are mutually exclusive, so by definition this
code doesn't run under sriov.

Towards changing this exclusion, we need to make sure that roce will not
be enabled on the eswitch manager port under sriov since this is
requirement of the switchdev mode.

We are going strict here and avoiding this all together under sriov.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Aviv Heller
1418ddd96a net/mlx5e: Duplicate offloaded TC eswitch rules under uplink LAG
Under uplink LAG, packets that match a flow might arrive on both uplink
ports and transmitted through both as part of supporting aggregation and
high-availability.

When the netdevs representing the uplinks are set into LAG (bonding,
teaming), duplicate the TC flow offloading into each of the per-uplink
e-switches.

Duplication is not required if the source is the bond device, since in
this case it is assumed that the bond and the uplink netdevs share the
same TC block, and thus duplication will occur naturally by the stack.

Note that under encapsulation scheme, both flows will use the same
neighbour and hence both will contribute to the last-used feedback
towards the stack.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou
7ba58ba7ba net/mlx5e: Offload TC e-switch rules with egress LAG device
When parsing TC FDB actions, if the egress device is a bond/team
net-device which enslaved the uplink representor of the e-switch,
use the uplink representor as the destination in the HW rule.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:53 -08:00
Rabie Loulou
491c37e49b net/mlx5e: In case of LAG, one switch parent id is used for all representors
When the uplink representors are put into lag, set all the
representors (VFs and uplinks) of the same NIC to return the same
switchdev id.

Currently, the route lookup code on the encapsulation offload path
assumes that same switchdev id for the source and dest devices means
that the dest is also mlx5 HW netdev. This doesn't hold anymore when we
align the switchdev Id of the uplinks to be same, which in turn causes
the bond/team to return that id to the caller. As such, enhance the
relevant check to take into account the uplink lag case.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Shahar Klein
f9392795e2 net/mlx5e: Enhance flow counter scheme for offloaded TC eswitch rules
Assign a counter dev attribute according to device capability and use
it for management of counters related to offloaded eswitch TC flows.

With upcoming support for uplink LAG, we have two HW rules per one
logical SW (TC) rule. Although the HW supports attaching one counter
to multiple rules, we are allocating counter per HW rule.

We need this separation for two reasons:

1. "flow eswitch" counter affinity HW require the counter to be
allocated on the device where the eswitch rule is set.

2. for some use-cases (multi-path routing) each HW flow relates to
different neighbour, hence our neigh update logic must have a per-rule
HW accountant in order to provide the proper feedback to the kernel.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan
04de7dda73 net/mlx5e: Infrastructure for duplicated offloading of TC flows
Under uplink LAG or multipath schemes, traffic that matches one flow
might arrive on both uplink ports and transmitted through both
as part of supporting aggregation and high-availability.

To cope with the fact that the SW model might use logical SW port
(e.g uplink team or bond) but we have two HW ports with e-switch on
each, there are cases where in order to offload a SW TC rule we
need to duplicate it to two HW flows.

Since each HW rule has its own counter we also aggregate the counter
of both rules when a flow stats query is executed from user-space.

Introduce the changes for the different elements (add/delete/stats),
currently nothing is duplicated.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:52 -08:00
Roi Dayan
ac004b8321 net/mlx5e: E-Switch, Add peer miss rules
In the sriov offloads mode, packets that are not matched by any
other rule are sent towards the e-switch vport manager for further
processing.

Under upcoming patches (e.g for uplink LAG), packets sent from VF
vports belonging to esw0 (e-switch related to PF0) might end up in
esw1 (e-switch related to PF1) due to muxing logic applied by the
FW.

In such a case we still want the missed packet to be sent to the
"base" esw manager vport in order to present the control plane a
consistent view of the source (VF reresentor) port.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Aviv Heller
fadd59fc50 net/mlx5: Introduce inter-device communication mechanism
This introduces devcom, a generic mechanism for performing operations
on both physical functions of the same Connect-X card.

The first user of this API is merged eswitch, which will be introduced
in subsequent patches.

Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 13:28:51 -08:00
Arnd Bergmann
2aa55dccf8 hns3: prevent building without CONFIG_INET
We now get a link failure when CONFIG_INET is disabled, since
tcp_gro_complete is unavailable:

drivers/net/ethernet/hisilicon/hns3/hns3_enet.o: In function `hns3_set_gro_param':
hns3_enet.c:(.text+0x230c): undefined reference to `tcp_gro_complete'

Add an explicit CONFIG_INET dependency here to avoid the broken
configuration.

Fixes: a6d53b97a2 ("net: hns3: Adds GRO params to SKB for the stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14 13:17:50 -08:00
Saeed Mahameed
64e4cf0dab Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5-next shared branch with rdma subtree to avoid mlx5 rdma v.s. netdev
conflicts.

Highlights:
1) Lag refactroing and flow counter affinity bits.
2) mlx5 core cleanups

By Roi Dayan (2) and others
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 11:15:25 -08:00
Shahar Klein
4c283e6155 net/mlx5: Fold the modify lag code into function
Handle the code of modifying the lag affinity within a separate function.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan
3cfe432e1b net/mlx5: Add lag affinity info to log
Report the initial LAG port affinity upon LAG creation.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Roi Dayan
8252cf728c net/mlx5: Split the activate lag function into two routines
Split the activate lag function in order to be symmetric with
the deactivate lag function.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-14 09:58:57 -08:00
Sudarsana Reddy Kalluru
c3db8d5310 qed: Fix command number mismatch between driver and the mfw
The value for OEM_CFG_UPDATE command differs between driver and the
Management firmware (mfw). Fix this gap with adding a reserved field.

Fixes: cac6f69154 ("qed: Add support for Unified Fabric Port.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 21:48:14 -08:00
David S. Miller
38ed22351c mlx5-fixes-2018-12-13
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcEiXXAAoJEEg/ir3gV/o+xxQH/jGS6GQrty9puL40sXKE4ryK
 GKVSwbKLnu4WatPcLe/CGxJwRoCVUUKfsbprsuy+oYxhUd5M5h6qu215kZgsfdDC
 CfxznbEj3o6mN+ZyZjL+F2gEG6e96bpkKvayb2a9YYQaY/9+V6CMKh3U5jHI+Q2V
 D4ToCjWbMXkQ0jsevfpV9W2T0DNfXIa5QGvsvH+D+PoLF/ke/YALZd+yJT0Cm9N2
 /vp0qvnHnZO5YavEThN2Lh8f9AY/pAwuGgTrPyddfkZC/6QLH+c8948gF14V+03l
 Nz78uSpurQ97X511UBcabSQB0xTCHSAw3/SKCVSxFrpMvkhj4mTbjohwc567Gy4=
 =YxUz
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-fixes-2018-12-13

Subject: [pull request][net 0/9] Mellanox, mlx5 fixes 2018-12-13
Saeed Mahameed says:

====================
This series introduces some fixes to the mlx5 core and mlx5e netdevice
driver.

=======
Conflict with net-next: When merged with net-next this series will
cause a moderate conflict:

1) in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (2 hunks)
Take hunks from net only and just replace *attr->mirror_count to *attr->split_count
1.1) there is one more instance of slow_attr->mirror_count to be replaced
with slow_attr->split_count, it doesn't appear in the conflict, it will
cause a compilation error if left out.
2) in mlx5_ifc.h, take hunks only from net.

Example for the merge resolution can be found at:
https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=merge/mlx5-fixes&id=48830adf29804d85d77ed8a251d625db0eb5b8a8
branch merge/mlx5-fixes of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
(I simply merged this pull request tag into net-next and resolved the conflict)

I don't know if it's ok with you, but to save your time, you can just:
git pull git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux merge/mlx5-fixes
Into net-next, before your next net merge, and you will have a clean
merge of net into net-next (at least for mlx5 files).
======

Please pull and let me know if there's any problem.

For -stable v4.18
338d615be484 ('net/mlx5e: Cancel DIM work on close SQ')
91f40f9904ad ('net/mlx5e: RX, Verify MPWQE stride size is in range')

For -stable v4.19
c5c7e1c41bbe ('net/mlx5e: Remove unused UDP GSO remaining counter')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 19:21:53 -08:00
Petr Machata
74bc993974 mlxsw: spectrum_router: Veto unsupported RIF MAC addresses
On NETDEV_PRE_CHANGEADDR, if the change is related to a RIF interface,
verify that it satisfies the criterion that all RIF interfaces have the
same MAC address prefix, as indicated by mlxsw_sp.mac_mask.

Additionally, besides explicit address changes, check that the address
of an interface for which a RIF is about to be added matches the
required pattern as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata
9329b8162b mlxsw: spectrum: Add mlxsw_sp.mac_mask
The Spectrum hardware demands that all router interfaces in the system
have the same first 38 resp. 36 bits of MAC address: the former limit
holds on Spectrum, the latter on Spectrum-2. Add a field that refers to
the required prefix mask and initialize in mlxsw_sp1_init() and
mlxsw_sp2_init().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Petr Machata
9735f2d2fe mlxsw: spectrum_router: Generalize mlxsw_sp_netdevice_router_port_event()
Prepare mlxsw_sp_netdevice_router_port_event() for handling of
NETDEV_PRE_CHANGEADDR. Split out the part that deals with the actual
changes and call it for the two events currently handled.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-13 18:41:39 -08:00
Tal Gilboa
fa2bf86bab net/mlx5e: Cancel DIM work on close SQ
TXQ SQ closure is followed by closing the corresponding CQ. A pending
DIM work would try to modify the now non-existing CQ.
This would trigger an error:
[85535.835926] mlx5_core 0000:af:00.0: mlx5_cmd_check:769:(pid 124399):
MODIFY_CQ(0x403) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1d7771)

Fix by making sure to cancel any pending DIM work before destroying the SQ.

Fixes: cbce4f4447 ("net/mlx5e: Enable adaptive-TX moderation")
Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Mikhael Goikhman
d13b224f43 net/mlx5e: Remove unused UDP GSO remaining counter
Remove tx_udp_seg_rem counter from ethtool output, as it is no longer
being updated in the driver's data flow.

Fixes: 3f44899ef2 ("net/mlx5e: Use PARTIAL_GSO for UDP segmentation")
Signed-off-by: Mikhael Goikhman <migo@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Or Gerlitz
61c806dafe net/mlx5e: Avoid encap flows deletion attempt the 1st time a neigh is resolved
Currently, we are deleting offloaded encap flows in case the relevant neigh
becomes unconnected while the encap is valid (a sign that it used to be
connected), or if the curr neigh mac is different from the cached mac
(a sign that the remote side changed their mac).

The 2nd check also applies when the neigh becomes connected on the 1st
time (we start with zero mac). Before the offending commit, the deleting
handler was practically no op, as no flows were offloaded. But since
that commit, we offload neigh-less encap flows to slow path.

Under mirroring scheme, we go into the delete handler, attempt to unoffload a
mirror rule which was never set (as we were offloading to slow path) and crash.

Fix that by calling the delete handler only when the encap is valid,
which covers both cases mentioned above.

Fixes: 5dbe906ff1 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Or Gerlitz
154e62abe9 net/mlx5e: Properly initialize flow attributes for slow path eswitch rule deletion
When a neighbour is resolved, we delete the goto slow path rule from HW.

The eswitch flow attributes where not properly initialized on that case,
hence we mess up the eswitch refcounts for chain zero (the default one).

Fix that along with making sure to use semicolons and not commas on that code;

Fixes: 5dbe906ff1 ('net/mlx5e: Use a slow path rule instead if vxlan neighbour isn't available')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Or Gerlitz
d14f6f2a84 net/mlx5e: Avoid overriding the user provided priority for offloaded tc rules
Just a leftover which was wrongly left there, remove it while spawning
a message to suggest firmware upgrade.

Fixes: bf07aa730a ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Or Gerlitz
e88afe759a net/mlx5e: Err if asked to mirror a goto chain tc eswitch rule
Currently we are not supporting this and not err-ing on that either.

For now, just err if asked to do that.

Fixes: bf07aa730a ('net/mlx5e: Support offloading tc priorities and chains for eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Moshe Shemesh
e1c15b62b7 net/mlx5e: RX, Verify MPWQE stride size is in range
Add check of MPWQE stride size is within range supported by HW. In case
calculated MPWQE stride size exceed range, linear SKB can't be used and
we should use non linear MPWQE instead.

Fixes: 619a8f2a42 ("net/mlx5e: Use linear SKB in Striding RQ")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
Gavi Teitz
8956f0014e net/mlx5e: Fix default amount of channels for VF representors
The default amount of channels a representor opens was erroneously
changed from one to the maximum amount of channels, restore to its
intended value.

Fixes: 779d986d60 ("net/mlx5e: Do not ignore netdevice TX/RX queues number")
Signed-off-by: Gavi Teitz <gavi@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-13 01:24:44 -08:00
David S. Miller
95302c394c mlx5e-updates-2018-12-11
From Eli Britstein,
 Patches 1-10 adds remote mirroring support.
 Patches 1-4 refactor encap related code as pre-steps for using per
 destination encapsulation properties.
 Patches 5-7 use extended destination feature for single/multi
 destination scenarios that have a single encap destination.
 Patches 8-10 enable multiple encap destinations for a TC flow.
 
 From, Daniel Jurgens,
 Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24%
 improvement in small packet throughput
 
 From Eyal Davidovich,
 patches 12-14, FW monitor counter support
 FW monitor counters feature came to solve the delayed reporting of
 FW stats in the atomic get_stats64 ndo, since we can't access the
 FW at that stage, this feature will enable immediate FW stats updates
 in the driver via fw events on specific stats updates.
 
 Patch 12, cleanup to avoid querying a FW counter when it is not
 supported
 Patch 13, Monitor counters FW commands support
 Patch 14, Use monitor counters in ethernet netdevice to update FW
 stats reported in the atomic get_stats64 ndo.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcEEYmAAoJEEg/ir3gV/o+XlIIAKIQhvHnCoGMPsZU+HbA1h3O
 10i2hnmQqx2F5Z7tSON9jr9HVzLj6FqQ3pSnTyw8xiMBqOldF6abmfpMPQ/XeWXk
 PDBCzfOUMAlIsGFAJ06tcz3VLkWrEdoQHmi7Jb4IW+LC9CnZBobzrcLaxQ95txYP
 LkCfO8Y70D5tzBrzL1o1RfKsCSFJma5zhXDPF/B3rifLyHOmo9aLzZuDvKwvvSkC
 Y3QM5ulnLqVhxaXA0xzB+iHLiI0nX2H17CBPseLzM+Ip0vZS/nA77rnQ49PUgjiZ
 d1BoXo9MUa3eoQwXNl+qSAfnGjCwqzplXxzjT2bKKXVc2ZOWmkECgRXmKI6JDDU=
 =EuqQ
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-12-11

From Eli Britstein,
Patches 1-10 adds remote mirroring support.
Patches 1-4 refactor encap related code as pre-steps for using per
destination encapsulation properties.
Patches 5-7 use extended destination feature for single/multi
destination scenarios that have a single encap destination.
Patches 8-10 enable multiple encap destinations for a TC flow.

From, Daniel Jurgens,
Patch 11, Use CQE padding for Ethernet CQs, PPC showed up to a 24%
improvement in small packet throughput

From Eyal Davidovich,
patches 12-14, FW monitor counter support
FW monitor counters feature came to solve the delayed reporting of
FW stats in the atomic get_stats64 ndo, since we can't access the
FW at that stage, this feature will enable immediate FW stats updates
in the driver via fw events on specific stats updates.

Patch 12, cleanup to avoid querying a FW counter when it is not
supported
Patch 13, Monitor counters FW commands support
Patch 14, Use monitor counters in ethernet netdevice to update FW
stats reported in the atomic get_stats64 ndo.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 22:22:53 -08:00
David S. Miller
3b076cfe86 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Fixes 2018-12-12

This series contains fixes to i40e and ixgbe.

Stefan Assmann fixes an issue created by a previous fix, where
ether_addr_copy() was moved to avoid a race but did not take into
account that it alters the MAC address being handed to
i40e_del_mac_filter().

Michał Mirosław provides 2 fixes for i40e, first resolves issues in the
hardware VLAN offload where VLAN.TCI equal to 0 was being dropped and a
race between disabling VLAN receive feature in hardware and processing
the receive queue, where packets could have their VLAN information
dropped.

Ross Lagerwall fixes a racy condition during a ixgbe VF reset, where
writing the register to issue a reset and sending the reset message via
the mailbox API could result of the mailbox memory getting cleared
during the reset before the message gets successfully sent which results
in a VF driver malfunction.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:43:43 -08:00
Biao Huang
43d4b29718 net-next: stmmac: dwmac-mediatek: add module license info
Add MODULE_LICENSE info to fix this:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/stmicro/stmmac/dwmac-mediatek.o

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:42:55 -08:00
YueHaibing
c784a28b02 net/mlx5e: Remove set but not used variable 'upriv'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_vport_rep_load':
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1490:21: warning:
 variable 'upriv' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_vport_rep_unload':
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1557:21: warning:
 variable 'upriv' set but not used [-Wunused-but-set-variable]

It not used any more since commit ef381359e3 ("net/mlx5e: Replace egdev with
indirect block notifications"). Also remove unused variable 'uplink_rpriv'
after this change.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 21:38:04 -08:00
YueHaibing
186599f89e net/mlx5: Remove duplicated include from eswitch.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-12 17:15:35 -08:00
Petr Machata
7357eb3d4b mlxsw: spectrum_switchdev: Propagate extack on port VLAN events
After switchdev_handle_port_obj_add() was extended in a preceding patch,
mlxsw_sp_port_obj_add() now takes an extack argument. Propagate it
further by extending a callee chain from mlxsw_sp_port_vlans_add(), via
mlxsw_sp_bridge_port_vlan_add() via mlxsw_sp_port_vlan_bridge_join() via
mlxsw_sp_port_vlan_fid_join() to mlxsw_sp_bridge_ops.fid_get, adding an
extack argument for each of them.

This code path is used when a VLAN is added to a port netdevice if there
already is an unoffloadable VXLAN device with that VLAN mapped.

mlxsw_sp_bridge_8021d_port_join() is updated to obey the new interfaces
changed by the abovementioned code, propagating extack ultimately from
NETDEV_CHANGEUPPER events.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:34:22 -08:00
Petr Machata
0a5a2aee6f mlxsw: spectrum_switchdev: Propagate extack on VXLAN VLAN events
Now that VLAN port object addition notifications carry an extack,
propagate it from mlxsw_sp_switchdev_vxlan_vlans_add() through
mlxsw_sp_switchdev_vxlan_vlan_add() to
mlxsw_sp_bridge_8021q_vxlan_join().

This code path is used when a VLAN is added to a VXLAN netdevice that
cannot be offloaded.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:34:22 -08:00
Petr Machata
6921351359 net: switchdev: Add extack to switchdev_handle_port_obj_add() callback
Drivers use switchdev_handle_port_obj_add() to handle recursive descent
through lower devices. Change this function prototype to take add_cb
that itself takes an extack argument. Decode extack from
switchdev_notifier_port_obj_info and pass it to add_cb.

Update mlxsw and ocelot drivers which use this helper.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:34:22 -08:00
Petr Machata
2fd527b72b net: ndo_bridge_setlink: Add extack
Drivers may not be able to implement a VLAN addition or reconfiguration.
In those cases it's desirable to explain to the user that it was
rejected (and why).

To that end, add extack argument to ndo_bridge_setlink. Adapt all users
to that change.

Following patches will use the new argument in the bridge driver.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:34:21 -08:00
Jonathan Toppins
351cbde969 bnxt: remove printing of hwrm message
bnxt_en 0000:19:00.0 (unregistered net_device) (uninitialized): hwrm
req_type 0x190 seq id 0x6 error 0xffff

The message above is commonly seen when a newer driver is used on
hardware with older firmware. The issue is this message means nothing to
anyone except Broadcom. Remove the message to not confuse users as this
message is really not very informative.

Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:31:32 -08:00
Sudarsana Reddy Kalluru
9061193c4e bnx2x: Send update-svid ramrod with retry/poll flags enabled
Driver sends update-SVID ramrod in the MFW notification path.
If there is a pending ramrod, driver doesn't retry the command
and storm firmware will never be updated with the SVID value.
The patch adds changes to send update-svid ramrod in process context with
retry/poll flags set.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:25:14 -08:00
Sudarsana Reddy Kalluru
07f12622a6 bnx2x: Enable PTP only on the PF that initializes the port
There will be only one PHC clock per port. PTP should be enabled only on
one PF per port. The change enables PTP functionality on the PF that
initializes the port. The change is useful in multi-function modes e.g.,
NPAR where a port can have more than one PF.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:25:14 -08:00
Sudarsana Reddy Kalluru
04f05230c5 bnx2x: Remove configured vlans as part of unload sequence.
Vlans are not getting removed when drivers are unloaded. The recent storm
firmware versions had added safeguards against re-configuring an already
configured vlan. As a result, PF inner reload flows (e.g., mtu change)
might trigger an assertion.
This change is going to remove vlans (same as we do for MACs) when doing
a chip cleanup during unload.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:25:14 -08:00
Sudarsana Reddy Kalluru
bbf666c1af bnx2x: Clear fip MAC when fcoe offload support is disabled
On some customer setups it was observed that shmem contains a non-zero fip
MAC for 57711 which would lead to enabling of SW FCoE.
Add a software workaround to clear the bad fip mac address if no FCoE
connections are supported.

Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 16:25:14 -08:00
Ross Lagerwall
96d1a73161 ixgbe: Fix race when the VF driver does a reset
When the VF driver does a reset, it (at least the Linux one) writes to
the VFCTRL register to issue a reset and then immediately sends a reset
message using the mailbox API. This is racy because when the PF driver
detects that the VFCTRL register reset pin has been asserted, it clears
the mailbox memory. Depending on ordering, the reset message sent by
the VF could be cleared by the PF driver. It then responds to the
cleared message with a NACK which causes the VF driver to malfunction.
Fix this by deferring clearing the mailbox memory until the reset
message is received.

Fixes: 939b701ad6 ("ixgbe: fix driver behaviour after issuing VFLR")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-12 15:51:50 -08:00
Michał Mirosław
800b8f637d i40e: DRY rx_ptype handling code
Move rx_ptype extracting to i40e_process_skb_fields() to avoid
duplicating the code.

Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-12 15:46:02 -08:00
Michał Mirosław
2a508c64ad i40e: fix VLAN.TCI == 0 RX HW offload
This fixes two bugs in hardware VLAN offload:
 1. VLAN.TCI == 0 was being dropped
 2. there was a race between disabling of VLAN RX feature in hardware
    and processing RX queue, where packets processed in this window
    could have their VLAN information dropped

Fix moves the VLAN handling into i40e_process_skb_fields() to save on
duplicated code. i40e_receive_skb() becomes trivial and so is removed.

Signed-off-by: Michał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-12 15:40:42 -08:00
Biao Huang
9992f37e34 stmmac: dwmac-mediatek: add support for mt2712
Add Ethernet support for MediaTek SoCs from the mt2712 family

Signed-off-by: Biao Huang <biao.huang@mediatek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-12 15:21:00 -08:00
Stefan Assmann
158daed16e i40e: fix mac filter delete when setting mac address
A previous commit moved the ether_addr_copy() in i40e_set_mac() before
the mac filter del/add to avoid a race. However it wasn't taken into
account that this alters the mac address being handed to
i40e_del_mac_filter().

Also changed i40e_add_mac_filter() to operate on netdev->dev_addr,
hopefully that makes the code easier to read.

Fixes: 458867b2ca ("i40e: don't remove netdev->dev_addr when syncing uc list")

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-12-12 14:54:48 -08:00
Greg Kroah-Hartman
ed0a773bff phy: for 4.21
*) Change phy set_mode ops to take both mode and setmode as arguments
  *) Add phy_configure() and phy_validate() API's mostly used for MIPI D-PHY
  *) Add helpers to get default values of parameters define in MIPI D-PHY spec
  *) Add driver for TI's CPSW Port PHY Interface Mode selection
  *) Add driver for Cadence Sierra PHY used with USB and PCIe
  *) Add driver for Freescale i.MX8MQ USB3 PHY
  *) Fixes QMP PHY bindings to allow the clocks provided by the PHY to be
     pointed at in device tree
  *) Fix for using fully specified regions (in device tree) for configuring
     the second lane in dual lane PHYs in QMP PHY
  *) Add support for Allwinner H6 USB2 PHY in phy-sun4i-usb driver
  *) Update phy-rcar-gen3-usb driver to follow the hardware manual
  *) Add support for fine grained power management in mapphone-mdm6600 driver
 
 Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAsFiEEUXMr/TfP2p4suIY5Dlx4XIBNgtkFAlwQj94OHGtpc2hvbkB0
 aS5jb20ACgkQDlx4XIBNgtmYrA/9HK7yfOWfNLJ7DB8bi4yPx5hd657sLohSdDEX
 0Pxx9PRuxjLYlaHQh8XbHEdvnLIbbGgnHbfvkkHjxJx272G+QAWMi7/SwF3a2ZpY
 ev7U1gvWvCTqS/3RsEmxmwiX9esszcOgya2ia5TcyvTIHLla0QXTOUCmOQBejWhJ
 Wfe3siSzxn6O4BITsoRUu+0Ek3bScW/cFdby5B62zBrs912p1qlGx20ihvjywB6I
 uR5q1w5DxCbC9KZLTkQgfPXki69r1EaCYzdcEKnUON59OXceIy/TKDx/ewZBOjtR
 /d+lWybsLMF5MXIy2UKlaKxLrk4AVdsQ6zmfU23uTqCT3rWn7OPEvbzSj3nBXpgi
 SNzBTt9wHT4ZAGOq4fmhgF1Gn1gx4tEcarvAfeyyzv1riVgGbwlxSaUpWWvkya/E
 zDBqROi/9VU+WX+De+E2KTCAOjJpcnVpLx8euA7uSD2m0nF4Kp4YiyAJiH/ZDQba
 ghhR4DiD1xhjO77U+tM7mD+68ZyKsQubqzO+h4rztjZ/QRCWqGtOCg9sSQBxSz86
 llEklLB3156dkLnHRMd0NIuSqYGOkHLWdTBrXNr2HaK7ZWvN6jxhC2puziIlmIZl
 8m94QyY5pmuTJdQsWA/MEoWFglQqLBDxAVuoMH4d/XFx7PBNxiaTGOOEgX7MhfVM
 9di9hak=
 =JLIJ
 -----END PGP SIGNATURE-----

Merge tag 'phy-for-4.21_v1' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy into usb-next

Kishon writes:

phy: for 4.21

 *) Change phy set_mode ops to take both mode and setmode as arguments
 *) Add phy_configure() and phy_validate() API's mostly used for MIPI D-PHY
 *) Add helpers to get default values of parameters define in MIPI D-PHY spec
 *) Add driver for TI's CPSW Port PHY Interface Mode selection
 *) Add driver for Cadence Sierra PHY used with USB and PCIe
 *) Add driver for Freescale i.MX8MQ USB3 PHY
 *) Fixes QMP PHY bindings to allow the clocks provided by the PHY to be
    pointed at in device tree
 *) Fix for using fully specified regions (in device tree) for configuring
    the second lane in dual lane PHYs in QMP PHY
 *) Add support for Allwinner H6 USB2 PHY in phy-sun4i-usb driver
 *) Update phy-rcar-gen3-usb driver to follow the hardware manual
 *) Add support for fine grained power management in mapphone-mdm6600 driver

Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>

* tag 'phy-for-4.21_v1' of git://git.kernel.org/pub/scm/linux/kernel/git/kishon/linux-phy: (30 commits)
  phy: qcom-qmp: Expose provided clocks to DT
  dt-bindings: phy-qcom-qmp: Move #clock-cells to child
  phy: qcom-qmp: Utilize fully-specified DT registers
  dt-bindings: phy-qcom-qmp: Fix register underspecification
  phy: ti: fix semicolon.cocci warnings
  phy: dphy: Add configuration helpers
  phy: Add MIPI D-PHY configuration options
  phy: Add configuration interface
  phy: Add MIPI D-PHY mode
  phy: add driver for Freescale i.MX8MQ USB3 PHY
  dt-bindings: phy: add binding for Freescale i.MX8MQ USB3 PHY
  phy: Use of_node_name_eq for node name comparisons
  net: ethernet: ti: cpsw: add support for port interface mode selection phy
  dt-bindings: net: ti: cpsw: switch to use phy-gmii-sel phy
  phy: ti: introduce phy-gmii-sel driver
  dt-bindings: phy: add cpsw port interface mode selection phy bindings
  phy: mvebu-cp110-comphy: fix spelling in structure name
  phy: mapphone-mdm6600: Improve phy related runtime PM calls
  phy: renesas: rcar-gen3-usb2: follow the hardware manual procedure
  phy: cadence: Add driver for Sierra PHY
  ...
2018-12-12 09:26:04 +01:00
Nir Dotan
cf7221a4f5 mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2
Add implementation of Spectrum-2 multicast routes for both IPv4 and IPv6 by
using ACL module explicitly.
In Spectrum-2, multicast routes are set as ACL rules, so initialization
takes care of creating dedicated ACL groups and binding them to the
appropriate multicast routing protocol IPv4/IPv6, and afterwards routes
configuration translates to setting explicit ACL rules.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
d7263ab35b mlxsw: spectrum_acl: Limit priority value
In Spectrum-2, higher priority value wins and priority valid values are in
the range of {1,cap_kvd_size-1}. mlxsw_sp_acl_tcam_priority_get converts
from lower-bound priorities alike tc flower to Spectrum-2 HW range. Up
until now tc flower did not provide priority 0 or reached the maximal
value, however multicast routing does provide priority 0.

Therefore, Change mlxsw_sp_acl_tcam_priority_get to verify priority is in
the correct range. Make sure priority is never set to zero and never
exceeds the maximal allowed value.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
c20580c21f mlxsw: spectrum_acl: Support rule creation without action creation
Up until now, when ACL rule was created its action was created with it.
It suits well for tc flower where ACL rule always needs an action, however
it does not suit multicast router, where the action is created prior to
setting a route, which in Spectrum-2 is actually an ACL rule.

Add support for rule creation without action creation. Do it by adding
afa_block argument to mlxsw_sp_acl_rule_create, which if NULL then an
action would be created, also add an indication within struct
mlxsw_sp_acl_rule_info that tells if the action should be destroyed when
the rule is destroyed.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
2507a64c17 mlxsw: spectrum_acl: Add replace rule action operation
Multicast routes actions may be updated after creation. An example for that
is an addition of an egress interface to an existing route.
So far, as tc flower API dictated, ACL rules were either created or
deleted. Since multicast routes in Spectrum-2 are written to ACL as any
rule, it is required to allow the update of a rule's action as it may
change.

Add methods and operations to support updating rule's action. This is
supported only for Spectrum-2.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
1a29d29394 mlxsw: spectrum_acl: Add multicast router profile operations
Add specific ACL operations needed for programming multicast routing ACL
groups and routes.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
add4550fca mlxsw: spectrum_acl: Add Spectrum-2 keys
Add virtual router ID fields to Spectrum-2 key blocks set, as the field is
required for multicast routing.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
254cec1464 mlxsw: spectrum: Change stage of ACL initialization
In Spectrum-2, MC routing is implemented using ACL block explicitly, so
router initialization should take place after ACL initialization.
Set the initialization of the ACL block before IP router initizalization
takes place, so multicast router will be able to allocate ACL data
structures and create its required chains.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Nir Dotan
a75e41d37a mlxsw: reg: Add Policy Engine Multicast Router Binding Table Register
In Spectrum-2, multicast routing is implemented explicitly using policy
engine (ACL) block. PEMRBT register is used to bind a dedicated ACL group
to a specific IP protocol.
Add the register to be later used in multicast router implementation.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 23:01:33 -08:00
Heiner Kallweit
ee28b30cbb r8169: fix crash if CONFIG_DEBUG_SHIRQ is enabled
If CONFIG_DEBUG_SHIRQ is enabled __free_irq() intentionally fires
a spurious interrupt. This interrupt causes a crash because
tp->dev->phydev is NULL at that time.

Fixes: 38caff5a44 ("r8169: handle all interrupt events in the hard irq handler")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 22:54:19 -08:00
Xue Chaojing
e1a76515b0 hinic: optmize rx refill buffer mechanism
There is no need to schedule a different tasklet for refill,
This patch remove it.

Suggested-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: Xue Chaojing <xuechaojing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-11 22:52:56 -08:00
Grygorii Strashko
3ff18849eb net: ethernet: ti: cpsw: add support for port interface mode selection phy
Add support for port interface mode selection phy (phy-gmii-sel):
- try to request interface mode selection phy from Port DT node and fail
silently if not defined and old CONFIG_TI_CPSW_PHY_SEL driver enabled.
- use new phy if requested successfully.

Cc: Kishon Vijay Abraham I <kishon@ti.com>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12 10:01:42 +05:30
Grygorii Strashko
cccc43b853 phy: mvebu-cp110-comphy: convert to use eth phy mode and submode
Convert mvebu-cp110-comphy PHY driver to use recently introduced
PHY_MODE_ETHERNET and phy_set_mode_ext().

Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Maxime Chevallier <maxime.chevallier@bootlin.com>
Cc: Antoine Tenart <antoine.tenart@free-electrons.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12 10:01:35 +05:30
Grygorii Strashko
c8fe6d7f3f phy: ocelot-serdes: convert to use eth phy mode and submode
Convert ocelot-serdes PHY driver to use recently introduced
PHY_MODE_ETHERNET and phy_set_mode_ext().

Cc: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Quentin Schulz <quentin.schulz@bootlin.com>
Tested-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com>
2018-12-12 10:01:35 +05:30
Eyal Davidovich
5c7e8bbb02 net/mlx5e: Use monitor counters for update stats
- Adding new notifier block (struct mlx5_nb) monitor_counters_nb
  for handeling MONITOR_COUNTER new event type.
- Adding work queue element: monitor_counters_work for re-arm and
  update stats.
- We re-queue the update stat work, only when working over firmware
  that doesn't support the monitored counters.

Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eyal Davidovich
2f8bc4917a net/mlx5e: Monitor counters commands support
new file monitor_stats.c for the new API.
add arm_monitor_counter new command support.
add set_monitor_counter new command support.

The device can monitor specific counters and provide an event to notify
when these counters are changed.
The monitoring is done in best effort manner where the minimum
notification period is 200 ms, however when the device is loaded, the
notification might be delayed.
To configure the required counters to be monitored, the
SET_MONITOR_COUNTER command shall be used with a list of counters to be
monitored.
The device firmware can monitor up to HCA_CAP.max_num_of_monitor_counters.
The configuration is done based on counter type (such as ppcnt, q counter,
etc) and additional param according to the type of counter selected.
Upon monitor counter change, the device will generate
Monitor_Counter_Change event.
The device will not generate new events unless the driver re-arms the
monitoring functionality, using the ARM_MONITOR_COUNTER command.

Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eyal Davidovich
75370eb0d3 net/mlx5e: Avoid query PPCNT register if not supported by the device
PPCNT is not supported if PCAM access reg is supported and ppcnt bit is clear.

Signed-off-by: Eyal Davidovich <eyald@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Daniel Jurgens
939de57d30 net/mlx5e: Use CQE padding for Ethernet CQs
Writing 64B CQEs to 128B cache lines results in a RMW operation. Padding
the CQEs to 128B if possible improves performance on 128B cache line
systems like PPC.

Testing on PPC showed up to a 24% improvement in small packet throughput
vs the default behavior, depending on the workload and system topology.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eli Britstein
8c4dc42bf6 net/mlx5e: Support multiple encapsulations for a TC flow
Currently a flow is associated with a single encap structure. The FW
extended destination features enables the driver to associate a flow
with multiple encap instances.

Change the encap id field from a flow scope to a per destination value
in the flow attributes struct. Use the encaps array to associate a flow
table entry with multiple encap entries.

Update the neigh logic to offload only if all encapsulations used in a
flow are connected, and un-offload upon the first one disconnected.

Note that the driver can now support up to two encap destinations.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eli Britstein
79baaec719 net/mlx5e: Allow association of a flow to multiple encaps
Currently a flow can be associated with a single encap entry. The
extended destination feature enables the driver to configure multiple
encap entries per flow.

Change the encap flow association field to array as a pre-step towards
supporting multiple encap destinations. Use only the first array
element, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eli Britstein
98b66cb1c9 net/mlx5e: Change parse attr struct to accommodate multiple tunnel infos
Currently the driver can support only a single TC tunnel_set action.
Change the tunnel info fields to arrays, as a pre-step to support
multiple encapsulations for a single flow, with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:20 -08:00
Eli Britstein
1cc26d74bb net/mlx5e: Support header rewrite actions with remote port mirroring
A rule with the following actions is split to a two level FDB:
1. Forward to local mirror vport
2. Header rewrite
3. Forward to local vport
In the first level flow table, forward the packet to the local port and
forward the packet to the second level flow table for header rewrite and
local port forwarding. This configuration fails when mirroring to a
remote encapsulated destination because currently an FTE cannot support
encap and table destinations.

Use the extended destination capabilities to configure the first level
flow table with a multi-destination FTE to the uplink and second level
table and the second level flow table for the header rewrite and local
port forwarding.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
38c9d2697b net/mlx5e: Replace the split logic with extended destination
Currently the FTE encap flag applies to all destinations.
To support mirroring encapsulated traffic to a local port the driver
split the two destinations to two flow table entries:
Table#0: - FWD to the local vport
         - Goto table#1
Table#1: - Encap and FWD to wire
The firmware extended destination capabilities enable the driver to set
an encapsulation flag per destination.

Remove the split logic and use the extended destination mechanism
instead.

Note that split technique is still required for pedit and VLAN push
scenarios.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
a18e879d4e net/mlx5e: Annul encap action ordering requirement
Currently a FW syndrome is emitted if the driver configures a
multi-destination FTE where the first destination is a tunneled uplink
port and the second destination is a local vPort.

Support this scenario by creating a multi-destination FTE using the
firmware's extended destination capabilities.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
f493f15534 net/mlx5e: Move flow attr reformat action bit to per dest flags
Flow attr reformat action bit is moved from the global action bits to a
per destination flags field, as a pre-step for adding additional flags
to support encapsulation properties per destination, with no
functionality change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
df65a573ea net/mlx5e: Refactor eswitch flow attr for destination specific properties
Currently the eswitch flow attr structure stores each destination
specific property in its own specific array.
Group them in an array of destination structures as a pre-step towards
adding additional destination specific field properties.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
e85e02bad2 net/mlx5: E-Switch, Rename esw attr mirror count field
The mirror count esw attributes field is used to determine if splitting
the rule to two FTEs is required while programming e-switch mirroring.
Rename it to split count, making it clearer with no functional change.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Eli Britstein
1228e912c9 net/mlx5: Consider encapsulation properties when comparing destinations
Currently the driver identifies identical vport destinations by
comparing the vport ID. The FW extended destination feature enables the
driver to forward the packet to the same vport with multiple
encapsulation properties.

Change the vport destination comparison logic to compare
the encapsulation properties in addition to the vport ID.

Signed-off-by: Eli Britstein <elibr@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-11 14:52:19 -08:00
Jason Gunthorpe
28ab1bb0e8 Linux 4.20-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlwNpb0eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGwGwH/00UHnXfxww3ixxz
 zwTVDzptA6SPm6s84yJOWatM5fXhPiAltZaHSYF9lzRzNU71NCq7Frhq3fQUIXKM
 OxqDn9nfSTWcjWTk2q5n2keyRV/KIn67YX7UgqFc1bO/mqtVjEgNWaMyblhI+e9E
 giu1ZXayHr43jK1cDOmGExZubXUq7Vsc9TOlrd+d2SwIqeEP7TCMrPhnHDwCNvX2
 UU5dtANpVzGtHaBcr37wJj+L8kODCc0f+PQ3g2ar5jTHst5SLlHp2u0AMRnUmgdi
 VkGx+mu/uk8mtwUqMIMqhplklVoqK6LTeLqsY5Xt32SKruw9UqyJGdphLjW2QP/g
 MkmA1lI=
 =7kaD
 -----END PGP SIGNATURE-----

Merge tag 'v4.20-rc6' into rdma.git for-next

For dependencies in following patches.
2018-12-11 14:24:57 -07:00
David S. Miller
addb067983 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-12-11

The following pull-request contains BPF updates for your *net-next* tree.

It has three minor merge conflicts, resolutions:

1) tools/testing/selftests/bpf/test_verifier.c

 Take first chunk with alignment_prevented_execution.

2) net/core/filter.c

  [...]
  case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
  case bpf_ctx_range(struct __sk_buff, wire_len):
        return false;
  [...]

3) include/uapi/linux/bpf.h

  Take the second chunk for the two cases each.

The main changes are:

1) Add support for BPF line info via BTF and extend libbpf as well
   as bpftool's program dump to annotate output with BPF C code to
   facilitate debugging and introspection, from Martin.

2) Add support for BPF_ALU | BPF_ARSH | BPF_{K,X} in interpreter
   and all JIT backends, from Jiong.

3) Improve BPF test coverage on archs with no efficient unaligned
   access by adding an "any alignment" flag to the BPF program load
   to forcefully disable verifier alignment checks, from David.

4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for
   proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz.

5) Extend tc BPF programs to use a new __sk_buff field called wire_len
   for more accurate accounting of packets going to wire, from Petar.

6) Improve bpftool to allow dumping the trace pipe from it and add
   several improvements in bash completion and map/prog dump,
   from Quentin.

7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for
   kernel addresses and add a dedicated BPF JIT backend allocator,
   from Ard.

8) Add a BPF helper function for IR remotes to report mouse movements,
   from Sean.

9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info
   member naming consistent with existing conventions, from Yonghong
   and Song.

10) Misc cleanups and improvements in allowing to pass interface name
    via cmdline for xdp1 BPF example, from Matteo.

11) Fix a potential segfault in BPF sample loader's kprobes handling,
    from Daniel T.

12) Fix SPDX license in libbpf's README.rst, from Andrey.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 18:00:43 -08:00
Pieter Jansen van Vuuren
290974d434 nfp: flower: ensure TCP flags can be placed in IPv6 frame
Previously we did not ensure tcp flags have a place to be stored
when using IPv6. We correct this by including IPv6 key layer when
we match tcp flags and the IPv6 key layer has not been included
already.

Fixes: 07e1671cfc ("nfp: flower: refactor shared ip header in match offload")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 17:45:41 -08:00
Thomas Falcon
1d1bbc37f8 ibmvnic: Fix non-atomic memory allocation in IRQ context
ibmvnic_reset allocated new reset work item objects in a non-atomic
context. This can be called from a tasklet, generating the output below.
Allocate work items with the GFP_ATOMIC flag instead.

BUG: sleeping function called from invalid context at mm/slab.h:421
in_atomic(): 1, irqs_disabled(): 1, pid: 93, name: kworker/0:2
INFO: lockdep is turned off.
irq event stamp: 66049
hardirqs last  enabled at (66048): [<c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0
hardirqs last disabled at (66049): [<c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0
softirqs last  enabled at (66044): [<c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160
softirqs last disabled at (66045): [<c0000000000306e0>] call_do_softirq+0x14/0x24
CPU: 0 PID: 93 Comm: kworker/0:2 Kdump: loaded Not tainted 4.20.0-rc6-00001-g1b50a8f03706 #7
Workqueue: events linkwatch_event
Call Trace:
[c0000003fffe7ae0] [c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable)
[c0000003fffe7b30] [c00000000015ba0c] ___might_sleep+0x2dc/0x320
[c0000003fffe7bb0] [c000000000391514] kmem_cache_alloc_trace+0x3e4/0x440
[c0000003fffe7c30] [d000000005b2309c] ibmvnic_reset+0x16c/0x360 [ibmvnic]
[c0000003fffe7cc0] [d000000005b29834] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic]
[c0000003fffe7e00] [c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0
[c0000003fffe7e60] [c000000000bf1238] __do_softirq+0x1a8/0x64c
[c0000003fffe7f90] [c0000000000306e0] call_do_softirq+0x14/0x24
[c0000003f3967980] [c00000000001ba50] do_softirq_own_stack+0x60/0xb0
[c0000003f39679c0] [c0000000001218a8] do_softirq+0xa8/0x100
[c0000003f39679f0] [c000000000121a74] __local_bh_enable_ip+0x174/0x180
[c0000003f3967a60] [c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80
[c0000003f3967a90] [c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160
[c0000003f3967ad0] [c000000000a8c8b0] dev_deactivate_many+0xd0/0x520
[c0000003f3967b70] [c000000000a8cd40] dev_deactivate+0x40/0x60
[c0000003f3967ba0] [c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0
[c0000003f3967bd0] [c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0
[c0000003f3967c30] [c000000000a5e728] linkwatch_event+0x48/0x60
[c0000003f3967c50] [c0000000001444e8] process_one_work+0x238/0x710
[c0000003f3967d20] [c000000000144a48] worker_thread+0x88/0x4e0
[c0000003f3967db0] [c00000000014e3a8] kthread+0x178/0x1c0
[c0000003f3967e20] [c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 17:34:25 -08:00
Thomas Falcon
6c5c748908 ibmvnic: Convert reset work item mutex to spin lock
ibmvnic_reset can create and schedule a reset work item from
an IRQ context, so do not use a mutex, which can sleep. Convert
the reset work item mutex to a spin lock. Locking debugger generated
the trace output below.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
in_atomic(): 1, irqs_disabled(): 1, pid: 120, name: kworker/8:1
4 locks held by kworker/8:1/120:
 #0: 0000000017c05720 ((wq_completion)"events"){+.+.}, at: process_one_work+0x188/0x710
 #1: 00000000ace90706 ((linkwatch_work).work){+.+.}, at: process_one_work+0x188/0x710
 #2: 000000007632871f (rtnl_mutex){+.+.}, at: rtnl_lock+0x30/0x50
 #3: 00000000fc36813a (&(&crq->lock)->rlock){..-.}, at: ibmvnic_tasklet+0x88/0x2010 [ibmvnic]
irq event stamp: 26293
hardirqs last  enabled at (26292): [<c000000000122468>] tasklet_action_common.isra.12+0x78/0x1c0
hardirqs last disabled at (26293): [<c000000000befce8>] _raw_spin_lock_irqsave+0x48/0xf0
softirqs last  enabled at (26288): [<c000000000a8ac78>] dev_deactivate_queue.constprop.28+0xc8/0x160
softirqs last disabled at (26289): [<c0000000000306e0>] call_do_softirq+0x14/0x24
CPU: 8 PID: 120 Comm: kworker/8:1 Kdump: loaded Not tainted 4.20.0-rc6 #6
Workqueue: events linkwatch_event
Call Trace:
[c0000003fffa7a50] [c000000000bc83e4] dump_stack+0xe8/0x164 (unreliable)
[c0000003fffa7aa0] [c00000000015ba0c] ___might_sleep+0x2dc/0x320
[c0000003fffa7b20] [c000000000be960c] __mutex_lock+0x8c/0xb40
[c0000003fffa7c30] [d000000006202ac8] ibmvnic_reset+0x78/0x330 [ibmvnic]
[c0000003fffa7cc0] [d0000000062097f4] ibmvnic_tasklet+0x1054/0x2010 [ibmvnic]
[c0000003fffa7e00] [c0000000001224c8] tasklet_action_common.isra.12+0xd8/0x1c0
[c0000003fffa7e60] [c000000000bf1238] __do_softirq+0x1a8/0x64c
[c0000003fffa7f90] [c0000000000306e0] call_do_softirq+0x14/0x24
[c0000003f3f87980] [c00000000001ba50] do_softirq_own_stack+0x60/0xb0
[c0000003f3f879c0] [c0000000001218a8] do_softirq+0xa8/0x100
[c0000003f3f879f0] [c000000000121a74] __local_bh_enable_ip+0x174/0x180
[c0000003f3f87a60] [c000000000bf003c] _raw_spin_unlock_bh+0x5c/0x80
[c0000003f3f87a90] [c000000000a8ac78] dev_deactivate_queue.constprop.28+0xc8/0x160
[c0000003f3f87ad0] [c000000000a8c8b0] dev_deactivate_many+0xd0/0x520
[c0000003f3f87b70] [c000000000a8cd40] dev_deactivate+0x40/0x60
[c0000003f3f87ba0] [c000000000a5e0c4] linkwatch_do_dev+0x74/0xd0
[c0000003f3f87bd0] [c000000000a5e694] __linkwatch_run_queue+0x1a4/0x1f0
[c0000003f3f87c30] [c000000000a5e728] linkwatch_event+0x48/0x60
[c0000003f3f87c50] [c0000000001444e8] process_one_work+0x238/0x710
[c0000003f3f87d20] [c000000000144a48] worker_thread+0x88/0x4e0
[c0000003f3f87db0] [c00000000014e3a8] kthread+0x178/0x1c0
[c0000003f3f87e20] [c00000000000bfd0] ret_from_kernel_thread+0x5c/0x6c

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-10 17:34:25 -08:00
Oz Shlomo
df2ef3bff1 net/mlx5e: Add GRE protocol offloading
Add HW offloading support for TC flower filters configured on
gretap/ip6gretap net devices.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:53:04 -08:00
Oz Shlomo
0621e6fc5e net: Add netif_is_gretap()/netif_is_ip6gretap()
Changed the is_gretap_dev and is_ip6gretap_dev logic from structure
comparison to string comparison of the rtnl_link_ops kind field.

This approach aligns with the current identification methods and function
names of vxlan and geneve network devices.

Convert mlxsw to use these helpers and use them in downstream mlx5 patch.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:53:04 -08:00
Oz Shlomo
101f4de9dd net/mlx5e: Move TC tunnel offloading code to separate source file
Move tunnel offloading related code to a separate source file for better
code maintainability.

Code refactoring with no functional change.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:53:04 -08:00
Oz Shlomo
54c177ca9c net/mlx5e: Branch according to classified tunnel type
Currently the tunnel offloading encap/decap methods assumes that VXLAN
is the sole tunneling protocol. Lay the infrastructure for supporting
multiple tunneling protocols by branching according to the tunnel
net device kind.

Encap filters tunnel type is determined according to the egress/mirred
net device. Decap filters classify the tunnel type according to the
filter's ingress net device kind.

Distinguish between the tunnel type as defined by the SW model and
the FW reformat type that specifies the HW operation being made.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:53:04 -08:00
Oz Shlomo
4d70564d1c net/mlx5e: Refactor VXLAN tunnel decap offloading code
Separates the vxlan header match handling from the matching on the
general fields of ipv4/6 tunnels, thus allowing the common IP tunnel
match code to branch in down stream patch, to multiple IP tunnels.

This patch doesn't add any functionality.

Signed-off-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-10 15:53:04 -08:00