Commit Graph

23437 Commits

Author SHA1 Message Date
Subash Abhinov Kasiviswanathan
9deb441c11 net: ipv6: Generate random IID for addresses on RAWIP devices
RAWIP devices such as rmnet do not have a hardware address and
instead require the kernel to generate a random IID for the
IPv6 addresses.

Signed-off-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 10:16:25 -04:00
David S. Miller
7d840a6065 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-06-04 22:23:35 -04:00
David S. Miller
d67b66b45a Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-06-04

This series contains a smorgasbord of updates to documentation, e1000e,
igb, ixgbe, ixgbevf and i40e.

Benjamin Poirier fixes a potential kernel crash due to NULL pointer
dereference in e1000e.

Jeff updates the kernel documentation for e100 and e1000 to correct
default values and URLs which were incorrect in the documentation.  Also
took the time to update these to the new reStructured text format for
kernel documentation.

Joanna Yurdal fixes a missing PTP transmit timestamp by ensuring that
TSICR gets cleared when ICR is cleared.

Sergey updates igb to reset all the transmit queues at one time so that
we only have to wait once for all the queues to be reset.

Alex fixes ixgbevf so that malicious driver detection (MDD) can co-exist
with XDP.

Emil and Tony extend the RTNL lock to ensure we get the most up-to-date
values for the bits and avoid a possible race condition when going down.

YueHaibing from Huawei introduces a helper function in ixgbe for
operation reads to simplify the code a bit more.

Daniel Borkmann adds support for XDP meta data when using build SKB
for i40e.

Shannon Nelson provides twp fixes for the IPSec code in ixgbe, first is
to make sure we do not try to offload the decryption of any incoming
packet that is destined for the management engine.  The other fix is to
resolve a cast problem introduced by a sparse cleanup patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:35:35 -04:00
Xi Wang
f0b964e5e4 net: hns: Fix the process of adding broadcast addresses to tcam
If the multicast mask value in device tree is configured not all
0xff, the broadcast mac will be lost from tcam table after the
execution of command 'ifconfig up'. The address is appended by
hns_ae_start, but will be clear later by hns_nic_set_rx_mode
called in dev_open process.

This patch fixed it by not use the multicast mask when add a
broadcast address.

Fixes: b5996f11ea ("net: add Hisilicon Network Subsystem basic ethernet support")
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:32:56 -04:00
YueHaibing
ff2e351e19 qed: use dma_zalloc_coherent instead of allocator/memset
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:28:43 -04:00
Yuval Bason
39dbc646fd qed: Add srq core support for RoCE and iWARP
This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:09:54 -04:00
Varsha Rao
b8aac410b7 net: ethernet: bnx2: Replace NULL comparison
This patch fixes the checkpatch issue of NULL comparison. Replace x == NULL
with !x, by using the following coccinelle script:

@disable is_null@
expression e;
@@
-e==NULL
+!e

Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:07:27 -04:00
Varsha Rao
6dc5aa2123 net: ethernet: bnx2: Remove extra parentheses
The following coccinelle script removes extra parentheses to fix the
clang warning of extraneous parentheses.

@disable paren@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:07:27 -04:00
YueHaibing
13ce3bc9c1 net: gemini: fix spelling mistake: "it" -> "is"
Trivial fix to spelling mistake in gemini dev_warn message

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 17:05:39 -04:00
YueHaibing
40434a670f net: chelsio: Use zeroing memory allocator instead of allocator/memset
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 16:07:30 -04:00
Sergei Shtylyov
1100149a23 sh_eth: use DIV_ROUND_UP() in sh_eth_soft_swap()
When initializing 'maxp' in sh_eth_soft_swap(), the buffer length needs
to be rounded  up -- that's just asking for DIV_ROUND_UP()!

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:25 -04:00
Sergei Shtylyov
bb2fa4e847 sh_eth: uninline sh_eth_soft_swap()
sh_eth_tsu_soft_swap() is called twice by the driver, remove *inline* and
move  that function  from the header to the driver itself to let gcc decide
whether to expand it inline or not...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:25 -04:00
Sergei Shtylyov
232b6743e4 sh_eth: make sh_eth_soft_swap() work on ARM
Browsing  thru the driver disassembly, I noticed that ARM gcc generated
no  code  whatsoever for sh_eth_soft_swap() while building a little-endian
kernel -- apparently __LITTLE_ENDIAN__ was not being #define'd, however
it got implicitly #define'd when building with the SH gcc (I could only
find the explicit #define __LITTLE_ENDIAN that was #include'd when building
a little-endian kernel).  Luckily, the Ether controller  only doing big-
endian DMA is encountered on the early SH771x SoCs only and all ARM SoCs
implement EDMR.DE and thus set 'sh_eth_cpu_data::hw_swap'. But anyway, we
need to fix the #ifdef inside sh_eth_soft_swap() to something that would
work on all architectures...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 15:23:24 -04:00
Shannon Nelson
9a75fa5c16 ixgbe: fix broken ipsec Rx with proper cast on spi
Fix up a cast problem introduced by a sparse cleanup patch.  This fixes
a problem where the encrypted packets were not recognized on Rx and
subsequently dropped.

Fixes: 9cfbfa701b ("ixgbe: cleanup sparse warnings")
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:31:22 -07:00
Shannon Nelson
2a8a15526d ixgbe: check ipsec ip addr against mgmt filters
Make sure we don't try to offload the decryption of an incoming
packet that should get delivered to the management engine.  This
is a corner case that will likely be very seldom seen, but could
really confuse someone if they were to hit it.

Suggested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:29:32 -07:00
Petr Machata
1fc68bb7c3 mlxsw: spectrum_span: Suppress VLAN on BRIDGE_VLAN_INFO_UNTAGGED
When offloading mirroring to gretap or ip6gretap netdevices, an 802.1q
bridge is one of the soft devices permissible in the underlay when
resolving the packet path. After the packet path is resolved to a
particular bridge egress device, flags on packet VLAN determine whether
the egressed packet should be tagged.

The current logic however only ever sets the VLAN tag, never suppresses
it. Thus if there's a VLAN netdevice above the bridge that determines
the packet VLAN, that VLAN is never unset, and mirroring is configured
with VLAN tagging.

Fix by setting the packet VLAN on both branches: set to zero (for unset)
when BRIDGE_VLAN_INFO_UNTAGGED, copy the resolved VLAN (e.g. from bridge
PVID) otherwise.

Fixes: 946a11e740 ("mlxsw: spectrum_span: Allow bridge for gretap mirror")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:27:57 -04:00
Petr Machata
f07ff01406 mlxsw: spectrum_switchdev: Postpone respin on object deletion
VLAN deletion notifications are emitted before the relevant change is
projected to bridge configuration. Thus, like with VLAN addition,
schedule SPAN respin for later.

Fixes: c520bc6986 ("mlxsw: Respin SPAN on switchdev events")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:27:56 -04:00
Tony Nguyen
88adce4ea8 ixgbe: fix possible race in reset subtask
Similar to ixgbevf, the same possibility for race exists. Extend the RTNL
lock in ixgbe_reset_subtask() to protect the state bits; this is to make
sure that we get the most up-to-date values for the bits and avoid a
possible race when going down.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:25:15 -07:00
Daniel Borkmann
cc5b114dcf bpf, i40e: add meta data support
Add support for XDP meta data when using build skb variant of
the i40e driver. Implementation is analogous to the existing
ixgbe and ixgbevf support for meta data from 366a88fe2f ("bpf,
ixgbe: add meta data support") and be8333322e ("ixgbevf: Add
support for meta data"). With the build skb variant we get
192 bytes of extra headroom which can be used for encaps or
meta data.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:23:58 -07:00
YueHaibing
e9c721837d ixgbe: introduce a helper to simplify code
ixgbe_dbg_reg_ops_read and ixgbe_dbg_netdev_ops_read copy-pasting
the same code except for ixgbe_dbg_netdev_ops_buf/ixgbe_dbg_reg_ops_buf,
so introduce a helper ixgbe_dbg_common_ops_read to remove redundant code.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:21:13 -07:00
Emil Tantilov
7d6446db1b ixgbevf: fix possible race in the reset subtask
Extend the RTNL lock in ixgbevf_reset_subtask() to protect the state bits
check in addition to the call to ixgbevf_reinit_locked().

This is to make sure that we get the most up-to-date values for the bits
and avoid a possible race when going down.

Suggested-by: Zhiping du <zhipingdu@tencent.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:19:32 -07:00
Alexander Duyck
4be87727d4 ixgbevf: Fix coexistence of malicious driver detection with XDP
In the case of the VF driver it is supposed to provide a context descriptor
that allows us to provide information about the header offsets inside of
the frame. However in the case of XDP we don't really have any of that
information since the data is minimally processed. As a result we were
seeing malicious driver detection (MDD) events being triggered when the PF
had that functionality enabled.

To address this I have added a bit of new code that will "prime" the XDP
ring by providing one context descriptor that assumes the minimal setup of
an Ethernet frame which is an L2 header length of 14. With just that we can
provide enough information to make the hardware happy so that we don't
trigger MDD events.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:17:55 -07:00
Sergey Nemov
06140c793d igb: Wait 10ms just once after TX queues reset
Move 10ms sleep out of function resetting TX queue.
Reset all the TX queues in one turn and
wait for all of them just once.

Use usleep_range() instead of mdelay() in order not to
affect transmission on other interfaces.

Signed-off-by: Sergey Nemov <sergey.nemov@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:11:14 -07:00
Joanna Yurdal
1ec2297c5c igb: Clear TSICR interrupts together with ICR
Issuing "ip link set up/down" can block TSICR interrupts, what results in
missing PTP Tx timestamp and no PPS pulse generation.

Problem happens when the link is set up with the TSICR interrupts pending.
ICR is cleared before enabling interrupts, while TSICR is not. When all TSICR
interrupts are pending at this moment, time_sync interrupt will never
be generated. TSICR should be cleared as well.

In order to reproduce the issue:
1. Setup linux with IEEE 1588 grandmaster and PPS output enabled
2. Continue setting link up/down with random intervals between commands
3. Wait until PPS is not generated ( only one pulse is generated and PPS
dies), and ptp4l complains constantly about Tx timeout.

Signed-off-by: Joanna Yurdal <jyu@trackman.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 09:55:36 -07:00
Benjamin Poirier
fff200caf6 e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 09:45:53 -07:00
kbuild test robot
fe083b3f06 net: mvpp2: mvpp2_percpu_read_relaxed() can be static
Fixes: db9d7d36ee ("net: mvpp2: Split the PPv2 driver to a dedicated directory")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 11:36:29 -04:00
Colin Ian King
76a45194e5 net: aquantia: make function aq_fw2x_get_mac_permanent static
The function aq_fw2x_get_mac_permanent is local to the source and does
not need to be in global scope, so make it static.

Cleans up sparse warning:
warning: symbol 'aq_fw2x_get_mac_permanent' was not declared. Should it
be static?

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 11:30:50 -04:00
Varsha Rao
cfed0a2c98 net: ethernet: mlx4: Remove unnecessary parentheses
This patch fixes the clang warning of extraneous parentheses, with the
following coccinelle script.

@@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:14:39 -04:00
Jose Abreu
9a8a02c9d4 net: stmmac: Add Flexible PPS support
This adds support for Flexible PPS output (which is equivalent
to per_out output of PTP subsystem).

Tested using an oscilloscope and the following commands:

1) Start PTP4L:
	# ptp4l -A -4 -H -m -i eth0 &
2) Set Flexible PPS frequency:
	# echo <idx> <ts> <tns> <ps> <pns> > /sys/class/ptp/ptpX/period

Where, ts/tns is start time and ps/pns is period time, and ptpX is ptp
of eth0.

Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Vitor Soares <soares@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:13:16 -04:00
Sudarsana Reddy Kalluru
b5fabb0800 qed: Fix use of incorrect shmem address.
Incorrect shared memory address is used while deriving the values
for tc and pri_type. Use shmem address corresponding to 'oem_cfg_func'
where the management firmare saves tc/pri_type values.

Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:10:46 -04:00
Sudarsana Reddy Kalluru
5e9f20359a qed: Fix shared memory inconsistency between driver and the MFW.
The structure shared between driver and management firmware (MFW)
differ in sizes. The additional field defined by the MFW is not
relevant to the current driver. Add a dummy field to the structure.

Fixes: cac6f691 ("qed: Add support for Unified Fabric Port")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:10:46 -04:00
Eric Dumazet
885892fb37 mlx4_core: restore optimal ICM memory allocation
Commit 1383cb8103 ("mlx4_core: allocate ICM memory in page size chunks")
brought two regressions caught in our regression suite.

The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
or 6.25 % which is unacceptable since ICM can be pretty large.

This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
(instead of prior 256KB)

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

Most of these allocations happen right after boot time, when we get
plenty of non fragmented memory, there is really no point being so
pessimistic and break huge pages into order-0 ones just for fun.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

Second regression is an KASAN fault, that will need further investigations.

Fixes: 1383cb8103 ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 11:01:44 -04:00
YueHaibing
deba328ce9 net: axienet: remove stale comment of axienet_open
axienet_open no longer return -ENODEV when PHY cannot be connected to
since commit d7cc3163e0 ("net: axienet: Support phy-less mode of operation")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 10:59:32 -04:00
YueHaibing
a6ee84be2c net: netcp: ethss: remove unnecessary pointer set to NULL
If statement has make sure the 'slave->phy' is NULL

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 10:39:48 -04:00
Wei Yongjun
8cb77149e8 net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 10:36:31 -04:00
David S. Miller
9c54aeb03a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 09:31:58 -04:00
Tariq Toukan
f65a59ffbc net/mlx5e: TX, Separate cachelines of xmit and completion stats
Avoid false sharing of cachelines by separating the cachelines of
TX stats that are dertied in xmit flow and in completion flow.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
5ffd81943d net/mlx5e: RX, Always prefer Linear SKB configuration
Prefer the linear SKB configuration of Legacy RQ over the
non-linear one of Striding RQ.

This implies that ConnectX-4 LX now uses legacy RQ by default,
as it does not support the linear configuration of Striding RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
069d11465a net/mlx5e: RX, Enhance legacy Receive Queue memory scheme
Enhance the memory scheme of the legacy RQ, such that
only order-0 pages are used.

Whenever possible, prefer using a linear SKB, and build it
wrapping the WQE buffer.

Otherwise (for example, jumbo frames on x86), use non-linear SKB,
with as many frags as needed. In this case, multiple WQE
scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.

This implied to remove support of HW LRO in legacy RQ, as it would
require large number of page allocations and scatter entries per WQE
on archs with PAGE_SIZE = 4KB, yielding bad performance.

In earlier patches, we guaranteed that all completions are in-order,
and that we use a cyclic WQ.
This creates an oppurtunity for a performance optimization:
The mapping between a "struct mlx5e_dma_info", and the
WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
across different cycles of a WQ. This allows initializing
the mapping in the time of RQ creation, and not handle it
in datapath.

A struct mlx5e_dma_info that is shared between different WQEs
is allocated by the first WQE, and freed by the last one.
This implies an important requirement: WQEs that share the same
struct mlx5e_dma_info must be posted within the same NAPI.
Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
point to the new struct mlx5e_dma_info, not the one that was posted
(and the HW wrote to).
This bulking requirement is actually good also for performance reasons,
hence we extend the bulk beyong the minimal requirement above.

With this memory scheme, the RQs memory footprint is reduce by a
factor of 2 on x86, and by a factor of 32 on PowerPC.
Same factors apply for the number of pages in a GRO session.

Performance tests:
ConnectX-4, single core, single RX ring, default MTU.

x86:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet rate (early drop in TC): no degradation
TCP streams: ~5% improvement

PowerPC:
CPU: POWER8 (raw), altivec supported

Packet rate (early drop in TC): 20% gain
TCP streams: 25% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
99cbfa93a6 net/mlx5e: RX, Use cyclic WQ in legacy RQ
Now that LRO is not supported for Legacy RQ, there is no source of
out-of-order completions in the WQ, and we can use a cyclic one.
This has multiple advantages:
- reduces the WQE size (smaller PCI transactions).
- lower overhead in datapath (no handling of 'next' pointers).
- no reserved WQE for the WQ head (was need in linked-list).
- allows using a constant map between frag and dma_info struct, in downstream patch.

Performance tests:
ConnectX-4, single core, single RX ring.
Major gain in packet rate of single ring XDP drop.
Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
422d4c401e net/mlx5e: RX, Split WQ objects for different RQ types
Replace the common RQ WQ object with two separate ones for the
different RQ types.
This is in preparation for switching to using a cyclic WQ type
in Legacy RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
6c3a823e1e net/mlx5e: RX, Remove HW LRO support in legacy RQ
Current LRO implementation in Legacy RQ uses high-order pages.
In downstream patches of this series we complete the transition
to using only order-0 pages in RX datapath (which was already done
in Striding RQ).

Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
of any non-consumed buffers of non-full LRO sessions, and combining
it with order-0 pages has many performance drawbacks.

Hence, here we totally remove LRO support in Legacy RQ.
This guarantees having no out-of-order completions, which allows using
a cyclic work queue (instead of a linked-list) in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
386471f16b net/mlx5e: RX, Dedicate a function for copying SKB header
Get the logic of copying the packet header into the SKB linear part
into a generic function. Function does copy length alignment
and dma buffer sync.

It is currently called only within the MPWQE flow.
In a downstream patch, it will be called within the legacy RQ flow
as well.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
fa698366b7 net/mlx5e: RX, Generalise function of SKB frag addition
Rename it and pass truesize as an extra argument, as it will be used also
in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
75aa889fb9 net/mlx5e: RX, Generalise name of non-linear SKB head size
Make name more generic by dropping MPWRQ from it, as it will be
used also in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
5e7d77a9c5 net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data
Instead of maintaining a local copy of skb->len/data and updating
it upon every copy to the WQE inline part, just calculate it once
when needed, using the ihs.

This obsoletes the function mlx5e_tx_skb_pull_inline.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Ilan Tayari
98db16bab5 net/mlx5: FPGA, Handle QP error event
Add handlers for this event to perform graceful teardown of the device.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Adi Nissim
250a42b6a7 net/mlx5e: Support configurable MTU for vport representors
The representor MTU was hard coded to 1500 bytes.
Allow setting arbitrary MTU values up to the max supported by the FW.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Maor Gottlieb
93edcb3a75 net/mlx5e: Increase aRFS flow tables size
Increase the aRFS flow table size to 64k so it could contain up to 64k
different streams.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Eran Ben Elisha
6c63efe4cf net/mlx5e: Remove redundant active_channels indication
Now, when all channels stats are saved regardless of the channel's state
{open, closed}, we can safely remove this indication and the stats spin
lock which protects it.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00