Commit Graph

212 Commits

Author SHA1 Message Date
David S. Miller
ee58b57100 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, except the packet scheduler
conflicts which deal with the addition of the free list parameter
to qdisc_enqueue().

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30 05:03:36 -04:00
Rana Shahout
af7d518526 net/mlx4_en: Add DCB PFC support through CEE netlink commands
This patch adds support for reading and updating priority flow
control (PFC) attributes in the driver via netlink.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-23 15:18:50 -04:00
Eran Ben Elisha
9d76931180 net/mlx4_en: Avoid unregister_netdev at shutdown flow
This allows a clean shutdown, even if some netdev clients do not
release their reference from this netdev. It is enough to release
the HW resources only as the kernel is shutting down.

Fixes: 2ba5fbd62b ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22 16:38:11 -04:00
Kamal Heib
93c098af09 net/mlx4_en: Fix the return value of a failure in VLAN VID add/kill
Modify mlx4_en_vlan_rx_[add/kill]_vid to return error value in case of
failure.

Fixes: 8e586137e6 ('net: make vlan ndo_vlan_rx_[add/kill]_vid return error value')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-22 16:38:11 -04:00
Alexander Duyck
a831274a13 mlx4_en: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port
This change replaces the network device operations for adding or removing a
VXLAN port with operations that are more generically defined to be used for
any UDP offload port but provide a type.  As such by just adding a line to
verify that the offload type is VXLAN we can maintain the same
functionality.

In addition I updated the socket address family check so that instead of
excluding IPv6 we instead abort of type is not IPv4.  This makes much more
sense as we should only be supporting IPv4 outer addresses on this
hardware.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17 20:23:31 -07:00
Alexander Duyck
a547224dce mlx4e: Do not attempt to offload VXLAN ports that are unrecognized
The mlx4e driver does not support more than one port for VXLAN offload.  As
such expecting the hardware to offload other ports is invalid since it
appears the parsing logic is used to perform Tx checksum and segmentation
offloads.  Use the vxlan_port number to determine in which cases we can
apply the offload and in which cases we can not.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16 14:24:59 -07:00
Eric Dumazet
7d71e994cd net/mlx4_en: mlx4_en_netpoll() should schedule TX, not RX
I am not sure mlx4_en_netpoll() is doing anything useful right now.

mlx4 has different NAPI structures for RX and TX, and netpoll only wants
to drain TX queues.

Lets schedule NAPI polls on TX, not RX.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-09 22:24:16 -07:00
Eric Dumazet
f73a6f439f net/mlx4_en: get rid of private net_device_stats
We simply can use the standard net_device stats.

We do not need to clear fields that are already 0.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
9ed17db17f net/mlx4_en: get rid of ret_stats
mlx4 uses a private struct net_device_stats in a vain attempt
to avoid races.

This is buggy because multiple cpus could call mlx4_en_get_stats()
at the same time, so ret_stats can not guarantee stable results.

To fix this, we need to switch to ndo_get_stats64() as this
method provides per-thread storage.

This allows to reduce mlx4_en_priv bloat.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
45acbac609 net/mlx4_en: clear some TX ring stats in mlx4_en_clear_stats()
mlx4_en_clear_stats() clears about everything but few TX ring
fields are missing :
- queue_stopped, wake_queue, tso_packets, xmit_more

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:50 -07:00
Eric Dumazet
63a664b7e9 net/mlx4_en: fix tx_dropped bug
1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable
is overwritten in mlx4_en_DUMP_ETH_STATS().

2) This increment was not SMP safe, as a port might have many TX queues.

Add a per TX ring tx_dropped to fix these issues.

This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field.

So lets avoid bugs with SNMP agents having to cope with partial
overwraps. (One of these agents being bond_fold_stats())

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-25 22:15:49 -07:00
Haggai Abramovsky
73898db043 net/mlx4: Avoid wrong virtual mappings
The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory.  On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent.  Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent().  Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).

The mlx4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
device is opened.

Prevent Ethernet driver to run this problematic code by forcing it to
allocate contiguous memory. As for the Infiniband driver, at first we
are trying to allocate contiguous memory, but in case of failure roll
back to work with fragmented memory.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reported-by: David Daney <david.daney@cavium.com>
Tested-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-05 23:23:05 -04:00
Alexander Duyck
09067122db net/mlx4_en: Add support for inner IPv6 checksum offloads and TSO
>From what I can tell the ConnectX-3 will support an inner IPv6 checksum and
segmentation offload, however it cannot support outer IPv6 headers.  This
assumption is based on the fact that I could see the checksum being
offloaded for inner header on IPv4 tunnels, but not on IPv6 tunnels.

For this reason I am adding the feature to the hw_enc_features and adding
an extra check to the features_check call that will disable GSO and
checksum offload in the case that the encapsulated frame has an outer IP
version of that is not 4.  The check in mlx4_en_features_check could be
removed if at some point in the future a fix is found that allows the
hardware to offload segmentation/checksum on tunnels with an outer IPv6
header.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 13:32:27 -04:00
Alexander Duyck
3c9346b240 net/mlx4_en: Add support for UDP tunnel segmentation with outer checksum offload
This patch assumes that the mlx4 hardware will ignore existing IPv4/v6
header fields for length and checksum as well as the length and checksum
fields for outer UDP headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 13:32:27 -04:00
Hannes Frederic Sowa
0c5c3252c4 mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock
mlx4_en_start_port requires rtnl_lock to be held.

Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-21 15:35:43 -04:00
David S. Miller
810813c47a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 12:34:12 -05:00
John Fastabend
5eb4dce3b3 net: relax setup_tc ndo op handle restriction
I added this check in setup_tc to multiple drivers,

 if (handle != TC_H_ROOT || tc->type != TC_SETUP_MQPRIO)

Unfortunately restricting to TC_H_ROOT like this breaks the old
instantiation of mqprio to setup a hardware qdisc. This patch
relaxes the test to only check the type to make it equivalent
to the check before I broke it. With this the old instantiation
continues to work.

A good smoke test is to setup mqprio with,

# tc qdisc add dev eth4 root mqprio num_tc 8 \
  map 0 1 2 3 4 5 6 7 \
  queues 0@0 1@1 2@2 3@3 4@4 5@5 6@6 7@7

Fixes: e4c6734eaa ("net: rework ndo tc op to consume additional qdisc handle paramete")
Reported-by: Singh Krishneil <krishneil.k.singh@intel.com>
Reported-by: Jake Keller <jacob.e.keller@intel.com>
CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03 16:25:15 -05:00
Jack Morgenstein
6e5224224f net/mlx4_core: Allow resetting VF admin mac to zero
The VF administrative mac addresses (stored in the PF driver) are
initialized to zero when the PF driver starts up.

These addresses may be modified in the PF driver through ndo calls
initiated by iproute2 or libvirt.

While we allow the PF/host to change the VF admin mac address from zero
to a valid unicast mac, we do not allow restoring the VF admin mac to
zero. We currently only allow changing this mac to a different unicast mac.

This leads to problems when libvirt scripts are used to deal with
VF mac addresses, and libvirt attempts to revoke the mac so this
host will not use it anymore.

Fix this by allowing resetting a VF administrative MAC back to zero.

Fixes: 8f7ba3ca12 ('net/mlx4: Add set VF mac address support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Moshe Levi <moshele@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-02 14:42:46 -05:00
Jiri Pirko
09d4d087cd mlx4: Implement devlink interface
Implement newly introduced devlink interface. Add devlink port instances
for every port and set the port types accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
v2->v3:
-add dev param to devlink_register (api change)
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-01 16:07:29 -05:00
David S. Miller
b633353115 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23 00:09:14 -05:00
Eugenia Emantayev
925ab1aa93 net/mlx4_en: Avoid changing dev->features directly in run-time
It's forbidden to manually change dev->features in run-time. Currently, this is
done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
VXLAN tunnel is set. However, since the stack actually does features intersection
with hw_enc_features, we can safely revert to advertizing features early when
registering the netdevice.

Fixes: f4a1edd561 ('net/mlx4_en: Advertize encapsulation offloads [...]')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 10:29:27 -05:00
John Fastabend
16e5cc6471 net: rework setup_tc ndo op to consume general tc operand
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.

This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
John Fastabend
e4c6734eaa net: rework ndo tc op to consume additional qdisc handle parameter
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.

This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.

CC: Murali Karicheri <m-karicheri2@ti.com>
CC: Shradha Shah <sshah@solarflare.com>
CC: Or Gerlitz <ogerlitz@mellanox.com>
CC: Ariel Elior <ariel.elior@qlogic.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 09:47:35 -05:00
David S. Miller
c07f30ad68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-12-31 18:20:10 -05:00
Eugenia Emantayev
90683061dd net/mlx4_en: Fix HW timestamp init issue upon system startup
mlx4_en_init_timestamp was called before creation of netdev and port
init, thus used uninitialized values.  Specifically - NIC frequency was
incorrect causing wrong calculations and later wrong HW timestamps.

Fixes: 1ec4864b10 ('net/mlx4_en: Fixed crash when port type is changed')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 14:48:04 -05:00
Eugenia Emantayev
fc9f5ea9b4 net/mlx4_en: Remove dependency between timestamping capability and service_task
Service task is responsible for other tasks in addition to timestamping
overflow check. Launch it even if timestamping is not supported by device.

Fixes: 07841f9d94 ('net/mlx4_en: Schedule napi when RX buffers allocation fails')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-18 14:48:03 -05:00
Eric Dumazet
868fdb0606 mlx4: remove mlx4_en_low_latency_recv()
Busy polling can now be handled in generic NAPI poll infrastructure.
This removes complexity and fast path overhead :

mlx4 used two spin_lock()/spin_unlock() pair per napi->poll() call
in mlx4_en_cq_lock_napi()/mlx4_en_cq_unlock_napi()

Tested:

Without busy polling :

lpaa23:~# echo 0 >/proc/sys/net/core/busy_read
lpaa24:~# echo 0 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    47330.78

With busy polling :

lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
lpaa23:~# ./netperf -H lpaa24 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    97643.55

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:40 -05:00
Jack Morgenstein
2b3ddf27f4 net/mlx4_core: Replace VF zero mac with random mac in mlx4_core
By design, when no default MAC addresses are set in the Hypervisor for VFs,
the VFs are passed zero-macs. When such a MAC is received by the VF, it
generates a random MAC address and registers that MAC address
with the Hypervisor.

This random mac generation is currently done in the mlx4_en module.
There is a problem, though, if the mlx4_ib module is loaded by a VF before
the mlx4_en module. In this case, for RoCE, mlx4_ib will see the un-replaced
zero-mac and register that zero-mac as part of QP1 initialization.

Having a zero-mac in the port's MAC table creates problems for a
Baseboard Management Console. The BMC occasionally sends packets with a
zero-mac destination MAC. If there is a zero-mac present in the port's
MAC table, the FW will send such BMC packets to the host driver rather than
to the wire, and BMC will stop working.

To address this problem, we move the replacement of zero-mac addresses
with random-mac addresses to procedure mlx4_slave_cap(), which is part of the
driver startup for VFs, and is before activation of mlx4_ib and mlx4_en.
As a result, zero-mac addresses will never be registered in the port MAC table
by the driver.

In addition, when mlx4_en does initialize the net device, it needs to set
the NET_ADDR_RANDOM flag in the netdev structure if the address was
randomly generated. This is done so that udev on the VM does not create
a new device name after each VF probe (VM boot and such). To accomplish this,
we add a per-port flag in mlx4_dev which gets set whenever mlx4_core replaces
a zero-mac with a randomly-generated mac. This flag is examined when mlx4_en
initializes the net-device.

Fix was suggested by Matan Barak <matanb@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-14 19:14:44 -07:00
Ido Shamay
ba4b87aedd net/mlx4_en: Add steering rules after RSS creation
Changed the receive control flow in a way that steering
rules are added only when the RSS object is already in RTR/RTS mode.
Some optimization features, which are enabled by the device firmware,
require this condition in order to be effective.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-09 07:27:50 -07:00
Hadar Hen Zion
e38af4faf0 net/mlx4_en: Add support for hardware accelerated 802.1ad vlan
To enable device support in accelerated 802.1ad vlan, the port
capability "packet has vlan enable" (phv_en) should be set.
Firmware won't work properly, in case phv_en is not set.

The user can enable "phv_en" port capability with the new ethtool
private flag phv-bit. The phv-bit private flag default value is OFF,
users who are interested in 802.1ad hardware acceleration should turn ON
the phv-bit private flag:
$ ethtool --set-priv-flags eth1 phv-bit on

Once the private flag is set, the device is ready for 802.1ad vlan
acceleration.

The user should also change the interface device features and turn on
"tx-vlan-stag-hw-insert" which is off by default:
$ ethtool -K eth1  tx-vlan-stag-hw-insert on

"phv-bit" private flag setting is available only for Physical
Functions(PF), the Virtual Function (VF) will be able to use the feature
by setting "tx-vlan-stag-hw-insert" ethtool device feature only if the
feature was enabled by the Hypervisor.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 15:00:37 -07:00
Eran Ben Elisha
0eb08514fd net/mlx4_en: Release TX QP when destroying TX ring
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.

Fixes: ddae0349fd ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:26 -07:00
Eran Ben Elisha
62a890557f net/mlx4_en: Support ndo_get_vf_stats
Implement the ndo to gather VF statistics through the PF.

All counters related to this VF are stored in a per slave
list, run over the slave's list and collect all statistics.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:03 -07:00
Eran Ben Elisha
b42de4d012 net/mlx4_en: Show PF own statistics via ethtool
Allow the user to observe the PF own statistics using ethtool with pf_
prefixed counter names.

Those counters are the PF statistics out of the overall port statistics.
Every PF QP is attached to a counter and the summary of those counters
is the PF statistics.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
Eran Ben Elisha
6de5f7f6a1 net/mlx4_core: Allocate default counter per port
Default counter per port will be allocated at the mlx4 core driver load.

Every QP opened by the Ethernet driver will be attached to the port's default
counter.  This is an infrastructure step to collect VF statistics from the PF.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
David S. Miller
dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
Matan Barak
c66fa19c40 net/mlx4: Add EQ pool
Previously, mlx4_en allocated EQs and used them exclusively.
This affected RoCE performance, as applications which are
events sensitive were limited to use only the legacy EQs.

Change that by introducing an EQ pool. This pool is managed
by mlx4_core. EQs are assigned to ports (when there are limited
number of EQs, multiple ports could be assigned to the same EQs).

An exception to this rule is the ASYNC EQ which handles various events.

Legacy EQs are completely removed as all EQs could be shared.

When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
EQ serving on a specific port. The core driver calculates which
EQ should be assigned to that request.

Because IRQs are shared between IB and Ethernet modules, their
names only include the PCI device BDF address.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:35:34 -07:00
Rusty Russell
f36963c9d3 cpumask_set_cpu_local_first => cpumask_local_spread, lament
da91309e0a (cpumask: Utility function to set n'th cpu...) created a
genuinely weird function.  I never saw it before, it went through DaveM.
(He only does this to make us other maintainers feel better about our own
mistakes.)

cpumask_set_cpu_local_first's purpose is say "I need to spread things
across N online cpus, choose the ones on this numa node first"; you call
it in a loop.

It can fail.  One of the two callers ignores this, the other aborts and
fails the device open.

It can fail in two ways: allocating the off-stack cpumask, or through a
convoluted codepath which AFAICT can only occur if cpu_online_mask
changes.  Which shouldn't happen, because if cpu_online_mask can change
while you call this, it could return a now-offline cpu anyway.

It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
drawn to my attention by Geert, who said this causes a warning on Sparc.
It sets a single bit in a cpumask instead of returning a cpu number,
because that's what the callers want.

It could be made more efficient by passing the previous cpu rather than
an index, but that would be more invasive to the callers.

Fixes: da91309e0a
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
Tested-by: Amir Vadai <amirv@mellanox.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
2015-05-28 11:05:20 +09:30
Ido Shamay
07841f9d94 net/mlx4_en: Schedule napi when RX buffers allocation fails
When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30 16:47:50 -04:00
Benjamin Poirier
f94813f3c1 mlx4_en: Use correct loop cursor in error path.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Fixes: 9e311e7 ("net/mlx4_en: Use affinity hint")
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30 16:25:14 -04:00
Muhammad Mahajna
78500b8c03 net/mlx4_en: Add RX-ALL support
Enabled when the device supports KEEP FCS and IGNORE FCS.

When the flag is set, pass all received frames up the stack,
even ones with invalid FCS, controlled by ethtool.

Signed-off-by: Muhammad Mahajna <muhammadm@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:04 -04:00
Muhammad Mahajna
f0df35037a net/mlx4_en: Add RX-FCS support
Enabled when device supports KEEP FCS. When the flag is set, Ethernet FCS
is appended to the end of the frame, controlled by ethtool.

Signed-off-by: Muhammad Mahajna <muhammadm@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:04 -04:00
Ido Shamay
3742cc6551 net/mlx4: Warn users of depracated QoS Firmware
A new capability bit was introduced in the past to to differ devices
using the QoS ETS feature. The old was deprecated since then.
If driver sees device which set only the old capabilty, it will print
warning to user suggesting to upgrade the FW.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
cda373f484 net/mlx4_en: Enable TX rate limit per VF
Support granular QoS per VF, by implementing the ndo_set_vf_rate.

Enforce a rate limit per VF when called, and enabled only for VFs in
VST mode with user priority supported by the device.

We don't enforce VFs to be in VST mode at the moment of configuration,
but rather save the given rate limit and enforce it when the VF is
moved to VST with user priority which is supported (currently 0).

VST<->VGT or VST qos value state changes are disallowed when a rate
limit is configured. Minimum BW share is not supported yet.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
241a08c3a7 net/mlx4_en: Change loopback only upon feature change
Currently any change of netdev features results in a call to
mlx4_en_update_loopback_state(). Those calls are unnecessary,
and should be called only upon loopback feature change.

Also moved some of the logic into mlx4_en_update_loopback_state().

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:24:51 -04:00
David S. Miller
9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Matan Barak
0b131561a7 net/mlx4_en: Add Flow control statistics display via ethtool
Flow control per priority and Global pause counters are now visible via
ethtool.  The counters shows statistics regarding pauses in the device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:51 -04:00
Eran Ben Elisha
3da8a36cc5 net/mlx4_en: Protect access to the statistics bitmap
This will allow parallel access to the statistics bitmap.
A pre-step for adding PFC counters, where the statistics bitmap
can be dynamically changed when modifying the PFC setting.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
6fcd27354b net/mlx4_en: Support general selective view of ethtool statistics
The driver uses a bitmask to indicate which statistics should be
displayed to the user in ethtool. The bitmask is u64, therefore we are
limited for a selective view of up to 64 statistics. Extend the bitmap
in order to show more than 64 statistics.

In addition, add packet statistics to the ethtool display for PF.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
ffa88f37ff net/mlx4_en: Move statistics bitmap setting to the Ethernet driver
The statistics bitmap belongs to the Ethernet driver, move it there.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Toshiaki Makita
8cb65d0008 net: Move check for multiple vlans to drivers
To allow drivers to handle the features check for multiple tags,
move the check to ndo_features_check().
As no drivers currently handle multiple tagged TSO, introduce
dflt_features_check() and call it if the driver does not have
ndo_features_check().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 13:33:22 -07:00