Commit Graph

1262 Commits

Author SHA1 Message Date
Hadar Hen Zion
e802f8e4c5 net/mlx4: Prepare VLAN macros for 802.1ad Hardware accelerated support
To add Hardware accelerated support in 802.1ad vlan, replace
Current VLAN macros to CVLAN.
Replace:
MLX4_WQE_CTRL_INS_VLAN
MLX4_CQE_VLAN_PRESENT_MASK
With:
MLX4_WQE_CTRL_INS_CVLAN
MLX4_CQE_CVLAN_PRESENT_MASK

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 15:00:37 -07:00
Hadar Hen Zion
7c509a48ff net/mlx4_en: Prepare ethtool private flags to support more flags
Currently we support only one ethtool private flag. Prepare
mlx4_en_set_priv_flags function to support more than one private flag.
Will be used in the next patch to support hardware accelerated 802.1ad
vlan.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 15:00:36 -07:00
Hadar Hen Zion
77fc29c4bb net/mlx4_core: Preparations for 802.1ad VLAN support
mlx4_core preparation to support hardware accelerated 802.1ad VLAN
device.

To allow 802.1ad accelerated device, "packet has vlan" (phv)
Firmware capability should be available. Firmware without the
phv capability won't behave properly and can't support 802.1ad device
acceleration.

The driver checks the Firmware capability and sets the phv bit
accordingly in SET_PORT command.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 15:00:36 -07:00
Achiad Shochat
a741749f21 net/mlx5e: Input IPSEC.SPI into the RX RSS hash function
In addition to the source/destination IP which are already hashed.
Only for unicast traffic for now.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
Achiad Shochat
5a6f8aef16 net/mlx5e: Cosmetics: use BIT() instead of "1 <<", and others
No logical change in this commit.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
Achiad Shochat
88a85f99e5 net/mlx5e: TX latency optimization to save DMA reads
A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
Achiad Shochat
58d522912a net/mlx5e: Support TX packet copy into WQE
AKA inline WQE.
A TX latency optimization to save data gather DMA reads.
Controlled by ETHTOOL_TX_COPYBREAK.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
Saeed Mahameed
311c7c71c9 net/mlx5e: Allocate DMA coherent memory on reader NUMA node
By affinity hints and XPS, each mlx5e channel is assigned a CPU
core.

Channel DMA coherent memory that is written by the NIC and read
by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU
core assigned for the channel.

Channel DMA coherent memory that is written by SW and read by the
NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC.

Doorbell record (written by SW and read by the NIC) is an
exception since it is accessed by SW more frequently.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:17 -07:00
Saeed Mahameed
2be6967cdb net/mlx5e: Support ETH_RSS_HASH_XOR
The ConnectX-4 HW implements inverted XOR8.
To make it act as XOR we re-order the HW RSS indirection table.

Set XOR to be the default RSS hash function and add ethtool API to
control it.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 00:29:16 -07:00
Ido Shamay
62e4c9b4fd net/mlx4_en: Remove BUG_ON assert when checking if ring is full
In mlx4_en_is_ring_empty we check if ring surpassed its size.
Since the prod and cons indicators are u32, there might be a state where
prod wrapped around and cons, making this assert false, although no
actual bug exists (other code segment can cope with this state).

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26 16:29:26 -07:00
Jack Morgenstein
9f5b031770 net/mlx4_core: Relieve cpu load average on the port sending flow
When a port is not attached, the FW requires a longer than usual time to
execute the SENSE_PORT command. In the command flow, the
wait_for_completion_timeout call used in mlx4_cmd_wait puts the kernel
thread into the uninterruptible state during this time. This, in turn,
due to the computation method, causes the CPU load average to increase.

Fix this by using wait_for_completion_interruptible_timeout() for the
SENSE_PORT command, which puts the thread in the interruptible state.
In this state, the thread does not contribute to the CPU load average.

Treat the interrupted case as if the SENSE_PORT command returned
port_type = NONE.

Fix suggested by Gideon Naim <gideonn@mellanox.com> and
Bart Van Assche <bart.vanassche@sandisk.com>.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26 16:29:25 -07:00
Jack Morgenstein
1c1bf34951 net/mlx4_core: Fix wrong index in propagating port change event to VFs
The port-change event processing in procedure mlx4_eq_int() uses "slave"
as the vf_oper array index. Since the value of "slave" is the PF function
index, the result is that the PF link state is used for deciding to
propagate the event for all the VFs. The VF link state should be used,
so the VF function index should be used here.

Fixes: 948e306d7d ('net/mlx4: Add VF link state support')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26 16:29:25 -07:00
Or Gerlitz
178d23e3cd net/mlx4_core: Use sink counter for the VF default as fallback
Some old PF drivers don't let VFs allocate counters, in that case, use
the sink counter so the VF can load and operate properly.

Fixes: 6de5f7f6a1 ('net/mlx4_core: Allocate default counter per port')
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-26 16:29:25 -07:00
Carol Soto
0beb44b065 net/mlx4_core: Add extra check for total vfs for SRIOV
Add extra check for total vfs for SRIOV to check if that value is
bigger than total vfs in pci SRIOV capabalities. Fix a check and
print of the number of maximum vfs that hw can handle. Fix a check
and print of the number of maximum vfs per port that driver can handle.

Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 15:19:54 -07:00
Eric Dumazet
0a6d424569 mlx4: TCP/UDP packets have L4 hash
Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.

ip_summed value is CHECKSUM_UNNECESSARY in this case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Ido Shamay <idos@mellanox.com>
Acked-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:45:22 -07:00
Or Gerlitz
7254acffee mlx4: Disable HA for SRIOV PF RoCE devices
When in HA mode, the driver exposes an IB (RoCE) device instance with only
one port. Under SRIOV, the existing implementation doesn't go well with
the PF RoCE driver's role of Special QPs Para-Virtualization, etc.

As such, disable HA for the mlx4 PF RoCE device in SRIOV mode.

Fixes: a575009030 ('IB/mlx4: Add port aggregation support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:29 -07:00
Ido Shamay
79a258526c net/mlx4_en: Fix wrong csum complete report when rxvlan offload is disabled
The check_csum() function relied on hwtstamp_rx_filter to know if rxvlan
offload is disabled. This is wrong since rxvlan offload can be switched
on/off regardless of hwtstamp_rx_filter.

Also moved check_csum to query CQE information to identify VLAN packets
and removed the check of IP packets, since it has been validated before.

Fixes: f8c6455bb0 ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:28 -07:00
Ido Shamay
488a9b48e3 net/mlx4_en: Wake TX queues only when there's enough room
Indication of a single completed packet, marked by txbbs_skipped
being bigger then zero, in not enough in order to wake up a
stopped TX queue. The completed packet may contain a single TXBB,
while next packet to be sent (after the wake up) may have multiple
TXBBs (LSO/TSO packets for example), causing overflow in queue followed
by WQE corruption and TX queue timeout.
Instead, wake the stopped queue only when there's enough room for the
worst case (maximum sized WQE) packet that we should need to handle after
the queue is opened again.

Also created an helper routine - mlx4_en_is_tx_ring_full, which checks
if the current TX ring is full or not. It provides better code readability
and removes code duplication.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:27 -07:00
Eran Ben Elisha
0eb08514fd net/mlx4_en: Release TX QP when destroying TX ring
TX ring QP wasn't released at mlx4_en_destroy_tx_ring. Instead, the code
used the deprecated base_tx_qpn field. Move TX QP release to
mlx4_en_destroy_tx_ring and remove the base_tx_qpn field.

Fixes: ddae0349fd ('net/mlx4: Change QP allocation scheme')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-25 02:06:26 -07:00
Linus Torvalds
e0456717e4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add TX fast path in mac80211, from Johannes Berg.

 2) Add TSO/GRO support to ibmveth, from Thomas Falcon

 3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

 4) Lots of new rhashtable tests, from Thomas Graf.

 5) Run ingress qdisc lockless, from Alexei Starovoitov.

 6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting.  From Eric Dumazet.

 7) Add mode parameter to pktgen, for testing receive.  From Alexei
    Starovoitov.

 8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

 9) Move page frag allocator under mm/, also from Alexander.

10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier.  From Jiri Pirko.

13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics.  From Willem de Bruijn.

14) Add netdev driver for GENEVE tunnels, from John W Linville.

15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

18) Add tail call support to BPF, from Alexei Starovoitov.

19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion.  From Eric Dumazet.

22) Add Cavium ThunderX driver, from Sunil Goutham.

23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation.  From Wei Liu.

26) Add more entropy inputs to flow dissector, from Tom Herbert.

27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
  bridge: vlan: flush the dynamically learned entries on port vlan delete
  bridge: multicast: add a comment to br_port_state_selection about blocking state
  net: inet_diag: export IPV6_V6ONLY sockopt
  stmmac: troubleshoot unexpected bits in des0 & des1
  net: ipv4 sysctl option to ignore routes when nexthop link is down
  net: track link-status of ipv4 nexthops
  net: switchdev: ignore unsupported bridge flags
  net: Cavium: Fix MAC address setting in shutdown state
  drivers: net: xgene: fix for ACPI support without ACPI
  ip: report the original address of ICMP messages
  net/mlx5e: Prefetch skb data on RX
  net/mlx5e: Pop cq outside mlx5e_get_cqe
  net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
  net/mlx5e: Remove extra spaces
  net/mlx5e: Avoid TX CQE generation if more xmit packets expected
  net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
  net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
  net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
  net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
  net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
  ...
2015-06-24 16:49:49 -07:00
David S. Miller
3a07bd6fea Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/main.c
	net/packet/af_packet.c

Both conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 02:58:51 -07:00
Saeed Mahameed
99611ba127 net/mlx5e: Prefetch skb data on RX
Prefetch the 1st cache line used by the buffer pointed by
the skb linear data.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:41 -07:00
Achiad Shochat
a1f5a1a87a net/mlx5e: Pop cq outside mlx5e_get_cqe
Separate between mlx5e_get_cqe() and mlx5_cqwq_pop(), this helps for
better code readability and better CQ buffer management.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:41 -07:00
Achiad Shochat
e33910548a net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
Use container_of() instead.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:39 -07:00
Achiad Shochat
8ca56ce39d net/mlx5e: Remove extra spaces
Coding Style fix, remove extra spaces.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:39 -07:00
Achiad Shochat
059ba072eb net/mlx5e: Avoid TX CQE generation if more xmit packets expected
In order to save PCI BW consumed by TX CQEs and to reduce the amount of
CPU cache misses caused by TX CQE reading, we request TX CQE generation
only when skb->xmit_more=0.

As a consequence of the above, a single TX CQE may now indicate the
transmission completion of multiple TX SKBs.

This also handles a problem introduced in commit b1b8105ebf41 "net/mlx5e:
Support NETIF_F_SG" where we didn't ask for NOP completions while the
driver didn't have the proper code to handle this case.

Fixes: b1b8105ebf41 ('net/mlx5e: Support NETIF_F_SG')
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:37 -07:00
Achiad Shochat
9fc5930625 net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
NOP completion SKBs are always NULL.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:37 -07:00
Achiad Shochat
ef583d037d net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
It is already assigned at mlx5e_build_rq_param()

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:36 -07:00
Saeed Mahameed
fb6c6f2529 net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
Instead of counting number of gso fragments, we can use
skb_shinfo(skb)->gso_segs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:35 -07:00
Saeed Mahameed
03289b88e3 net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
To save per-packet calculations, we use the following static mappings:
1) priv {channel, tc} to netdev txq (used @mlx5e_selec_queue())
2) netdev txq to priv sq (used @mlx5e_xmit())

Thanks to these static mappings, no more need for a separate implementation
of ndo_start_xmit when multiple TCs are configured.
We believe the performance improvement of such separation would be negligible, if any.
The previous way of dynamically calculating the above mappings required
allocating more TX queues than actually used (@alloc_etherdev_mqs()),
which is now no longer needed.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:34 -07:00
Eran Ben Elisha
f1a3badb0b net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
Under SRIOV, the port rx/tx bytes/packets statistics should by read
from the HW instead of using the PF netdevice SW accounting. This is
needed in order to get the full port statistics and not just the PF
own ones

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:33 -07:00
Eran Ben Elisha
9a2abf5a80 net/mlx4_en: Fix off-by-four in ethtool
NUM_ALL_STATS was not updated with the new four entries, instead
NUM_FLOW_STATS was updated, fix it. that caused off-by-four for all
counters below pf_*_*.

Fixes: b42de4d012 ('net/mlx4_en: Show PF own statistics via ethtool')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-24 00:42:32 -07:00
Linus Torvalds
f9d1b5a31a Changes for 4.2
- A large cleanup of how device capabilities are checked for various
   features
 - Additional cleanups in the MAD processing
 - Update to the srp driver
 - Creation and use of centralized log message helpers
 - Add const to a number of args to calls and clean up call chain
 - Add support for extended cq create verb
 - Add support for timestamps on cq completion
 - Add support for processing OPA MAD packets
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVeyzqAAoJELgmozMOVy/di3wP/jml4F9crvQn7UBJjGm/rgcI
 wzZ2GZTqxQE8dn+W6gQsdKOzy0Ibxx5UYGp9ruInuxAcVh9t1PcylanasaiGMEtY
 mrGRFjipJ9jYa+yDQDTQi8EFMClZuMSvtRLKjzYITudCXQck37V+F5YlP6VphjX7
 JeiM4a+4rD0ukk5PKGvUw51sP1eawKtEdUvnqcOEI2tJgQmzJBP4mXrhVtS/0wSc
 Pi8TRN5QKi3Drom/tK9QQ/ncoYngi4BKLfszCeU373HJq6qXqsxBYvs3jX6MPzfv
 Aooj272JxBgCYxkmEfECezDpmi3PbWDJjXj/xCLjfhjISDtHHHVLGVMODZpwUEsL
 2wBgwlzdajVopSbSLvsjQNtQw25s7sDWpu+TFKbS0u+W2d0ZOyipM1Xeje+OtDHQ
 clhwvDhgSfeI/bJ1YdtNLbvINrwsfZD213zD+WH21A/9weAVr3hEfTuSaNFiTiRn
 5yywP36TM0wH90KhiWoLrztcHvoE5p7kGuqzv04MRjrMMNHEJK2/IhWvT97Ewngu
 vWrZl7QRzXYcGspCOp2aJW9Wr2rhGRrv28TF+thpNrIJOB2JM4q4koCKZCcI0s2D
 E6pY2YQSzvrA/ZSfcWIg4yhugcycIJkOf7ur2N/U43cwGXtaCzPWVnKMApmdnVOO
 ZEMwD3OZ1OGcCHLhRL8Y
 =yISf
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:

 - a large cleanup of how device capabilities are checked for various
   features

 - additional cleanups in the MAD processing

 - update to the srp driver

 - creation and use of centralized log message helpers

 - add const to a number of args to calls and clean up call chain

 - add support for extended cq create verb

 - add support for timestamps on cq completion

 - add support for processing OPA MAD packets

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (92 commits)
  IB/mad: Add final OPA MAD processing
  IB/mad: Add partial Intel OPA MAD support
  IB/mad: Add partial Intel OPA MAD support
  IB/core: Add OPA MAD core capability flag
  IB/mad: Add support for additional MAD info to/from drivers
  IB/mad: Convert allocations from kmem_cache to kzalloc
  IB/core: Add ability for drivers to report an alternate MAD size.
  IB/mad: Support alternate Base Versions when creating MADs
  IB/mad: Create a generic helper for DR forwarding checks
  IB/mad: Create a generic helper for DR SMP Recv processing
  IB/mad: Create a generic helper for DR SMP Send processing
  IB/mad: Split IB SMI handling from MAD Recv handler
  IB/mad cleanup: Generalize processing of MAD data
  IB/mad cleanup: Clean up function params -- find_mad_agent
  IB/mlx4: Add support for CQ time-stamping
  IB/mlx4: Add mmap call to map the hardware clock
  IB/core: Pass hardware specific data in query_device
  IB/core: Add timestamp_mask and hca_core_clock to query_device
  IB/core: Extend ib_uverbs_create_cq
  IB/core: Add CQ creation time-stamping flag
  ...
2015-06-23 15:53:26 -07:00
Paul Gortmaker
138b15ed87 drivers/net: remove all references to obsolete Ethernet-HOWTO
This howto made sense in the 1990s when users had to manually configure
ISA cards with jumpers or vendor utilities, but with the implementation
of PCI it became increasingly less and less relevant, to the point where
it has been well over a decade since I last updated it.  And there is
no value in anyone else taking over updating it either.

However the references to it continue to spread as boiler plate text
from one Kconfig file into the next.  We are not doing end users any
favours by pointing them at this old document, so lets kill it with
fire, once and for all, to hopefully stop any further spread.

No code is changed in this commit, just Kconfig help text.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 06:50:35 -07:00
Eran Ben Elisha
62a890557f net/mlx4_en: Support ndo_get_vf_stats
Implement the ndo to gather VF statistics through the PF.

All counters related to this VF are stored in a per slave
list, run over the slave's list and collect all statistics.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:03 -07:00
Eran Ben Elisha
b42de4d012 net/mlx4_en: Show PF own statistics via ethtool
Allow the user to observe the PF own statistics using ethtool with pf_
prefixed counter names.

Those counters are the PF statistics out of the overall port statistics.
Every PF QP is attached to a counter and the summary of those counters
is the PF statistics.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
Eran Ben Elisha
9616982f3f net/mlx4_core: Add helper to query counters
This is an infrastructure step for querying VF and PF counters.

This code was in the IB driver, move it to the mlx4 core driver
so it will be accessible for more use cases.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
Eran Ben Elisha
6de5f7f6a1 net/mlx4_core: Allocate default counter per port
Default counter per port will be allocated at the mlx4 core driver load.

Every QP opened by the Ethernet driver will be attached to the port's default
counter.  This is an infrastructure step to collect VF statistics from the PF.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
Eran Ben Elisha
68230242cd net/mlx4_core: Add port attribute when tracking counters
Counter will get its port attribute within the resource tracker when
the first QP attached to it is modified to RTR. If a QP is counter-less,
an attempt to create a new counter with assigned port will be made.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:02 -07:00
Eran Ben Elisha
9de92c60be net/mlx4_core: Adjust counter grant policy in the resource tracker
Each physical function has a guarantee of two counters per port, one
for a default counter and one for the IB driver.

Each virtual function has a guarantee of one counter per port.
All other counters are free and can be obtained on demand.

This is a preparation step for supporting a get_vf_stats ndo call,
so we can promise a counter for every VF in order to collect their
statistics from the PF context.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:01 -07:00
Eran Ben Elisha
2632d18d3a net/mlx4_core: Remove counters table allocation from VF flow
Since virtual functions get their counters indices allocation from the PF,
allocate counters indices bitmap only in case the function isn't virtual.

Also, check that the device has counters to allocate before creating the
indices bitmap table.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:01 -07:00
Eran Ben Elisha
47d8417f59 net/mlx4_core: Add sink counter
Reserve the last valid counter index for "sink" counter, when a
new counter cannot be allocated, the driver will use this counter.

In order to avoid allocating this counter on any other flow, fix the
indices bitmap allocation range, and reserve the sink counter index.

Add macro for the sink counter index and replace all appearences of the
index with the macro.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:01 -07:00
Eran Ben Elisha
b72ca7e96a net/mlx4_core: Reset counters data when freed
Add resetting the counter data to the free counter flow, so the counter's
data won't be accessible anymore if querying the counter. Also, on next
counter allocation (to another VM for example), it will be fresh and clear.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:01 -07:00
Eran Ben Elisha
efa6bc91cb net/mlx4_core: Check before cleaning counters bitmap
If counters are not supported by the device. The indices bitmap table is not
allocated during initialization. Add the symmetrical check before cleaning
the counters bitmap table or freeing a counter.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 17:23:01 -07:00
Or Gerlitz
ac0a72a3e6 net/mlx4_core: Disable Granular QoS per VF under IB/Eth VPI configuration
Due to firmware bug, under VPI configuration when port1 = IB and
port2 = Eth, Granular QoS per VF isn't working properly. More over,
the whole QP0/QP1 Para-Virtualization in the mlx4 IB driver is
broken on that config.

Hence, we must disable Granular QoS per VF under that configuration
till a fix is introduced. Once that happens, a new device capability
will be used to mark the feature support on that specific configuration.

Reported-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:42:57 -07:00
Matan Barak
52033cfb5a IB/mlx4: Add mmap call to map the hardware clock
In order to read the HCA's cycle counter efficiently in
user space, we need to map the HCA's register.
This is done through mmap call.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-12 14:49:10 -04:00
Achiad Shochat
3191e05fea net/mlx5e: Add transport domain to the ethernet TIRs/TISs
Allocate and use transport domain by the Ethernet driver code.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:26 -07:00
Achiad Shochat
56508b5013 net/mlx5_core: Add transport domain alloc/dealloc support
Each transport object, namely TIR and TIS, must have a transport domain
number (TDN) identifier.

The driver wrongly assumed that it is OK to use TDN=0 without explicit
TDN allocation from the device.

The TDN will also be used for isolating different processes once user
mode Ethernet will be supported.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:26 -07:00
Saeed Mahameed
12be4b2190 net/mlx5e: Support NETIF_F_SG
When NETIF_F_SG is set, each send WQE may have a different size since
each skb can have different number of fragments as of LSO header etc.

This implies that a given WQE may wrap around the send queue, i.e begin
at its end and continue at its start. While it is legal by the device spec,
we preferred a solution that avoids it - when building of current WQE is
done, if the next WQE may wrap around the send queue, fill the send queue
with NOPs WQEs till its end, so that the next WQE will begin at send queue
start.

NOP WQE for itself cannot wrap around the send queue since it is of
minimal size - 64 bytes, and all send WQEs are a multiple of that size.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Gal Pressman
796a27ec2d net/mlx5e: Enforce max flow-tables level >= 3
The Ethernet driver requires at least 3 flow table levels to
operate, enforce that.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Saeed Mahameed
cd58c714ac net/mlx5e: Disable client vlan TX acceleration
We need to resolve a HW configuration issue for enabling HW CVLAN
insertion. Meanwhile, no need to implement the VLAN insertion in
the driver, rather use the generic kernel VLAN insertion method.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Saeed Mahameed
fc11fbf9a7 net/mlx5e: Add HW cacheline start padding
Enable HW cacheline start padding and align RX WQE size to cacheline
while considering HW start padding. Also, fix dma_unmap call to use
the correct SKB data buffer size.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Saeed Mahameed
facc9699f0 net/mlx5e: Fix HW MTU settings
Previously we configured HW MTU to be netdev->mtu, actually we
need to configure netdev->mtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN).

Also, query MTU can not fail, hence make the relevant helper a
void functionm, add mlx5e_set_dev_port_mtu, helper function to
handle MTU setting.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:55:25 -07:00
Dan Carpenter
7ec0bb227a net/mlx5_core: fix an error code
We return success if mlx5e_alloc_sq_db() fails but we should return an
error code.

Fixes: f62b8bb8f2 ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:44:41 -07:00
Fabian Frederick
e55898a7ed net/mlx4_core: use swap() in mlx4_make_profile()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:19:41 -07:00
Fabian Frederick
2df1cafa67 net/mlx4: use swap() in mlx4_init_qp_table()
Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-11 15:19:41 -07:00
Majd Dibbiny
7cf7fa529d net/mlx5_core: Fix static checker warnings around system guid query flow
Fix static checker warnings in the flow of system guid query.

Fixes: 707c4602cd ('net/mlx5_core: Add new query HCA vport commands')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-07 20:11:17 -07:00
Haggai Abramonvsky
4aa17b2879 mlx5: Enable mutual support for IB and Ethernet
Ethernet functionality is only available when working in ISSI > 0 mode.

Previously, the IB driver wasn't ready to work on that mode, and hence
building both the IB driver and the Ethernet functionality in the core
driver were disallowed by Kconfigs.

Now, once we have all the pre-steps in place, we can remove this limitation.

The last steps in the IB driver for getting that setup to work are:
create dummy SRQ for the driver's use (until now we could use XRC_SRQ
as SRQ and XRC_SRQ, after moving to ISSI > 0, we separate XRC SRQs from
basic SRQs) and adapt the create QP function to be compatible with ISSI > 0.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:02 -07:00
Majd Dibbiny
a124d13ef5 net/mlx5_core: Add more query port helpers
Add the following helpers:

1. mlx5_query_port_proto_oper -- queries the port speed port mask
2. mlx5_query_port_link_width_oper - queries the port link with bitmask
3. mlx5_query_port_vl_hw_cap - queries the Virtual Lanes supported on this port

These helpers will be used from the IB driver when working in ISSI > 0 mode.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:02 -07:00
Majd Dibbiny
a05bdefa40 net/mlx5_core: Use port number when querying port ptys
Until now, mlx5_query_port_ptys always queried port number one.

Added new argument in the function's prototype so we can also query
the second port. This will be needed  when thr helper will be invoked
from the IB driver on non FPP (Function-Per-Port) devices.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Majd Dibbiny
e760152d08 net/mlx5_core: Use port number in the query port mtu helpers
Extend the function prototypes for max and operational mtu to take the
local port number. In the Ethernet driver is this hard coded to one,
since ConnectX4 Ethernet devices are always function-per-port.
The IB driver also serves older devices (ConnectIB) which isn't such,
and hence the part can vary.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Majd Dibbiny
211e6c80e5 net/mlx5_core: Get vendor-id using the query adapter command
Add two wrapper functions to the query adapter command:

1. mlx5_query_board_id -- replaces the old mlx5_cmd_query_adapter.

2. mlx5_core_query_vendor_id -- retrieves the vendor_id from the
   query_adapter command.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Majd Dibbiny
707c4602cd net/mlx5_core: Add new query HCA vport commands
Added the implementation for the following commands:

1. QUERY_HCA_VPORT_GID
2. QUERY_HCA_VPORT_PKEY
3. QUERY_HCA_VPORT_CONTEXT

They will be needed when we move to work with ISSI > 0 in the IB driver too.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Majd Dibbiny
d18a9470f8 net/mlx5_core: Make the vport helpers available for the IB driver too
Move the vport header file to be under include/linux/mlx5, such that
the mlx5 IB can use it as well.

Also add nic_ prefix to the vport NIC commands to differeniate between
HCA vport commands and NIC vport commands.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Haggai Abramonvsky
e74a1db033 net/mlx5_core: Check the return bitmask when querying ISSI
The determination of the supported ISSI versions should be conditioned
on the returned mask, and not only on the return status of the query
ISSI command, fix that.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Haggai Abramonvsky
01949d0109 net/mlx5_core: Enable XRCs and SRQs when using ISSI > 0
When working in ISSI > 0 mode, the model exposed by the device for
XRCs and SRQs is different. XRCs use XRC SRQs and plain SRQs are based
on RPM (Receive Memory Pool).

Add helper functions to create, modify, query, and arm XRC SRQs and RMPs.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:01 -07:00
Haggai Abramonvsky
7db22ffb5b net/mlx5_core: Apply proper name convention to helpers
Some core helper functions were named with mlx5_ only prefix, fix that to
mlx5_core_ so we're aligned with the overall scheme used for core services.

Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:00 -07:00
Amir Vadai
5e24851ec5 net/mlx5_en: Add missing check for memory allocation failure
The patch afb736e933: "net/mlx5: Ethernet resource handling files"
from May 28, 2015, leads to the following static checker warning:

drivers/net/ethernet/mellanox/mlx5/core/en_flow_table.c:726 mlx5e_create_main_flow_table()
error: potential null dereference 'g'.  (kcalloc returns null)

Fixes: afb736e933 ("net/mlx5: Ethernet resource handling files")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 16:41:00 -07:00
Carol Soto
613d8c188f net/mlx4_core: fix typo in mlx4_set_vf_mac
fix typo in mlx4_set_vf_mac

Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:12:58 -07:00
Carol Soto
ed3d2276ef net/mlx4_core: need to call close fw if alloc icm is called twice
If mlx4_enable_sriov is called by adapter without this
feature MLX4_DEV_CAP_FLAG2_SYS_EQS then during this path the function alloc
icm is called twice without freeing the structures from the first time.

Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:12:58 -07:00
Carol L Soto
5114a04e6c net/mlx4_core: double free of dev_vfs
If user loads mlx4_core with num_vfs greater than
supported then variable dev->dev_vfs is freed 2 times after unloading the
driver.

Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 20:12:58 -07:00
Or Gerlitz
db9777e376 net/mlx4_core: Fix build failure introduced by the EQ pool changes
When CONFIG_RFS_ACCEL or SMP aren't set, we fail to build, fix it.

Also, avoid build warning as of unused function on that setup.

Fixes: c66fa19c40 ('net/mlx4: Add EQ pool')
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-03 19:34:51 -07:00
Ira Weiny
a97e2d86a9 IB/core cleanup: Add const on args - device->process_mad
The process_mad device function declares some parameters as "in".  Make those
parameters const and adjust the call tree under process_mad in the various
drivers accordingly.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-06-02 09:33:13 -04:00
David S. Miller
dda922c831 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/amd-xgbe-phy.c
	drivers/net/wireless/iwlwifi/Kconfig
	include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-01 22:51:30 -07:00
Matan Barak
6d90aa5cf1 net/mlx4_core: Make sure there are no pending async events when freeing CQ
When freeing a CQ, we need to make sure there are no
asynchronous events (on the ASYNC EQ) that could
relate to this CQ before freeing it.

This is done by introducing synchronize_irq.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:35:34 -07:00
Ido Shamay
de1618034a net/mlx4_core: Move affinity hints to mlx4_core ownership
Now that EQs management is in the sole responsibility of mlx4_core,
the IRQ affinity hints configuration should be in its hands as well.
request_irq is called only once by the first consumer (maybe mlx4_ib),
so mlx4_en passes the affinity mask too late. We also need to request
vectors according to the cores we want to run on.

mlx4_core distribution of IRQs to cores is straight forward,
EQ(i)->IRQ will set affinity hint to core i.
Consumers need to request EQ vectors, according to their cores
considerations (NUMA).

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:35:34 -07:00
Matan Barak
c66fa19c40 net/mlx4: Add EQ pool
Previously, mlx4_en allocated EQs and used them exclusively.
This affected RoCE performance, as applications which are
events sensitive were limited to use only the legacy EQs.

Change that by introducing an EQ pool. This pool is managed
by mlx4_core. EQs are assigned to ports (when there are limited
number of EQs, multiple ports could be assigned to the same EQs).

An exception to this rule is the ASYNC EQ which handles various events.

Legacy EQs are completely removed as all EQs could be shared.

When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
EQ serving on a specific port. The core driver calculates which
EQ should be assigned to that request.

Because IRQs are shared between IB and Ethernet modules, their
names only include the PCI device BDF address.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:35:34 -07:00
Matan Barak
48564135cb net/mlx4_core: Demote simple multicast and broadcast flow steering rules
In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are
created, always create them at MLX4_DOMAIN_NIC priority (instead of
the real priority the function created them at). This is done in order
to let multiple functions add broadcast/multicast rules without
affecting other functions, which is necessary for DPDK in SRIOV.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 23:35:34 -07:00
Amir Vadai
f62b8bb8f2 net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality
This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4
Single/Dual-Port Adapter supporting 100Gb/s with VPI.  The driver
extends the existing mlx5 driver with Ethernet functionality.

This patch contains the driver entry points but does not include
transmit and receive (see the previous patch in the series) routines.

It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the
Ethernet functionality. Currently, Kconfig is programmed to make
Ethernet and Infiniband functionality mutally exclusive.
Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of
selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND
being selected.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:51 -07:00
Amir Vadai
afb736e933 net/mlx5: Ethernet resource handling files
This patch contains the resource handling files:
- flow_table.c: This file contains the code to handle the low level API
		to configure hardware flow table. It is separated from
		the flow_table_en.c, because it will be used in the
		future by Raw Ethernet QP in mlx5_ib too.
- en_flow_table.[ch]: Ethernet flow steering handling. The flow table
		object contain a mapping between flow specs and TIRs.
		This mechanism will be used also to configure e-switch
		in the future, when SR-IOV support will be added.
- transobj.[ch] - Low level functions to create/modify/destroy the
                  transport objects: RQ/SQ/TIR/TIS
- vport.[ch] - Handle attributes of a virtual port (vPort) in the
  embedded switch. Currently this switch is a passthrough, until SR-IOV
  support will be added.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:39 -07:00
Amir Vadai
e586b3b0ba net/mlx5: Ethernet Datapath files
en_[rt]x.c contains the data path related code specific to tx or rx.
en_txrx.c contains data path code which is common for both the rx and
tx, this is mainly napi related code.

Below are the objects that are being used by the hardware and the driver
in the data path:

Channel - one channel per IRQ. Every channel object contains:
  RQ  - describes the rx queue
  TIR - One TIR (Transport Interface Receive) object per flow type. TIR
        contains attributes for a type of rx flow (e.g IPv4, IPv6 etc).
        A flow is defined in the Flow Table.
        Currently TIR describes the RSS hash parameters if exists and LRO
        attributes.
  SQ  - describes the a tx queue. There is one SQ (Send Queue) per
        TC (traffic class).
  TIS - There is one TIS (Transport Interface Send) per TC.  It
        describes the TC and may later be extended to describe more
	transport properties.

Both RQ and SQ inherit from the object WQ (work queue). This common code
to describe the layout of CQE's WQE's in memory is in the files wq.[cj]

For every channel there is one NAPI context that is used for RX and
for TX.

Driver is using netdev_alloc_skb() to allocate skb's.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:20 -07:00
Saeed Mahameed
e725440e75 net/mlx5_core: Set/Query port MTU commands
Introduce set/Query low level functions to access MTU in hardware. To be
used by the netdev.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:24:08 -07:00
Rana Shahout
90b3e38d04 net/mlx5_core: Modify CQ moderation parameters
Introduce mlx5_core_modify_cq_moderation() to be used by the netdev, to
set hardware coalescing.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:59 -07:00
Rana Shahout
4c916a7980 net/mlx5_core: Implement get/set port status
Implemet get/set port status low level functions to be exposed by the
netdev.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:46 -07:00
Saeed Mahameed
adb0c9545b net/mlx5_core: Implement access functions of ptys register fields
Those registers will be used by the ethtool to set/get settings.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:31 -07:00
Saeed Mahameed
938fe83c8d net/mlx5_core: New device capabilities handling
- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
  generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
  cap).
- Modify IB driver to use new macros for checking caps.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:22 -07:00
Saeed Mahameed
e281682bf2 net/mlx5_core: HW data structs/types definitions cleanup
mlx5_ifc.h was heavily modified here since it is now generated by a
script from the device specification (PRM rev 0.25). This specification
is backward compatible to existing hardware.

Some structures/fields were added here in order to enable the Ethernet
functionality of the driver.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:23:11 -07:00
Saeed Mahameed
db058a186f net/mlx5_core: Set irq affinity hints
Preparation for upcoming ethernet driver.
- Move msix array from eq_table struct to priv since its not related to
  eq_table
- Intorduce irq_info struct to hold all irq information
- Move name from mlx5_eq to irq_info struct since it is irq property.
- Set IRQ affinity hints

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:22:48 -07:00
Amir Vadai
64ffaa2159 net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memory
As David Daney pointed in mlx4_core driver [1], mlx5_core is also
misusing the DMA-API.

This patch is removing the code that vmap() memory allocated by
dma_alloc_coherent().

After this patch, users of this drivers might fail allocating resources
on memory fragmeneted systems.  This will be fixed later on.

[1] - https://patchwork.ozlabs.org/patch/458531/

CC: David Daney <david.daney@cavium.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30 18:22:37 -07:00
Linus Torvalds
6e49ba1bb1 ** NOW WITH TESTING! **
Two fixes which got lost in my recent distraction.  One is a weird
 cpumask function which needed to be rewritten, the other is a module
 bug which is cc:stable.
 
 Thanks,
 Rusty.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVaBENAAoJENkgDmzRrbjxxL4QAJMFwo21VN8rwIsEJ2P/Yh4u
 YXxJtnbrSPZtyad8J4G6FGOOfM7ImkkADhGJE8MN05goIFmeORWduAiozBtZBfo3
 OVpeo0HIGTEMXq/QCxSQsDhP9MSeWV592vjhlqQJ2KhU9Gpstc/Ub9ArVWuY3FD3
 CFN6ciw+5DIhoc6jMI2P9XX7jpR4VOBu320j+3lQ1QZ1aEZIaPefWH+VYuIZXirq
 E6N4yKgTahKb1Clr0DS6EB2Z5g+upNzFf4WBHaChP5EklwatZkHAOvzfSLWcbShI
 ochGV5LBPcn7ruqOD5mR4LGkxfQSYPCKCKihmenD/EVoO/dshKOQREfsqRXNsh5X
 xk4yx/VCy68ubIjx7FIDL18qDvJrX82+Z2bYZbENvKrVinaQ7MWB+CokK0fNW0ai
 ZMP5s32vSUZMMIIE7+fS4n3BLUxOpLZC8S0wIac19jNKzCHVTuhnUolCHk11zQLk
 IIDHEJwzvWtPjKOyUyd7HG0bYeczwf8DZgHg+xom9BNbHbK3Jk5d1Sibjgf8eGg+
 O36XR8FYYvqHwqqrPKSSaWoLj578/IWyHZg/V4tQ2HWi189BVHk6Iw2knftsvvPw
 pBu2AdbRSLLD+X/pwrdmm+xgytjUIr1X/Qnwj/eE5MvB/vaVVwV0OjapU/Z6S+dL
 JrZGvbWcviyjpvGD+vG1
 =wuP+
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull fixes for cpumask and modules from Rusty Russell:
 "** NOW WITH TESTING! **

  Two fixes which got lost in my recent distraction.  One is a weird
  cpumask function which needed to be rewritten, the other is a module
  bug which is cc:stable"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
  cpumask_set_cpu_local_first => cpumask_local_spread, lament
  module: Call module notifier on failure after complete_formation()
2015-05-29 11:24:28 -07:00
Rusty Russell
f36963c9d3 cpumask_set_cpu_local_first => cpumask_local_spread, lament
da91309e0a (cpumask: Utility function to set n'th cpu...) created a
genuinely weird function.  I never saw it before, it went through DaveM.
(He only does this to make us other maintainers feel better about our own
mistakes.)

cpumask_set_cpu_local_first's purpose is say "I need to spread things
across N online cpus, choose the ones on this numa node first"; you call
it in a loop.

It can fail.  One of the two callers ignores this, the other aborts and
fails the device open.

It can fail in two ways: allocating the off-stack cpumask, or through a
convoluted codepath which AFAICT can only occur if cpu_online_mask
changes.  Which shouldn't happen, because if cpu_online_mask can change
while you call this, it could return a now-offline cpu anyway.

It contains a nonsensical test "!cpumask_of_node(numa_node)".  This was
drawn to my attention by Geert, who said this causes a warning on Sparc.
It sets a single bit in a cpumask instead of returning a cpu number,
because that's what the callers want.

It could be made more efficient by passing the previous cpu rather than
an index, but that would be more invasive to the callers.

Fixes: da91309e0a
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased)
Tested-by: Amir Vadai <amirv@mellanox.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
2015-05-28 11:05:20 +09:30
Benjamin Poirier
f4ecf29fd7 mlx4_core: Fix fallback from MSI-X to INTx
The test in mlx4_load_one() to remove MLX4_FLAG_MSI_X expects mlx4_NOP() to
fail with -EBUSY. It is also necessary to avoid the reset since the device
is not fully reinitialized before calling mlx4_start_hca() a second time.

Note that this will also affect mlx4_test_interrupts(), the only other user
of MLX4_CMD_NOP.

Fixes: f5aef5a ("net/mlx4_core: Activate reset flow upon fatal command cases")
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27 13:08:03 -04:00
Or Gerlitz
be9b9eca25 net/mlx4_core: Enable single ported IB VFs
Remove the limitation that disallows configuring single ported VFs
in the presence of IB ports, after addressing the issues that
prevented that to work.

SMI (QP0) requests/responses are still not supported for single
ported IB VFs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-24 23:05:10 -04:00
Or Gerlitz
e5dfbf9a79 net/mlx4_core: Adjust the schedule queue port in reset-to-init too
It's legal for drivers to provide the QP port through the
QPC schedule-queue field on the reset-to-init QP state change.

Add adjusting of the schedule queue port in the SRIOV wrapper
for that operation too.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-24 23:05:10 -04:00
Or Gerlitz
f40e99e927 net/mlx4_core: Adjust the schedule queue port for single ported IB VFs
Some VF drivers flow set the schedule queue in the QP context but
without setting none of OPTPAR_SCHED_QUEUE or OPTPAR_PRIMARY_ADDR_PATH.

To allow for such non-modified drivers to function as single ported
IB VFs, we must adjust the schedule queue port whenever being set,
e.g as currently done for single ported Eth VFs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-24 23:05:09 -04:00
Or Gerlitz
74d4943fbb net/mlx4_core: Modify port values when generting EQEs for VFs
As part of enabling single ported VFs over IB ports we need to handle
some of the flows for generting EQ events for VFs which don't come
into play under Eth ports.

This mainly includes port management events derived from changes of the
phyiscal port (lid change, client re-register, down/up, etc), VF pkey table
changes and VF guid changes initiated by the IB driver.

(1) make sure that events are generated only for VFs sitting on
    the relevant physical port (under the ALL_SLAVES flow).

(2) before generating the event, convert from physical (one or two)
    to VF port (always equals one).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-24 23:05:09 -04:00
Or Gerlitz
7c35ef4525 net/mlx4_core: Enhance the MAD_IFC wrapper to convert VF port to physical
Single port VFs always provide port = 1 (even if the actual physical
port used is port 2). As such, we need to convert the port provided
by the VF to the physical port before calling into the firmware.

It turns out that the Linux mlx4 VF RoCE driver maintains a copy of
the GID table and hence this change became critical only for single
ported IB VFs, but it could be needed for other RoCE VF drivers too.

Fixes: 449fc48866 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-24 23:05:09 -04:00
Bjorn Helgaas
c1c52db16e net/mlx4: Avoid 'may be used uninitialized' warnings
With a cross-compiler based on gcc-4.9, I see warnings like the following:

  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_SW2HW_CQ_wrapper':
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3048:10: error: 'cq' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    cq->mtt = mtt;

I think the warning is spurious because we only use cq when
cq_res_start_move_to() returns zero, and it always initializes *cq in that
case.  The srq case is similar.  But maybe gcc isn't smart enough to figure
that out.

Initialize cq and srq explicitly to avoid the warnings.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14 22:28:48 -04:00
Yishai Hadas
2d3c739739 net/mlx4_core: Work properly with EQ numbers > 256 in SRIOV
The Firmware uses dynamic EQs allocation based on number of VFs and
max EQs that can be allocated. As a result, VF can have EQ numbers
that are larger than 256.

According to the firmware spec, the max value is limited to be 1024
(10 bits), adapt the relevant code accordingly. This bug was impossible
to hit prior to commit 7ae0e400cd ("net/mlx4_core: Flexible (asymmetric)
allocation of EQs and MSI-X vectors for PF/VFs") which actually enables
large number of EQs for VFs.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 19:39:12 -04:00
Eran Ben Elisha
cae0633872 net/mlx4_en: Fix off-by-one in counters manipulation
This caused the en_stats_adder helper to accumulate a field which is
not related to the counter, fix that.

Fixes: a3333b35da ('net/mlx4_en: Moderate ethtool callback to show [..]')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-05 19:39:12 -04:00
Ido Shamay
07841f9d94 net/mlx4_en: Schedule napi when RX buffers allocation fails
When system is out of memory, refilling of RX buffers fails while
the driver continue to pass the received packets to the kernel stack.
At some point, when all RX buffers deplete, driver may fall into a
sleep, and not recover when memory for new RX buffers is once again
availible. This is because hardware does not have valid descriptors,
so no interrupt will be generated for the driver to return to work
in napi context. Fix it by schedule the napi poll function from
stats_task delayed workqueue, as long as the allocations fail.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30 16:47:50 -04:00
David Ahern
17d5ceb6e4 net/mlx4_core: Fix unaligned accesses
Addresses the following kernel logs seen during boot:

Kernel unaligned access at TPC[100ee150] mlx4_QUERY_HCA+0x80/0x248 [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]
Kernel unaligned access at TPC[100f071c] mlx4_QUERY_ADAPTER+0x100/0x12c [mlx4_core]

Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30 16:26:30 -04:00
Benjamin Poirier
f94813f3c1 mlx4_en: Use correct loop cursor in error path.
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Fixes: 9e311e7 ("net/mlx4_en: Use affinity hint")
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-30 16:25:14 -04:00
Benjamin Poirier
42eab005a5 mlx4: Fix tx ring affinity_mask creation
By default, the number of tx queues is limited by the number of online cpus
in mlx4_en_get_profile(). However, this limit no longer holds after the
ethtool .set_channels method has been called. In that situation, the driver
may access invalid bits of certain cpumask variables when queue_index >=
nr_cpu_ids.

Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Acked-by: Ido Shamay <idos@mellanox.com>
Fixes: d03a68f ("net/mlx4_en: Configure the XPS queue mapping on driver load")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-29 15:16:57 -04:00
Linus Torvalds
2decb2682f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) mlx4 doesn't check fully for supported valid RSS hash function, fix
    from Amir Vadai

 2) Off by one in ibmveth_change_mtu(), from David Gibson

 3) Prevent altera chip from reporting false error interrupts in some
    circumstances, from Chee Nouk Phoon

 4) Get rid of that stupid endless loop trying to allocate a FIN packet
    in TCP, and in the process kill deadlocks.  From Eric Dumazet

 5) Fix get_rps_cpus() crash due to wrong invalid-cpu value, also from
    Eric Dumazet

 6) Fix two bugs in async rhashtable resizing, from Thomas Graf

 7) Fix topology server listener socket namespace bug in TIPC, from Ying
    Xue

 8) Add some missing HAS_DMA kconfig dependencies, from Geert
    Uytterhoeven

 9) bgmac driver intends to force re-polling but does so by returning
    the wrong value from it's ->poll() handler.  Fix from Rafał Miłecki

10) When the creater of an rhashtable configures a max size for it,
    don't bark in the logs and drop insertions when that is exceeded.
    Fix from Johannes Berg

11) Recover from out of order packets in ppp mppe properly, from Sylvain
    Rochet

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  bnx2x: really disable TPA if 'disable_tpa' option is set
  net:treewide: Fix typo in drivers/net
  net/mlx4_en: Prevent setting invalid RSS hash function
  mdio-mux-gpio: use new gpiod_get_array and gpiod_put_array functions
  netfilter; Add some missing default cases to switch statements in nft_reject.
  ppp: mppe: discard late packet in stateless mode
  ppp: mppe: sanity error path rework
  net/bonding: Make DRV macros private
  net: rfs: fix crash in get_rps_cpus()
  altera tse: add support for fixed-links.
  pxa168: fix double deallocation of managed resources
  net: fix crash in build_skb()
  net: eth: altera: Resolve false errors from MSGDMA to TSE
  ehea: Fix memory hook reference counting crashes
  net/tg3: Release IRQs on permanent error
  net: mdio-gpio: support access that may sleep
  inet: fix possible panic in reqsk_queue_unlink()
  rhashtable: don't attempt to grow when at max_size
  bgmac: fix requests for extra polling calls from NAPI
  tcp: avoid looping in tcp_send_fin()
  ...
2015-04-27 14:05:19 -07:00
Amir Vadai
b37069090b net/mlx4_en: Prevent setting invalid RSS hash function
mlx4_en_check_rxfh_func() was checking for hardware support before
setting a known RSS hash function, but didn't do any check before
setting unknown RSS hash function. Need to make it fail on such values.
In this occasion, moved the actual setting of the new value from the
check function into mlx4_en_set_rxfh().

Fixes: 947cbb0 ("net/mlx4_en: Support for configurable RSS hash function")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-27 13:36:48 -04:00
Linus Torvalds
7c034dfd58 InfiniBand/RDMA updates for 4.1:
- IPoIB fixes from Doug Ledford and Erez Shitrit
  - iSER updates from Sagi Grimberg
  - mlx4 GUID handling changes from Yishai Hadas
  - other misc fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJVN9SzAAoJEENa44ZhAt0hWq4QAJRFrwoe9ubTextSHeTU0FkY
 CydiQtGWrhyAHTX/KtdB1Uv9FzGHc6gqkAOXImouacYTM9ffypMF6Oj4xIYIMQtz
 MvNlNm07KOtQYlubiaZWcP5BjdLfMZjQxb03/9smygLTBjm80dAEt5X1znx7YrqI
 ZfE+ibPdvRqVEvFZKfT2U0kGU6oEVKrbJEiUCoJPwwcghDZQl18YmGOxt5qdI2uO
 V+71ozwozT8utSIl7S2YTJZBdkJ7tLrqrX2D/D2jUAmh1rqHIDrsXXiZ44UJj82i
 oXuwqmHXfq1LfuC9kxCX5JJpGeLE7E3OoxM1zIev31710zPA0v57rNKKweCi2Tj6
 Z36B0SIRV4ipWr/sBhVDr1Ffc/uap3DOIEU9Z+t8rwhELCEVuxmNaNb0K1e5nPiy
 YOQYp/ctC0NslM4mqQJLhGMVl6H8PjodbM1whnYZLsF1+8clNvdtLYzy/cA5fGbO
 tngUGXu0YZGdwvfuQhi5FB45XLaErJaPcMH0QRI5G0JgtjvbzXiMlqWtekTUBi7W
 DJNQlVRI4S1RYRBYkq709ymXiWwTeh3rhH+ZJpM+aY8b0NR/lx+dNyesNG+7GBJH
 y5UOOUck0w+JbQzZo264I6a5e8pXq3kMi3BH8pF4Jbo5WvxSF6uriXb6Q1JzfH20
 Jn0J6W9ghCSfrhMI1zgQ
 =v1jB
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA updates from Roland Dreier:

 - IPoIB fixes from Doug Ledford and Erez Shitrit

 - iSER updates from Sagi Grimberg

 - mlx4 GUID handling changes from Yishai Hadas

 - other misc fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (51 commits)
  mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures
  IB/iser: Rewrite bounce buffer code path
  IB/iser: Bump version to 1.6
  IB/iser: Remove code duplication for a single DMA entry
  IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr
  IB/iser: Modify struct iser_mem_reg members
  IB/iser: Make fastreg pool cache friendly
  IB/iser: Move PI context alloc/free to routines
  IB/iser: Move fastreg descriptor pool get/put to helper functions
  IB/iser: Merge build page-vec into register page-vec
  IB/iser: Get rid of struct iser_rdma_regd
  IB/iser: Remove redundant assignments in iser_reg_page_vec
  IB/iser: Move memory reg/dereg routines to iser_memory.c
  IB/iser: Don't pass ib_device to fall_to_bounce_buff routine
  IB/iser: Remove a redundant struct iser_data_buf
  IB/iser: Remove redundant cmd_data_len calculation
  IB/iser: Fix wrong calculation of protection buffer length
  IB/iser: Handle fastreg/local_inv completion errors
  IB/iser: Fix unload during ep_poll wrong dereference
  ib_srpt: convert printk's to pr_* functions
  ...
2015-04-22 11:50:05 -07:00
Eran Ben Elisha
fab9adfb71 net/mlx4_core: Fix reading HCA max message size in mlx4_QUERY_DEV_CAP
Currently we parse max_msg_sz from the wrong offset in QUERY_DEV_CAP,
fix to use the right offset.

Fixes: 0b131561a7 ('net/mlx4_en: Add Flow control statistics [..]')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-21 17:36:08 -04:00
Doug Ledford
c1c2fef6cf Merge branches 'cve-fixup', 'ipoib', 'iser', 'misc-4.1', 'or-mlx4' and 'srp' into for-4.1 2015-04-15 16:24:49 -04:00
Honggang LI
59d2d18cc4 mlx5: wrong page mask if CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for 32Bit architectures
If CONFIG_ARCH_DMA_ADDR_T_64BIT enabled for x86 systems and physical
memory is more than 4GB, dma_map_page may return a valid memory
address which greater than 0xffffffff. As a result, the mlx5 device page
allocator RB tree will be initialized with valid addresses greater than
0xfffffff.

However, (addr & PAGE_MASK) set the high four bytes to zeros. So, it's
impossible for the function, free_4k, to release the pages whose
addresses greater than 4GB. Memory leaks. And mlx5_ib module can't
release the pages when user try to remove the module, as a result,
system hang.

[root@rdma05 root]# dmesg  | grep addr | head
addr             = 3fe384000
addr & PAGE_MASK =  fe384000
[root@rdma05 root]# rmmod mlx5_ib   <---- hang on

---------------------- cosnole log -----------------
mlx5_ib 0000:04:00.0: irq 138 for MSI/MSI-X
  alloc irq_desc for 139 on node -1
  alloc kstat_irqs on node -1
mlx5_ib 0000:04:00.0: irq 139 for MSI/MSI-X
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
0000:04:00.0:free_4k:221:(pid 1519): page not found
---------------------- cosnole log -----------------

Fixes: bf0bf77f65 ('mlx5: Support communicating arbitrary host page size to firmware')
Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 16:20:51 -04:00
Yishai Hadas
e9a7ff3a71 net/mlx4_core: Return the admin alias GUID upon host view request
Return the admin alias GUID value upon a GET request via HOST. We do this so
that the GUID value requested by the admin is returned even if the SM has not
yet approved this GUID (e.g. the SM is down).

Note that this does not create a problem, since the virtual port will remain
down until the SM does ACK the requested GUID value.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 15:51:50 -04:00
Yishai Hadas
a0667a83af net/mlx4_core: Raise slave shutdown event upon FLR
There might be cases that PF doesn't get a "reset" command upon slave down
(e.g. virsh destroy). In these cases, however, an FLR event is issued.

Therefore, when the PF receives an FLR event for a slave, it should also
generate a shutdown event on the PF for that slave, to let the PF upper
layers (mlx4_ib, eth) perform any required cleanup/actions associated
with slave shutdown.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 15:51:50 -04:00
Yishai Hadas
fb517a4f03 net/mlx4_core: Set initial admin GUIDs for VFs
To have out of the box experience, the PF generates random GUIDs who
serve as the initial admin values.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 15:51:50 -04:00
Yishai Hadas
773af94e4e net/mlx4_core: Manage alias GUID per VF
Manages alias GUIDs per VF per port in the core layer.

This is a pre-step for managing alias GUIDs in a mode that the admin
GUID is returned via ib_query_gid() regardless of whether the SM
has approved it or not.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-04-15 15:51:50 -04:00
Alexander Duyck
12b3375f39 mlx4/mlx5: Use dma_wmb/rmb where appropriate
This patch should help to improve the performance of the mlx4 and mlx5 on a
number of architectures.  For example, on x86 the dma_wmb/rmb equates out
to a barrer() call as the architecture is already strong ordered, and on
PowerPC the call works out to a lwsync which is significantly less expensive
than the sync call that was being used for wmb.

I placed the new barriers between any spots that seemed to be trying to
order memory/memory reads or writes, if there are any spots that involved
MMIO I left the existing wmb in place as the new barriers cannot order
transactions between coherent and non-coherent memories.

v2: Reduced the replacments to just the spots where I could clearly
    identify the usage pattern.

Cc: Amir Vadai <amirv@mellanox.com>
Cc: Ido Shamay <idos@mellanox.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-09 14:25:25 -04:00
David S. Miller
c85d6975ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/mellanox/mlx4/cmd.c
	net/core/fib_rules.c
	net/ipv4/fib_frontend.c

The fib_rules.c and fib_frontend.c conflicts were locking adjustments
in 'net' overlapping addition and removal of code in 'net-next'.

The mlx4 conflict was a bug fix in 'net' happening in the same
place a constant was being replaced with a more suitable macro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 22:34:15 -04:00
Jack Morgenstein
fde913e254 net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
Commit 1daa4303b4 ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.

Fixes: 1daa4303b4 ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-06 17:32:27 -04:00
Saeed Mahameed
64613d9499 net/mlx5_core: Extend struct mlx5_interface to support multiple protocols
Preparation for ethernet driver.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:43 -04:00
Saeed Mahameed
233d05d28a net/mlx5_core: Move completion eqs from mlx5_ib to mlx5_core
Preparation for ethernet driver.
These functions will be used in drivers other than mlx5_ib.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Achiad Shochat
4ae6c18c59 net/mlx5_core: Update module info macros for ConnectX4 Support
Declare the support of new ConnectX4 HCA in module info
Pump up the module version

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Saeed Mahameed
302bdf68fc net/mlx5_core: Fix Mellanox copyright note
Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Achiad Shochat
4cbdd27c9c net/mlx5_core: Fix a bug in alloc_token
In alloc_token(), the token '1' would be allocated twice consecutively.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:42 -04:00
Ira Gusinsky
21db507439 net/mlx5_core: Avoid usage command work entry after writing command doorbell
Avoid usage of command work entry in cmd_work_handler since it can be released
by mlx5_cmd_invoke before the work handler returns to running.

Signed-off-by: Ira Gusinsky <irenag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Eli Cohen
05e4ecd1dc net/mlx5_core: Avoid copying outbox in aysnc command completion
Avoid copying to the output buffer in cmd_exec since this is done after the
command is completed. Failure to do this may cause cases where the callback
handler is called before the copy done by cmd_exec which then overwrites it.

Reported-by: Tamer Hleihel <tamerh@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Eli Cohen
64599cca51 net/mlx5_core: Use coherent memory for command interface page
Use coherent memory for the commands descriptor page. Take measures to make
sure the page is aligned to MLX5_ADAPTER_PAGE_SIZE as required by the hardware.

Reported-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Achiad Shochat
60722c2ba0 net/mlx5_core: Use the right inbox struct in destroy mkey command
struct mlx5_query_mkey_mbox_in rather than mlx5_destroy_mkey_mbox_in

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Saeed Mahameed
b812b5441e net/mlx5_core: Clear doorbell record inside mlx5_db_alloc()
Do it in one place instead of every where the function is invoked

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Eli Cohen
9ef9baa2ac net/mlx5_core: Avoid setting DC requestor/responder resources
PRM does not support setting these values so avoid setting them.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:41 -04:00
Eli Cohen
ad1891062a net/mlx5_core: Allocate firmware pages from device's NUMA node
Allocate firmware pages from the NUMA node which is close to the device.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:33:40 -04:00
Muhammad Mahajna
78500b8c03 net/mlx4_en: Add RX-ALL support
Enabled when the device supports KEEP FCS and IGNORE FCS.

When the flag is set, pass all received frames up the stack,
even ones with invalid FCS, controlled by ethtool.

Signed-off-by: Muhammad Mahajna <muhammadm@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:04 -04:00
Muhammad Mahajna
f0df35037a net/mlx4_en: Add RX-FCS support
Enabled when device supports KEEP FCS. When the flag is set, Ethernet FCS
is appended to the end of the frame, controlled by ethtool.

Signed-off-by: Muhammad Mahajna <muhammadm@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:04 -04:00
Ido Shamay
51af33cfed net/mlx4_en: Add interface identify support
Add support for the interface ethtool identify feature.

Make the physical port LED to blink with green and yellow colors.

The device handles the LED blink by itself (synchrous use of
set_phys_id), by returning 0 to ETHTOOL_ID_ACTIVE command.

Signed-off-by: Eyal Grossman <eyalgr@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
a130b59057 net/mlx4: Add SET_PORT opcode modifiers enumeration
The calls to SET_PORT used hard-code numbers, when supplying command's
opcode modifiers, fix that to use well defined constants.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
38438f7c7e net/mlx4: Set enhanced QoS support by default when ETS supported
If HCA supports ETS QoS feature, set enhanced QoS bit in init_hca as default.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
3742cc6551 net/mlx4: Warn users of depracated QoS Firmware
A new capability bit was introduced in the past to to differ devices
using the QoS ETS feature. The old was deprecated since then.
If driver sees device which set only the old capabilty, it will print
warning to user suggesting to upgrade the FW.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
cda373f484 net/mlx4_en: Enable TX rate limit per VF
Support granular QoS per VF, by implementing the ndo_set_vf_rate.

Enforce a rate limit per VF when called, and enabled only for VFs in
VST mode with user priority supported by the device.

We don't enforce VFs to be in VST mode at the moment of configuration,
but rather save the given rate limit and enforce it when the VF is
moved to VST with user priority which is supported (currently 0).

VST<->VGT or VST qos value state changes are disallowed when a rate
limit is configured. Minimum BW share is not supported yet.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
08068cd568 net/mlx4: Added qos_vport QP configuration in VST mode
Granular QoS per VF feature introduce a new QP field, qos_vport.

PF administrator can connect VF QPs to a certain QoS Vport, to
inherit its proporties. Connecting QPs to the default QoS Vport
(defined as 0) is always allowed, even when there are no allocated VPPs.
At this point, only the default vport is connected to QPs.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:03 -04:00
Ido Shamay
666672d480 net/mlx4: Allocate VPPs for each port on PF init
Initialization of granular Qos per VF mechanism.

Query the port availible VPPs and allocates those on all supported
priorities in an equal share. Allocation is done only in SRIOV mode,
when the feature is supported by the device and port type is Ethernet.

Allocation currently is done only on the default priority 0.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:02 -04:00
Ido Shamay
d019fcb224 net/mlx4: Query device for QoS per VF support
Checks in QUERY_DEV_CAP if the granular QoS per VF feature is
supported by the device. Disabled for guests.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:02 -04:00
Ido Shamay
1c29146d38 net/mlx4: Add mlx4_SET_VPORT_QOS implementation
Add the SET_VPORT_QOS device command, which is ntended for virtual
granular QoS configuration per VF in SRIOV mode. The SET_VPORT_QOS
command sets and queries QoS parameters of a VPort. Each priority
allowed for a VPort is assigned with a share of the BW, and a BW
limitation. QoS parameters can be modified at any time, but must be
initialized before any QP is associated with the VPort.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:02 -04:00
Ido Shamay
7e95bb99a8 net/mlx4: Add mlx4_ALLOCATE_VPP implementation
Implements device ALLOCATE_VPP command, to be used for granular QoS
configuration of VFs by the PF device. Defines and queries the amount
of VPPs assigned to each port, and the amount of VPPs assigned to each
priority of each port. Once the total VPPs are split between the priorities
of a port, they may be assigned with a share of the BW or a rate limit.

Split into two functions (get/set) whoch are supplied with
mlx4_alloc_vpp_context and physical port number.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:02 -04:00
Ido Shamay
12a889c057 net/mlx4: New file for QoS related firmware commands
Create two new files fw_qos.h and fw_qos.c in mlx4_core module.

It gathers all relevant QoS firmware related commands etc, thus improving
encapsulation of the mlx4_core module. For now it contains the QoS existing
commands: mlx4_SET_PORT_SCHEDULER and mlx4_SET_PORT_PRIO2TC.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:02 -04:00
Ido Shamay
4abccb6157 net/mlx4: Aesthetic code changes in multi_func_init
Previous vf_oper and vf_admin code created very long lines, making it hard
to read the code. Added relevant in-struct pointers to reduce code
complexity and avoid code lines spread over 80 lines. Same logic is preserved.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:25:01 -04:00
Ido Shamay
fccea6436a net/mlx4: Make mlx4_is_eth visible inline funcion
Currently implemented as static function in resource_tracker.c --
this change will allow other files in mlx4_core to use it as well.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:24:51 -04:00
Ido Shamay
241a08c3a7 net/mlx4_en: Change loopback only upon feature change
Currently any change of netdev features results in a call to
mlx4_en_update_loopback_state(). Those calls are unnecessary,
and should be called only upon loopback feature change.

Also moved some of the logic into mlx4_en_update_loopback_state().

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:24:51 -04:00
Ido Shamay
802f42a8d9 net/mlx4: Add RSS support for fragmented IP datagrams
Enable RSS support for fragmented IP packets, when device supports it.
Until now, fragmented IP packets were directed only to the default_qpn.
Since IP fragments (datagram) have no upper protocols (L3 IP packets),
hash is performed on 3-tuple - dst MAC, source IP and dest IP. The HW
makes sure that this holds for the 1st fragment too, so all fragments
go to the same QP.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:24:50 -04:00
David S. Miller
9f0d34bc34 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	drivers/net/usb/sr9800.c
	drivers/net/usb/usbnet.c
	include/linux/usb/usbnet.h
	net/ipv4/tcp_ipv4.c
	net/ipv6/tcp_ipv6.c

The TCP conflicts were overlapping changes.  In 'net' we added a
READ_ONCE() to the socket cached RX route read, whilst in 'net-next'
Eric Dumazet touched the surrounding code dealing with how mini
sockets are handled.

With USB, it's a case of the same bug fix first going into net-next
and then I cherry picked it back into net.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 16:16:53 -04:00
Richard Cochran
f75419c81e ptp: mlx4: use helpers for converting ns to timespec.
This patch changes the driver to use ns_to_timespec64() instead of
open coding the same logic.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 17:19:19 -04:00
Eran Ben Elisha
a3333b35da net/mlx4_en: Moderate ethtool callback to show more statistics
More packet statistics are now calculated and visible to the user via
ethtool:

- RX packet errors statistics.
- TX/RX drops are now calculated.
- TX multicast and broadcast statistics.
- RX/TX per priority bytes statistics.
- RX/TX no vlan packets and bytes statistics.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:51 -04:00
Matan Barak
0b131561a7 net/mlx4_en: Add Flow control statistics display via ethtool
Flow control per priority and Global pause counters are now visible via
ethtool.  The counters shows statistics regarding pauses in the device.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:51 -04:00
Eran Ben Elisha
3da8a36cc5 net/mlx4_en: Protect access to the statistics bitmap
This will allow parallel access to the statistics bitmap.
A pre-step for adding PFC counters, where the statistics bitmap
can be dynamically changed when modifying the PFC setting.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
6fcd27354b net/mlx4_en: Support general selective view of ethtool statistics
The driver uses a bitmask to indicate which statistics should be
displayed to the user in ethtool. The bitmask is u64, therefore we are
limited for a selective view of up to 64 statistics. Extend the bitmap
in order to show more than 64 statistics.

In addition, add packet statistics to the ethtool display for PF.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
ffa88f37ff net/mlx4_en: Move statistics bitmap setting to the Ethernet driver
The statistics bitmap belongs to the Ethernet driver, move it there.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
b4b6e842fc net/mlx4_en: Create new header file for all statistics info
Add mlx4_stats.h file and move there all statistics structs and marcos.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Eran Ben Elisha
66f24a7ebf net/mlx4_en: Fix port counters statistics bitmask
Two counters (rx_chksum_complete and tx_chksum_offload) are not displayed
under SRIOV for the PF via ethtool because their bit mask is off, fix that.

Fixes: f8c6455bb ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Fixes: 9fab426de ('mlx4: add a new xmit_more counter')

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 16:36:50 -04:00
Richard Cochran
e394b80545 ptp: mlx4: convert to the 64 bit get/set time methods.
This driver's clock is implemented using a timecounter, and so with
this patch the driver is ready for the year 2038.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 12:01:18 -04:00
Toshiaki Makita
8cb65d0008 net: Move check for multiple vlans to drivers
To allow drivers to handle the features check for multiple tags,
move the check to ndo_features_check().
As no drivers currently handle multiple tagged TSO, introduce
dflt_features_check() and call it if the driver does not have
ndo_features_check().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-29 13:33:22 -07:00
Jack Morgenstein
bffb023ad2 net/mlx4_core: Fix GEN_EQE accessing uninitialixed mutex
We occasionally see in procedure mlx4_GEN_EQE that the driver tries
to grab an uninitialized mutex.

This can occur in only one of two ways:
1. We are trying to generate an async event on an uninitialized slave.
2. We are trying to generate an async event on an illegal slave number
   ( < 0 or > persist->num_vfs) or an inactive slave.

To deal with #1: move the mutex initialization from specific slave init
sequence in procedure mlx_master_do_cmd to mlx4_multi_func_init() (so that
the mutex is always initialized for all slaves).

To deal with #2: check in procedure mlx4_GEN_EQE that the slave number
provided is in the proper range and that the slave is active.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:22:52 -04:00
Ido Shamay
e5eda89d97 net/mlx4_en: Call register_netdevice in the proper location
Netdevice registration should be performed a the end of the driver
initialization flow. If we don't do that, after calling register_netdevice,
device callbacks may be issued by higher layers of the stack before
final configuration of the device is done.

For example (VXLAN configuration race), mlx4_SET_PORT_VXLAN was issued
after the register_netdev command. System network scripts may configure
the interface (UP) right after the registration, which also attach
unicast VXLAN steering rule, before mlx4_SET_PORT_VXLAN was called,
causing the firmware to fail the rule attachment.

Fixes: 837052d0cc ("net/mlx4_en: Add netdev support for TCP/IP offloads of vxlan tunneling")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-24 15:22:52 -04:00
David S. Miller
0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Wu Fengguang
de1cf8a7d7 net/mlx4_en: mlx4_en_set_tx_maxrate() can be static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 23:21:41 -04:00
Eran Ben Elisha
39de961a4a net/mlx4_en: Set statistics bitmap at port init
Port statistics bitmap will now be initialized at port init.  Even before
starting the port, statistics are visible to the user and must be properly masked.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 15:17:11 -04:00
Eran Ben Elisha
a16f356570 net/mlx4_en: Fix off-by-one in ethtool statistics display
NUM_PORT_STATS was 9 instead of 10, which caused off-by-one bug when
displaying the statistics starting from tx_chksum_offload in ethtool.

Fixes: f8c6455bb0 ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 15:17:11 -04:00
Or Gerlitz
c10e4fc6c4 net/mlx4_en: Add tx queue maxrate support
Add ndo_set_tx_maxrate support.

To support per tx queue maxrate limit, we use the update-qp firmware
command to do run-time rate setting for the qp that serves this tx ring.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 14:55:19 -04:00
Or Gerlitz
fc31e2560a net/mlx4_core: Add basic support for QP max-rate limiting
Add the low-level device commands and definitions used for QP max-rate limiting.

This is done through the following elements:

  - read rate-limit device caps in QUERY_DEV_CAP: number of different
    rates and the min/max rates in Kbs/Mbs/Gbs units

  - enhance the QP context struct to contain rate limit units and value

  - allow to do run time rate-limit setting to QPs through the
    update-qp firmware command

  - QP rate-limiting is disallowed for VFs

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-18 14:55:19 -04:00
Julia Lawall
6b9f53bc10 net/mlx5_core: don't export static symbol
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
type T;
identifier f;
@@

static T f (...) { ... }

@@
identifier r.f;
declarer name EXPORT_SYMBOL;
@@

-EXPORT_SYMBOL(f);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-12 00:03:34 -04:00
Joe Perches
dbedd44e98 ethernet: codespell comment spelling fixes
To test a checkpatch spelling patch, I ran codespell against
drivers/net/ethernet/.

$ git ls-files drivers/net/ethernet/ | \
  while read file ; do \
    codespell -w $file; \
  done

I removed a false positive in e1000_hw.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-08 22:54:22 -04:00
Shani Michaeli
708b869bf5 net/mlx4_en: Add QCN parameters and statistics handling
Implement the IEEE DCB handlers for set/get QCN parameters and
statistics reading per TC.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 21:50:02 -05:00
Shani Michaeli
d237baa1cb net/mlx4_core: Add basic elements for QCN
Add device capability, firmware command opcode and etc prior elements
needed for QCN suppprt. Disable SRIOV VF view/access for QCN is disabled.

While here, remove a redundant offset definition into the
QUERY_DEV_CAP mailbox.

Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-06 21:50:02 -05:00
David S. Miller
71a83a6db6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/rocker/rocker.c

The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 21:16:48 -05:00
Joe Perches
c7bf716940 ethernet: Use eth_<foo>_addr instead of memset
Use the built-in function instead of memset.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-03 17:01:36 -05:00
Ido Shamay
1037ebbbd2 net/mlx4_en: Disbale GRO for incoming loopback/selftest packets
Packets which are sent from the selftest (ethtool) flow,
should not be passed to GRO stack but rather dropped by
the driver after validation. To achieve that, we disable
GRO for the duration of the selftest.

Fixes: dd65beac48 ("net/mlx4_en: Extend usage of napi_gro_frags")
Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 15:27:19 -05:00
Or Gerlitz
f5956fafb0 net/mlx4_core: Fix wrong mask and error flow for the update-qp command
The bit mask for currently supported driver features (MLX4_UPDATE_QP_SUPPORTED_ATTRS)
of the update-qp command was defined twice (using enum value and pre-processor
define directive) and wrong.

The return value of the call to mlx4_update_qp() from within the SRIOV
resource-tracker was wrongly voided down.

Fix both issues.

issue: none
Fixes: 09e05c3f78 ('net/mlx4: Set vlan stripping policy by the right command')
Fixes: ce8d9e0d67 ('net/mlx4_core: Add UPDATE_QP SRIOV wrapper support')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-02 15:27:19 -05:00
Eli Cohen
de61390cb3 net/mlx5_core: Fix configuration of log_uar_page_sz
The current code failed to configure the page size for architectures with page
size different than 4K - PPC for example.

Signed-off-by: Carol L Soto <clsoto@us.ibm.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-11 19:42:23 -08:00
Markus Elfring
1d966d03a3 net: Mellanox: Delete unnecessary checks before the function call "vunmap"
The vunmap() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-09 14:10:05 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
Ido Shamay
cfb53f36a5 net/mlx4_en: Notify TX Vlan offload change
Notify users when TX vlan offload feature changed with ethtool.
Relevant command - ethtool -K <eth> txvlan on/off.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:46 -08:00
Ido Shamay
e8e7f018f1 net/mlx4_en: Adjust RX frag strides to frag sizes
This patch improves memory utilization and therefore the packets rate
for special MTU's. Instead of setting the frag_stride to the maximal
hard coded frag_size, use the actual frag_size that is set according to
the MTU, when setting the stride of the last frag.
So, for example, for MTU 1600, where the frag_size of the 2nd frag is
86, the frag_size is set to 128 instead of 4096. See below:

Before:
 frag:0 - size:1536 prefix:0 stride:1536
 frag:1 - size:86 prefix:1536 stride:4096

 frag 0 allocator: - size:32768 frags:21
 frag 1 allocator: - size:32768 frags:8

After:
 frag:0 - size:1536 prefix:0 stride:1536
 frag:1 - size:86 prefix:1536 stride:128

 frag 0 allocator: - size:32768 frags:21
 frag 1 allocator: - size:32768 frags:256

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Ido Shamay
b110d2ce49 net/mlx4_en: Print page allocator information
After Initialization of page_alloc, print actual allocated page
size and number of frags it contains. prints is done only when drv
message level is set on the interface.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Or Gerlitz
1c755cc5be net/mlx5_core: Move to use hex PCI device IDs
Align the IDs in the code with the modinfo, lspci -n, etc tools outputs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Or Gerlitz
0fab541ac2 net/mlx4_core: Fix misleading debug print on CQE stride support
We do support cache line sizes of 32 and 64 bytes without activating the
CQE stride feature. Fix a misleading print saying that these cache line
sizes aren't supported.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Maor Gottlieb
6af0a52f65 net/mlx4: mlx4_config_dev_retrieval() - Initialize struct config_dev before using
Add Initialization to struct config_dev before filling and using it.
Fix to warning:

warning: config_dev.rx_checksum_val may be used uninitialized in this function

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Maor Gottlieb
b332068ce4 net/mlx4_core: Fix mpt_entry initialization in mlx4_mr_rereg_mem_write()
a) Previously, mlx4_mr_rereg_write filled the MPT's start
   and length with the old MPT's values.
   Fixing the initialization to take the new start and length.

b) In addition access flags in mpt_status were initialized instead of
   status due to bad boolean operation. Fixing the operation.

c) Initialization of pd_slave caused a protection error.
   Fix - removing this initialization.

d) In resource_tracker.c: Fixing vf encoding to be one-based.

Fixes: e630664c ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:17:45 -08:00
Moni Shoua
5da0354726 net/mlx4_en: Port aggregation configuration
Capture NETDEV events generated by the bonding driver and based on that
make decisions of how to configure port aggregation in the mlx4 core driver.

This includes setting the V2P port table and re-creating the interested
interfaces in bonded/non-bonded mode.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:25 -08:00
Moni Shoua
53f33ae295 net/mlx4_core: Port aggregation upper layer interface
Supply interface functions to bond and unbond ports of a mlx4 internal
interfaces. Example for such an interface is the one registered by the
mlx4 IB driver under RoCE.

There are

1. Functions to go in/out to/from bonded mode
2. Function to remap virtual ports to physical ports

The bond_mutex prevents simultaneous access to data that keep status of
the device in bonded mode.

The upper mlx4 interface marks to the mlx4 core module that they
want to be subject for such bonding by setting the MLX4_INTFF_BONDING
flag. Interface which goes to/from bonded mode is re-created.

The mlx4 Ethernet driver does not set this flag when registering the
interface, the IB driver does.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
Moni Shoua
59e14e3250 net/mlx4_core: Port aggregation low level interface
Implement the hardware interface required for port aggregation.

1. Disable RX port check on receive - don't perform a validity check
that matches to QP's port and the port where the packet is received.

2. Virtual to physical port remap - configure virtual to physical port
mapping. Port remap capability for virtual functions.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 16:14:24 -08:00
Jack Morgenstein
5a2e87b168 net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
Commit de966c5928 (net/mlx4_core: Support more than 64 VFs) was meant to
allow up to 126 VFs.  However, due to leaving MLX4_MFUNC_MAX too low, using
more than 80 VFs resulted in memory corruptions (and Oopses) when more than
80 VFs were requested. In addition, the number of slaves was left too high.

This commit fixes these issues.

Fixes: de966c5928 ("net/mlx4_core: Support more than 64 VFs")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-02 19:38:04 -08:00
Majd Dibbiny
6d6e996c20 net/mlx4_core: Update the HCA core clock frequency after INIT_PORT
The firmware might change the hca core clock frequency after the driver
issues the INIT_PORT command. Therefore we need to query the new
value again and save in to the cached dev caps.

Fixes: ddd8a6c1 ('net/mlx4_core: Read HCA frequency and map internal clock')
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:58 -08:00
Or Gerlitz
cb2147a925 net/mlx4_core: Fix device capabilities dumping
We are dumping device capabilities which are supported both by the
firmware and the driver. Align the array that holds the capability
strings with this practice.

Reported-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:58 -08:00
Matan Barak
19ab574f62 net/mlx4: Fix memory corruption in mlx4_MAD_IFC_wrapper
Fix a memory corruption at mlx4_MAD_IFC_wrapper.

A table of size dev->caps.pkey_table_len[port]*sizeof(*table)
was allocated, but get_full_pkey_table() assumes that the number
of entries in the table is a multiplication of 32 (which isn't always
correct).

Fixes: 0a9a018 ('mlx4: MAD_IFC paravirtualization')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:58 -08:00
Saeed Mahameed
5a228c03d8 net/mlx4_en: Use ethtool cmd->autoneg as a hint for ethtool set settings
Use cmd->autoneg as a user hint to decide what to set in ethtool set settings callback.
When cmd->autoneg == AUTONEG_ENABLE set according to ethtool->advertise otherwise,
set according to ethtool->speed.

Usage:
	- ethtool -s eth<x> speed 56000 autoneg off
	- ethtool -s eth<x> advertise 0x800000 autoneg on

While we're here:
	- Move proto_admin masking outcome check to be adjacent to the operation.
	- Move en_warn("port reset..") print to "port reset" block.

Fixes: 312df74 ("net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:58 -08:00
Jack Morgenstein
ef0223e693 net/mlx4_core: Remove duplicate code line from procedure mlx4_bf_alloc
mlx4_bf_alloc had an unnecessary/duplicate code line. Did no harm,
but not good practice.

Reported by the Mellanox Beijing team.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
dc7d500451 net/mlx4_core: Fix struct mlx4_vhcr_cmd to make implicit padding explicit
Struct mlx4_vhcr was implicitly padded by the gcc compiler on 64-bit
architectures.

This commit makes that padding explicit, to prevent issues with
changing compilers and with incompatibilities between 32-bit architecture
implicit padding and 64-bit architecture implicit padding.

This structure is used in virtualization for communication between
the Host and its Guests. The explicit padding allows 64-bit Hosts
(old and new) to continue to interoperate with 64-bit Guests (old and new).

However, without this fix, 64-bit Hosts could not interoperate with 32-bit
Guests (since these did not insert the padding dword). With this fix,
32-bit Guests will be able to interoperate with 64-bit Hosts (since
the structure offsets will be identical on both).

Reported-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
30a5da5b33 net/mlx4_core: Fix HW2SW_EQ to conform to the firmware spec
The driver incorrectly assigned an out-mailbox to this command,
and used an opcode modifier = 0, which is a reserved value (it
should use opcode modifier = 1).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
5a03108689 net/mlx4_core: Adjust command timeouts to conform to the firmware spec
The firmware spec states that the timeout for all commands should be 60 seconds.

In the past, the spec indicated that there were several classes of timeout
(short, medium, and long).  The driver has these different timeout classes.
We leave the class differentiation in the driver as-is (to protect against any
future spec changes), but set the timeout for all classes to be 60 seconds.

In addition, we fix a few commands which had hard-coded numeric timeouts specified.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
772103e6b1 net/mlx4_core: Fix mem leak in SRIOV mlx4_init_one error flow
Structs allocated for the resource tracker must be freed in
the error flow.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
f0ce061508 net/mlx4_core: Add reserved lkey for VFs to QUERY_FUNC_CAP
The reserved lKey is different for each VF.
A base lkey value is returned in QUERY_DEV_CAP at offset 0x98.

The reserved L_key value for a VF is:
    VF_lkey = base_lkey + (VF_number << 8).

This VF L_key value should be returned in QUERY_FUNC_CAP
(opcode-modifier = 0) at offset 0x48.

To indicate that the lkey value at offset 0x48 is valid, the Hypervisor
sets a flag bit in dword 0x0, offset 27 in the QUERY_FUNC_CAP wrapper
function.

When the VF calls QUERY_FUNC_CAP, it should check if this flag bit is set.
If it is set, the VF should take the reserved lkey value at offset 0x48.
If the bit is not set, the VF should not use a reserved lkey
(i.e., should set its reserved lkey value to 0).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
Jack Morgenstein
be6a6b43b5 net/mlx4_core: Add bad-cable event support
If the firmware can detect a bad cable, allow it to generate an
event, and print the problem in the log.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 17:12:57 -08:00
David S. Miller
95f873f2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-27 16:59:56 -08:00
Yishai Hadas
0cd9302734 net/mlx4_core: Reset flow activation upon SRIOV fatal command cases
When SRIOV commands are executed over the comm-channel and get
a fatal error (e.g. timeout, closing command failure) the VF enters
into error state and reset flow is activated.

To be able to recognize whether the failure was on a closing command, the
operational code for the given VHCR command is used. Once the device entered
into an error state we prevent redundant error messages from being printed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:15 -08:00
Yishai Hadas
55ad359225 net/mlx4_core: Enable device recovery flow with SRIOV
In SRIOV, both the PF and the VF may attempt device recovery whenever they
assume that the device is not functioning.  When the PF driver resets the
device, the VF should detect this and attempt to reinitialize itself.

The VF must be able to reset itself under all circumstances, even
if the PF is not responsive.

The VF shall reset itself in the following cases:

1. Commands are not processed within reasonable time over the communication channel.
This is done considering device state and the correct return code based on
the command as was done in the native mode, done in the next patch.

2. The VF driver receives an internal error event reported by the PF on the
communication channel. This occurs when the PF driver resets the device or
when VF is out of sync with the PF.

Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
PF is not responsive.

As PF and VF may run their reset flow simulantanisly, there are several cases
that are handled:
- Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
- Prevent PF getting VF commands before it has finished initializing its resources.
- Upon VF startup, check that comm-channel is online before sending
  commands to the PF and getting timed-out.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
2ba5fbd62b net/mlx4_core: Handle AER flow properly
Fix AER callbacks to work properly, it includes:
- Refractoring AER to be aligned with Reset flow support.
- Sync with concurrent catas flow.

In addition, fix the shutdown PCI callback to sync with
concurrent catas flow.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
c69453e294 net/mlx4_core: Manage interface state for Reset flow cases
We need to manage interface state to sync between reset flow and some other
relative cases such as remove_one. This has to be done to prevent certain
races. For example in case software stack is down as a result of unload call,
the remove_one should skip the unload phase.

Implement the remove_one case, handling AER and other cases comes next.

The interface can be up/down, upon remove_one, the state will include an extra
bit indicating that the device is cleaned-up, forcing other tasks to finish
before the final cleanup.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
f5aef5aa35 net/mlx4_core: Activate reset flow upon fatal command cases
We activate reset flow upon command fatal errors, when the device enters an
erroneous state, and must be reset.

The cases below are assumed to be fatal: FW command timed-out, an error from FW
on closing commands, pci is offline when posting/pending a command.

In those cases we place the device into an error state: chip is reset, pending
commands are awakened and completed immediately. Subsequent commands will
return immediately.

The return code in the above cases will depend on the command. Commands which
free and close resources will return success (because the chip was reset, so
callers may safely free their kernel resources). Other commands will return -EIO.

Since the device's state was marked as error, the catas poller will
detect this and restart the device's software stack (as is done when a FW
internal error is directly detected). The device state is protected by a
persistent mutex lives on its mlx4_dev, as such no need any more for the
hcr_mutex which is removed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
f6bc11e426 net/mlx4_core: Enhance the catas flow to support device reset
This includes:

- resetting the chip when a fatal error is detected (the current code
  does not do this).

- exposing the ability to enter error state from outside the catas code
  by calling its functionality. (E.g. FW Command timeout, AER error).

- managing a persistent device state. This is needed to sync between
  reset flow cases.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
ad9a0bf08f net/mlx4_core: Refactor the catas flow to work per device
Using a WQ per device instead of a single global WQ, this allows
independent reset handling per device even when SRIOV is used.

This comes as a pre-patch for supporting chip reset
for both native and SRIOV.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:14 -08:00
Yishai Hadas
dd0eefe3ab net/mlx4_core: Set device configuration data to be persistent across reset
When an HCA enters an internal error state, this is detected by the driver.
The driver then should reset the HCA and restart the software stack.

Keep ports information and some SRIOV configuration in a persistent area
to have it valid across reset.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:13 -08:00
Yishai Hadas
872bf2fb69 net/mlx4_core: Maintain a persistent memory for mlx4 device
Maintain a persistent memory that should survive reset flow/PCI error.
This comes as a preparation for coming series to support above flows.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-25 14:43:13 -08:00
Or Gerlitz
5eff6dadb9 net/mlx4: Don't disable vxlan offloads under DMFS-A0 optimized steering
Except for VXLAN steering rules, all offloads should work as they were
under plain DMFS mode. Fix that by enabling all the offloads under
DMFS-A0 mode, except for VXLAN steering rules.

Fixes: d57febe1a4 "net/mlx4: Add A0 hybrid steering"
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 19:35:30 -05:00
Jiri Pirko
df8a39defa net: rename vlan_tx_* helpers since "tx" is misleading there
The same macros are used for rx as well. So rename it.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:51:08 -05:00
Arnd Bergmann
065bd8c28b mlx5: avoid build warnings on 32-bit
The mlx5 driver passes a string pointer in through a 'u64' variable,
which on 32-bit machines causes a build warning:

drivers/net/ethernet/mellanox/mlx5/core/debugfs.c: In function 'qp_read_field':
drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

The code is in fact safe, so we can shut up the warning by adding
extra type casts.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 17:08:20 -05:00
David S. Miller
44d84d7272 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-06 22:29:20 -05:00
Richard Cochran
d9f393734a mlx4: include clocksource.h again
This driver uses the function, clocksource_khz2mult, and so it really must
include clocksource.h.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 16:47:36 -05:00
Jack Morgenstein
d0d012509f net/mlx4_core: Fix error flow in mlx4_init_hca()
We shouldn't call UNMAP_FA here, this is done in mlx4_load_one.

If mlx4_query_func fails, we need to invoke CLOSE_HCA for both
native and master.

Fixes: a0eacca948 ('net/mlx4_core: Refactor mlx4_load_one')
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:41:29 -05:00
Maor Gottlieb
a51e0df4c1 net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the
old MTT's page shift, which could result in using an incorrect offset.
Fix the initialization to be after we calculate the new MTT offset.

In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This
is necessary in order to mark the MTT as invalid and avoid freeing it later.

Fixes: e630664 ('mlx4_core: Add helper functions to support MR re-registration')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-02 15:41:28 -05:00
Richard Cochran
2eebdde652 timecounter: keep track of accumulated fractional nanoseconds
The current timecounter implementation will drop a variable amount
of resolution, depending on the magnitude of the time delta. In
other words, reading the clock too often or too close to a time
stamp conversion will introduce errors into the time values. This
patch fixes the issue by introducing a fractional nanosecond field
that accumulates the low order bits.

Reported-by: Janusz Użycki <j.uzycki@elproma.com.pl>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Richard Cochran
ce51ff0937 net: mlx4: convert to timecounter adjtime.
This patch changes the driver to use the new and improved method
for adjusting the offset of a timecounter.

Compile tested only.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-30 18:29:27 -05:00
Jesse Gross
5f35227ea3 net: Generalize ndo_gso_check to ndo_features_check
GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.

This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.

By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.

CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by:  Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-26 17:20:56 -05:00
Amir Vadai
492f5add4b net/mlx4_en: Doorbell is byteswapped in Little Endian archs
iowrite32() will byteswap it's argument on big endian archs.
iowrite32be() will byteswap on little endian archs.
Since we don't want to do this unnecessary byteswap on the fast path,
doorbell is stored in the NIC's native endianness. Using the right
iowrite() according to the arch endianness.

CC: Wei Yang <weiyang@linux.vnet.ibm.com>
CC: David Laight <david.laight@aculab.com>
Fixes: 6a4e812 ("net/mlx4_en: Avoid calling bswap in tx fast path")
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-22 16:33:10 -05:00
Linus Torvalds
4c929feed7 Main batch of InfiniBand/RDMA changes for 3.19:
- On-demand paging support in core midlayer and mlx5 driver.  This lets
    userspace create non-pinned memory regions and have the adapter HW
    trigger page faults.
  - iSER and IPoIB updates and fixes.
  - Low-level HW driver updates for cxgb4, mlx4 and ocrdma.
  - Other miscellaneous fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUk2pBAAoJEENa44ZhAt0hL18P/jdCGbWVXOJh25KvjzmKIfUV
 T3Bdixz5h/Xj2iU6ShsXLSZa8vkPXsiO5v3MIQcR5MuPn88vrxemTy/OmBjefJeL
 qKGnWfy9O8KxMqhYZAokTTIyl5ygtSITbJsCE5W0KHgRBgBtexbrHeFBcWsT3AZ5
 piGyRP4XWc2LtfjrFWdUUjRELz9m74L93uILy0P8lS58k3M8YIOvkjqVmGj5Ya3U
 /hadgk1HYWfxjw+z3v0keaP1IoqHpJferH+UyjCj8UsIB9swXabE8ap/SFrQPIpe
 p+Zwyi25292mavqEfm/neUmvn34xLF8c00XB6UKxr42Q9yd1mDxnO+ZxWpxW5klQ
 tKEZeySDbB/WplGrumCeNXPonFvdBpGOTguP3z5o0bcgj1UJ+yVk8KjNBlwSWhQw
 Mkh/Rb6gSJzeidB3pnQV3TKVkvcFr+Li6DgbG6a77f0W7ggQC2UaeTwEPY5FlMtK
 n2jQddhnXYsQXeOEpDcISbpAnCIx+qjDIRv7jYTajw0hg8A669ytcI/gi4b9qJeU
 l3epZDszbCkRwPACzOXCRfeZRiz1H6/USI+Vn/yIQZBlHEd7TcK6ph+KDO/btX+D
 PWKrirIgzorJsIsDyD4WBXHfJnNS1Imfoxl5s7/8kkwrIkwY+lGpU0zM1bu7cS8W
 c32iGI9+dgHSPTZt3RdL
 =MCO9
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.19:

   - On-demand paging support in core midlayer and mlx5 driver.  This
     lets userspace create non-pinned memory regions and have the
     adapter HW trigger page faults.
   - iSER and IPoIB updates and fixes.
   - Low-level HW driver updates for cxgb4, mlx4 and ocrdma.
   - Other miscellaneous fixes"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (56 commits)
  IB/mlx5: Implement on demand paging by adding support for MMU notifiers
  IB/mlx5: Add support for RDMA read/write responder page faults
  IB/mlx5: Handle page faults
  IB/mlx5: Page faults handling infrastructure
  IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
  IB/mlx5: Changes in memory region creation to support on-demand paging
  IB/mlx5: Implement the ODP capability query verb
  mlx5_core: Add support for page faults events and low level handling
  mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
  IB/srp: Allow newline separator for connection string
  IB/core: Implement support for MMU notifiers regarding on demand paging regions
  IB/core: Add support for on demand paging regions
  IB/core: Add flags for on demand paging support
  IB/core: Add support for extended query device caps
  IB/mlx5: Add function to read WQE from user-space
  IB/core: Add umem function to read data from user-space
  IB/core: Replace ib_umem's offset field with a full address
  IB/mlx5: Enhance UMR support to allow partial page table update
  IB/mlx5: Remove per-MR pas and dma pointers
  RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs
  ...
2014-12-18 20:10:44 -08:00
Ido Shamay
c3f2511fea net/mlx4: Cache line CQE/EQE stride fixes
This commit contains 2 fixes for the 128B CQE/EQE stride feaure.
Wei found that mlx4_QUERY_HCA function marked the wrong capability
in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled.
Also added small fix in initial CQE ownership bit assignment, when CQE
is size is not default 32B.

Fixes: 77507aa24 (net/mlx4: Enable CQE/EQE stride support)
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-16 15:23:53 -05:00
Roland Dreier
a7cfef21e3 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'mlx4', 'ocrdma', 'odp' and 'srp' into for-next 2014-12-15 18:19:20 -08:00
Haggai Eran
e420f0c0f3 mlx5_core: Add support for page faults events and low level handling
* Add a handler function pointer in the mlx5_core_qp struct for page
  fault events. Handle page fault events by calling the handler
  function, if not NULL.
* Add on-demand paging capability query command.
* Export command for resuming QPs after page faults.
* Add various constants related to paging support.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:18:59 -08:00
Yuval Shaia
0b9976577c mlx4_core: Check for DPDP violation only when DPDP is not supported
Move check for DPDP out of the loop to make the code more readable.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-12-15 18:12:29 -08:00
Or Gerlitz
c78e25edbf net/mlx4_core: Avoid double dumping of the PF device capabilities
To support asymmetric EQ allocations, we should query the device
capabilities prior to enabling SRIOV. As a side effect of adding that,
we are dumping the PF device capabilities twice. Avoid that by moving
the printing into a helper function which is called once.

Fixes: 7ae0e400cd ('net/mlx4_core: Flexible (asymmetric) allocation of
		     EQs and MSI-X vectors for PF/VFs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-15 11:34:54 -05:00
Matan Barak
da315679e8 net/mlx4_core: Fixed memory leak and incorrect refcount in mlx4_load_one
The current mlx4_load_one has a memory leak as it always allocates
dev_cap, but frees it only on error.

In addition, even if VFs exist when mlx4_load_one is called,
we still need to notify probed VFs that we're loading (by
incrementing pf_loading).

Fixes: a0eacca948 ('net/mlx4_core: Refactor mlx4_load_one')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-15 11:34:53 -05:00
Matan Barak
7d077cd34e net/mlx4: Add support for A0 steering
Add the required firmware commands for A0 steering and a way to enable
that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
to configure and query the device.

The different A0 DMFS (steering) modes are:

Static - optimized performance, but flow steering rules are
limited. This mode should be choosed explicitly by the user
in order to be used.

Dynamic - this mode should be explicitly choosed by the user.
In this mode, the FW works in optimized steering mode as long as
it can and afterwards automatically drops to classic (full) DMFS.

Disable - this mode should be explicitly choosed by the user.
The user instructs the system not to use optimized steering, even if
the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
steering in Default A0 DMFS mode).

Default - this mode is implicitly choosed. In this mode, if the FW
supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
work at Disable A0 DMFS mode.

Under SRIOV configuration, when the A0 steering mode is enabled,
older guest VF drivers who aren't using the RX QP allocation flag
(MLX4_RESERVE_A0_QP) will get a QP from the general range and
fail when attempting to register a steering rule. To avoid that,
the PF context behaviour is changed once on A0 static mode, to
require support for the allocation flag in VF drivers too.

In order to enable A0 steering, we use log_num_mgm_entry_size param.
If the value of the parameter is not positive, we treat the absolute
value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
bit field enables static A0 steering.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:36 -05:00
Matan Barak
431df8c7e9 net/mlx4: Refactor QUERY_PORT
Currently QUERY_PORT is done as a part of QUERY_DEV_CAP firmware command.

Since we would like to use it without querying all device capabilities,
extract this part to be a function of its own.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
579d059bd2 net/mlx4_core: Add explicit error message when rule doesn't meet configuration
When a given flow steering rule is invalid in respect to the current
steering configuration, print the correct error message to the system log.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
d57febe1a4 net/mlx4: Add A0 hybrid steering
A0 hybrid steering is a form of high performance flow steering.
By using this mode, mlx4 cards use a fast limited table based steering,
in order to enable fast steering of unicast packets to a QP.

In order to implement A0 hybrid steering we allocate resources
from different zones:
(1) General range
(2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.

When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
we try hard to allocate the QP from range (2). Otherwise, we try hard not
to allocate from this  range. However, when the system is pushed to its
limits and one needs every resource, the allocator uses every region it can.

Meaning, when we run out of raw-eth qps, the allocator allocates from the
general range (and the special-A0 area is no longer active). If we run out
of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
is also exhausted, the allocator will allocate from the general range
(and the A0 region is no longer active).

Note that if a raw-eth qp is allocated from the general range, it attempts
to allocate the range such that bits 6 and 7 (blueflame bits) in the
QP number are not set.

When the feature is used in SRIOV, the VF has to notify the PF what
kind of QP attributes it needs. In order to do that, along with the
"Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
to the combination of these bits, the PF tries to allocate a suitable QP.

In order to maintain backward compatibility (with older PFs), the PF
notifies which QP attributes it supports via QUERY_FUNC_CAP command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
7a89399ffa net/mlx4: Add mlx4_bitmap zone allocator
The zone allocator is a mechanism which manages a few mlx4_bitmaps.

When allocating a resource, the user indicates the desired zone of
which this resource will be allocated from. If possible, the resource
will be allocated from this zone. Otherwise, the resource will be
allocated from a less-than, equal-to, higher-than priority zone,
according to the desired zone's properties with that respective
allocation order.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Dotan Barak
ab256e5ad0 net/mlx4: Add a check if there are too many reserved QPs
The number of reserved QPs is affected both from the firmware and
from the driver's requirements. This patch adds a check that
validates that this number is indeed feasable.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Eugenia Emantayev
ddae0349fd net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
   and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0]  - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
3dca0f42c7 net/mlx4_core: Use tasklet for user-space CQ completion events
Previously, we've fired all our completion callbacks straight from our ISR.

Some of those callbacks were lightweight (for example, mlx4_en's and
IPoIB napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over those
events could be so long, that we'll get into a hard lockup by the system
watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive completion
event to a per-EQ list and schedule a tasklet. In the tasklet context
we loop over all the CQs in the list and invoke the user callback.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Or Gerlitz
383677da43 net/mlx4_core: Mask out host side virtualization features for guests
When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
need not be told that host side virtualization features such as VST, FSM
(MAC anti-spoofing) and running > 80 VFs are supported by the device.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Or Gerlitz
c58942f252 net/mlx4_en: Set csum level for encapsulated packets
This was dropped by mistake for the napi_gro_frags flow, fix that.

Fixes: dd65beac48 ('net/mlx4_en: Extend usage of napi_gro_frags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Eyal Perry
947cbb0ac2 net/mlx4_en: Support for configurable RSS hash function
The ConnectX HW is capable of using one of the following hash functions:
Toeplitz and an XOR hash function. This patch extends the implementation
of the mlx4_en driver set/get_rxfh callbacks to support getting and
setting the RSS hash function used by the device.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:07:10 -05:00
Eyal Perry
892311f66f ethtool: Support for configurable RSS hash function
This patch extends the set/get_rxfh ethtool-options for getting or
setting the RSS hash function.

It modifies drivers implementation of set/get_rxfh accordingly.

This change also delegates the responsibility of checking whether a
modification to a certain RX flow hash parameter is supported to the
driver implementation of set_rxfh.

User-kernel API is done through the new hfunc bitmask field in the
ethtool_rxfh struct. A bit set in the hfunc field is corresponding to an
index in the new string-set ETH_SS_RSS_HASH_FUNCS.

Got approval from most of the relevant driver maintainers that their
driver is using Toeplitz, and for the few that didn't answered, also
assumed it is Toeplitz.

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Cc: Prashant Sreedharan <prashant@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
Cc: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Matthew Vick <matthew.vick@intel.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Mitch Williams <mitch.a.williams@intel.com>
Cc: Amir Vadai <amirv@mellanox.com>
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Shradha Shah <sshah@solarflare.com>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: "VMware, Inc." <pv-drivers@vmware.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 21:07:10 -05:00
Eli Cohen
28c167fa8f net/mlx5_core: Add more supported devices
Add ConnectX-4LX to the list of supported devices as well as their virtual
functions.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:55 -05:00
Majd Dibbiny
6b60d5e221 net/mlx5_core: Clear outbox of dealloc uar
The outbox should be cleared before executing the command.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:55 -05:00
Eli Cohen
ab62924ec2 net/mlx5_core: Print resource number on QP/SRQ async events
Useful for debugging purposes.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:54 -05:00
Eli Cohen
2d446d18aa net/mlx5_core: Fix command queue size enforcement
Command queue descriptor page size is 4KB and not the page size used by the
kernel.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:54 -05:00
Eli Cohen
3a9e161a59 net/mlx5_core: Fix min vectors value in mlx5_enable_msix
mlx5 requires at least one interrupt vector for completions so fix the minvec
argument to pci_enable_msix_range() accordingly.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:54 -05:00
Eli Cohen
f66f049fb7 net/mlx5_core: Request the mlx5 IB module on driver load
Call request module on mlx5_ib so it will be available for applications
requiring it, such as installers that require boot over IB.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-08 20:45:54 -05:00
Jiri Pirko
02637fce3e net: rename netdev_phys_port_id to more generic name
So this can be reused for identification of other "items" as well.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Thomas Graf <tgraf@suug.ch>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-02 20:01:19 -08:00
David S. Miller
60b7379dc5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-29 20:47:48 -08:00
Jack Morgenstein
2d5c57d7fb net/mlx4_core: Limit count field to 24 bits in qp_alloc_res
Some VF drivers use the upper byte of "param1" (the qp count field)
in mlx4_qp_reserve_range() to pass flags which are used to optimize
the range allocation.

Under the current code, if any of these flags are set, the 32-bit
count field yields a count greater than 2^24, which is out of range,
and this VF fails.

As these flags represent a "best-effort" allocation hint anyway, they may
safely be ignored. Therefore, the PF driver may simply mask out the bits.

Fixes: c82e9aa0a8 "mlx4_core: resource tracking for HCA resources used by guests"
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-26 12:04:49 -05:00
Eric Dumazet
bd635c354d mlx4: fix mlx4_en_set_rxfh()
mlx4_en_set_rxfh() can crash if no RSS indir table is provided.

While we are at it, allow RSS key to be changed with ethtool -X

Tested:

myhost:~# cat /proc/sys/net/core/netdev_rss_key
b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e:c5:5d:4d:8c:22:67:30🆎8a:6e:c3:6a

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
b6:89:91:f3:b2:c3:c2:90:11:e8:ce:45:e8:a9:9d:1c:f2:f6:d4:53:61:8b:26:3a:b3:9a:57:97:c3:b6:79:4d:2e:d9:66:5c:72:ed:b6:8e

myhost:~# ethtool -X eth0 hkey \
03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f

myhost:~# ethtool -x eth0
RX flow hash indirection table for eth0 with 8 RX ring(s):
    0:      0     1     2     3     4     5     6     7
RSS hash key:
03:0e:e2:43:fa:82:0e:73:14:2d:c0:68:21:9e:82:99:b9:84:d0:22:e2:b3:64:9f:4a:af:00:fa:cc:05:b4:4a:17:05:14:73:76:58:bd:2f

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: b9d1ab7eb4 ("mlx4: use netdev_rss_key_fill() helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-23 13:49:12 -05:00
David S. Miller
1459143386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ieee802154/fakehard.c

A bug fix went into 'net' for ieee802154/fakehard.c, which is removed
in 'net-next'.

Add build fix into the merge from Stephen Rothwell in openvswitch, the
logging macros take a new initial 'log' argument, a new call was added
in 'net' so when we merge that in here we have to explicitly add the
new 'log' arg to it else the build fails.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 22:28:24 -05:00
Saeed Mahameed
312df74c71 net/mlx4_en: mlx4_en_set_settings() always fails when autoneg is set
Fix ethtool set settings to not check AUTONEG_ENABLE

mlx4_en_set_settings should not check if cmd->autoneg == AUTONEG_ENABLE,
cmd->autoneg can be enabled by default and this check will fail other settings requests.
mlx4_en driver doesn't support changing autoneg value, but shouldn't fail the request
in case cmd->autoneg was set.

Fixes: d48b3ab ("net/mlx4_en: Use PTYS register to set ethtool settings (Speed)")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 15:06:30 -05:00
Al Viro
914efb02be mlx4: don't duplicate kvfree()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:58:18 -05:00
Al Viro
479163f460 mlx5: don't duplicate kvfree()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-21 14:58:18 -05:00
Or Gerlitz
9737c6ab7a net/mlx4_en: Add VXLAN ndo calls to the PF net device ops too
This is currently missing, which results in a crash when one attempts
to set VXLAN tunnel over the mlx4_en when acting as PF.

	[ 2408.785472] BUG: unable to handle kernel NULL pointer dereference at (null)
	[...]
	[ 2408.994104] Call Trace:
	[ 2408.996584]  [<ffffffffa021f7f5>] ? vxlan_get_rx_port+0xd6/0x103 [vxlan]
	[ 2409.003316]  [<ffffffffa021f71f>] ? vxlan_lowerdev_event+0xf2/0xf2 [vxlan]
	[ 2409.010225]  [<ffffffffa0630358>] mlx4_en_start_port+0x862/0x96a [mlx4_en]
	[ 2409.017132]  [<ffffffffa063070f>] mlx4_en_open+0x17f/0x1b8 [mlx4_en]

While here, make sure to invoke vxlan_get_rx_port() only when VXLAN
offloads are actually enabled and not when they are only supported.

Reported-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-19 15:11:09 -05:00
Eric Dumazet
b9d1ab7eb4 mlx4: use netdev_rss_key_fill() helper
Use of well known RSS key increases attack surface.
Switch to a random one, using generic helper so that all
ports share a common key.

Also provide ethtool -x support to fetch RSS key

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-16 15:59:13 -05:00
Joe Stringer
956bdab2e4 net/mlx4_en: Implement ndo_gso_check()
Use vxlan_gso_check() to advertise offload support for this NIC.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 17:12:48 -05:00
David S. Miller
076ce44825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/chelsio/cxgb4vf/sge.c
	drivers/net/ethernet/intel/ixgbe/ixgbe_phy.c

sge.c was overlapping two changes, one to use the new
__dev_alloc_page() in net-next, and one to use s->fl_pg_order in net.

ixgbe_phy.c was a set of overlapping whitespace changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-14 01:01:12 -05:00
Matan Barak
de966c5928 net/mlx4_core: Support more than 64 VFs
We now allow up to 126 VFs. Note though that certain firmware
versions only allow up to 80 VFs. Moreover, old HCAs only support 64 VFs.
In these cases, we limit the maximum number of VFs to 64.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:22 -05:00
Matan Barak
7ae0e400cd net/mlx4_core: Flexible (asymmetric) allocation of EQs and MSI-X vectors for PF/VFs
Previously, the driver queried the firmware in order to get the number
of supported EQs. Under SRIOV, since this was done before the driver
notified the firmware how many VFs it actually needs, the firmware had
to take into account a worst case scenario and always allocated four EQs
per VF, where one was used for events while the others were used for completions.

Now, when the firmware supports the asymmetric allocation scheme, denoted
by exposing num_sys_eqs > 0 (--> MLX4_DEV_CAP_FLAG2_SYS_EQS), we use the
QUERY_FUNC command to query the firmware before enabling SRIOV. Thus we
can get more EQs and MSI-X vectors per function.

Moreover, when running in the new firmware/driver mode, the limitation
that the number of EQs should be a power of two is lifted.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:21 -05:00
Matan Barak
e8c4265bea net/mlx4_core: Add QUERY_FUNC firmware command
QUERY_FUNC firmware command could be used in order to query the
number of EQs, reserved EQs, etc for a specific function.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:19 -05:00
Matan Barak
a0eacca948 net/mlx4_core: Refactor mlx4_load_one
Refactor mlx4_load_one, as a preparation step for a new and
more complicated load function. The goal is to support both
newer firmware that required init_hca to be done before
enable_sriov and legacy firmwares that requires things to
be done the other way around.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:18 -05:00
Matan Barak
ffc39f6d6f net/mlx4_core: Refactor mlx4_cmd_init and mlx4_cmd_cleanup
Refactoring mlx4_cmd_init and mlx4_cmd_cleanup such that partial init
and cleanup are possible. After this refactoring, calling mlx4_cmd_init
several times is safe.

This is necessary in the VF init flow when mlx4_init_hca returns -EACCESS,
we need to issue cleanup and re-attempt to call it with the slave flag.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:17 -05:00
Matan Barak
225c6c8c6b net/mlx4_core: Use correct variable type for mlx4_slave_cap
We've used an incorrect type for the loop counter and the
mlx4_QUERY_FUNC_CAP function. The current input modifier
is either a port or a boolean.
Since the number of ports is always a positive value < 255,
we should use u8 instead of an integer with casting.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:16 -05:00
Matan Barak
7c68dd435b net/mlx4_core: Fix wrong reading of reserved_eqs
We mistakenly read the reserved_eqs field as a standard
numeric value rather than a log2 value.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13 15:16:15 -05:00
Or Gerlitz
f4a1edd561 net/mlx4_en: Advertize encapsulation offloads features only when VXLAN tunnel is set
Currenly we only support Large-Send and TX checksum offloads for
encapsulated traffic of type VXLAN. We must make sure to advertize
these offloads up to the stack only when VXLAN tunnel is set.

Failing to do so, would mislead the the networking stack to assume
that the driver can offload the internal TX checksum for GRE packets
and other buggy schemes.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:24:45 -05:00
Shani Michaeli
f8c6455bb0 net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE
When processing received traffic, pass CHECKSUM_COMPLETE status to the
stack, with calculated checksum for non TCP/UDP packets (such
as GRE or ICMP).

Although the stack expects checksum which doesn't include the pseudo
header, the HW adds it. To address that, we are subtracting the pseudo
header checksum from the checksum value provided by the HW.

In the IPv6 case, we also compute/add the IP header checksum which
is not added by the HW for such packets.

Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:20:02 -05:00
Shani Michaeli
dd65beac48 net/mlx4_en: Extend usage of napi_gro_frags
We can call napi_gro_frags for all the received traffic regardless
of the checksum status. Specifically, received packets whose status
is CHECKSUM_NONE (and soon to be added CHECKSUM_COMPLETE)
are eligible for napi_gro_frags as well.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-11 13:20:02 -05:00
Eric Dumazet
2e1af7d74f mlx4: restore conditional call to napi_complete_done()
After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.

This added extra interrupt pressure, this patch restores proper
behavior.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 1a28817282 ("mlx4: use napi_complete_done()")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 21:09:03 -05:00
Eric Dumazet
1a28817282 mlx4: use napi_complete_done()
To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().

Tested:
 Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)

Without this feature, we send back about 305,000 ACK per second.

GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)

Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)

Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.

Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811269.80 305732.30 1199462.57  19705.72      0.00
0.00      0.50

B:~# echo 2000 >/sys/class/net/eth0/gro_flush_timeout

B:~# sar -n DEV 1 10 | grep eth0 | tail -1
Average:         eth0 811577.30  19230.80 1199916.51   1239.80      0.00
0.00      0.50

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-10 12:05:59 -05:00
David S. Miller
4e84b496fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
Eli Cohen
364d1798ef net/mlx5_core: Fix race on driver load
When events arrive at driver load, the event handler gets called even before
the spinlock and list are initialized. Fix this by moving the initialization
before EQs creation.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 16:40:36 -05:00
Eli Cohen
a158906dd7 net/mlx5_core: Fix race in create EQ
After the EQ is created, it can possibly generate interrupts and the interrupt
handler is referencing eq->dev. It is therefore required to set eq->dev before
calling request_irq() so if an event is generated before request_irq() returns,
we will have a valid eq->dev field.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 16:40:35 -05:00
Matan Barak
d475c95b4b net/mlx4_core: Add retrieval of CONFIG_DEV parameters
Add code to issue CONFIG_DEV "get" firmware command.

This command is used in order to obtain certain parameters used for
supporting various RX checksumming options and vxlan UDP port.

The GET operation is allowed for VFs too.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Shani Michaeli <shanim@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:14 -05:00
Ido Shamay
1ab25f86c4 net/mlx4_en: Add __GFP_COLD gfp flags in alloc_pages
Needed in order to get cache cold pages (L3 flushed) for HW scatter.

Otherwise memory may flush those entries when the packet comes from
PCI, causing back pressure resulting in BW decrease.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:13 -05:00
Ido Shamay
5f6e980080 net/mlx4_en: Remove RX buffers alignment to IP_ALIGN
When IP_ALIGN has a non zero value, hardware will write to a non aligned
address. The only reader from this address is when copying the header
from the first frag into the linear buffer (further access to the IP
address will be from the linear buffer, in which the headers are
aligned). Since the penalty of non align access by the hardware is
greater than the software memcpy, changing the frag_align to always be 0.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:13 -05:00
Amir Vadai
0a98455666 net/mlx4_core: Protect port type setting by mutex
We need to protect set_port_type() for concurrency, as the sysfs code could
call it from mutliple contexts in parallel.

The port_mutex is not enough because we need to protect from concurrent
modification of 'info' and stopping of the port sensing work.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:13 -05:00
Saeed Mahameed
6e80669998 net/mlx4_core: Prevent VF from changing port configuration
Added wrapper to the ACCESS_REG command for handling guest HW
registers access, preventing write operations, but do allow reads.

This will prevent SRIOV guests to change port PTYS configuration,
such as speed/advertised link modes.

Fixes: adbc7ac5c1 ('net/mlx4_core: Introduce ACCESS_REG CMD [...]')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 12:28:13 -05:00
David S. Miller
55b42b5ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/marvell.c

Simple overlapping changes in drivers/net/phy/marvell.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-01 14:53:27 -04:00
Or Gerlitz
571e1b2c7a mlx4: Avoid leaking steering rules on flow creation error flow
If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a906 "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:48:58 -04:00
Or Gerlitz
a4f2dacbf2 net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN
For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0cc ('net/mlx4_en: Add netdev support for TCP/IP
		     offloads of vxlan tunneling')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 19:48:58 -04:00
Eric Dumazet
477b35b44f mlx4: use napi_schedule_irqoff()
mlx4_en_rx_irq() and mlx4_en_tx_irq() run from hard interrupt context.

They can use napi_schedule_irqoff() instead of napi_schedule()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-By: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:50:47 -04:00
Amir Vadai
d5ec899adb net/mlx4_en: Report actual number of rings in indirection table
Hardware requires the number of rings in indirection table to be a power
of 2. When setting number of channels to a non power of 2 number,
indirection table is using only the closest power of 2 rings.
Report this number in 'ethtool -x' and not the total number of rx rings.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Eugenia Emantayev
207af6c507 net/mlx4_en: Move spinlocks and work initalizations to beginning of init_netdev
Upon failures, destroy_netdev is called, and spinlocks/works must be
initialized before calling it. Otherwise kernel panic may occur.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Ido Shamay
f4a3675158 net/mlx4_en: Call napi_synchronize on stop_port
This is instead of calling the actual implementation of
napi_synchronize, for better encapsulation.

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Jack Morgenstein
c2a3d4b4ca net/mlx4_en: Cleanups suggested by clang static checker
clang flagged the following. All are actually cosmetic cleanups, not really bugs:

drivers/net/ethernet/mellanox/mlx4/en_main.c:233:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;
                ^     ~~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_main.c:293:3: warning: Value stored to 'err' is never read
                err = -ENOMEM;

drivers/net/ethernet/mellanox/mlx4/en_netdev.c:648:16: warning: Assigned value is garbage or undefined
        entry->reg_id = reg_id;
                      ^ ~~~~~~
drivers/net/ethernet/mellanox/mlx4/en_netdev.c:659:2: warning: Function call argument is an uninitialized value
        mlx4_en_uc_steer_release(priv, priv->dev->dev_addr, *qpn, reg_id);
(NOTE: reg_id is only used in the device-managed flow steering path, in which is it always initialized.
 This is not a bug. Cleanup here is therefore cosmetic only).

drivers/net/ethernet/mellanox/mlx4/en_rx.c:122:3: warning: Value stored to 'frag_info' is never read
                frag_info = &priv->frag_info[i];
                ^           ~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Saeed Mahameed
537f6f951e net/mlx4_en: Add ethtool support for [rx|tx]vlan offload set to OFF/ON
Move mlx4_en_reset_config to en_netdev.c as it now serves more general purpose.
Add support for turning OFF/ON the rx/tx vlan offlad.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Saeed Mahameed
7787fa661b net/mlx4_en: Add support for setting rxvlan offload OFF/ON
Rename mlx4_en_timestamp_config to mlx4_en_reset_config and extend it to support
choosing RX vlan offload configuration.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:01 -04:00
Saeed Mahameed
d48b3ab4c0 net/mlx4_en: Use PTYS register to set ethtool settings (Speed)
Added Support to set speed or advertised link modes via ethtool:
ethtool -s <ifname> [speed <speed>] [advertise <link modes>]

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
2c76267943 net/mlx4_en: Use PTYS register to query ethtool settings
- If dev cap MLX4_DEV_CAP_FLAG2_ETH_PROT_CTRL is ON, query PTYS register to fill ethtool settings.
else use default values.
- Use autoneg port cap and dev backplane autoneg cap to reprort autoneg interface capbilities.
- Fix typo in mlx4_en_port_state struct field (transciver to transceiver).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
dcf972a334 ethtool, net/mlx4_en: Add 100M, 20G, 56G speeds ethtool reporting support
Added 100M, 20G and 56G ethtool speed reporting support.
Update mlx4_en_test_speed self test with the new speeds.

Defined new link speeds in include/uapi/linux/ethtool.h:
+#define SPEED_20000	20000
+#define SPEED_40000	40000
+#define SPEED_56000	56000

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
a53e3e8c1d net/mlx4_core: Add ethernet backplane autoneg device capability
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
adbc7ac5c1 net/mlx4_core: Introduce ACCESS_REG CMD and eth_prot_ctrl dev cap
Adding ACCESS REG mlx4 command and use it to implement Query method for
PTYS (Port Type and Speed Register).
Query and store eth_prot_ctrl dev cap.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
7202da8b7f ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support
Added support for get_module_info/get_module_eeprom ethtool support for cable info reading.

Added new cable types enum in include/uapi/linux/ethtool.h for ethtool use.
+#define ETH_MODULE_SFF_8636            0x3
+#define ETH_MODULE_SFF_8636_LEN        256
+#define ETH_MODULE_SFF_8436            0x4
+#define ETH_MODULE_SFF_8436_LEN        256

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:18:00 -04:00
Saeed Mahameed
32a173c7f9 net/mlx4_core: Introduce mlx4_get_module_info for cable module info reading
Added new MAD_IFC command to read cable module info with attribute id (0xFF60).
Update include/linux/mlx4/device.h with function declaration (mlx4_get_module_info)
and the needed defines/enums for future use.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-28 17:17:59 -04:00
Eli Cohen
bf1bac5b78 net/mlx4_core: Call synchronize_irq() before freeing EQ buffer
After moving the EQ ownership to software effectively destroying it, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26 22:46:04 -04:00
Eli Cohen
96e4be06cb net/mlx5_core: Call synchronize_irq() before freeing EQ buffer
After destroying the EQ, the object responsible for generating interrupts, call
synchronize_irq() to ensure that any handler routines running on other CPU
cores finish execution. Only then free the EQ buffer. This patch solves a very
rare case when we get panic on driver unload.
The same thing is done when we destroy a CQ which is one of the sources
generating interrupts. In the case of CQ we want to avoid completion handlers
on a CQ that was destroyed. In the case we do the same to avoid receiving
asynchronous events after the EQ has been destroyed and its buffers freed.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-26 22:46:03 -04:00
Eric Dumazet
98226208c8 mlx4: fix race accessing page->_count
This is illegal to use atomic_set(&page->_count, ...) even if we 'own'
the page. Other entities in the kernel need to use get_page_unless_zero()
to get a reference to the page before testing page properties, so we could
loose a refcount increment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-10 15:37:28 -04:00
Linus Torvalds
35a9ad8af0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Most notable changes in here:

   1) By far the biggest accomplishment, thanks to a large range of
      contributors, is the addition of multi-send for transmit.  This is
      the result of discussions back in Chicago, and the hard work of
      several individuals.

      Now, when the ->ndo_start_xmit() method of a driver sees
      skb->xmit_more as true, it can choose to defer the doorbell
      telling the driver to start processing the new TX queue entires.

      skb->xmit_more means that the generic networking is guaranteed to
      call the driver immediately with another SKB to send.

      There is logic added to the qdisc layer to dequeue multiple
      packets at a time, and the handling mis-predicted offloads in
      software is now done with no locks held.

      Finally, pktgen is extended to have a "burst" parameter that can
      be used to test a multi-send implementation.

      Several drivers have xmit_more support: i40e, igb, ixgbe, mlx4,
      virtio_net

      Adding support is almost trivial, so export more drivers to
      support this optimization soon.

      I want to thank, in no particular or implied order, Jesper
      Dangaard Brouer, Eric Dumazet, Alexander Duyck, Tom Herbert, Jamal
      Hadi Salim, John Fastabend, Florian Westphal, Daniel Borkmann,
      David Tat, Hannes Frederic Sowa, and Rusty Russell.

   2) PTP and timestamping support in bnx2x, from Michal Kalderon.

   3) Allow adjusting the rx_copybreak threshold for a driver via
      ethtool, and add rx_copybreak support to enic driver.  From
      Govindarajulu Varadarajan.

   4) Significant enhancements to the generic PHY layer and the bcm7xxx
      driver in particular (EEE support, auto power down, etc.) from
      Florian Fainelli.

   5) Allow raw buffers to be used for flow dissection, allowing drivers
      to determine the optimal "linear pull" size for devices that DMA
      into pools of pages.  The objective is to get exactly the
      necessary amount of headers into the linear SKB area pre-pulled,
      but no more.  The new interface drivers use is eth_get_headlen().
      From WANG Cong, with driver conversions (several had their own
      by-hand duplicated implementations) by Alexander Duyck and Eric
      Dumazet.

   6) Support checksumming more smoothly and efficiently for
      encapsulations, and add "foo over UDP" facility.  From Tom
      Herbert.

   7) Add Broadcom SF2 switch driver to DSA layer, from Florian
      Fainelli.

   8) eBPF now can load programs via a system call and has an extensive
      testsuite.  Alexei Starovoitov and Daniel Borkmann.

   9) Major overhaul of the packet scheduler to use RCU in several major
      areas such as the classifiers and rate estimators.  From John
      Fastabend.

  10) Add driver for Intel FM10000 Ethernet Switch, from Alexander
      Duyck.

  11) Rearrange TCP_SKB_CB() to reduce cache line misses, from Eric
      Dumazet.

  12) Add Datacenter TCP congestion control algorithm support, From
      Florian Westphal.

  13) Reorganize sk_buff so that __copy_skb_header() is significantly
      faster.  From Eric Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1558 commits)
  netlabel: directly return netlbl_unlabel_genl_init()
  net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
  net: description of dma_cookie cause make xmldocs warning
  cxgb4: clean up a type issue
  cxgb4: potential shift wrapping bug
  i40e: skb->xmit_more support
  net: fs_enet: Add NAPI TX
  net: fs_enet: Remove non NAPI RX
  r8169:add support for RTL8168EP
  net_sched: copy exts->type in tcf_exts_change()
  wimax: convert printk to pr_foo()
  af_unix: remove 0 assignment on static
  ipv6: Do not warn for informational ICMP messages, regardless of type.
  Update Intel Ethernet Driver maintainers list
  bridge: Save frag_max_size between PRE_ROUTING and POST_ROUTING
  tipc: fix bug in multicast congestion handling
  net: better IFF_XMIT_DST_RELEASE support
  net/mlx4_en: remove NETDEV_TX_BUSY
  3c59x: fix bad split of cpu_to_le32(pci_map_single())
  net: bcmgenet: fix Tx ring priority programming
  ...
2014-10-08 21:40:54 -04:00
Eric Dumazet
535114539b net: add netdev_txq_bql_{enqueue, complete}_prefetchw() helpers
Add two helpers so that drivers do not have to care of BQL being
available or not.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jim Davis <jim.epost@gmail.com>
Fixes: 29d40c9032 ("net/mlx4_en: Use prefetch in tx path")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-08 16:08:04 -04:00
Linus Torvalds
28596c9722 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull "trivial tree" updates from Jiri Kosina:
 "Usual pile from trivial tree everyone is so eagerly waiting for"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Remove MN10300_PROC_MN2WS0038
  mei: fix comments
  treewide: Fix typos in Kconfig
  kprobes: update jprobe_example.c for do_fork() change
  Documentation: change "&" to "and" in Documentation/applying-patches.txt
  Documentation: remove obsolete pcmcia-cs from Changes
  Documentation: update links in Changes
  Documentation: Docbook: Fix generated DocBook/kernel-api.xml
  score: Remove GENERIC_HAS_IOMAP
  gpio: fix 'CONFIG_GPIO_IRQCHIP' comments
  tty: doc: Fix grammar in serial/tty
  dma-debug: modify check_for_stack output
  treewide: fix errors in printk
  genirq: fix reference in devm_request_threaded_irq comment
  treewide: fix synchronize_rcu() in comments
  checkstack.pl: port to AArch64
  doc: queue-sysfs: minor fixes
  init/do_mounts: better syntax description
  MIPS: fix comment spelling
  powerpc/simpleboot: fix comment
  ...
2014-10-07 21:16:26 -04:00
Eric Dumazet
fe971b95c2 net/mlx4_en: remove NETDEV_TX_BUSY
Drivers should avoid NETDEV_TX_BUSY as much as possible.

They should stop the tx queue before qdisc even tries to push another
packet, to avoid requeues.

For a driver supporting skb->xmit_more, this is likely to be a prereq
anyway, otherwise we could have a tx deadlock : We need to force a
doorbell if TX ring is full.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:20:39 -04:00