Commit Graph

648063 Commits

Author SHA1 Message Date
Andrew Lunn
5952758101 dsa: mv88e6xxx: Optimise atu_get
Lookup in the ATU can be performed starting from a given MAC
address. This is faster than starting with the first possible MAC
address and iterating all entries.

Entries are returned in numeric order. So if the MAC address returned
is bigger than what we are searching for, we know it is not in the
ATU.

Using the benchmark provided by Volodymyr Bendiuga
<volodymyr.bendiuga@gmail.com>,

https://www.spinics.net/lists/netdev/msg411550.html

on an Marvell Armada 370 RD, the test to add a number of static fdb
entries went from 1.616531 seconds to 0.312052 seconds.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 16:34:34 -05:00
Arjun V
8eb9f2f9e4 cxgb4: Support compressed error vector for T6
t6fw-1.15.15.0 enabled compressed error vector in cpl_rx_pkt for T6.
Updating driver to take care of these changes.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Arjun V <arjun@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 14:01:53 -05:00
David S. Miller
cee3548db1 Merge branch 'sh_eth-intrs-cleanup'
Sergei Shtylyov says:

====================
sh_eth: E-MAC interrupt handler cleanups

   Here's a set of 3 patches against DaveM's 'net-next.git' repo. I'm cleaning
up the E-MAC interrupt handling with the main goal of factoring out the E-MAC
interrupt handler into a separate function.

[1/3] sh_eth: handle only enabled E-MAC interrupts
[2/3] sh_eth: no need for *else* after *goto*
[3/3] sh_eth: factor out sh_eth_emac_interrupt()
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:47:55 -05:00
Sergei Shtylyov
9b39f05ce8 sh_eth: factor out sh_eth_emac_interrupt()
The E-MAC interrupt (EESR.ECI) is not always caused  by an error condition,
so  it really shouldn't be handled by sh_eth_error(). Factor out the E-MAC
interrupt handler, sh_eth_emac_interrupt(),  removing the ECI bit from the
EESR's values throughout the driver...

Update Cogent Embedded's copyright and clean up the whitespace in Renesas'
copyright, while at it...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:47:55 -05:00
Sergei Shtylyov
1940f24076 sh_eth: no need for *else* after *goto*
Well, checkpatch.pl complains about *else* after *return* and *break* but
not after *goto*... and it probably should have complained about the code
in sh_eth_error().  Win couple LoCs by removing that *else*. :-)

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:47:54 -05:00
Sergei Shtylyov
4063469971 sh_eth: handle only enabled E-MAC interrupts
The driver should only handle the enabled E-MAC interrupts, like it does
for the E-DMAC interrupts since commit 3893b27345 ("sh_eth: workaround
for spurious ECI interrupt"),  so mask ECSR with  ECSIPR when reading it
in sh_eth_error().

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:47:54 -05:00
Mahesh Bandewar
009146d117 ipvlan: assign unique dev-id for each slave device.
IPvlan setup uses one mac-address (of master). The IPv6 link-local
addresses are derived using the mac-address on the link. Lack of
dev-ids makes these link-local addresses same for all slaves including
that of master device. dev-ids are necessary to add differentiation
when L2 address is shared.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:30:42 -05:00
Vivien Didelot
a896eee334 net: dsa: remove out label in dsa_switch_setup_one
The "out" label in dsa_switch_setup_one() is useless, thus remove it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:29:27 -05:00
Prasad Kanneganti
9feb16ae0b liquidio: remove PTP support in 23XX adapters
liquidio driver incorrectly indicates that PTP is supported in 23XX
adapters; this patch fixes that.  PTP is supported in 66XX and 68XX
adapters, and the driver correctly indicates that.

Signed-off-by: Prasad Kanneganti <prasad.kanneganti@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Derek Chickles <derek.chickles@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:28:27 -05:00
David S. Miller
ac4340fc3c net: Assert at build time the assumptions we make about the CMSG header.
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).

Otherwise there are missing adjustments in the various calculations
that parse and build these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:24:19 -05:00
yuan linyu
1ff8cebf49 scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:04:37 -05:00
Yotam Gigi
ec2507d2a3 net/sched: cls_matchall: Fix error path
Fix several error paths in matchall:
 - Release reference to actions in case the hardware fails offloading
   (relevant to skip_sw only)
 - Fix error path in case tcf_exts initialization/validation fail

Fixes: bf3994d2ed ("net/sched: introduce Match-all classifier")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 18:58:27 -05:00
David S. Miller
6f63db82d6 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
10GbE Intel Wired LAN Driver Updates 2017-01-03

This series contains updates to ixgbe and ixgbevf only.

Emil fixes ixgbe to use the NVM settings for FEC, so do not override the
settings.  Fixed the indirection table for x550, where newer devices can
support up to 64 RSS queues.  Extends the rtnl_lock() to protect the call
to netif_device_detach() and ixgbe_clear_interrupt_scheme() to avoid
against a double free WARN and/or a BUG in free_msi_irqs().  Fixed AER
error handling by making sure that the driver frees the IRQs in
ixgbe_io_error_detected() when responding to a PCIe AER error, and to
restore them when the interface recovers.

Tony updates the driver to report the driver version to the firmware using
the host interface command for x550 devices.  Fixed the PHY reset check
for x550em_ext_t PHY type.  Fixed bounds checking for x540 devices to
ensure the index is valid for the LED function.  Fixed the BaseT adapters
which support 100Mb capability and were not reporting the capability.

Ken Cox adds a missing check for the trusted bit before trying to set the
MACVLAN MAC address.

Yusuke Suzuki fixes an issue with 82599 and x540 devices where receive
timestamps were not working becase the bitwise operation for RX_HWSTAMP
falg was incorrect.

Don ensures that x553 KR/KX devices correctly advertise link speeds.
Adds the mailbox message to allow for VF promiscuous mode support.

Mark fixes two issues with EEPROM access, where the semaphore was not
being held until the entire response was read and the acquiring/releasing
of the semaphore is slow.  Cleaned up firmware version method and
functions which are no longer used.  Added new interfaces for firmware
commands to access some new PHYs.

v2: fixed tab indentation in patch 12 and mis-spelled words in patch 15
    based on feedback from Sergei Shtylyov and Rami Rosen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 16:23:01 -05:00
Don Skidmore
07eea570ac ixgbe: Add PF support for VF promiscuous mode
This patch extends the xcast mailbox message to include support for
unicast promiscuous mode.  To allow a VF to enter this mode the PF
must be in promiscuous mode.

A later patch will add the support needed in the VF driver (ixgbevf)

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:40 -08:00
Don Skidmore
41e544cdad ixgbevf: Add support for VF promiscuous mode
This patch extends the mailbox message to allow for VF promiscuous
mode support.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:40 -08:00
Mark Rustad
b3eb4e1860 ixgbe: Implement support for firmware-controlled PHYs
Implement support for devices that have firmware-controlled PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:39 -08:00
Mark Rustad
12c78ef098 ixgbe: Implement firmware interface to access some PHYs
Implement new interface for firmware commands to access some PHYs.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:39 -08:00
Mark Rustad
d2a10ae72e ixgbe: Remove unused firmware version functions and method
The firmware version method and functions are not used anywhere, so
remove them all.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:38 -08:00
Mark Rustad
3efa9ed260 ixgbe: Fix issues with EEPROM access
There are two problems with EEPROM access. One is that it needs to
hold the semaphore until the entire response is read or else the
response can be corrupted by other firmware accesses. The second
problem is that acquiring and releasing the semaphore is slow, so
it should be taken and released once when multiple EEPROM accesses
will be done.

Both of these issues can be solved by adding a new function,
ixgbe_hic_unlocked, to issue firmware commands that will assume
that the caller has acquired the needed semaphore.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:38 -08:00
Don Skidmore
54f6d4c424 ixgbe: Configure advertised speeds correctly for KR/KX backplane
This patch ensures that the advertised link speeds are configured
for X553 KR/KX backplane.  Without this patch the link remains at
1G when resuming from low power after being downshifted by LPLU.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:38 -08:00
Emil Tantilov
26403b7fde ixgbevf: restore hw_addr on resume or error
Restore adapter->hw.hw_addr after handling an error, or a resume
operation to make sure we can access the registers.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:38 -08:00
Yusuke Suzuki
aeb4c73100 ixgbe: Fix incorrect bitwise operations of PTP Rx timestamp flags
Rx timestamp does not work on 82599 and X540 because bitwise operation
of RX_HWTSTAMP flags is incorrect and ixgbe_ptp_rx_hwtstamp() is never
called. This patch fixes it to enable Rx timestamp on 82599 and X540.

Without this fix:
ptp4l[278.730]: selected /dev/ptp8 as PTP clock
ptp4l[278.733]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.733]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[278.834]: port 1: received SYNC without timestamp
ptp4l[278.835]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[279.834]: port 1: received SYNC without timestamp
ptp4l[280.834]: port 1: received SYNC without timestamp
ptp4l[281.834]: port 1: received SYNC without timestamp
ptp4l[282.834]: port 1: received SYNC without timestamp
ptp4l[282.835]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[282.835]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[283.834]: port 1: received SYNC without timestamp

With this fix:
ptp4l[239.154]: selected /dev/ptp8 as PTP clock
ptp4l[239.157]: port 1: INITIALIZING to LISTENING on INITIALIZE
ptp4l[239.157]: port 0: INITIALIZING to LISTENING on INITIALIZE
ptp4l[240.989]: port 1: new foreign master 1c3947.fffe.60f9cc-1
ptp4l[244.989]: selected best master clock 1c3947.fffe.60f9cc
ptp4l[244.989]: port 1: LISTENING to UNCALIBRATED on RS_SLAVE
ptp4l[246.977]: master offset -899583339542096 s0 freq      +0 path delay     16222
ptp4l[247.977]: master offset -899583339617265 s1 freq  -75169 path delay     16177
ptp4l[248.977]: master offset       -130 s2 freq  -75299 path delay     16177
ptp4l[248.977]: port 1: UNCALIBRATED to SLAVE on MASTER_CLOCK_SELECTED
ptp4l[249.977]: master offset         -9 s2 freq  -75217 path delay     16177
ptp4l[250.977]: master offset         88 s2 freq  -75123 path delay     16132

Fixes: a9763f3cb5 ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Yusuke Suzuki <yus-suzuki@uf.jp.nec.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:37 -08:00
Emil Tantilov
b19cf6eea9 ixgbevf: fix AER error handling
Make sure that we free the IRQs in ixgbevf_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbevf_remove() after a failed recovery from
AER error because the interrupts were not freed.

Also moved the down and free functions into ixgbevf_close_suspend()
same as with ixgbe.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:37 -08:00
Emil Tantilov
126db13fa0 ixgbe: fix AER error handling
Make sure that we free the IRQs in ixgbe_io_error_detected() when
responding to an PCIe AER error and also restore them when the
interface recovers from it.

Previously it was possible to trigger BUG_ON() check in free_msix_irqs()
in the case where we call ixgbe_remove() after a failed recovery from
AER error because the interrupts were not freed.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:37 -08:00
Ken Cox
a9d2d53a78 ixgbe: test for trust in macvlan adjustments for VF
There are two methods for setting mac addresses in a Macvlan, that
differentiate themselves in the function macvlan_set_mac_Address.
If the macvlan mode is passthru, then we use the dev_set_mac_address
method, otherwise we use the dev_uc api via macvlan_sync_addresses.
The latter method (which would stem from using any non-passthru mode,
like bridge, or vepa), calls down into the driver in a path that terminates
in ixgbevf_set_uc_addr_vf, which sends a IXGBE_VF_SET_MACVLAN message,
which causes the pf to spawn the noted error message.  This occurs because
it appears that the guest is trying to delete the mac address of the macvlan
before adding another.

The other path in macvlan_set_mac_address uses dev_set_mac_address, which
calls into ixgbevf_set_mac which uses the IXGBE_VF_SET_MAC_ADDR to the
pf to set the macvlan mac address.

The discrepancy here is in the handlers.  The handler function for
IXGBE_VF_SET_MAC_ADDR (ixgbe_set_vf_mac_addr) has a check for
the vfinfo[].trusted bit to allow the operation if the vf is trusted.
In comparison, the IXGBE_VF_SET_MACVLAN message handler
(ixgbe_set_vf_macvlan_msg) has no such check of the trusted bit.

Signed-off-by: Ken Cox <jkc@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:36 -08:00
Emil Tantilov
2dad7b2775 ixgbevf: handle race between close and suspend on shutdown
When an interface is part of a namespace it is possible that
ixgbevf_close() may be called while ixgbevf_suspend() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs()

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and check for !netif_device_present()
to avoid entering close while in suspend.

Also added rtnl locks to ixgbevf_queue_reset_subtask().

CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:36 -08:00
Emil Tantilov
f7f37e7ff2 ixgbe: handle close/suspend race with netif_device_detach/present
When an interface is part of a namespace it is possible that
ixgbe_close() may be called while __ixgbe_shutdown() is running
which ends up in a double free WARN and/or a BUG in free_msi_irqs().

To handle this situation we extend the rtnl_lock() to protect the
call to netif_device_detach() and ixgbe_clear_interrupt_scheme()
in __ixgbe_shutdown() and check for netif_device_present()
to avoid clearing the interrupts second time in ixgbe_close();

Also extend the rtnl lock in ixgbe_resume() to netif_device_attach().

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:36 -08:00
Tony Nguyen
f215266470 ixgbe: Fix reporting of 100Mb capability
BaseT adapters that are capable of supporting 100Mb are not reporting this
capability.  This patch corrects the reporting so that 100Mb is shown as
supported on those adapters.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:35 -08:00
Tony Nguyen
3f0d646b72 ixgbe: Reduce I2C retry count on X550 devices
A retry count of 10 is likely to run into problems on X550 devices that
have to detect and reset unresponsive CS4227 devices. So, reduce the I2C
retry count to 3 for X550 and above. This should avoid any possible
regressions in existing devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:35 -08:00
Tony Nguyen
910c9c0f59 ixgbe: Add bounds check for x540 LED functions
This is an extension of commit 003287e0f0 ("ixgbevf: Correct parameter
sent to LED function"); add bounds checking to x540 functions to ensure the
index is valid.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:35 -08:00
Emil Tantilov
2bf1a87b90 ixgbe: add mask for 64 RSS queues
The indirection table was reported incorrectly for X550 and newer
where we can support up to 64 RSS queues.

Reported-by Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:34 -08:00
Tony Nguyen
5c092749e3 ixgbe: Fix check for ixgbe_phy_x550em_ext_t reset
The generic PHY reset check we had previously is not sufficient for the
ixgbe_phy_x550em_ext_t PHY type.  Check 1.CC02.0 instead - same as
ixgbe_init_ext_t_x550().

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:34 -08:00
Tony Nguyen
cb8e051446 ixgbe: Report driver version to firmware for x550 devices
Some x550 devices require the driver version reported to its firmware; this
patch sends the driver version string to the firmware through the host
interface command for x550 devices.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:34 -08:00
Emil Tantilov
1fe954b209 ixgbe: do not disable FEC from the driver
FEC is configured by the NVM and the driver should not be
overriding it.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-01-03 13:03:33 -08:00
David S. Miller
a88eb6becf Merge branch 'tipc-link-starvation'
Jon Maloy says:

====================
tipc: improve interaction socket-link

We fix a very real starvation problem that may occur when a link
encounters send buffer congestion. At the same time we make the
interaction between the socket and link layer simpler and more
consistent.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:06 -05:00
Jon Paul Maloy
365ad353c2 tipc: reduce risk of user starvation during link congestion
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
Jon Paul Maloy
4d8642d896 tipc: modify struct tipc_plist to be more versatile
During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
Jon Paul Maloy
8c44e1af16 tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
David S. Miller
aa276dd7b3 Merge branch 'TPACKET_V3-TX_RING-support'
Sowmini Varadhan says:

====================
TPACKET_V3 TX_RING support

This patch series allows an application to use a single PF_PACKET
descriptor and leverage the best implementations of TX_RING
and RX_RING that exist today.

Patch 1 adds the kernel/Documentation changes for TX_RING
support and patch2 adds the associated test case in selftests.

Changes since v2: additional sanity checks for setsockopt
input for TX_RING/TPACKET_V3. Refactored psock_tpacket.c
test code to avoid code duplication from V2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:00:28 -05:00
Sowmini Varadhan
fe878cad38 tools: test case for TPACKET_V3/TX_RING support
Add a test case and sample code for (TPACKET_V3, PACKET_TX_RING)

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:00:27 -05:00
Sowmini Varadhan
7f953ab2ba af_packet: TX_RING support for TPACKET_V3
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.

This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:00:27 -05:00
Edward Cree
e7072f6669 sfc-falcon: declare module version (same as ethtool drvinfo version)
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 10:54:40 -05:00
Edward Cree
14077e9ed3 sfc: declare module version (same as ethtool drvinfo version)
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 10:54:15 -05:00
Nikolay Aleksandrov
1708ebc963 ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 10:04:31 -05:00
Volodymyr Bendiuga
07b76ba9f8 dsa:mv88e6xxx: allow address 0x1 in smi_init
Some devices, such as the mv88e6097 do have ADDR[0] external and so it
is possible to configure the device to use SMI address 0x1. Remove the
restriction, as there are boards using this address.

Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 10:01:58 -05:00
Philippe Reynes
28fa4f308e net: freescale: dpaa: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 09:49:35 -05:00
David S. Miller
303b0d4be1 Merge branch 'for_4.11/net-next/rds_v3' of git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux
Santosh Shilimkar says:

====================
net: RDS updates

v2->v3:
- Re-based against latest net-next head.
- Dropped a user visible change after discussing with David Miller.
  It needs some more work to fully support old/new tools matrix.
- Addressed Dave's comment about bool usage in patch
  "RDS: IB: track and log active side..."

v1->v2:
Re-aligned indentation in patch 'RDS: mark few internal functions..."

Series consist of:
 - RDMA transport fixes for map failure, listen sequence, handler panic and
   composite message notification.
 - Couple of sparse fixes.
 - Message logging improvements for bind failure, use once mr semantics
   and connection remote address, active end point.
 - Performance improvement for RDMA transport by reducing the post send
   pressure on the queue and spreading the CQ vectors.
 - Useful statistics for socket send/recv usage and receive cache usage.
 - Additional RDS CMSG used by application to track the RDS message
   stages for certain type of traffic to find out latency spots.
   Can be enabled/disabled per socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 09:45:44 -05:00
Santosh Shilimkar
3289025aed RDS: add receive message trace used by application
Socket option to tap receive path latency in various stages
in nano seconds. It can be enabled on selective sockets using
using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return
the data to application with RDS_CMSG_RXPATH_LATENCY in defined
format. Scope is left to add more trace points for future
without need of change in the interface.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:59 -08:00
Avinash Repaka
f9fb69adb6 RDS: make message size limit compliant with spec
RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:57 -08:00
Venkat Venkatsubra
192a798f52 RDS: add stat for socket recv memory usage
Tracks the receive side memory added to scokets and removed from sockets.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:56 -08:00