Commit Graph

785507 Commits

Author SHA1 Message Date
Eric Dumazet
864e5c0907 tcp: optimize tcp internal pacing
When TCP implements its own pacing (when no fq packet scheduler is used),
it is arming high resolution timer after a packet is sent.

But in many cases (like TCP_RR kind of workloads), this high resolution
timer expires before the application attempts to write the following
packet. This overhead also happens when the flow is ACK clocked and
cwnd limited instead of being limited by the pacing rate.

This leads to extra overhead (high number of IRQ)

Now tcp_wstamp_ns is reserved for the pacing timer only
(after commit "tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh"),
we can setup the timer only when a packet is about to be sent,
and if tcp_wstamp_ns is in the future.

This leads to a ~10% performance increase in TCP_RR workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:42 -07:00
Eric Dumazet
7baf33bdac net_sched: sch_fq: no longer use skb_is_tcp_pure_ack()
With the new EDT model, sch_fq no longer has to special
case TCP pure acks, since their skb->tstamp will allow them
being sent without pacing delay.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:42 -07:00
Eric Dumazet
a7a2563064 tcp: mitigate scheduling jitter in EDT pacing model
In commit fefa569a9d ("net_sched: sch_fq: account for schedule/timers
drifts") we added a mitigation for scheduling jitter in fq packet scheduler.

This patch does the same in TCP stack, now it is using EDT model.

Note that this mitigation is valid for both external (fq packet scheduler)
or internal TCP pacing.

This uses the same strategy than the above commit, allowing
a time credit of half the packet currently sent.

Consider following case :

An skb is sent, after an idle period of 300 usec.
The air-time (skb->len/pacing_rate) is 500 usec
Instead of setting the pacing timer to now+500 usec,
it will use now+min(500/2, 300) -> now+250usec

This is like having a token bucket with a depth of half
an skb.

Tested:

tc qdisc replace dev eth0 root pfifo_fast

Before
netperf -P0 -H remote -- -q 1000000000 # 8000Mbit
540000 262144 262144    10.00    7710.43

After :
netperf -P0 -H remote -- -q 1000000000 # 8000 Mbit
540000 262144 262144    10.00    7999.75   # Much closer to 8000Mbit target

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:42 -07:00
Eric Dumazet
76a9ebe811 net: extend sk_pacing_rate to unsigned long
sk_pacing_rate has beed introduced as a u32 field in 2013,
effectively limiting per flow pacing to 34Gbit.

We believe it is time to allow TCP to pace high speed flows
on 64bit hosts, as we now can reach 100Gbit on one TCP flow.

This patch adds no cost for 32bit kernels.

The tcpi_pacing_rate and tcpi_max_pacing_rate were already
exported as 64bit, so iproute2/ss command require no changes.

Unfortunately the SO_MAX_PACING_RATE socket option will stay
32bit and we will need to add a new option to let applications
control high pacing rates.

State      Recv-Q Send-Q Local Address:Port             Peer Address:Port
ESTAB      0      1787144  10.246.9.76:49992             10.246.9.77:36741
                 timer:(on,003ms,0) ino:91863 sk:2 <->
 skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0)
 ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448
 rcvmss:536 advmss:1448
 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177
 segs_in:3916318 data_segs_out:177279175
 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2)
 send 28045.5Mbps lastrcv:73333
 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps
 busy:73333ms unacked:135 retrans:0/157 rcv_space:14480
 notsent:2085120 minrtt:0.013

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:42 -07:00
Eric Dumazet
5f6188a800 tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh
In EDT design, I made the mistake of using tcp_wstamp_ns
to store the last tcp_clock_ns() sample and to store the
pacing virtual timer.

This causes major regressions at high speed flows.

Introduce tcp_clock_cache to store last tcp_clock_ns().
This is needed because some arches have slow high-resolution
kernel time service.

tcp_wstamp_ns is only updated when a packet is sent.

Note that we can remove tcp_mstamp in the future since
tcp_mstamp is essentially tcp_clock_cache/1000, so the
apparent socket size increase is temporary.

Fixes: 9799ccb0e9 ("tcp: add tcp_wstamp_ns socket field")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:56:41 -07:00
Li RongQing
1a3aea2534 net: bridge: fix a possible memory leak in __vlan_add
After per-port vlan stats, vlan stats should be released
when fail to add vlan

Fixes: 9163a0fc1f ("net: bridge: add support for per-port vlan stats")
CC: bridge@lists.linux-foundation.org
cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:53:52 -07:00
David Howells
bc0e7cf433 rxrpc: Add /proc/net/rxrpc/peers to display peer list
Add /proc/net/rxrpc/peers to display the list of peers currently active.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:52:58 -07:00
Wei Yongjun
d275444cc3 fore200e: fix missing unlock on error in bsq_audit()
Add the missing unlock before return from function bsq_audit()
in the error handling case.

Fixes: 1d9d8be917 ("fore200e: check for dma mapping failures")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:48:35 -07:00
David S. Miller
65f2247d61 Merge branch 'bnxt_en-Add-support-for-new-57500-chips'
Michael Chan says:

====================
bnxt_en: Add support for new 57500 chips.

This patch-set is larger than normal because I wanted a complete series
to add basic support for the new 57500 chips.  The new chips have the
following main differences compared to legacy chips:

1. Requires the PF driver to allocate DMA context memory as a backing
store.
2. New NQ (notification queue) for interrupt events.
3. One or more CP rings can be associated with an NQ.
4. 64-bit doorbells.

Most other structures and firmware APIs are compatible with legacy
devices with some exceptions.  For example, ring groups are no longer
used and RSS table format has changed.

The patch-set includes the usual firmware spec. update, some refactoring
and restructuring, and adding the new code to add basic support for the
new class of devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
1ab968d2f1 bnxt_en: Add PCI ID for BCM57508 device.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
0fcec9854a bnxt_en: Add new NAPI poll function for 57500 chips.
Add a new poll function that polls for NQ events.  If the NQ event is
a CQ notification, we locate the CP ring from the cq_handle and call
__bnxt_poll_work() to handle RX/TX events on the CP ring.

Add a new has_more_work field in struct bnxt_cp_ring_info to indicate
budget has been reached.  __bnxt_poll_cqs_done() is called to update or
ARM the CP rings if budget has not been reached or not.  If budget
has been reached, the next bnxt_poll_p5() call will continue to poll
from the CQ rings directly.  Otherwise, the NQ will be ARMed for the
next IRQ.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
3675b92fa7 bnxt_en: Refactor bnxt_poll_work().
Separate the CP ring polling logic in bnxt_poll_work() into 2 separate
functions __bnxt_poll_work() and __bnxt_poll_work_done().  Since the logic
is separated, we need to add tx_pkts and events fields to struct bnxt_napi
to keep track of the events to handle between the 2 functions.  We also
add had_work_done field to struct bnxt_cp_ring_info to indicate whether
some work was performed on the CP ring.

This is needed to better support the 57500 chips.  We need to poll up to
2 separate CP rings before we update or ARM the CP rings on the 57500 chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
58590c8d90 bnxt_en: Add coalescing setup for 57500 chips.
On legacy chips, the CP ring may be shared between RX and TX and so only
setup the RX coalescing parameters in such a case.  On 57500 chips, we
always have a dedicated CP ring for TX so we can always set up the
TX coalescing parameters in bnxt_hwrm_set_coal().

Also, the min_timer coalescing parameter applies to the NQ on the new
chips and a separate firmware call needs to be made to set it up.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
e44758b78a bnxt_en: Use bnxt_cp_ring_info struct pointer as parameter for RX path.
In the RX code path, we current use the bnxt_napi struct pointer to
identify the associated RX/CP rings.  Change it to use the struct
bnxt_cp_ring_info pointer instead since there are now up to 2
CP rings per MSIX.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:33 -07:00
Michael Chan
7b3af4f75b bnxt_en: Add RSS support for 57500 chips.
RSS context allocation and RSS indirection table setup are very different
on the new chip.  Refactor bnxt_setup_vnic() to call 2 different functions
to set up RSS for the vnic based on chip type.  On the new chip, the
number of RSS contexts and the indirection table size depends on the
number of RX rings.  Each indirection table entry is also different
on the new chip since ring groups are no longer used.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
44c6f72a4c bnxt_en: Increase RSS context array count and skip ring groups on 57500 chips.
On the new 57500 chips, we need to allocate one RSS context for every
64 RX rings.  In previous chips, only one RSS context per vnic is
required regardless of the number of RX rings.  So increase the max
RSS context array count to 8.

Hardware ring groups are not used on the new chips.  Note that the
software ring group structure is still maintained in the driver to
keep track of the rings associated with the vnic.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
3e08b1841b bnxt_en: Allocate/Free CP rings for 57500 series chips.
On the new 57500 chips, we allocate/free one CP ring for each RX ring or
TX ring separately.  Using separate CP rings for RX/TX is an improvement
as TX events will no longer be stuck behind RX events.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
23aefdd761 bnxt_en: Modify bnxt_ring_alloc_send_msg() to support 57500 chips.
Firmware ring allocation semantics are slightly different for most
ring types on 57500 chips.  Allocation/deallocation for NQ rings are
also added for the new chips.

A CP ring handle is also added so that from the NQ interrupt event,
we can locate the CP ring.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
2c61d2117e bnxt_en: Add helper functions to get firmware CP ring ID.
On the new 57500 chips, getting the associated CP ring ID associated with
an RX ring or TX ring is different than before.  On the legacy chips,
we find the associated ring group and look up the CP ring ID.  On the
57500 chips, each RX ring and TX ring has a dedicated CP ring even if
they share the MSIX.  Use these helper functions at appropriate places
to get the CP ring ID.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
50e3ab7836 bnxt_en: Allocate completion ring structures for 57500 series chips.
On 57500 chips, the original bnxt_cp_ring_info struct now refers to the
NQ.  bp->cp_nr_rings refer to the number of NQs on 57500 chips.  There
are now 2 pointers for the CP rings associated with RX and TX rings.
Modify bnxt_alloc_cp_rings() and bnxt_free_cp_rings() accordingly.

With multiple CP rings per NAPI, we need to add a pointer in
bnxt_cp_ring_info struct to point back to the bnxt_napi struct.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
41e8d79837 bnxt_en: Modify the ring reservation functions for 57500 series chips.
The ring reservation functions have to be modified for P5 chips in the
following ways:

- bnxt_cp_ring_info structs map to internal NQs as well as CP rings.
- Ring groups are not used.
- 1 CP ring must be available for each RX or TX ring.
- number of RSS contexts to reserve is multiples of 64 RX rings.
- RFS currently not supported.

Also, RX AGG rings are only used for jumbo frames, so we need to
unconditionally call bnxt_reserve_rings() in __bnxt_open_nic()
to see if we need to reserve AGG rings in case MTU has changed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
9c1fabdf42 bnxt_en: Adjust MSIX and ring groups for 57500 series chips.
Store the maximum MSIX capability in PCIe config. space earlier.  When
we call firmware to query capability, we need to compare the PCIe
MSIX max count with the firmware count and use the smaller one as
the MSIX count for 57500 (P5) chips.

The new chips don't use ring groups.  But previous chips do and
the existing logic limits the available rings based on resource
calculations including ring groups.  Setting the max ring groups to
the max rx rings will work on the new chips without changing the
existing logic.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
697197e5a1 bnxt_en: Re-structure doorbells.
The 57500 series chips have a new 64-bit doorbell format.  Use a new
bnxt_db_info structure to unify the new and the old 32-bit doorbells.
Add a new bnxt_set_db() function to set up the doorbell addreses and
doorbell keys ahead of time.  Modify and introduce new doorbell
helpers to help abstract and unify the old and new doorbells.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
e38287b72e bnxt_en: Add 57500 new chip ID and basic structures.
57500 series is a new chip class (P5) that requires some driver changes
in the next several patches.  This adds basic chip ID, doorbells, and
the notification queue (NQ) structures.  Each MSIX is associated with an
NQ instead of a CP ring in legacy chips.  Each NQ has up to 2 associated
CP rings for RX and TX.  The same bnxt_cp_ring_info struct will be used
for the NQ.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
1b9394e5a2 bnxt_en: Configure context memory on new devices.
Call firmware to configure the DMA addresses of all context memory
pages on new devices requiring context memory.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
98f04cf0f1 bnxt_en: Check context memory requirements from firmware.
New device requires host context memory as a backing store.  Call
firmware to check for context memory requirements and store the
parameters.  Allocate host pages accordingly.

We also need to move the call bnxt_hwrm_queue_qportcfg() earlier
so that all the supported hardware queues and the IDs are known
before checking and allocating context memory.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:32 -07:00
Michael Chan
66cca20abc bnxt_en: Add new flags to setup new page table PTE bits on newer devices.
Newer chips require the PTU_PTE_VALID bit to be set for every page
table entry for context memory and rings.  Additional bits are also
required for page table entries for all rings.  Add a flags field to
bnxt_ring_mem_info struct to specify these additional bits to be used
when setting up the pages tables as needed.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
Michael Chan
6fe1988685 bnxt_en: Refactor bnxt_ring_struct.
Move the DMA page table and vmem fields in bnxt_ring_struct to a new
bnxt_ring_mem_info struct.  This will allow context memory management
for a new device to re-use some of the existing infrastructure.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
Michael Chan
74706afa71 bnxt_en: Update interrupt coalescing logic.
New firmware spec. allows interrupt coalescing parameters, such as
maximums, timer units, supported features to be queried.  Update
the driver to make use of the new call to query these parameters
and provide the legacy defaults if the call is not available.

Replace the hard-coded values with these parameters.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
Michael Chan
1dfddc41ae bnxt_en: Add maximum extended request length fw message support.
Support the max_ext_req_len field from the HWRM_VER_GET_RESPONSE.
If this field is valid and greater than the mailbox size, use the
short command format to send firmware messages greater than the
mailbox size.  Newer devices use this method to send larger messages
to the firmware.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
Michael Chan
36e53349b6 bnxt_en: Add additional extended port statistics.
Latest firmware spec. has some additional rx extended port stats and new
tx extended port stats added.  We now need to check the size of the
returned rx and tx extended stats and determine how many counters are
valid.  New counters added include CoS byte and packet counts for rx
and tx.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
Michael Chan
31d357c069 bnxt_en: Update firmware interface spec. to 1.10.0.3.
Among the new changes are trusted VF support, 200Gbps support, and new
API to dump ring information on the new chips.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:44:31 -07:00
David S. Miller
9e983c5898 Merge branch 'selftests-pmtu-Add-test-choice-and-captures'
Stefano Brivio says:

====================
selftests: pmtu: Add test choice and captures

This series adds a couple of features useful for debugging: 1/2
allows selecting single tests and 2/2 adds optional traffic
captures.

Semantics for current invocation of test script are preserved.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:37:28 -07:00
Stefano Brivio
bb059fb204 selftests: pmtu: Add optional traffic captures for single tests
If --trace is passed as an option and tcpdump is available,
capture traffic for all relevant interfaces to per-test pcap
files named <test>_<interface>.pcap.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:37:28 -07:00
Stefano Brivio
55bbc8ff49 selftests: pmtu: Allow selection of single tests
As number of tests is growing, it's quite convenient to allow
single tests to be run.

Display usage when the script is run with any invalid argument,
keep existing semantics when no arguments are passed so that
automated runs won't break.

Instead of just looping on the list of requested tests, if any,
check first that they exist, and go through them in a nested
loop to keep the existing way to display test descriptions.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:37:28 -07:00
Heiner Kallweit
2527e4037f r8169: remove unneeded call to netif_stop_queue in rtl8169_net_suspend
netif_device_detach() stops all tx queues already, so we don't need
this call.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:35:03 -07:00
Heiner Kallweit
34bc009543 r8169: simplify rtl8169_set_magic_reg
Simplify this function, no functional change intended.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:34:34 -07:00
Arnd Bergmann
44eb385bc5 octeontx2-af: remove unused cgx_fwi_link_change
The newly added driver causes a warning about a function that is
not used anywhere:

drivers/net/ethernet/marvell/octeontx2/af/cgx.c:320:12: error: 'cgx_fwi_link_change' defined but not used [-Werror=unused-function]

Remove it for now, until a user gets added. If we want to use this
function from another module, we also need a declaration in a header
file, which is currently missing, so it would have to change anyway.

Fixes: 1463f382f5 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:31:54 -07:00
Ryan C Goodfellow
5948185b97 nfp: devlink port split support for 1x100G CXP NIC
This commit makes it possible to use devlink to split the 100G CXP
Netronome into two 40G interfaces. Currently when you ask for 2
interfaces, the math in src/nfp_devlink.c:nfp_devlink_port_split
calculates that you want 5 lanes per port because for some reason
eth_port.port_lanes=10 (shouldn't this be 12 for CXP?). What we really
want when asking for 2 breakout interfaces is 4 lanes per port. This
commit makes that happen by calculating based on 8 lanes if 10 are
present.

Signed-off-by: Ryan C Goodfellow <rgoodfel@isi.edu>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Greg Weeks <greg.weeks@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:29:55 -07:00
David S. Miller
ca0f32d5d9 Merge branch 'dpaa2-eth-code-cleanup'
Ioana Ciornei says:

====================
dpaa2-eth: code cleanup

There are no functional changes in this patch set, only some cleanup
changes such as: unused parameters, uninitialized variables and
unnecessary Kconfig dependencies.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:20 -07:00
Ioana Radulescu
b948c8c6a7 dpaa2-eth: remove unused FD field
According to the hardware ArchDef, the PTV1 field in FD[CTRL]
is ignored by WRIOP, so setting it for Tx FDs is pointless.

Remove all references to it from the code.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ioana Ciornei
b00c898c00 dpaa2-eth: mark unused parameter in dpaa2_eth_tx_conf
The ch parameter is never used in the dpaa2_eth_tx_conf function but
since its prototype must match the type defined in the consume field of
struct dpaa2_eth_fq, just mark it as __always_unused.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ioana Ciornei
fdb6ca9e46 dpaa2-eth: remove unused priv parameter
The priv parameter is never used in the build_linear_skb and
drain_channel function. Remove it from the function definitions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ioana Ciornei
85b7a342ba dpaa2-eth: fix uninitialized variable warnings
All 3 cases of possible uninitialized variables are false
positives since they are used only as output parameters.
Nonetheless, fix the warnings.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ioana Ciornei
3233c1514f dpaa2-eth: make dpaa2_eth_set_dist_key static
The dpaa2_eth_set_dist_key function is only used in a single file.
Make it static.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ioana Radulescu
b12cef51b5 dpaa2-eth: Fix Kconfig dependencies
Both ARCH_LAYERSCAPE and COMPILE_TEST dependencies are already implied
through the FSL_MC_BUS dep, so there's no need to state it explicitly.

Also, the fsl-mc bus depends on COMPILE_TEST only for some
architectures (arm, arm64, ppc, x86), so it's not correct to
claim build support unconditionally.

Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:23:19 -07:00
Ivan Khoronzhuk
5b3a5a14f8 net: ethernet: ti: cpsw: use for mcast entries only host port
In dual-emac mode the cpsw driver sends directed packets, that means
that packets go to the directed port, but an ALE lookup is performed
to determine untagged egress only. It means that on tx side no need
to add port bit for ALE mcast entry mask, and basically ALE entry
for port identification is needed only on rx side.

So, add only host port in dual_emac mode as used directed
transmission, and no need in one more port. For single port boards
and switch mode all ports used, as usual, so no changes for them.
Also it simplifies farther changes.

In other words, mcast entries for dual-emac should behave exactly
like unicast. It also can help avoid leaking packets between ports
with same vlan on h/w level if ports could became members of same vid.

So now, for instance, if mcast address 33:33:00:00:00:01 is added then
entries in ALE table:

vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x1
vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x1

Instead of:
vid = 1, addr = 33:33:00:00:00:01, port_mask = 0x3
vid = 2, addr = 33:33:00:00:00:01, port_mask = 0x5

With the same considerations, set only host port for unregistered
mcast for dual-emac mode in case of IFF_ALLMULTI is set, exactly like
it's done in cpsw_ale_set_allmulti().

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:22:12 -07:00
David S. Miller
ba722f9b6f Merge branch 'net-ethernet-ti-cpsw-fix-mcast-packet-lost'
Ivan Khoronzhuk says:

====================
net: ethernet: ti: cpsw fix mcast packet lost

The patchset omits redundant refresh of mcast address table and
prevents mcast packet lost.

Based on net-next/master
tested on am572x evm
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:21:28 -07:00
Ivan Khoronzhuk
5da1948969 net: ethernet: ti: cpsw: fix lost of mcast packets while rx_mode update
Whenever kernel or user decides to call rx mode update, it clears
every multicast entry from forwarding table and in some time adds
it again. This time can be enough to drop incoming multicast packets.

That's why clear only staled multicast entries and update or add new
one afterwards.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:21:28 -07:00
Ivan Khoronzhuk
58bdeac8b0 net: ethernet: ti: cpsw_ale: use const for API having pointer on mac address
It allows to use function under callbacks with same const qualifier of
mac address for farther changes.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:21:27 -07:00