Commit Graph

855619 Commits

Author SHA1 Message Date
David S. Miller
f7571cde6b Merge branch 'net-dsa-mv88e6xxx-avoid-some-redundant-VTU-operations'
Vivien Didelot says:

====================
net: dsa: mv88e6xxx: avoid some redundant VTU operations

The mv88e6xxx driver currently uses a mv88e6xxx_vtu_get wrapper to get a
single entry and uses a boolean to eventually initialize a fresh one.

However the fresh entry is only needed in one place and mv88e6xxx_vtu_getnext
is simple enough to call it directly. Doing so makes the code easier to read,
especially for the return code expected by switchdev to honor software VLANs.

In addition to not loading the VTU again when an entry is already correctly
programmed, this also allows to avoid programming the broadcast entries
again when updating a port's membership, from e.g. tagged to untagged.

This patch series removes the mv88e6xxx_vtu_get wrapper in favor of direct
calls to mv88e6xxx_vtu_getnext, and also renames the _mv88e6xxx_port_vlan_add
and _mv88e6xxx_port_vlan_del helpers using an old underscore prefix convention.

In case the port's membership is already correctly programmed in hardware,
the following debug message may be printed:

    [  745.989884] mv88e6085 2188000.ethernet-1:00: p4: already a member of VLAN 42
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:09 -04:00
Vivien Didelot
b1ac6fb440 net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_add
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read and
_mv88e6xxx_port_vlan_add is the only function requiring the preparation
of a new VLAN entry.

To simplify things up, remove the mv88e6xxx_vtu_get wrapper and
explicit the VLAN lookup in _mv88e6xxx_port_vlan_add. This rework
also avoids programming the broadcast entries again when changing a
port's membership, e.g. from tagged to untagged.

At the same time, rename the helper using an old underscore convention.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:09 -04:00
Vivien Didelot
5210989283 net: dsa: mv88e6xxx: call vtu_getnext directly in vlan_del
Wrapping mv88e6xxx_vtu_getnext makes the code less easy to read.
Explicit the call to mv88e6xxx_vtu_getnext in _mv88e6xxx_port_vlan_del
and the return value expected by switchdev in case of software VLANs.

At the same time, rename the helper using an old underscore convention.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:09 -04:00
Vivien Didelot
5ef8d249f8 net: dsa: mv88e6xxx: call vtu_getnext directly in db load/purge
mv88e6xxx_vtu_getnext is simple enough to call it directly in the
mv88e6xxx_port_db_load_purge function and explicit the return code
expected by switchdev for software VLANs when an hardware VLAN does
not exist.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:09 -04:00
Vivien Didelot
425d2d37ab net: dsa: mv88e6xxx: explicit entry passed to vtu_getnext
mv88e6xxx_vtu_getnext interprets two members from the input
mv88e6xxx_vtu_entry structure: the (excluded) vid member to start
the iteration from, and the valid argument specifying whether the VID
must be written or not (only required once at the start of a loop).

Explicit the assignation of these two fields right before calling
mv88e6xxx_vtu_getnext, as it is done in the mv88e6xxx_vtu_get wrapper.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:08 -04:00
Vivien Didelot
7095a4c497 net: dsa: mv88e6xxx: lock mutex in vlan_prepare
Lock the mutex in the mv88e6xxx_port_vlan_prepare function
called by the DSA stack, instead of doing it in the internal
mv88e6xxx_port_check_hw_vlan helper.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 16:43:08 -04:00
David S. Miller
a8e600e218 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-07-31

This series contains updates to ice driver only.

Paul adds support for reporting what the link partner is advertising for
flow control settings.

Jake fixes the hardware statistics register which is prone to rollover
since the statistic registers are either 32 or 40 bits wide, depending
on which register is being read.  So use a 64 bit software statistic to
store off the hardware statistics to track past when it rolls over.
Fixes an issue with the locking of the control queue, where locks were
being destroyed at run time.

Tony fixes an issue that was created when interrupt tracking was
refactored and the call to ice_vsi_setup_vector_base() was removed from
the PF VSI instead of the VF VSI.  Adds a check before trying to
configure a port to ensure that media is attached.

Brett fixes an issue in the receive queue configuration where prefena
(Prefetch Enable) was being set to 0 which caused the hardware to only
fetch descriptors when there are none free in the cache for a received
packet.  Updates the driver to only bump the receive tail once per
napi_poll call, instead of the current model of bumping the tail up to 4
times per napi_poll call.  Adds statistics for receive drops at the port
level to ethtool/netlink.  Cleans up duplicate code in the allocation of
receive buffer code.

Akeem updates the driver to ensure that VFs stay disabled until the
setup or reset is completed.  Modifies the driver to use the allocated
number of transmit queues per VSI to set up the scheduling tree versus
using the total number of available transmit queues.  Also fix the
driver to update the total number of configured queues, after a
successful VF request to change its number of queues before updating the
corresponding VSI for that VF.  Cleaned up unnecessary flags that are no
longer needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:41:43 -04:00
David S. Miller
9b59e39f09 Merge branch 'net-hns3-some-code-optimizations-bugfixes-features'
Huazhong Tan says:

====================
net: hns3: some code optimizations & bugfixes & features

This patch-set includes code optimizations, bugfixes and features for
the HNS3 ethernet controller driver.

[patch 01/12] adds support for reporting link change event.

[patch 02/12] adds handler for NCSI error.

[patch 03/12] fixes bug related to debugfs.

[patch 04/12] adds a code optimization for setting ring parameters.

[patch 05/12 - 09/12] adds some cleanups.

[patch 10/12 - 12/12] adds some patches related to reset issue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Huazhong Tan
012fcb52f6 net: hns3: activate reset timer when calling reset_event
When calling hclge_reset_event() within HCLGE_RESET_INTERVAL,
it returns directly now. If no one call it again, then the
error which needs a reset to fix it can not be fixed.

So this patch activates the reset timer for this case, and
adds checking in the end of the reset procedure to make this
error fixed earlier.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Huazhong Tan
72e2fb0799 net: hns3: clear reset interrupt status in hclge_irq_handle()
Currently, the reset interrupt is cleared in the reset task, which
is too late. Since, when the hardware finish the previous reset,
it can begin to do a new global/IMP reset, if this new coming reset
type is same as the previous one, the driver will clear them together,
then driver can not get that there is another reset, but the hardware
still wait for the driver to deal with the second one.

So this patch clears PF's reset interrupt status in the
hclge_irq_handle(), the hardware waits for handshaking from
driver before doing reset, so the driver and hardware deal with reset
one by one.

BTW, when VF doing global/IMP reset, it reads PF's reset interrupt
register to get that whether PF driver's re-initialization is done,
since VF's re-initialization should be done after PF's. So we add
a new command and a register bit to do that. When VF receive reset
interrupt, it sets up this bit, and PF finishes re-initialization
send command to clear this bit, then VF do re-initialization.

Fixes: 4ed340ab8f ("net: hns3: Add reset process in hclge_main")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Huazhong Tan
6b428b4fbf net: hns3: fix some reset handshake issue
Currently, the driver sets handshake status to tell the hardware
that the driver have downed the netdev and it can continue with
reset process. The driver will clear the handshake status when
re-initializing the CMDQ, and does not recover this status
when reset fail, which may cause the hardware to wait for
the handshake status to be set and not being able to continue
with reset process.

So this patch delays clearing handshake status just before UP,
and recovers this status when reset fail.

BTW, this patch adds a new function hclge(vf)_reset_handshake() to
deal with the reset handshake issue, and renames
HCLGE(VF)_NIC_CMQ_ENABLE to HCLGE(VF)_NIC_SW_RST_RDY which
represents this register bit more accurately.

Fixes: ada13ee3db ("net: hns3: add handshake with hardware while doing reset")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Guojia Liao
6e6e768063 net: hns3: rename a member in struct hclge_mac_ethertype_idx_rd_cmd
The member 'mac_add' defined in hclge_mac_ethertype_idx_rd_cmd
means MAC address, so 'mac_addr' is a better name for it.

Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Weihang Li
dbae56a33f net: hns3: simplify hclge_cmd_query_error()
The 4th and 5th parameter of hclge_cmd_query_error is useless, so this
patch removes them.

Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Yunsheng Lin
b6872fd361 net: hns3: minior error handling change for hclge_tm_schd_info_init
When hclge_tm_schd_info_update calls hclge_tm_schd_info_init to
initialize the schedule info, hdev->tm_info.num_pg and
hdev->tx_sch_mode is not changed, which makes the checking in
hclge_tm_schd_info_init unnecessary.

So this patch moves the hdev->tm_info.num_pg and hdev->tx_sch_mode
checking into hclge_tm_schd_init and changes the return type of
hclge_tm_schd_info_init from int to void.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Yunsheng Lin
a4ee7624c0 net: hns3: minor cleanup in hns3_clean_rx_ring
The unused_count variable is used to indicate how many
RX BD need attaching new buffer in hns3_clean_rx_ring,
and the clean_count variable has the similar meaning.

This patch removes the clean_count variable and use
unused_count to uniformly indicate the RX BD that need
attaching new buffer.

This patch also clean up some coding style related to
variable assignment in hns3_clean_rx_ring.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Jian Shen
6e4139f691 net: hns3: remove unnecessary variable in hclge_get_mac_vlan_cmd_status()
The local variable return_status in hclge_get_mac_val_cmd_status()
is useless. So this patch returns the error code directly, instead of
using this variable. Also, replace some '%d' with '%u' in
hclge_get_mac_val_cmd_status().

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:13 -04:00
Jian Shen
a723fb8efe net: hns3: refine for set ring parameters
Previously, when changing the ring parameters, we free the old
ring resources firstly, and then setup the new ring resources.
In some case of an memory allocation fail, there will be no
resources to use. This patch refines it by setup new ring
resources and free the old ring resources in order.

Also reduce the max ring BD number to 32760 according to UM.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:12 -04:00
Yufeng Mo
3f0f325309 net: hns3: do not query unsupported commands in debugfs
Some commands are not supported on DCB-unsupported ports.
This patch distinguishes these commands and does not query
unsupported commands in debugfs.

This patch also fix an error in the dump "qos buf cfg"
command in debugfs.

Fixes: 2849d4e7a1 ("net: hns3: Add "tc config" info query function")
Fixes: 7d9d7f8864 ("net: hns3: Add "qos buffer" config info query function")
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:12 -04:00
Huazhong Tan
b18bf305c4 net: hns3: add handler for NCSI error mailbox
When NCSI has HW error, the IMP will report this error to the driver
by sending a mailbox. After received this message, the driver should
assert a global reset to fix this kind of HW error.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:12 -04:00
Jian Shen
ed8fb4b262 net: hns3: add link change event report
Previously, PF updates link status per second. For some scenario,
it requires link down event being reported more quickly.
To solve it, firmware pushes the link change event to PF with
CMDQ message, and driver updates the link status directly.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Reviewed-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:32:12 -04:00
YueHaibing
0ae9fce32c net: phy: xgene: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
9d26cfa5b0 bcm63xx_enet: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
c792c0081d net: qcom/emac: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
566495de16 net: mediatek: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
4237678846 net: dsa: bcm_sf2: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
291f4b6de4 net: dsa: b53: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
6551c8c807 net: dsa: lantiq: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
YueHaibing
3230a55b36 mvpp2: use devm_platform_ioremap_resource() to simplify code
Use devm_platform_ioremap_resource() to simplify the code a bit.
This is detected by coccinelle.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-01 13:10:34 -04:00
Nikolay Aleksandrov
3247b27204 net: bridge: mcast: add delete due to fast-leave mdb flag
In user-space there's no way to distinguish why an mdb entry was deleted
and that is a problem for daemons which would like to keep the mdb in
sync with remote ends (e.g. mlag) but would also like to converge faster.
In almost all cases we'd like to age-out the remote entry for performance
and convergence reasons except when fast-leave is enabled. In that case we
want explicit immediate remote delete, thus add mdb flag which is set only
when the entry is being deleted due to fast-leave.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 19:13:40 -04:00
Lucas Bates
0eba31ef5c tc-testing: Clarify the use of tdc's -d option
The -d command line argument to tdc requires the name of a physical device
on the system where the tests will be run. If -d has not been used, tdc
will skip tests that require a physical device.

This patch is intended to better document what the -d option does and how
it is used.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 18:54:34 -04:00
David S. Miller
21947f467c mlx5-updates-2019-07-29
This series includes updates to mlx5 driver,
 1) Simplifications, cleanup and warning prints improvements
 
 2) From Vlad Buslov:
 Refactor mlx5 tc flow handling for unlocked execution (Part 1)
 
 Currently, all cls API hardware offloads driver callbacks require caller
 to hold rtnl lock when calling them. Cls API has already been updated to
 update software filters in parallel (on classifiers that support
 unlocked execution), however hardware offloads code still obtains rtnl
 lock before calling driver tc callbacks. This set implements partial
 support for unlocked execution that is leveraged by follow up
 refactorings in specific mlx5 tc subsystems and patch to cls API that
 implements API that allows drivers to register their callbacks as
 rtnl-unlocked.
 
 In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
 represent tc filter. Currently, code the structure itself and its
 handlers in both tc and eswitch layers do not implement any kind of
 synchronizations and assume external global synchronization provided by
 rtnl lock instead. Implement following changes to remove dependency on
 rtnl lock in flow handling code that are intended to be used a
 groundwork for following changes to provide fully rtnl-independent mlx5
 tc:
 
 - Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
   allow concurrent access from multiple tc and neigh update workqueue
   instances without introducing any additional locks specific to the
   structure. Its 'flags' field type is changed to atomic bitmask ops which
   is necessary for tc to interact with other concurrent tc instances or
   concurrent neigh update that need to skip flows that are not fully
   initialized (new INIT_DONE flow flag) and can change the flags
   according to neighbor state (flipping OFFLOADED flag).
 
 - Protect unready flows list by new uplink_priv->unready_flows_lock
   mutex.
 
 - Convert calls to netdev APIs that require rtnl lock in flow handling
   code to their rcu counterparts.
 
 - Modify eswitch code that is called from tc layer and assume implicit
   external synchronization to be concurrency safe: change
   esw->offloads.num_flows type to atomic integer and re-arrange
   esw->state_lock usage to protect additional data.
 
 Some of approaches to synchronizations presented in this patch set are
 quite complicated (lockless concurrent usage of data structures with rcu
 and reference counting, using fine-grained locking when necessary, retry
 mechanisms to handle concurrent insertion of another instance of data
 structure with same key, etc.). This is necessary to allow calling the
 firmware in parallel in most cases, which is the main motivation of this
 change since firmware calls are mach heavier operation than atomic
 operations, multitude of locks and potential multiple retries during
 concurrent accesses to same elements.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0/g+oACgkQSD+KveBX
 +j7vqAf/TXz9uGKkXR1EHfRMcrWIjQH6M3zKkrwo2K3wTwssi3ymCNfZH+VYkUwk
 LMrVnBQxHHPjuhUfmYfXbswVZiGm5Mul0zFNoeWChvcaEbQzsisHXCNNtzQFy9p2
 zQUmmvmvNNFQ0KPuwGuK/0rbUmXM3tXqWs2mtBsRLHxZUvqpGh7hETCFvZ0TnBaO
 EeFrWV6h8rfynBsRVeDuFGk5hDoo/ASEbWVTsocpPm6UPhIwKyH/Tbv4+wKZgZfr
 p/yPKSjwtQw9jPXj/lYxRXBbbZXey+BEptIurmwh3d8MLyrM4VElP6rgaoAb2Wzq
 ov5pnEGR5qsAINkOnImBZ4Mw4HN9WQ==
 =UKN2
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2019-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2019-07-29

This series includes updates to mlx5 driver,
1) Simplifications, cleanup and warning prints improvements

2) From Vlad Buslov:
Refactor mlx5 tc flow handling for unlocked execution (Part 1)

Currently, all cls API hardware offloads driver callbacks require caller
to hold rtnl lock when calling them. Cls API has already been updated to
update software filters in parallel (on classifiers that support
unlocked execution), however hardware offloads code still obtains rtnl
lock before calling driver tc callbacks. This set implements partial
support for unlocked execution that is leveraged by follow up
refactorings in specific mlx5 tc subsystems and patch to cls API that
implements API that allows drivers to register their callbacks as
rtnl-unlocked.

In mlx5 tc code mlx5e_tc_flow is the main structure that is used to
represent tc filter. Currently, code the structure itself and its
handlers in both tc and eswitch layers do not implement any kind of
synchronizations and assume external global synchronization provided by
rtnl lock instead. Implement following changes to remove dependency on
rtnl lock in flow handling code that are intended to be used a
groundwork for following changes to provide fully rtnl-independent mlx5
tc:

- Extend struct mlx5e_tc_flow with atomic reference counter and rcu to
  allow concurrent access from multiple tc and neigh update workqueue
  instances without introducing any additional locks specific to the
  structure. Its 'flags' field type is changed to atomic bitmask ops which
  is necessary for tc to interact with other concurrent tc instances or
  concurrent neigh update that need to skip flows that are not fully
  initialized (new INIT_DONE flow flag) and can change the flags
  according to neighbor state (flipping OFFLOADED flag).

- Protect unready flows list by new uplink_priv->unready_flows_lock
  mutex.

- Convert calls to netdev APIs that require rtnl lock in flow handling
  code to their rcu counterparts.

- Modify eswitch code that is called from tc layer and assume implicit
  external synchronization to be concurrency safe: change
  esw->offloads.num_flows type to atomic integer and re-arrange
  esw->state_lock usage to protect additional data.

Some of approaches to synchronizations presented in this patch set are
quite complicated (lockless concurrent usage of data structures with rcu
and reference counting, using fine-grained locking when necessary, retry
mechanisms to handle concurrent insertion of another instance of data
structure with same key, etc.). This is necessary to allow calling the
firmware in parallel in most cases, which is the main motivation of this
change since firmware calls are mach heavier operation than atomic
operations, multitude of locks and potential multiple retries during
concurrent accesses to same elements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 18:48:01 -04:00
Tony Nguyen
3015b8fcb6 ice: Bump version number
Update driver version to 0.7.5

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:41:09 -07:00
Akeem G Abodunrin
b67f25d76e ice: Remove flag to track VF interrupt status
As a result of refactoring of VF VSIs interrupts code, there is no
need to track its configuration status again with ICE_VF_STATE_CFG_INTR
flag - In fact, it is not being checked anywhere in the code right now, so
this patch removes the dead code as applicable to the flag.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:41:05 -07:00
Brett Creeley
ba880734ba ice: Remove unnecessary flag ICE_FLAG_MSIX_ENA
This flag is not needed and is called every time we re-enable interrupts
in the hotpath so remove it. Also remove ice_vsi_req_irq() because it
was a wrapper function for ice_vsi_req_irq_msix() whose sole purpose was
checking the ICE_FLAG_MSIX_ENA flag.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:41:01 -07:00
Akeem G Abodunrin
9921494463 ice: Don't return error for disabling LAN Tx queue that does exist
Since Tx rings are being managed by FW/NVM, Tx rings might have not been
set up or driver had already wiped them off - In that case, call to
disable LAN Tx queue is being returned as not in existence. This patch
makes sure we don't return unnecessary error for such scenario.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:57 -07:00
Brett Creeley
a1e9968593 ice: Remove duplicate code in ice_alloc_rx_bufs
Currently if the call to ice_alloc_mapped_page() fails we jump to the
no_buf label, possibly call ice_release_rx_desc(), and return true
indicating that there is more work to do. In the success case we just
fall out of the while loop, possibly call ice_alloc_mapped_page(), and
return false saying we exhausted cleaned_count. This flow can be
improved by breaking if ice_alloc_mapped_page() fails and then the flow
outside of the while loop is the same for the failure and success case.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:52 -07:00
Brett Creeley
56923ab664 ice: Add stats for Rx drops at the port level
Currently we are not reporting dropped counts at the port level to
ethtool or netlink. This was found when debugging Rx dropped issues
and the total packets sent did not equal the total packets received
minus the rx_dropped, which was very confusing. To determine dropped
counts at the port level we need to read the PRTRPB_RDPC register.
To fix reporting we will store the dropped counts in the PF's
rx_discards. This will be reported to netlink by storing it in the
PF VSI's rx_missed_errors signaling that the receiver missed the
packet. Also, we will report this to ethtool in the rx_dropped.nic
field.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:46 -07:00
Akeem G Abodunrin
66b29e7a88 ice: Update number of VF queue before setting VSI resources
In case there is a request from a VF to change its number of queues, and
the request was successful, we need to update number of queues
configured on the VF before updating corresponding VSI for that VF,
especially LAN Tx queue tree and TC update, otherwise, we would continued
to use old value of vf->num_vf_qs for allocated Tx/Rx queues...

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:42 -07:00
Akeem G Abodunrin
d5a4635917 ice: Set up Tx scheduling tree based on alloc VSI Tx queues
This patch uses allocated number of Tx queues per VSI to set up its
scheduling tree instead of using total number of available Tx queues.
Only PF VSIs have total number of allocated Tx queues equal to number
of available Tx queues, other VSIs have different number of queues
configured.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:35 -07:00
Brett Creeley
cb7db35641 ice: Only bump Rx tail and release buffers once per napi_poll
Currently we bump the Rx tail and release/give buffers to hardware every
16 descriptors. This causes us to bump Rx tail up to 4 times per
napi_poll call. Also we are always bumping tail on an odd index and this
is a problem because hardware ignores the lower 3 bits in the QRX_TAIL
register. This is making it so hardware sees tail bumps only every 8
descriptors. Instead lets only bump Rx tail once per napi_poll if
the value aligns with hardware's expectations of the lower 3 bits being
cleared. Also only release/give Rx buffers once per napi_poll call.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:30 -07:00
Akeem G Abodunrin
c7aeb4d1b9 ice: Disable VFs until reset is completed
This patch adds code to clear VFs enable status until reset is completed,
and Tx/Rx rings are setup. Without this patch, the code flow request Tx
queues to be disabled after reset, especially PFR - where VF VSI Tx rings
have already been wiped off in the NVM and result to adminq error based on
the call to disable Tx LAN queue in ice_reset_all_vfs function call.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Tony Nguyen
6d5999467d ice: Do not configure port with no media
The firmware reports an error when trying to configure a port with no
media. Instead of always configuring the port, check for media before
attempting to configure it. In the absence of media, turn off link and
poll for media to become available before re-enabling link.

Move ice_force_phys_link_state() up to avoid forward declaration.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Jacob Keller
5c91ecfda5 ice: separate out control queue lock creation
The ice_init_all_ctrlq and ice_shutdown_all_ctrlq functions create and
destroy the locks used to protect the send and receive process of each
control queue.

This is problematic, as the driver may use these functions to shutdown
and re-initialize the control queues at run time. For example, it may do
this in response to a device reset.

If the driver failed to recover from a reset, it might leave the control
queues offline. In this case, the locks will no longer be initialized.
A later call to ice_sq_send_cmd will then attempt to acquire a lock that
has been destroyed.

It is incorrect behavior to access a lock that has been destroyed.

Indeed, ice_aq_send_cmd already tries to avoid accessing an offline
control queue, but the check occurs inside the lock.

The root of the problem is that the locks are destroyed at run time.

Modify ice_init_all_ctrlq and ice_shutdown_all_ctrlq such that they no
longer create or destroy the locks.

Introduce new functions, ice_create_all_ctrlq and ice_destroy_all_ctrlq.
Call these functions in ice_init_hw and ice_deinit_hw.

Now, the control queue locks will remain valid for the life of the
driver, and will not be destroyed until the driver unloads.

This also allows removing a duplicate check of the sq.count and
rq.count values when shutting down the controlqs. The ice_shutdown_ctrlq
function already checks this value under the lock. Previously
commit dec64ff10e ("ice: use [sr]q.count when checking if queue is
initialized") needed this check to happen outside the lock, because it
prevented duplicate attempts at destroying the locks.

The driver may now safely use ice_init_all_ctrlq and
ice_shutdown_all_ctrlq while handling reset events, without causing the
locks to be invalid.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Brett Creeley
c31a5c25bb ice: Always set prefena when configuring an Rx queue
Currently we are always setting prefena to 0. This is causing the
hardware to only fetch descriptors when there are none free in the cache
for a received packet instead of prefetching when it has used the last
descriptor regardless of incoming packets. Fix this by allowing the
hardware to prefetch Rx descriptors.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Tony Nguyen
17bc6d0721 ice: Move vector base setup to PF VSI
When interrupt tracking was refactored, during rebuild, the call to
ice_vsi_setup_vector_base() was inadvertently removed from the PF VSI
instead of being removed from the VF VSI. During reset, the failure to
properly setup the vector base generates a call trace. Correct this so
that resets/rebuilds properly complete.

Fixes: cbe66bfee6 ("ice: Refactor interrupt tracking")
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Jacob Keller
36517fd397 ice: track hardware stat registers past rollover
Currently, ice_stat_update32 and ice_stat_update40 will limit the
value of the software statistic to 32 or 40 bits wide, depending on
which register is being read.

This means that if a driver is running for a long time, the displayed
software register values will roll over to zero at 40 bits or 32 bits.

This occurs because the functions directly assign the difference between
the previous value and current value of the hardware statistic.

Instead, add this value to the current software statistic, and then
update the previous value.

In this way, each time ice_stat_update40 or ice_stat_update32 are
called, they will increment the software tracking value by the
difference of the hardware register from its last read. The software
tracking value will correctly count up until it overflows a u64.

The only requirement is that the ice_stat_update functions be called at
least once each time the hardware register overflows.

While we're fixing ice_stat_update40, modify it to use rd64 instead of
two calls to rd32. Additionally, drop the now unnecessary hireg
function parameter.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
Paul Greenwalt
5a056cd7ea ice: add lp_advertising flow control support
Add support for reporting link partner advertising when
ETHTOOL_GLINKSETTINGS defined. Get pause param reports the Tx/Rx
pause configured, and then ethtool issues ETHTOOL_GSET ioctl and
ice_get_settings_link_up reports the negotiated Tx/Rx pause. Negotiated
pause frame report per IEEE 802.3-2005 table 288-3.

$ ethtool --show-pause ens6f0
Pause parameters for ens6f0:
Autonegotiate:  on
RX:             on
TX:             on
RX negotiated:  on
TX negotiated:  on

$ ethtool ens6f0
Settings for ens6f0:
        Supported ports: [ FIBRE ]
        Supported link modes:   25000baseCR/Full
        Supported pause frame use: Symmetric
        Supports auto-negotiation: Yes
        Supported FEC modes: None BaseR RS
        Advertised link modes:  25000baseCR/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: None BaseR RS
        Link partner advertised link modes:  Not reported
        Link partner advertised pause frame use: Symmetric
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 25000Mb/s
        Duplex: Full
        Port: Direct Attach Copper
        PHYAD: 0
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

When ETHTOOL_GLINKSETTINGS is not defined, get pause param reports the
negotiated Tx/Rx pause.

Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 10:23:04 -07:00
YueHaibing
6a7ce95d75 staging/octeon: Fix build error without CONFIG_NETDEVICES
While do COMPILE_TEST build without CONFIG_NETDEVICES,
we get Kconfig warning:

WARNING: unmet direct dependencies detected for PHYLIB
  Depends on [n]: NETDEVICES [=n]
  Selected by [y]:
  - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC && NETDEVICES [=n] || COMPILE_TEST [=y])

Reported-by: Hulk Robot <hulkci@huawei.com>
Reported-by: Mark Brown <broonie@kernel.org>
Fixes: 171a9bae68 ("staging/octeon: Allow test build on !MIPS")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 09:07:05 -07:00
David S. Miller
ac5fe22636 We have a reasonably large number of changes:
* lots more HE (802.11ax) support, particularly things
    relevant for the the AP side, but also mesh support
  * debugfs cleanups from Greg
  * some more work on extended key ID
  * start using genl parallel_ops, as preparation for
    weaning ourselves off RTNL and getting parallelism
  * various other changes all over
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl1BtsAACgkQB8qZga/f
 l8QTdQ//bD3NLitiIW4rBmC/w9str2TRpUbRN8Hf9xTnRh1smLGQop8jI8xpYdAE
 hXFlWn4glCkTkZrqtT8IDMcERb2YPHI94L7lMR1L6dXH7e1tiBZ7Qum5+rojLUCQ
 /XXZJnVa9AbdzBDtkhzCjkN8dCldMnkId0m6vkGdFre/b70FLDg2GlolqIchbDCz
 GxeKaw/uTLwu9nxEJhspDmiXDQtQd1ZsGNZRrovQ2nUuUfvX9sU54OmaoDHhqrUM
 iSwFZJAhrCV7nidOLs2X1rVs8hRymbrYnGu6NdUlfTQMqaOVgWhqniM1RMnJnnVi
 Ec/1OQg4KIj2rjJIXW/3QXAzAO6TDwV8xNk9al3m+yAkkSXR7tXP01plNUoMU4UW
 YxcaPD341H919GIqa71lieeEMyi7XIKKFfaZvGFLiDzidKKi0JkDuC+1w5UOJrbj
 PM/dzvKqvwOMEa5XahjrUQLsiGMgxBkDo36+F4rfaelzDKPuJBOokDS23vg/lYf5
 2v74WbeJVc45cAhSwp5/0RTsG0xhmFImfrE66VNB1vcJUVLvHTuATfXsBbRahOz/
 SglGBAGqigxJwIeYfgec3lPNfdV+0LwYnPAEjBIjCO8qlGvst4jLroHg7ayApr2E
 JfUbaWtpEb9cgAz7X6tEjv/BAnsRWPzpxb8Nm6JoZttTtA2VakE=
 =eUnh
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2019-07-31' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
We have a reasonably large number of changes:
 * lots more HE (802.11ax) support, particularly things
   relevant for the the AP side, but also mesh support
 * debugfs cleanups from Greg
 * some more work on extended key ID
 * start using genl parallel_ops, as preparation for
   weaning ourselves off RTNL and getting parallelism
 * various other changes all over
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:59:41 -07:00
David S. Miller
164f0de315 Merge branch 'mlxsw-Test-coverage-for-DSCP-leftover-fix'
Petr Machata says:

====================
mlxsw: Test coverage for DSCP leftover fix

This patch set fixes some global scope pollution issues in the DSCP tests
(in patch #1), and then proceeds (in patch #2) to add a new test for
checking whether, after DSCP prioritization rules are removed from a port,
DSCP rewrites consistently to zero, instead of the last removed rule still
staying in effect.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-31 08:47:14 -07:00