Commit Graph

813616 Commits

Author SHA1 Message Date
Daniel Borkmann
143bdc2e27 Merge branch 'bpf-libbpf-af-xdp'
Magnus Karlsson says:

====================
This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
  and AF_XDP sockets. This interface also loads a simple XDP program
  that routes all traffic on a queue up to the AF_XDP socket.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

The rings are now called xsk_ring and when a producer operates on
it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
way we can get some compile time error checking that the rings are
used correctly.

Comments and contenplations:

* The current behaviour is that the library loads an XDP program (if
  requested to do so) but the clean up of this program is left to the
  application. It would be possible to implement this cleanup in the
  library, but it would require state to be kept on netdev level,
  which there is none at the moment, and the synchronization of this
  between processes. All this adding complexity. But when we get an
  XDP program per queue id, then it becomes trivial to also remove the
  XDP program when the application exits. This proposal from Jesper,
  Björn and others will also improve the performance of libbpf, since
  most of the XDP program code can be removed when that feature is
  supported.

* In a future release, I am planning on adding a higher level data
  plane interface too. This will be based around recvmsg and sendmsg
  with the use of struct iovec for batching, without the user having
  to know anything about the underlying four rings of an AF_XDP
  socket. There will be one semantic difference though from the
  standard recvmsg and that is that the kernel will fill in the iovecs
  instead of the application. But the rest should be the same as the
  libc versions so that application writers feel at home.

Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions
Patch 3: Documentation update to help first time users

Changes v5 to v6:
  * Fixed prog_fd bug found by Xiaolong Ye. Thanks!
Changes v4 to v5:
  * Added a FAQ to the documentation
  * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to
    xsk_umem__get_data
  * Replaced the netlink code with bpf_get_link_xdp_id()
  * Dynamic allocation of the map sizes. They are now sized after
    the max number of queueus on the netdev in question.
Changes v3 to v4:
  * Dropped the pr_*() patch in favor of Yonghong Song's patch set
  * Addressed the review comments of Daniel Borkmann, mainly leaking
    of file descriptors at clean up and making the data plane APIs
    all static inline (with the exception of xsk_umem__get_data that
    uses an internal structure I do not want to expose).
  * Fixed the netlink callback as suggested by Maciej Fijalkowski.
  * Removed an unecessary include in the sample program as spotted by
    Ilia Fillipov.
Changes v2 to v3:
  * Added automatic loading of a simple XDP program that routes all
    traffic on a queue up to the AF_XDP socket. This program loading
    can be disabled.
  * Updated function names to be consistent with the libbpf naming
    convention
  * Moved all code to xsk.[ch]
  * Removed all the XDP program loading code from the sample since
    this is now done by libbpf
  * The initialization functions now return a handle as suggested by
    Alexei
  * const statements added in the API where applicable.
Changes v1 to v2:
  * Fixed cleanup of library state on error.
  * Moved API to initial version
  * Prefixed all public functions by xsk__ instead of xsk_
  * Added comment about changed default ring sizes, batch size and umem
    size in the sample application commit message
  * The library now only creates an Rx or Tx ring if the respective
    parameter is != NULL
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 23:26:36 +01:00
Magnus Karlsson
0f4a9b7d4e xsk: add FAQ to facilitate for first time users
Added an FAQ section in Documentation/networking/af_xdp.rst to help
first time users with common problems. As problems are getting
identified, entries will be added to the FAQ.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 23:21:42 +01:00
Magnus Karlsson
248c7f9c0e samples/bpf: convert xdpsock to use libbpf for AF_XDP access
This commit converts the xdpsock sample application to use the AF_XDP
functions present in libbpf. This cuts down the size of it by nearly
300 lines of code.

The default ring sizes plus the batch size has been increased and the
size of the umem area has decreased. This so that the sample application
will provide higher throughput. Note also that the shared umem code
has been removed from the sample as this is not supported by libbpf
at this point in time.

Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 23:21:42 +01:00
Magnus Karlsson
1cad078842 libbpf: add support for using AF_XDP sockets
This commit adds AF_XDP support to libbpf. The main reason for this is
to facilitate writing applications that use AF_XDP by offering
higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting
  up umems and af_xdp sockets as well as a simple XDP program.

Tested-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 23:21:42 +01:00
Stanislav Fomichev
740f8a6572 selftests/bpf: make sure signal interrupts BPF_PROG_TEST_RUN
Simple test that I used to reproduce the issue in the previous commit:
Do BPF_PROG_TEST_RUN with max iterations, each program is 4096 simple
move instructions. File alarm in 0.1 second and check that
bpf_prog_test_run is interrupted (i.e. test doesn't hang).

Note: reposting this for bpf-next to avoid linux-next conflict. In this
version I test both BPF_PROG_TYPE_SOCKET_FILTER (which uses generic
bpf_test_run implementation) and BPF_PROG_TYPE_FLOW_DISSECTOR (which has
it own loop with preempt handling in bpf_prog_test_run_flow_dissector).

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 22:24:19 +01:00
Stanislav Fomichev
a439184d51 bpf/test_run: fix unkillable BPF_PROG_TEST_RUN for flow dissector
Syzbot found out that running BPF_PROG_TEST_RUN with repeat=0xffffffff
makes process unkillable. The problem is that when CONFIG_PREEMPT is
enabled, we never see need_resched() return true. This is due to the
fact that preempt_enable() (which we do in bpf_test_run_one on each
iteration) now handles resched if it's needed.

Let's disable preemption for the whole run, not per test. In this case
we can properly see whether resched is needed.
Let's also properly return -EINTR to the userspace in case of a signal
interrupt.

This is a follow up for a recently fixed issue in bpf_test_run, see
commit df1a2cb7c7 ("bpf/test_run: fix unkillable
BPF_PROG_TEST_RUN").

Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 22:21:22 +01:00
Anders Roxell
fd92d6648f bpf: test_bpf: turn off preemption in function __run_once
When running BPF test suite the following splat occurs:

[  415.930950] test_bpf: #0 TAX jited:0
[  415.931067] BUG: assuming atomic context at lib/test_bpf.c:6674
[  415.946169] in_atomic(): 0, irqs_disabled(): 0, pid: 11556, name: modprobe
[  415.953176] INFO: lockdep is turned off.
[  415.957207] CPU: 1 PID: 11556 Comm: modprobe Tainted: G        W         5.0.0-rc7-next-20190220 #1
[  415.966328] Hardware name: HiKey Development Board (DT)
[  415.971592] Call trace:
[  415.974069]  dump_backtrace+0x0/0x160
[  415.977761]  show_stack+0x24/0x30
[  415.981104]  dump_stack+0xc8/0x114
[  415.984534]  __cant_sleep+0xf0/0x108
[  415.988145]  test_bpf_init+0x5e0/0x1000 [test_bpf]
[  415.992971]  do_one_initcall+0x90/0x428
[  415.996837]  do_init_module+0x60/0x1e4
[  416.000614]  load_module+0x1de0/0x1f50
[  416.004391]  __se_sys_finit_module+0xc8/0xe0
[  416.008691]  __arm64_sys_finit_module+0x24/0x30
[  416.013255]  el0_svc_common+0x78/0x130
[  416.017031]  el0_svc_handler+0x38/0x78
[  416.020806]  el0_svc+0x8/0xc

Rework so that preemption is disabled when we loop over function
'BPF_PROG_RUN(...)'.

Fixes: 568f196756 ("bpf: check that BPF programs run with preemption disabled")
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-25 22:18:07 +01:00
Toke Høiland-Jørgensen
915654fd71 samples/bpf: Fix dummy program unloading for xdp_redirect samples
The xdp_redirect and xdp_redirect_map sample programs both load a dummy
program onto the egress interfaces. However, the unload code checks these
programs against the wrong id number, and thus refuses to unload them. Fix
the comparison to avoid this.

Fixes: 3b7a8ec2de ("samples/bpf: Check the prog id before exiting")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-22 16:21:59 +01:00
Alexei Starovoitov
e80d02dd76 seccomp, bpf: disable preemption before calling into bpf prog
All BPF programs must be called with preemption disabled.

Fixes: 568f196756 ("bpf: check that BPF programs run with preemption disabled")
Reported-by: syzbot+8bf19ee2aa580de7a2a7@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-22 00:14:19 +01:00
Jesper Dangaard Brouer
74e31ca850 bpf: add skb->queue_mapping write access from tc clsact
The skb->queue_mapping already have read access, via __sk_buff->queue_mapping.

This patch allow BPF tc qdisc clsact write access to the queue_mapping via
tc_cls_act_is_valid_access.  Also handle that the value NO_QUEUE_MAPPING
is not allowed.

It is already possible to change this via TC filter action skbedit
tc-skbedit(8).  Due to the lack of TC examples, lets show one:

  # tc qdisc  add  dev ixgbe1 clsact
  # tc filter add  dev ixgbe1 ingress matchall action skbedit queue_mapping 5
  # tc filter list dev ixgbe1 ingress

The most common mistake is that XPS (Transmit Packet Steering) takes
precedence over setting skb->queue_mapping. XPS is configured per DEVICE
via /sys/class/net/DEVICE/queues/tx-*/xps_cpus via a CPU hex mask. To
disable set mask=00.

The purpose of changing skb->queue_mapping is to influence the selection of
the net_device "txq" (struct netdev_queue), which influence selection of
the qdisc "root_lock" (via txq->qdisc->q.lock) and txq->_xmit_lock. When
using the MQ qdisc the txq->qdisc points to different qdiscs and associated
locks, and HARD_TX_LOCK (txq->_xmit_lock), allowing for CPU scalability.

Due to lack of TC examples, lets show howto attach clsact BPF programs:

 # tc qdisc  add  dev ixgbe2 clsact
 # tc filter add  dev ixgbe2 egress bpf da obj XXX_kern.o sec tc_qmap2cpu
 # tc filter list dev ixgbe2 egress

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-19 21:56:05 +01:00
Peter Zijlstra
568f196756 bpf: check that BPF programs run with preemption disabled
Introduce cant_sleep() macro for annotation of functions that
cannot sleep.

Use it in BPF_PROG_RUN to catch execution of BPF programs in
preemptable context.

Suggested-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-19 21:53:07 +01:00
Alban Crequy
a5d9265e01 bpf: bpftool, fix documentation for attach types
bpftool has support for attach types "stream_verdict" and
"stream_parser" but the documentation was referring to them as
"skb_verdict" and "skb_parse". The inconsistency comes from commit
b7d3826c2e ("bpf: bpftool, add support for attaching programs to
maps").

This patch changes the documentation to match the implementation:
- "bpftool prog help"
- man pages
- bash completion

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-02-19 17:23:18 +01:00
YueHaibing
c9b747dbc2 bnx2x: Remove set but not used variable 'mfw_vn'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function 'bnx2x_get_hwinfo':
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:11940:10: warning:
 variable 'mfw_vn' set but not used [-Wunused-but-set-variable]

It's never used since introduction.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:47:32 -08:00
David S. Miller
778a57d93e Merge branch 'net-phy-add-helpers-for-handling-C45-10GBT-AN-register-values'
Heiner Kallweit says:

====================
net: phy: add helpers for handling C45 10GBT AN register values

Similar to the existing helpers for the Clause 22 registers add helpers
to deal with converting Clause 45 advertisement registers to / from
link mode bitmaps.

Note that these helpers are defined in linux/mdio.h, not like the
Clause 22 helpers in linux/mii.h. Reason is that the Clause 45 register
constants are defined in uapi/linux/mdio.h. And uapi/linux/mdio.h
includes linux/mii.h before defining the C45 register constants.

v2:
- Remove few helpers which aren't used by this series. They will
  follow together with the users.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:44:02 -08:00
Heiner Kallweit
96c2be34e6 net: phy: use mii_10gbt_stat_mod_linkmode_lpa_t in genphy_c45_read_lpa
Use mii_10gbt_stat_mod_linkmode_lpa_t() in genphy_c45_read_lpa() to
simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:43:55 -08:00
Heiner Kallweit
9004a14cb6 net: phy: add helper mii_10gbt_stat_mod_linkmode_lpa_t
Similar to the existing helpers for the Clause 22 registers add helper
mii_10gbt_stat_mod_linkmode_lpa_t.

Note that this helper is defined in linux/mdio.h, not like the
Clause 22 helpers in linux/mii.h. Reason is that the Clause 45 register
constants are defined in uapi/linux/mdio.h. And uapi/linux/mdio.h
includes linux/mii.h before defining the C45 register constants.

v2:
- remove helpers that don't have users in this series

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:43:54 -08:00
YueHaibing
bf9d787ba7 liquidio: using NULL instead of plain integer
Fix following warning:

drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c:1453:35: warning: Using plain integer as NULL pointer
drivers/net/ethernet/cavium/liquidio/lio_main.c:2910:23: warning: Using plain integer as NULL pointer

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:40:08 -08:00
Heiner Kallweit
eb160971af r8169: remove unneeded mmiowb barriers
writex() has implicit barriers, that's what makes it different from
writex_relaxed(). Therefore these calls to mmiowb() can be removed.

This patch was recently reverted due to a dependency with another
problematic patch. But because it didn't contribute to the problem
it was rebased and can be resubmitted.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:38:25 -08:00
Rundong Ge
57fd967838 net: dsa: Implement flow_dissect callback for tag_dsa.
RPS not work for DSA devices since the 'skb_get_hash'
will always get the invalid hash for dsa tagged packets.

"[PATCH] tag_mtk: add flow_dissect callback to the ops struct"
introduced the flow_dissect callback to get the right hash for
MTK tagged packet. Tag_dsa and tag_edsa  also need to implement
the callback.

Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:35:41 -08:00
Wei Yongjun
6e07902f56 net: sched: using kfree_rcu() to simplify the code
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:34:51 -08:00
YueHaibing
780feae7eb mdio_bus: Fix PTR_ERR() usage after initialization to constant
Fix coccinelle warning:

./drivers/net/phy/mdio_bus.c:51:5-12: ERROR: PTR_ERR applied after initialization to constant on line 44
./drivers/net/phy/mdio_bus.c:52:5-12: ERROR: PTR_ERR applied after initialization to constant on line 44

fix this by using IS_ERR before PTR_ERR

Fixes: bafbdd527d ("phylib: Add device reset GPIO support")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 16:34:20 -08:00
Shalom Toledo
31ef5b0eef mlxsw: spectrum: Change IP2ME CPU policer rate and burst size values
The IP2ME packet trap is triggered by packets hitting local routes.
After evaluating current defaults used by the driver it was decided to
reduce the amount of traffic generated by this trap to 1Kpps and
increase the burst size. This is inline with similarly deployed systems.

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 12:10:49 -08:00
Masahiro Yamada
ed95799bd4 net: hamradio: remove unused hweight*() defines
This file does not use hweight*() at all, and the definition is
surrounded by #if 0 ... #endif.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 12:10:02 -08:00
David S. Miller
8bbed40f10 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for you net-next
tree:

1) Missing NFTA_RULE_POSITION_ID netlink attribute validation,
   from Phil Sutter.

2) Restrict matching on tunnel metadata to rx/tx path, from wenxu.

3) Avoid indirect calls for IPV6=y, from Florian Westphal.

4) Add two indirections to prepare merger of IPV4 and IPV6 nat
   modules, from Florian Westphal.

5) Broken indentation in ctnetlink, from Colin Ian King.

6) Patches to use struct_size() from netfilter and IPVS,
   from Gustavo A. R. Silva.

7) Display kernel splat only once in case of racing to confirm
   conntrack from bridge plus nfqueue setups, from Chieh-Min Wang.

8) Skip checksum validation for layer 4 protocols that don't need it,
   patch from Alin Nastac.

9) Sparse warning due to symbol that should be static in CLUSTERIP,
   from Wei Yongjun.

10) Add new toggle to disable SDP payload translation when media
    endpoint is reachable though the same interface as the signalling
    peer, from Alin Nastac.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18 11:38:30 -08:00
Wei Yongjun
e511f17b1f net: hns3: make function hclge_set_all_vf_rst() static
Fixes the following sparse warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2431:5: warning:
 symbol 'hclge_set_all_vf_rst' was not declared. Should it be static?

Fixes: aa5c4f175b ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:55:11 -08:00
YueHaibing
58ecf2688c ptr_ring: remove duplicated include from ptr_ring.h
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:53:01 -08:00
Wei Yongjun
3edaded896 net: sgi: use GFP_ATOMIC under spin lock
The function meth_init_tx_ring() is called from meth_tx_timeout(),
in which spin_lock is held, so we should use GFP_ATOMIC instead.

Fixes: 8d4c28fbc2 ("meth: pass struct device to DMA API functions")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:51:46 -08:00
Ivan Vecera
aaeb1dea51 net: sched: sch_api: set an error msg when qdisc_alloc_handle() fails
This patch sets an error message in extack when the number of qdisc
handles exceeds the maximum. Also the error-code ENOSPC is more
appropriate than ENOMEM in this situation.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reported-by: Li Shuang <shuali@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:39:08 -08:00
Maxime Chevallier
5642563883 net: phy: marvell10g: Don't explicitly set Pause and Asym_Pause
The PHY core expects PHY drivers not to set Pause and Asym_Pause bits,
unless the driver only wants to specify one of them due to HW
limitation. In the case of the Marvell10g driver, we don't need to set
them.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:33:53 -08:00
YueHaibing
a0bc653b1d net: dsa: bcm_sf2: Remove set but not used variables 'v6_spec, v6_m_spec'
Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/dsa/bcm_sf2_cfp.c: In function 'bcm_sf2_cfp_ipv6_rule_set':
drivers/net/dsa/bcm_sf2_cfp.c:606:40: warning:
 variable 'v6_m_spec' set but not used [-Wunused-but-set-variable]
drivers/net/dsa/bcm_sf2_cfp.c:606:30: warning:
 variable 'v6_spec' set but not used [-Wunused-but-set-variable]

It not used any more after commit e4f7ef54cb ("dsa: bcm_sf2: use flow_rule
infrastructure")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:31:47 -08:00
Pieter Jansen van Vuuren
0496743b20 nfp: flower: fix masks for tcp and ip flags fields
Check mask fields of tcp and ip flags when setting the corresponding mask
flag used in hardware.

Fixes: 8f2566225a ("flow_offload: add flow_rule and flow_match")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:28:50 -08:00
David S. Miller
eaec2efbe4 Merge branch 'devlink-add-the-ability-to-update-device-flash'
Jakub Kicinski says:

====================
devlink: add the ability to update device flash

This series is the second step to allow trouble shooting and recovering
devices in bad state without the use of netdevs as handles.  We can
already query FW versions over devlink, now we add the ability to update
the FW.  This will allow drivers to implement some from of "limp-mode"
where the device can't really be used for networking and hence has no
netdev, but we can interrogate it over devlink and fix the broken FW.

Small but nice advantage of devlink is that it only holds the devlink
instance lock during flashing, unlike ethtool which holds rtnl_lock().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:27:39 -08:00
Jakub Kicinski
5c5696f3df nfp: devlink: allow flashing the device via devlink
Devlink now allows updating device flash.  Implement this
callback.

Compared to ethtool update we no longer have to release
the networking locks - devlink doesn't take them.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:27:38 -08:00
Jakub Kicinski
4eceba1720 ethtool: add compat for flash update
If driver does not support ethtool flash update operation
call into devlink.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:27:38 -08:00
Jakub Kicinski
76726ccb7f devlink: add flash update command
Add devlink flash update command. Advanced NICs have firmware
stored in flash and often cryptographically secured. Updating
that flash is handled by management firmware. Ethtool has a
flash update command which served us well, however, it has two
shortcomings:
 - it takes rtnl_lock unnecessarily - really flash update has
   nothing to do with networking, so using a networking device
   as a handle is suboptimal, which leads us to the second one:
 - it requires a functioning netdev - in case device enters an
   error state and can't spawn a netdev (e.g. communication
   with the device fails) there is no netdev to use as a handle
   for flashing.

Devlink already has the ability to report the firmware versions,
now with the ability to update the firmware/flash we will be
able to recover devices in bad state.

To enable updates of sub-components of the FW allow passing
component name.  This name should correspond to one of the
versions reported in devlink info.

v1: - replace target id with component name (Jiri).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:27:38 -08:00
David S. Miller
8e31c47424 Merge branch 'net-phy-improve-and-use-phy_resolve_aneg_linkmode'
Heiner Kallweit says:

====================
net: phy: improve and use phy_resolve_aneg_linkmode

Improve phy_resolve_aneg_linkmode and use it in genphy_read_status.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:21:39 -08:00
Heiner Kallweit
5502b218e0 net: phy: use phy_resolve_aneg_linkmode in genphy_read_status
Now that we have phy_resolve_aneg_linkmode() we can make
genphy_read_status() much simpler.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:21:38 -08:00
Heiner Kallweit
a2703de709 net: phy: improve phy_resolve_aneg_linkmode
We have the settings array of modes which is sorted based on aneg
priority. Instead of checking each mode manually let's simply iterate
over the sorted settings.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 15:21:38 -08:00
Vlad Buslov
8b58d12f4a net: sched: cgroup: verify that filter is not NULL during walk
Check that filter is not NULL before passing it to tcf_walker->fn()
callback in cls_cgroup_walk(). This can happen when cls_cgroup_change()
failed to set first filter.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:26:57 -08:00
Vlad Buslov
d66022cd16 net: sched: matchall: verify that filter is not NULL in mall_walk()
Check that filter is not NULL before passing it to tcf_walker->fn()
callback. This can happen when mall_change() failed to offload filter to
hardware.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:26:39 -08:00
Vlad Buslov
3027ff41f6 net: sched: route: don't set arg->stop in route4_walk() when empty
Some classifiers set arg->stop in their implementation of tp->walk() API
when empty. Most of classifiers do not adhere to that convention. Do not
set arg->stop in route4_walk() to unify tp->walk() behavior among
classifier implementations.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:24:39 -08:00
Vlad Buslov
31a9984876 net: sched: fw: don't set arg->stop in fw_walk() when empty
Some classifiers set arg->stop in their implementation of tp->walk() API
when empty. Most of classifiers do not adhere to that convention. Do not
set arg->stop in fw_walk() to unify tp->walk() behavior among classifier
implementations.

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 13:24:18 -08:00
Jann Horn
1eb00162f8 net: caif: use skb helpers instead of open-coding them
Use existing skb_put_data() and skb_trim() instead of open-coding them,
with the skb_put_data() first so that logically, `skb` still contains the
data to be copied in its data..tail area when skb_put_data() reads it.
This change on its own is a cleanup, and it is also necessary for potential
future integration of skbuffs with things like KASAN.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 11:01:17 -08:00
Vadim Pasternak
6a79507cfe mlxsw: core: Extend thermal module with per QSFP module thermal zones
Add a dedicated thermal zone for each QSFP/SFP module. The current
temperature is obtained from the module's temperature sensor and the
trip points are set based on the warning and critical thresholds
read from the module.

A cooling device (fan) is bound to all the thermal zones. The
thermal zone governor is set to user space in order to avoid
collisions between thermal zones.
For example, one thermal zone might want to increase the speed of
the fan, whereas another one would like to decrease it.

Deferring this decision to user space allows the user to the take
the most suitable decision.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:57:49 -08:00
David S. Miller
3c136c542a Merge branch 'neigh-tracepoints'
Roopa Prabhu says:

====================
tracepoints in neighbor subsystem

Roopa Prabhu (2):
  trace: events: add a few neigh tracepoints
  neigh: hook tracepoints in neigh update code
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:33:39 -08:00
Roopa Prabhu
56dd18a49f neigh: hook tracepoints in neigh update code
hook tracepoints at the end of functions that
update a neigh entry. neigh_update gets an additional
tracepoint to trace the update flags and old and new
neigh states.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:33:39 -08:00
Roopa Prabhu
9c03b282ba trace: events: add a few neigh tracepoints
The goal here is to trace neigh state changes covering all possible
neigh update paths. Plus have a specific trace point in neigh_update
to cover flags sent to neigh_update.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:33:39 -08:00
David S. Miller
9e8ccd8957 Merge branch 'net-phy-add-and-use-genphy_c45_an_config_an'
Heiner Kallweit says:

====================
net: phy: add and use genphy_c45_an_config_an

This series adds genphy_c45_an_config_an() and uses it in the
marvell10g diver. In addition patch 4 aligns the aneg configuration
with what is done in genphy_config_aneg().

v2:
- in patch 2 changed function name to genphy_c45_an_config_aneg
- in patch 3 add a comment regarding 1000BaseT vendor registers

v3:
- rebase patch 3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:27:00 -08:00
Heiner Kallweit
3ce2a027ae net: phy: marvell10g: check for newly set aneg
Even if the advertisement registers content didn't change, we may have
just switched to aneg, and therefore have to trigger an aneg restart.
This matches the behavior of genphy_config_aneg().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:26:52 -08:00
Andrew Lunn
3de97f3c63 net: phy: marvell10g: use genphy_c45_an_config_aneg
Use new function genphy_c45_config_aneg() in mv3310_config_aneg().

v2:
- add a comment regarding 1000BaseT vendor registers
v3:
- rebased

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: patch splitted]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17 10:26:52 -08:00