Commit Graph

873882 Commits

Author SHA1 Message Date
Alexei Starovoitov
e1cb7d2d60 Merge branch 'map-pinning'
Toke Høiland-Jørgensen says:

====================
This series adds support to libbpf for reading 'pinning' settings from BTF-based
map definitions. It introduces a new open option which can set the pinning path;
if no path is set, /sys/fs/bpf is used as the default. Callers can customise the
pinning between open and load by setting the pin path per map, and still get the
automatic reuse feature.

The semantics of the pinning is similar to the iproute2 "PIN_GLOBAL" setting,
and the eventual goal is to move the iproute2 implementation to be based on
libbpf and the functions introduced in this series.

Changelog:

v6:
  - Fix leak of struct bpf_object in selftest
  - Make struct bpf_map arg const in bpf_map__is_pinned() and bpf_map__get_pin_path()

v5:
  - Don't pin maps with pinning set, but with a value of LIBBPF_PIN_NONE
  - Add a few more selftests:
    - Should not pin map with pinning set, but value LIBBPF_PIN_NONE
    - Should fail to load a map with an invalid pinning value
    - Should fail to re-use maps with parameter mismatch
  - Alphabetise libbpf.map
  - Whitespace and typo fixes

v4:
  - Don't check key_type_id and value_type_id when checking for map reuse
    compatibility.
  - Move building of map->pin_path into init_user_btf_map()
  - Get rid of 'pinning' attribute in struct bpf_map
  - Make sure we also create parent directory on auto-pin (new patch 3).
  - Abort the selftest on error instead of attempting to continue.
  - Support unpinning all pinned maps with bpf_object__unpin_maps(obj, NULL)
  - Support pinning at map->pin_path with bpf_object__pin_maps(obj, NULL)
  - Make re-pinning a map at the same path a noop
  - Rename the open option to pin_root_path
  - Add a bunch more self-tests for pin_maps(NULL) and unpin_maps(NULL)
  - Fix a couple of smaller nits

v3:
  - Drop bpf_object__pin_maps_opts() and just use an open option to customise
    the pin path; also don't touch bpf_object__{un,}pin_maps()
  - Integrate pinning and reuse into bpf_object__create_maps() instead of having
    multiple loops though the map structure
  - Make errors in map reuse and pinning fatal to the load procedure
  - Add selftest to exercise pinning feature
  - Rebase series to latest bpf-next

v2:
  - Drop patch that adds mounting of bpffs
  - Only support a single value of the pinning attribute
  - Add patch to fixup error handling in reuse_fd()
  - Implement the full automatic pinning and map reuse logic on load
====================

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-11-02 12:35:12 -07:00
Toke Høiland-Jørgensen
2f4a32cc83 selftests: Add tests for automatic map pinning
This adds a new BPF selftest to exercise the new automatic map pinning
code.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269298209.394725.15420085139296213182.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen
57a00f4164 libbpf: Add auto-pinning of maps when loading BPF objects
This adds support to libbpf for setting map pinning information as part of
the BTF map declaration, to get automatic map pinning (and reuse) on load.
The pinning type currently only supports a single PIN_BY_NAME mode, where
each map will be pinned by its name in a path that can be overridden, but
defaults to /sys/fs/bpf.

Since auto-pinning only does something if any maps actually have a
'pinning' BTF attribute set, we default the new option to enabled, on the
assumption that seamless pinning is what most callers want.

When a map has a pin_path set at load time, libbpf will compare the map
pinned at that location (if any), and if the attributes match, will re-use
that map instead of creating a new one. If no existing map is found, the
newly created map will instead be pinned at the location.

Programs wanting to customise the pinning can override the pinning paths
using bpf_map__set_pin_path() before calling bpf_object__load() (including
setting it to NULL to disable pinning of a particular map).

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269298092.394725.3966306029218559681.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen
196f8487f5 libbpf: Move directory creation into _pin() functions
The existing pin_*() functions all try to create the parent directory
before pinning. Move this check into the per-object _pin() functions
instead. This ensures consistent behaviour when auto-pinning is
added (which doesn't go through the top-level pin_maps() function), at the
cost of a few more calls to mkdir().

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297985.394725.5882630952992598610.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen
4580b25fce libbpf: Store map pin path and status in struct bpf_map
Support storing and setting a pin path in struct bpf_map, which can be used
for automatic pinning. Also store the pin status so we can avoid attempts
to re-pin a map that has already been pinned (or reused from a previous
pinning).

The behaviour of bpf_object__{un,}pin_maps() is changed so that if it is
called with a NULL path argument (which was previously illegal), it will
(un)pin only those maps that have a pin_path set.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297876.394725.14782206533681896279.stgit@toke.dk
2019-11-02 12:35:07 -07:00
Toke Høiland-Jørgensen
d1b4574a4b libbpf: Fix error handling in bpf_map__reuse_fd()
bpf_map__reuse_fd() was calling close() in the error path before returning
an error value based on errno. However, close can change errno, so that can
lead to potentially misleading error messages. Instead, explicitly store
errno in the err variable before each goto.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/157269297769.394725.12634985106772698611.stgit@toke.dk
2019-11-02 12:35:06 -07:00
Linus Torvalds
1204c70d9d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix free/alloc races in batmanadv, from Sven Eckelmann.

 2) Several leaks and other fixes in kTLS support of mlx5 driver, from
    Tariq Toukan.

 3) BPF devmap_hash cost calculation can overflow on 32-bit, from Toke
    Høiland-Jørgensen.

 4) Add an r8152 device ID, from Kazutoshi Noguchi.

 5) Missing include in ipv6's addrconf.c, from Ben Dooks.

 6) Use siphash in flow dissector, from Eric Dumazet. Attackers can
    easily infer the 32-bit secret otherwise etc.

 7) Several netdevice nesting depth fixes from Taehee Yoo.

 8) Fix several KCSAN reported errors, from Eric Dumazet. For example,
    when doing lockless skb_queue_empty() checks, and accessing
    sk_napi_id/sk_incoming_cpu lockless as well.

 9) Fix jumbo packet handling in RXRPC, from David Howells.

10) Bump SOMAXCONN and tcp_max_syn_backlog values, from Eric Dumazet.

11) Fix DMA synchronization in gve driver, from Yangchun Fu.

12) Several bpf offload fixes, from Jakub Kicinski.

13) Fix sk_page_frag() recursion during memory reclaim, from Tejun Heo.

14) Fix ping latency during high traffic rates in hisilicon driver, from
    Jiangfent Xiao.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (146 commits)
  net: fix installing orphaned programs
  net: cls_bpf: fix NULL deref on offload filter removal
  selftests: bpf: Skip write only files in debugfs
  selftests: net: reuseport_dualstack: fix uninitalized parameter
  r8169: fix wrong PHY ID issue with RTL8168dp
  net: dsa: bcm_sf2: Fix IMP setup for port different than 8
  net: phylink: Fix phylink_dbg() macro
  gve: Fixes DMA synchronization.
  inet: stop leaking jiffies on the wire
  ixgbe: Remove duplicate clear_bit() call
  Documentation: networking: device drivers: Remove stray asterisks
  e1000: fix memory leaks
  i40e: Fix receive buffer starvation for AF_XDP
  igb: Fix constant media auto sense switching when no cable is connected
  net: ethernet: arc: add the missed clk_disable_unprepare
  igb: Enable media autosense for the i350.
  igb/igc: Don't warn on fatal read failures when the device is removed
  tcp: increase tcp_max_syn_backlog max value
  net: increase SOMAXCONN to 4096
  netdevsim: Fix use-after-free during device dismantle
  ...
2019-11-01 17:48:11 -07:00
Linus Torvalds
372bf6c1c8 NFS Client Bugfixes for Linux 5.4-rc6
Stable bugfixes:
 - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
 
 Other fixes:
 - The TCP back channel mustn't disappear while requests are outstanding
 - The RDMA back channel mustn't disappear while requests are outstanding
 - Destroy the back channel when we destroy the host transport
 - Don't allow a cached open with a revoked delegation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl28mZ4ACgkQ18tUv7Cl
 QOvAAxAA5h15NJ8OTOpr/kyWQAHEYWIEWQyTNYT8Gi/TI/YCfauET6kxAKg7ETk+
 Bx6+A7Q5OlKSqFZXf2UMMIHPnvH7Ar06uIof6DLjWxA9N/q57SNW0YszQCUhvWjp
 4Ry7d9A2yNrCizEk/vTmY5zfFIo2S88AvwkQ6gLCEyfIn5REWQyQnS4SS3vL7MuS
 E/iWBsXFV2C9/yON+zzZnC0ewMIPbtWA3/CsVI2zDKLsHKvBYOx6n7TbfTtMtc/p
 mCdgLzdEEZlN0KOCzzMp7ziME8y1qUGX6eo17Jrr0ZBZIM7o5d7sYMA88Apu6dMg
 MQM5iEEAf//kmHLWzQ6yg8XZEZrXFMOiGKYu0kWt4pVlqi78gemEfE6T8dClsB5m
 P7jqn/4ntDlJPWPciImsVEzDOHlzlZBIXwZs1JjufeO/hIB570cs80PjeDQP0id8
 Cx3yR7Hr1ldKzwSwpWrtIaqEPljJJw0jhxQ7YYlvfbA1Ogd4ek4QU4RuvyW11CRr
 d/5Oaa7xvjGs63klu8HpZG4NHFby3PPmZT6t4xUcH75MR14S15YCGtaUvPJ7ORck
 C6tZJAkVtJqSoAjsGuBtTc5qiOdo+hMRghutW4Pd6UpnPYJKwQUtTqp3MbfM4+bA
 S19tGG6qn+7cl7bW4NJpNy8HfbC7BdS+m7maHpQfGIdMsxKVFoI=
 =ttcU
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client bugfixes from Anna Schumaker:
 "This contains two delegation fixes (with the RCU lock leak fix marked
  for stable), and three patches to fix destroying the the sunrpc back
  channel.

  Stable bugfixes:

   - Fix an RCU lock leak in nfs4_refresh_delegation_stateid()

  Other fixes:

   - The TCP back channel mustn't disappear while requests are
     outstanding

   - The RDMA back channel mustn't disappear while requests are
     outstanding

   - Destroy the back channel when we destroy the host transport

   - Don't allow a cached open with a revoked delegation"

* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
  NFSv4: Don't allow a cached open with a revoked delegation
  SUNRPC: Destroy the back channel when we destroy the host transport
  SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
  SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
2019-11-01 17:37:44 -07:00
Linus Torvalds
0821de2896 for-linus-20191101
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl28lRQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjtbEADbrMXdsdPV2CApxSiZIaWO1mR78yy/btxp
 cHsJ+avaPGxhNukSsose2KWm656SriH/OfQspqtvzDpslbu40V41+vSqSknqGRPr
 8jW5efZIAy6dq0FjbtBnmIV6PhC5d4F/nAEQbsnVRn8RSr3OwQcm8/smpSFA8urI
 oHVU8jiyLsQiSbDvjf2KPhPYhWBHO0W5SyGo29HY8pSzQpsMzGkQ6TcHL4EzwPZP
 WPtGglr14v8rMyhNMxUdHZ9eHCMq7uufFPuyJXzesE/qyM+H8p2pwwxyfflHGZil
 w2vxLJRu8d4UIkHEkNbC0bydXJ+eCtRMBZON1ZGdrZwQ58L9AbBPBZmxKb0LkmHb
 4tc/yQm/0kSUUXwFtDoUoIBFjjy36Pl5BsLt4n5fofsl04myhm5CLqZ8oWxyU0vO
 sCinJwk1+eQO/tbQVDfven+MroNlYVPCnXhIe/12/wEba3EJ7Ab4X5p0lJoJ1oY7
 9dQyY6+BaHd4wV9p0domOP5y7dJnXM9k46EF0/5YoNjoqaH5MWPMq355VH2xNjdw
 5HzRcZfvOAlXASrnXuQAAQAdR2b+s/iFZaNKA7bTZxjNPvYE0zySCMeQeNXmfVKe
 CrDuwViWukwIzETDZHYqMWJxOV4nyOeL3jTo7rQp5A5TEWwBiJKQ4aGBif2eqc+L
 Mk41ziQGuQ==
 =+rar
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20191101' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Two small nvme fixes, one is a fabrics connection fix, the other one
   a cleanup made possible by that fix (Anton, via Keith)

 - Fix requeue handling in umb ubd (Anton)

 - Fix spin_lock_irq() nesting in blk-iocost (Dan)

 - Three small io_uring fixes:
     - Install io_uring fd after done with ctx (me)
     - Clear ->result before every poll issue (me)
     - Fix leak of shadow request on error (Pavel)

* tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
  iocost: don't nest spin_lock_irq in ioc_weight_write()
  io_uring: ensure we clear io_kiocb->result before each issue
  um-ubd: Entrust re-queue to the upper layers
  nvme-multipath: remove unused groups_only mode in ana log
  nvme-multipath: fix possible io hang after ctrl reconnect
  io_uring: don't touch ctx in setup after ring fd install
  io_uring: Fix leaked shadow_req
2019-11-01 17:33:12 -07:00
Linus Torvalds
e5897c7d2e RISC-V updates for v5.4-rc6
One fix for PCIe users:
 
 - Fix legacy PCI I/O port access emulation
 
 One set of cleanups:
 
 - Resolve most of the warnings generated by sparse across arch/riscv.
   No functional changes
 
 And one MAINTAINERS update:
 
 - Update Palmer's E-mail address
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElRDoIDdEz9/svf2Kx4+xDQu9KksFAl28yFoACgkQx4+xDQu9
 KkvNSQ/8DpNzyFysNQZQEwbh3s9VOl2mWpie2Ym/ND2KJHL4zavFURwX5wO7QY7I
 a6H1fYZ3yjw9JOmUsCj/0ia2gzvAlrAPpy8Vd3hO/MPTXvxqPYJ0CC325Spn4aG+
 qYJHJWH/kBKhaaWDVa4wtOok35fx2F9UD352BZjyJfmvjqLKK0H+iflVId8ZrKPl
 hM8q4uO75b9NJUFakXMYAfYSTAzDe2qleNQzfJMlBRNjfxyx6E39O6U0VYNmIYXp
 c+U8OxfFKvJ0aU3JMLRxvt1gZRhsVk3lpScqaRvSRhtbUBcP6ya/hvySIDg3T+wK
 b7k3x0uINMJqxlAu/akw2X3QPFfqrQUxpXP80qZW+duGSEPsOeICGFvzi0gOn80F
 vq+SjVv3pXLxt1psd4Y6Os1kofFki+/VpUmtaQQwnBIHM3U7RomFnfSgCF2H/mPY
 SdYrH5F4RHVzptvG3MBSFqrs+QycPE2NJUdMKz794VQ4qiDqlpV1HKAO1Fbbw7KA
 b0eUd4MatHp2KSRG7YyT3RpCAKSyz7Ar9yu5rGr/I/J40PiPzHtR0d4FK0mQfiZw
 GBxvwUlErswgFrrmD3JMnWaFTEa+kzGKiYDgunjazGzJ8W06i3la0wWVcHXVdbj+
 K6SWla67dTw+N45U3Cs1WvtdgVK9QpQkl14X5HLynqUviw546ng=
 =hkRB
 -----END PGP SIGNATURE-----

Merge tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:
 "One fix for PCIe users:

   - Fix legacy PCI I/O port access emulation

  One set of cleanups:

   - Resolve most of the warnings generated by sparse across arch/riscv.
     No functional changes

  And one MAINTAINERS update:

   - Update Palmer's E-mail address"

* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  MAINTAINERS: Change to my personal email address
  RISC-V: Add PCIe I/O BAR memory mapping
  riscv: for C functions called only from assembly, mark with __visible
  riscv: fp: add missing __user pointer annotations
  riscv: add missing header file includes
  riscv: mark some code and data as file-static
  riscv: init: merge split string literals in preprocessor directive
  riscv: add prototypes for assembly language functions from head.S
2019-11-01 17:20:53 -07:00
Daniel Borkmann
78db77fab1 Merge branch 'bpf-xskmap-perf-improvements'
Björn Töpel says:

====================
This set consists of three patches from Maciej and myself which are
optimizing the XSKMAP lookups.  In the first patch, the sockets are
moved to be stored at the tail of the struct xsk_map. The second
patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch,
introduced in this revision, moves various XSKMAP functions, to permit
the compiler to do more aggressive inlining.

Based on the XDP program from tools/lib/bpf/xsk.c where
bpf_map_lookup_elem() is explicitly called, this work yields a 5%
improvement for xdpsock's rxdrop scenario. The last patch yields 2%
improvement.

Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the
overflow checks are done in the bpf_map_area_alloc() and
bpf_map_charge_init() functions, which was fixed in commit
ff1c08e1f7 ("bpf: Change size to u64 for bpf_map_{area_alloc,
charge_init}()").

  [1] https://patchwork.ozlabs.org/patch/1186170/

v1->v2: * Change size/cost to size_t and use {struct, array}_size
          where appropriate. (Jakub)
v2->v3: * Proper commit message for patch 2.
v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub)
        * Introduced patch 3.
v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct
          pointer-sized loads. (Daniel)
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-11-02 00:38:59 +01:00
Björn Töpel
d817991cc7 xsk: Restructure/inline XSKMAP lookup/redirect/flush
In this commit the XSKMAP entry lookup function used by the XDP
redirect code is moved from the xskmap.c file to the xdp_sock.h
header, so the lookup can be inlined from, e.g., the
bpf_xdp_redirect_map() function.

Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the
xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush()
functions.

Finally, all the XDP socket functions were moved from linux/bpf.h to
net/xdp_sock.h, where most of the XDP sockets functions are anyway.

This yields a ~2% performance boost for the xdpsock "rx_drop"
scenario.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Maciej Fijalkowski
e65650f291 bpf: Implement map_gen_lookup() callback for XSKMAP
Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup()
callback. This results in emitting the bpf instructions in place of
bpf_map_lookup_elem() helper call and better performance of bpf
programs.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Björn Töpel
64fe8c061d xsk: Store struct xdp_sock as a flexible array member of the XSKMAP
Prior this commit, the array storing XDP socket instances were stored
in a separate allocated array of the XSKMAP. Now, we store the sockets
as a flexible array member in a similar fashion as the arraymap. Doing
so, we do less pointer chasing in the lookup.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com
2019-11-02 00:38:49 +01:00
Linus Torvalds
31408fbe33 Merge branch 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
 "Fix a parisc kernel crash with ftrace functions when compiled without
  frame pointers"

* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix frame pointer in ftrace_regs_caller()
2019-11-01 15:16:25 -07:00
David S. Miller
aeb1b85c34 Merge branch 'fix-BPF-offload-related-bugs'
Jakub Kicinski says:

====================
fix BPF offload related bugs

test_offload.py catches some recently added bugs.

First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.

Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski
aefc3e723a net: fix installing orphaned programs
When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.

This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.

dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.

Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.

Fixes: c14a9f633d ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski
41aa29a58b net: cls_bpf: fix NULL deref on offload filter removal
Commit 4011921137 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.

Fixes: 4011921137 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Jakub Kicinski
8101e06941 selftests: bpf: Skip write only files in debugfs
DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.

Note that we can't use os.access() because the script requires
root.

Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:16:01 -07:00
Wei Wang
d64479a3e3 selftests: net: reuseport_dualstack: fix uninitalized parameter
This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.

Fixes: d6a61f80b8 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:11:02 -07:00
Heiner Kallweit
62bdc8fd1c r8169: fix wrong PHY ID issue with RTL8168dp
As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.

[0] https://bbs.archlinux.org/viewtopic.php?id=246508

Fixes: 242cd9b586 ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:09:40 -07:00
Florian Fainelli
5fc0f21246 net: dsa: bcm_sf2: Fix IMP setup for port different than 8
Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.

Fixes: 9f91484f6f ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:08:21 -07:00
Florian Fainelli
9d68db5092 net: phylink: Fix phylink_dbg() macro
The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.

Fixes: 17091180b1 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:06:46 -07:00
Yangchun Fu
9cfeeb576d gve: Fixes DMA synchronization.
Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.

Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 15:00:05 -07:00
Eric Dumazet
a904a0693c inet: stop leaking jiffies on the wire
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.

RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.

Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.

Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.

Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:57:52 -07:00
Alex Vesker
667f264676 net/mlx5: DR, Support IPv4 and IPv6 mixed matcher
Until now SW steering supported matchers that are IPv4 and IPv6.
The limitation was mixed matchers in which the outer header IP version
was different from the inner header IP version.

To support the mixed matcher we create all the possible ste_builder
combinations, once we create a rule we select the correct one to
be used for rule creation.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:18 -07:00
Erez Alfasi
1cdc14e9d1 net/mlx5: LAG, Use affinity type enumerators
Instead of using explicit indexes, simply use affinity
type enumerators to make the code more readable.

Fixes: 544fe7c2e6 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:17 -07:00
Erez Alfasi
84d2dbb0aa net/mlx5: LAG, Use port enumerators
Instead of using explicit array indexes, simply use
ports enumerators to make the code more readable.

Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:17 -07:00
Li RongQing
5a212e0cac net/mlx5: rate limit alloc_ent error messages
when debug a bug, which triggers TX hang, and kernel log is
spammed with the following info message

    [ 1172.044764] mlx5_core 0000:21:00.0: cmd_work_handler:930:(pid 8):
    failed to allocate command entry

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:17 -07:00
Dmytro Linkin
ab9341b549 net/mlx5e: Add ToS (DSCP) header rewrite support
Add support for rewriting of DSCP part of ToS field.
Next commands, for example, can be used to offload rewrite action:

OVS:
 $ ovs-ofctl add-flow ovs-sriov "ip, in_port=REP, \
       actions=mod_nw_tos:68, output:NIC"

iproute2 (used retain mask, as tc command rewrite whole ToS field):
 $ tc filter add dev REP ingress protocol ip prio 1 flower skip_sw \
       ip_proto icmp action pedit munge ip tos set 68 retain 0xfc pipe \
       action mirred egress redirect dev NIC

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:16 -07:00
Dmytro Linkin
88f30bbcba net/mlx5e: Bit sized fields rewrite support
This patch doesn't change any functionality, but is a pre-step for
adding support for rewriting of bit-sized fields, like DSCP and ECN
in IPv4 header, similar fields in IPv6, etc.

Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:16 -07:00
Tariq Toukan
769619ee39 net/mlx5: WQ, Move short getters into header file
Move short Work Queue API getter functions into the WQ
header file.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:16 -07:00
Saeed Mahameed
130c7b46c9 net/mlx5e: TX, Dump WQs wqe descriptors on CQE with error events
Dump the Work Queue's TX WQE descriptor when a completion with
error is received.

Example:
[5.331832] mlx5_core 0000:00:04.0 enp0s4: Error cqe on cqn 0xa, ci 0x1, TXQ-SQ qpn 0xe, opcode 0xd, syndrome 0x2, vendor syndrome 0x0
[5.333127] 00000000: 55 65 02 75 31 fe c2 d2 6b 6c 62 1e f9 e1 d8 5c
[5.333837] 00000010: d3 b2 6c b8 89 e4 84 20 0b f4 3c e0 f3 75 41 ca
[5.334568] 00000020: 46 00 00 00 cd 70 a0 92 18 3a 01 de 00 00 00 00
[5.335313] 00000030: 7d bc 05 89 b2 e9 00 02 1e 00 00 0e 00 00 30 d2
[5.335972] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 64
[5.336710] 00000000: 00 00 00 1e 00 00 0e 04 00 00 00 08 00 00 00 00
[5.337524] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 12 33 33
[5.338151] 00000020: 00 00 00 16 52 54 00 00 00 01 86 dd 60 00 00 00
[5.338740] 00000030: 00 00 00 48 00 00 00 00 00 00 00 00 66 ba 58 14

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:16 -07:00
Parav Pandit
7dee607ed0 net/mlx5: Support lockless FTE read lookups
During connection tracking offloads with high number of connections,
(40K connections per second), flow table group lock contention is
observed.
To improve the performance by reducing lock contention, lockless
FTE read lookup is performed as described below.

Each flow table entry is refcounted.
Flow table entry is removed when refcount drops to zero.
rhash table allows rcu protected lookup.
Each hash table entry insertion and removal is write lock protected.

Hence, it is possible to perform lockless lookup in rhash table using
following scheme.

(a) Guard FTE entry lookup per group using rcu read lock.
(b) Before freeing the FTE entry, wait for all readers to finish
accessing the FTE.

Below example of one reader and write in parallel racing, shows
protection in effect with rcu lock.

lookup_fte_locked()
  rcu_read_lock();
  search_hash_table()
                                  existing_flow_group_write_lock();
                                  tree_put_node(fte)
                                    drop_ref_cnt(fte)
                                    del_sw_fte(fte)
                                    del_hash_table_entry();
                                    call_rcu();
                                  existing_flow_group_write_unlock();
  get_ref_cnt(fte) fails
  rcu_read_unlock();
                                  rcu grace period();
                                    [..]
                                    kmem_cache_free(fte);

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:15 -07:00
Parav Pandit
84c7af6375 net/mlx5: Do not hold group lock while allocating FTE in software
FTE memory allocation using alloc_fte() doesn't have any dependency
on the flow group.
Hence, do not hold flow group lock while performing alloc_fte().
This helps to reduce contention of flow group lock.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:15 -07:00
Vlad Buslov
ae2741e2b6 net/mlx5e: Verify that rule has at least one fwd/drop action
Currently, mlx5 tc layer doesn't verify that rule has at least one forward
or drop action which leads to following firmware syndrome when user tries
to offload such action:

[ 1824.860501] mlx5_core 0000:81:00.0: mlx5_cmd_check:753:(pid 29458): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a)

Add check at the end of parse_tc_fdb_actions() that verifies that resulting
attribute has action fwd or drop flag set.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:15 -07:00
Aya Levin
556b9d16d3 net/mlx5: Clear VF's configuration on disabling SRIOV
When setting number of VFs to 0 (disable SRIOV), clear VF's
configuration.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:14 -07:00
zhong jiang
32680da710 net/mlx5: Remove unneeded variable in mlx5_unload_one
mlx5_unload_one do not need local variable to store different value,
Hence just remove it.

Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:14 -07:00
Igor Leshenko
cc4db579e6 net/mlx5: FPGA, support network cards with standalone FPGA
Not all mlx5 cards with FPGA device use it for network processing.

mlx5_core driver configures network connection to FPGA device
for all mlx5 cards with installed FPGA. If FPGA is not a part of
network path, driver crashes in this case

Check FPGA name in function mlx5_fpga_device_start() and continue
integrate FPGA into packets flow only for dedicated cards.
Currently there are Newton and Edison cards.

Signed-off-by: Igor Leshenko <igorle@mellanox.com>
Reviewed-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:14 -07:00
Hamdan Igbaria
40416d8ede net/mlx5: DR, Replace CRC32 implementation to use kernel lib
Use kernel function to calculate crc32 Instead of dr implementation
since it has the same algorithm "slice by 8".

Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-11-01 14:55:14 -07:00
Roman Mashak
c23fcbbc6a tc-testing: added tests with cookie for conntrack TC action
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:53:41 -07:00
David S. Miller
c8c2cd8102 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2019-11-01

This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.

Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine.  So change the read error into a
warn while the device is still present.

Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it.  So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.

I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.

Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.

Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.

Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.

v2: Fixed alignment issue in patch 3 of the series based on community
    feedback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-01 14:50:27 -07:00
Igor Pylypiv
451fe015b2 ixgbe: Remove duplicate clear_bit() call
__IXGBE_RX_BUILD_SKB_ENABLED bit is already cleared.

Signed-off-by: Igor Pylypiv <igor.pylypiv@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-01 13:20:50 -07:00
Jonathan Neuschäfer
17df5ae1b3 Documentation: networking: device drivers: Remove stray asterisks
These asterisks were once references to a line that said:
  "* Other names and brands may be claimed as the property of others."
But now, they serve no purpose; they can only irritate the reader.

Fixes: de3edab427 ("e1000: update README for e1000")
Fixes: a3fb65680f ("e100.txt: Cleanup license info in kernel doc")
Fixes: da8c01c450 ("e1000e.txt: Add e1000e documentation")
Fixes: f12a84a9f6 ("Documentation: fm10k: Add kernel documentation")
Fixes: b55c52b193 ("igb.txt: Add igb documentation")
Fixes: c4e9b56e24 ("igbvf.txt: Add igbvf Documentation")
Fixes: d7064f4c19 ("Documentation/networking/: Update Intel wired LAN driver documentation")
Fixes: c4b8c01112 ("ixgbevf.txt: Update ixgbevf documentation")
Fixes: 1e06edcc2f ("Documentation: i40e: Prepare documentation for RST conversion")
Fixes: 105bf2fe6b ("i40evf: add driver to kernel build system")
Fixes: 1fae869bcf ("Documentation: ice: Prepare documentation for RST conversion")
Fixes: df69ba4321 ("ionic: Add basic framework for IONIC Network device driver")
Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-01 13:20:43 -07:00
Wenwen Wang
8472ba6215 e1000: fix memory leaks
In e1000_set_ringparam(), 'tx_old' and 'rx_old' are not deallocated if
e1000_up() fails, leading to memory leaks. Refactor the code to fix this
issue.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-01 13:20:33 -07:00
Jeff Kirsher
2c19e395e0 i40e: Fix receive buffer starvation for AF_XDP
Magnus's fix to resolve a potential receive buffer starvation for AF_XDP
got applied to both the i40e_xsk_umem_enable/disable() functions, when it
should have only been applied to the "enable".  So clean up the undesired
code in the disable function.

CC: Magnus Karlsson <magnus.karlsson@intel.com>
Fixes: 1f459bdc20 ("i40e: fix potential RX buffer starvation for AF_XDP")
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2019-11-01 13:20:18 -07:00
Manfred Rudigier
8d5cfd7f76 igb: Fix constant media auto sense switching when no cable is connected
At least on the i350 there is an annoying behavior that is maybe also
present on 82580 devices, but was probably not noticed yet as MAS is not
widely used.

If no cable is connected on both fiber/copper ports the media auto sense
code will constantly swap between them as part of the watchdog task and
produce many unnecessary kernel log messages.

The swap code responsible for this behavior (switching to fiber) should
not be executed if the current media type is copper and there is no signal
detected on the fiber port. In this case we can safely wait until the
AUTOSENSE_EN bit is cleared.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicronenergy.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-01 13:20:00 -07:00
Linus Torvalds
0dbe6cb8f7 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Fix two scheduler topology bugs/oversights on Juno r0 2+4 big.LITTLE
  systems"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/topology: Allow sched_asym_cpucapacity to be disabled
  sched/topology: Don't try to build empty sched domains
2019-11-01 11:49:54 -07:00
Linus Torvalds
355f83c1d0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes: an ABI fix for a reserved field, AMD IBS fixes, an Intel
  uncore PMU driver fix and a header typo fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/headers: Fix spelling s/EACCESS/EACCES/, s/privilidge/privilege/
  perf/x86/uncore: Fix event group support
  perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h)
  perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity
  perf/core: Start rejecting the syscall with attr.__reserved_2 set
2019-11-01 11:40:47 -07:00
Linus Torvalds
b2a18c25c7 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Various fixes all over the map: prevent boot crashes on HyperV,
  classify UEFI randomness as bootloader randomness, fix EFI boot for
  the Raspberry Pi2, fix efi_test permissions, etc"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN
  x86, efi: Never relocate kernel below lowest acceptable address
  efi: libstub/arm: Account for firmware reserved memory at the base of RAM
  efi/random: Treat EFI_RNG_PROTOCOL output as bootloader randomness
  efi/tpm: Return -EINVAL when determining tpm final events log size fails
  efi: Make CONFIG_EFI_RCI2_TABLE selectable on x86 only
2019-11-01 11:32:50 -07:00