Commit Graph

901971 Commits

Author SHA1 Message Date
Pablo Neira Ayuso
9ea4894ba4 Merge branch 'master' of git://blackhole.kfki.hu/nf
Jozsef Kadlecsik says:

====================
ipset patches for nf

The first one is larger than usual, but the issue could not be solved simpler.
Also, it's a resend of the patch I submitted a few days ago, with a one line
fix on top of that: the size of the comment extensions was not taken into
account at reporting the full size of the set.

- Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot
  by introducing region locking and using workqueue instead of timer based
  gc of timed out entries in hash types of sets in ipset.
- Fix the forceadd evaluation path - the bug was also uncovered by the syzbot.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26 13:55:15 +01:00
Saeed Mahameed
586ee9e8a3 net/mlx5: sparse: warning: Using plain integer as NULL pointer
Return NULL instead of 0.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2020-02-25 17:06:21 -08:00
Saeed Mahameed
5edc4c7275 net/mlx5: sparse: warning: incorrect type in assignment
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:191:13:
sparse: warning: incorrect type in assignment (different base types)

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2020-02-25 17:06:19 -08:00
Nathan Chancellor
fa2b491287 net/mlx5: Fix header guard in rsc_dump.h
Clang warns:

 In file included from
 ../drivers/net/ethernet/mellanox/mlx5/core/main.c:73:
 ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning:
 '__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define
 of a different macro [-Wheader-guard]
 #ifndef __MLX5_RSC_DUMP_H
         ^~~~~~~~~~~~~~~~~
 ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note:
 '__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'?
 #define __MLX5_RSC_DUMP__H
         ^~~~~~~~~~~~~~~~~~
         __MLX5_RSC_DUMP_H
 1 warning generated.

Make them match to get the intended behavior and remove the warning.

Fixes: 12206b1723 ("net/mlx5: Add support for resource dump")
Link: https://github.com/ClangBuiltLinux/linux/issues/897
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:16 -08:00
Hans Wippel
fa194707a9 Documentation: fix vxlan typo in mlx5.rst
Fix a vxlan typo in the mlx5 driver documentation.

Signed-off-by: Hans Wippel <ndev@hwipl.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:13 -08:00
Tariq Toukan
e9c1d2539d net/mlx5e: RX, Use indirect calls wrapper for handling compressed completions
We can avoid an indirect call per compressed completion wrapping the
completion handling call with the appropriate helper.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:10 -08:00
Tariq Toukan
2c8f80b3e3 net/mlx5e: RX, Use indirect calls wrapper for posting descriptors
We can avoid an indirect call per NAPI cycle wrapping the RX descriptors
posting call with the appropriate helper.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:07 -08:00
Maxim Mikityanskiy
6e0504c698 net/mlx5e: Change inline mode correctly when changing trust state
The current steps that are performed when the trust state changes, if
the channels are active:

1. The trust state is changed in hardware.

2. The new inline mode is calculated.

3. If the new inline mode is different, the channels are recreated using
the new inline mode.

This approach has some issues:

1. There is a time gap between changing trust state in hardware and
starting sending enough inline headers (the latter happens after
recreation of channels). It leads to failed transmissions and error
CQEs.

2. If the new channels fail to open, we'll be left with the old ones,
but the hardware will be configured for the new trust state, so the
interval when we can see TX errors never ends.

This patch fixes the issues above by moving the trust state change into
the preactivate hook that runs during the recreation of the channels
when no channels are active, so it eliminates the gap of partially
applied configuration. If the inline mode doesn't change with the change
of the trust state, the channels won't be recreated, just like before
this patch.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:04 -08:00
Maxim Mikityanskiy
b9ab5d0ecf net/mlx5e: Add context to the preactivate hook
Sometimes the preactivate hook of mlx5e_safe_switch_channels needs more
parameters than just struct mlx5e_priv *. For such cases, a new
parameter (void *context) is added to preactivate hooks.

Some of the existing normal functions are currently used as preactivate
callbacks. To avoid adding an extra unused parameter, they are wrapped
in an automatic way using the MLX5E_DEFINE_PREACTIVATE_WRAPPER_CTX
macro.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:02 -08:00
Maxim Mikityanskiy
35a78ed4c3 net/mlx5e: Allow mlx5e_switch_priv_channels to fail and recover
Currently mlx5e_switch_priv_channels expects that the preactivate hook
doesn't fail, however, it can fail, because it may set hardware
parameters. This commit addresses this issue and provides a way to
recover from failures of the preactivate hook: the old channels are not
closed until the point where nothing can fail anymore, so in case
preactivate fails, the driver can roll back the old channels and
activate them again.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:59 -08:00
Maxim Mikityanskiy
600a3952a2 net/mlx5e: Remove unneeded netif_set_real_num_tx_queues
The number of queues is now updated by mlx5e_update_netdev_queues in a
centralized way, when no channels are active. Remove an extra occurrence
of netif_set_real_num_tx_queues to prepare it for the next commit.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:56 -08:00
Maxim Mikityanskiy
3909a12e79 net/mlx5e: Fix configuration of XPS cpumasks and netdev queues in corner cases
Currently, mlx5e notifies the kernel about the number of queues and sets
the default XPS cpumasks when channels are activated. This
implementation has several corner cases, in which the kernel may not be
updated on time, or XPS cpumasks may be reset when not directly touched
by the user.

This commit fixes these corner cases to match the following expected
behavior:

1. The number of queues always corresponds to the number of channels
configured.

2. XPS cpumasks are set to driver's defaults on netdev attach.

3. XPS cpumasks set by user are not reset, unless the number of channels
changes. If the number of channels changes, they are reset to driver's
defaults. (In general case, when the number of channels increases or
decreases, it's not possible to guess how to convert the current XPS
cpumasks to work with the new number of channels, so we let the user
reconfigure it if they change the number of channels.)

XPS cpumasks are no longer stored per channel. Only one temporary
cpumask is used. The old stored cpumasks didn't reflect the user's
changes and were not used after applying them.

A scratchpad area is added to struct mlx5e_priv. As cpumask_var_t
requires allocation, and the preactivate hook can't fail, we need to
preallocate the temporary cpumask in advance. It's stored in the
scratchpad.

Fixes: 149e566fef ("net/mlx5e: Expand XPS cpumask to cover all online cpus")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:53 -08:00
Maxim Mikityanskiy
fe867cac9e net/mlx5e: Use preactivate hook to set the indirection table
mlx5e_ethtool_set_channels updates the indirection table before
switching to the new channels. If the switch fails, the indirection
table is new, but the channels are old, which is wrong. Fix it by using
the preactivate hook of mlx5e_safe_switch_channels to update the
indirection table at the stage when nothing can fail anymore.

As the code that updates the indirection table is now encapsulated into
a new function, use that function in the attach flow when the driver has
to reduce the number of channels, and prepare the code for the next
commit.

Fixes: 85082dba0a ("net/mlx5e: Correctly handle RSS indirection table when changing number of channels")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:51 -08:00
Maxim Mikityanskiy
dca147b3dc net/mlx5e: Rename hw_modify to preactivate
mlx5e_safe_switch_channels accepts a callback to be called before
activating new channels. It is intended to configure some hardware
parameters in cases where channels are recreated because some
configuration has changed.

Recently, this callback has started being used to update the driver's
internal MLX5E_STATE_XDP_OPEN flag, and the following patches also
intend to use this callback for software preparations. This patch
renames the hw_modify callback to preactivate, so that the name fits
better.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:47 -08:00
Maxim Mikityanskiy
c2c95271f9 net/mlx5e: Encapsulate updating netdev queues into a function
As a preparation for one of the following commits, create a function to
encapsulate the code that notifies the kernel about the new amount of
RX and TX queues. The code will be called multiple times in the next
commit.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:45 -08:00
Tariq Toukan
02377e6edf net/mlx5e: Add missing LRO cap check
The LRO boolean state in params->lro_en must not be set in case
the NIC is not capable.
Enforce this check and remove the TODO comment.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:42 -08:00
Eran Ben Elisha
4229e0ea2c net/mlx5e: Define one flow for TXQ selection when TCs are configured
We shall always extract channel index out of the txq, regardless
of the relation between txq_ix and num channels. The extraction is
always valid, as if txq is smaller than number of channels,
txq_ix == priv->txq2sq[txq_ix]->ch_ix.

By doing so, we can remove an if clause from the select queue method,
and have one flow for all packets.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:05:39 -08:00
Masami Hiramatsu
2910b5aa6f bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
Since commit d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
default") also changed the CONFIG_BOOTTIME_TRACING to select
CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
it introduced wrong dependencies with BLK_DEV_INITRD as below.

WARNING: unmet direct dependencies detected for BOOT_CONFIG
  Depends on [n]: BLK_DEV_INITRD [=n]
  Selected by [y]:
  - BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]

This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
that both boot-time tracing and boot configuration off but those
appear on the menu list.

Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2

Fixes: d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-02-25 19:07:58 -05:00
Yuya Kusakabe
503d539a6e virtio_net: Add XDP meta data support
Implement support for transferring XDP meta data into skb for
virtio_net driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().

Tested with the script at
https://github.com/higebu/virtio_net-xdp-metadata-test.

Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/bpf/20200225033212.437563-2-yuya.kusakabe@gmail.com
2020-02-25 22:50:55 +01:00
Yuya Kusakabe
f1d4884d68 virtio_net: Keep vnet header zeroed if XDP is loaded for small buffer
We do not want to care about the vnet header in receive_small() if XDP
is loaded, since we can not know whether or not the packet is modified
by XDP.

Fixes: f6b10209b9 ("virtio-net: switch to use build_skb() for small buffer")
Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/bpf/20200225033212.437563-1-yuya.kusakabe@gmail.com
2020-02-25 22:50:55 +01:00
Andrii Nakryiko
9fb156bb82 selftests/bpf: Print backtrace on SIGSEGV in test_progs
Due to various bugs in tests clean up code (usually), if host system is
misconfigured, it happens that test_progs will just crash in the middle of
running a test with little to no indication of where and why the crash
happened. For cases where coredump is not readily available (e.g., inside
a CI), it's very helpful to have a stack trace, which lead to crash, to be
printed out. This change adds a signal handler that will capture and print out
symbolized backtrace:

  $ sudo ./test_progs -t mmap
  test_mmap:PASS:skel_open_and_load 0 nsec
  test_mmap:PASS:bss_mmap 0 nsec
  test_mmap:PASS:data_mmap 0 nsec
  Caught signal #11!
  Stack trace:
  ./test_progs(crash_handler+0x18)[0x42a888]
  /lib64/libpthread.so.0(+0xf5d0)[0x7f2aab5175d0]
  ./test_progs(test_mmap+0x3c0)[0x41f0a0]
  ./test_progs(main+0x160)[0x407d10]
  /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f2aab15d3d5]
  ./test_progs[0x407ebc]
  [1]    1988412 segmentation fault (core dumped)  sudo ./test_progs -t mmap

Unfortunately, glibc's symbolization support is unable to symbolize static
functions, only global ones will be present in stack trace. But it's still a
step forward without adding extra libraries to get a better symbolization.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225000847.3965188-1-andriin@fb.com
2020-02-25 22:43:02 +01:00
David S. Miller
f13e4415d2 Merge branch 'mlxsw-Implement-ACL-dropped-packets-identification'
Jiri Pirko says:

====================
mlxsw: Implement ACL-dropped packets identification

mlxsw hardware allows to insert a ACL-drop action with a value defined
by user that would be later on passed with a dropped packet.

To implement this, use the existing TC action cookie and pass it to the
driver. As the cookie format coming down from TC and the mlxsw HW cookie
format is different, do the mapping of these two using idr and rhashtable.

The cookie is passed up from the HW through devlink_trap_report() to
drop_monitor code. A new metadata type is used for that.

Example:
$ tc qdisc add dev enp0s16np1 clsact
$ tc filter add dev enp0s16np1 ingress protocol ip pref 10 flower skip_sw dst_ip 192.168.1.2 action drop cookie 3b45fa38c8
                                                                                                                ^^^^^^^^^^
$ devlink trap set pci/0000:00:10.0 trap acl action trap
$ dropwatch
Initializing null lookup method
dropwatch> set hw true
setting hardware drops monitoring to 1
dropwatch> set alertmode packet
Setting alert mode
Alert mode successfully set
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
drop at: ingress_flow_action_drop (acl_drops)
origin: hardware
input port ifindex: 30
input port name: enp0s16np1
cookie: 3b45fa38c8    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
timestamp: Fri Jan 24 17:10:53 2020 715387671 nsec
protocol: 0x800
length: 98
original length: 98

This way the user may insert multiple drop rules and monitor the dropped
packets with the information of which action caused the drop.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
7a3c3f4440 selftests: netdevsim: Extend devlink trap test to include flow action cookie
Extend existing devlink trap test to include metadata type for flow
action cookie.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
d3cbb907ae netdevsim: add ACL trap reporting cookie as a metadata
Add new trap ACL which reports flow action cookie in a metadata. Allow
used to setup the cookie using debugfs file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
6de9fceeaa mlxsw: spectrum_trap: Lookup and pass cookie down to devlink_trap_report()
Use the cookie index received along with the packet to lookup original
flow_offload cookie binary and pass it down to devlink_trap_report().
Add "fa_cookie" metadata to the ACL trap.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
78a7dcb7c9 mlxsw: pci: Extract cookie index for ACL discard trap packets
In case the received packet comes in due to one of ACL discard traps,
take the user_def_val_orig_pkt_len field from CQE and store it
in skb->cb as ACL cookie index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
6d19d2bdc8 mlxsw: core_acl_flex_actions: Implement flow_offload action cookie offload
Track cookies coming down to driver by flow_offload.
Assign a cookie_index to each unique cookie binary. Use previously
defined "Trap with userdef" flex action to ask HW to pass cookie_index
alongside with the dropped packets.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
ec12165195 mlxsw: core_acl_flex_actions: Add trap with userdef action
Expose "Trap action with userdef". It is the same as already
defined "Trap action" with a difference that it would ask the policy
engine to pass arbitrary value (userdef) alongside with received packets.
This would be later on used to carry cookie index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jiri Pirko
5a2e106c74 devlink: extend devlink_trap_report() to accept cookie and pass
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jiri Pirko
742b8cceaa drop_monitor: extend by passing cookie from driver
If driver passed along the cookie, push it through Netlink.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jiri Pirko
85b0589ede devlink: add trap metadata type for cookie
Allow driver to indicate cookie metadata for registered traps.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jiri Pirko
2008495d81 flow_offload: pass action cookie through offload structures
Extend struct flow_action_entry in order to hold TC action cookie
specified by user inserting the action.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
Jason A. Donenfeld
a8e41f6033 icmp: allow icmpv6_ndo_send to work with CONFIG_IPV6=n
The icmpv6_send function has long had a static inline implementation
with an empty body for CONFIG_IPV6=n, so that code calling it doesn't
need to be ifdef'd. The new icmpv6_ndo_send function, which is intended
for drivers as a drop-in replacement with an identical function
signature, should follow the same pattern. Without this patch, drivers
that used to work with CONFIG_IPV6=n now result in a linker error.

Cc: Chen Zhou <chenzhou10@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 0b41713b60 ("icmp: introduce helper for nat'd source address in network device context")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:01:39 -08:00
Linus Torvalds
c5f8689118 RISC-V Fixes for 5.6-rc4
This tag contains a handful of RISC-V related fixes that I've collected and
 would like to target for 5.6-rc4:
 
 * A fix to set up the PMPs on boot, which allows the kernel to access memory on
   systems that don't set up permissive PMPs before getting to Linux.  This only
   effects machine-mode kernels, which currently means only NOMMU kernels.
 * A fix to avoid enabling supervisor-mode interrupts when running in
   machine-mode, also only for NOMMU kernels.
 * A pair of fixes to our KASAN support to avoid corrupting memory.
 * A gitignore fix.
 
 This boots on QEMU's virt board for me.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl5UZC4THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiYR3D/9YOz4JMp1rGySxxBctWwiO3WyPDcce
 y1+QWizzto7sPl6wrQCO2mehWRKzjWVa1fBovgE/NewEIjaFd5sbhB/JZ2FUiUCU
 OJo8j8TrBp3CvHIlfYKSuZrRHwFUt4KeLo22KoGpTQDhhpDjgSAwnUSjfykEEiLc
 xAtSfoHUgrYBFNe78J9Yz61gc5zNYb7iTsgf1Av6S2hiwwlLRtqUEtoO+dK9uo8f
 hIadaO8UWGJU+Zz1JN7tboP/rixRdNUCbacoeRLQ8cmo3vuNfHH9E1i15QiSv1lx
 xCDDk9imZN1G2kL26Irgivg0eh8NRczfabfKSnMrEEsCvYG0Mo3nwvezPTJvDqIB
 7nFpxUj2jDu/Q0t7rgANs61tRy0fyPA2q/Hbn+IPn4cv/taUaSdQCr0sHBaPN2D7
 MnXYtXNYwqPGqK4OI25qXkIPOlgbJCfUa9C3evW2lq7L/oK5WQzhfXHBKL+SYwWI
 5nQLRewDj8e7KJBAY6/ODJ6QU83mQxvueFQG16oisYdDE+crdWxJ6GhmSWrF1B8y
 sziCMHiWLt5GNCoHf47esg44Wj824aG4ZNmJkNgSwv2YBTNgKDbU7ejue8x/ZEls
 ZmBEPFw88QenOMUkCEwcsmIJcVuxLqGAZe57ROpHQ/uLiO64pu5+unpWHkNihyVQ
 jAUWa/iGFezBBg==
 =r0RY
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linux-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "This contains a handful of RISC-V related fixes that I've collected
  and would like to target for 5.6-rc4:

   - A fix to set up the PMPs on boot, which allows the kernel to access
     memory on systems that don't set up permissive PMPs before getting
     to Linux. This only effects machine-mode kernels, which currently
     means only NOMMU kernels.

   - A fix to avoid enabling supervisor-mode interrupts when running in
     machine-mode, also only for NOMMU kernels.

   - A pair of fixes to our KASAN support to avoid corrupting memory.

   - A gitignore fix.

  This boots on QEMU's virt board for me"

* tag 'riscv-for-linux-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: adjust the indent
  riscv: allocate a complete page size for each page table
  riscv: Fix gitignore
  RISC-V: Don't enable all interrupts in trap_init()
  riscv: set pmp configuration if kernel is running in M-mode
2020-02-25 10:14:39 -08:00
Linus Torvalds
d67f250e96 Merge branch 'mips-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
 "Here are a few MIPS fixes, and a MAINTAINERS update to hand over MIPS
  maintenance to Thomas Bogendoerfer - this will be my final pull
  request as MIPS maintainer.

  Thanks for your helpful comments, useful corrections & responsiveness
  during the time I've fulfilled the role, and I'm sure I'll pop up
  elsewhere in the tree somewhere down the line"

* 'mips-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MAINTAINERS: Hand MIPS over to Thomas
  MIPS: ingenic: DTS: Fix watchdog nodes
  MIPS: X1000: Fix clock of watchdog node.
  MIPS: vdso: Wrap -mexplicit-relocs in cc-option
  MIPS: VPE: Fix a double free and a memory leak in 'release_vpe()'
  MIPS: cavium_octeon: Fix syncw generation.
  mips: vdso: add build time check that no 'jalr t9' calls left
  MIPS: Disable VDSO time functionality on microMIPS
  mips: vdso: fix 'jalr t9' crash in vdso code
2020-02-25 10:09:41 -08:00
Stefano Brivio
d082055650 selftests: nft_concat_range: Move option for 'list ruleset' before command
Before nftables commit fb9cea50e8b3 ("main: enforce options before
commands"), 'nft list ruleset -a' happened to work, but it's wrong
and won't work anymore. Replace it by 'nft -a list ruleset'.

Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 611973c1e0 ("selftests: netfilter: Introduce tests for sets with range concatenation")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-25 13:01:07 +01:00
Kees Cook
adc10f5b0a docs: Fix empty parallelism argument
When there was no parallelism (no top-level -j arg and a pre-1.7
sphinx-build), the argument passed would be empty ("") instead of just
being missing, which would (understandably) badly confuse sphinx-build.
Fix this by removing the quotes.

Reported-by: Rafael J. Wysocki <rafael@kernel.org>
Fixes: 51e46c7a40 ("docs, parallelism: Rearrange how jobserver reservations are made")
Cc: stable@vger.kernel.org  # v5.5 only
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-25 03:11:04 -07:00
Stephen Kitt
53ace11952 docs: remove MPX from the x86 toc
MPX was removed in commit 45fc24e89b ("x86/mpx: remove MPX from
arch/x86"), this removes the corresponding entry in the x86 toc.

This was suggested by a Sphinx warning.

Signed-off-by: Stephen Kitt <steve@sk2.org>
Fixes: 45fc24e89b ("x86/mpx: remove MPX from arch/x86")
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-02-25 03:10:22 -07:00
Paul Burton
3234f4ed30
MAINTAINERS: Hand MIPS over to Thomas
My time with MIPS the company has reached its end, and so at best I'll
have little time spend on maintaining arch/mips/.

Ralf last authored a patch over 2 years ago, the last time he committed
one is even further back & activity was sporadic for a while before
that. The reality is that he isn't active.

Having a new maintainer with time to do things properly will be
beneficial all round. Thomas Bogendoerfer has been involved in MIPS
development for a long time & has offered to step up as maintainer, so
add Thomas and remove myself & Ralf from the MIPS entry.

Ralf already has an entry in CREDITS to honor his contributions, so this
just adds one for me.

Signed-off-by: Paul Burton <paulburton@kernel.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@vger.kernel.org
2020-02-24 22:43:18 -08:00
Jakub Sitnicki
e0360423d0 selftests/bpf: Run SYN cookies with reuseport BPF test only for TCP
Currently we run SYN cookies test for all socket types and mark the test as
skipped if socket type is not compatible. This causes confusion because
skipped test might indicate a problem with the testing environment.

Instead, run the test only for the socket type which supports SYN cookies.

Also, switch to using designated initializers when setting up tests, so
that we can tweak only some test parameters, leaving the rest initialized
to default values.

Fixes: eecd618b45 ("selftests/bpf: Mark SYN cookie test skipped for UDP sockets")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200224135327.121542-2-jakub@cloudflare.com
2020-02-24 16:35:16 -08:00
Jakub Sitnicki
779e422d11 selftests/bpf: Run reuseport tests only with supported socket types
SOCKMAP and SOCKHASH map types can be used with reuseport BPF programs but
don't support yet storing UDP sockets. Instead of marking UDP tests with
SOCK{MAP,HASH} as skipped, don't run them at all.

Skipped test might signal that the test environment is not suitable for
running the test, while in reality the functionality is not implemented in
the kernel yet.

Before:

  sh# ./test_progs -t select_reuseport
  …
  #40 select_reuseport:OK
  Summary: 1/126 PASSED, 30 SKIPPED, 0 FAILED

After:

  sh# ./test_progs  -t select_reuseport
  …
  #40 select_reuseport:OK
  Summary: 1/98 PASSED, 2 SKIPPED, 0 FAILED

The remaining two skipped tests are SYN cookies tests, which will be
addressed in the subsequent patch.

Fixes: 11318ba8ca ("selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP/SOCKHASH")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200224135327.121542-1-jakub@cloudflare.com
2020-02-24 16:35:16 -08:00
Alexei Starovoitov
80a836c250 Merge branch 'BPF_and_RT'
Thomas Gleixner says:

====================
This is the third version of the BPF/RT patch set which makes both coexist
nicely. The long explanation can be found in the cover letter of the V1
submission:

  https://lore.kernel.org/r/20200214133917.304937432@linutronix.de

V2 is here:

  https://lore.kernel.org/r/20200220204517.863202864@linutronix.de

The following changes vs. V2 have been made:

  - Rebased to bpf-next, adjusted to the lock changes in the hashmap code.

  - Split the preallocation enforcement patch for instrumentation type BPF
    programs into two pieces:

    1) Emit a one-time warning on !RT kernels when any instrumentation type
       BPF program uses run-time allocation. Emit also a corresponding
       warning in the verifier log. But allow the program to run for
       backward compatibility sake. After a grace period this should be
       enforced.

    2) On RT reject such programs because on RT the memory allocator cannot
       be called from truly atomic contexts.

  - Fixed the fallout from V2 as reported by Alexei and 0-day

  - Removed the redundant preempt_disable() from trace_call_bpf()

  - Removed the unused export of trace_call_bpf()
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-02-24 16:20:10 -08:00
David Miller
099bfaa731 bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled
In a RT kernel down_read_trylock() cannot be used from NMI context and
up_read_non_owner() is another problematic issue.

So in such a configuration, simply elide the annotated stackmap and
just report the raw IPs.

In the longer term, it might be possible to provide a atomic friendly
versions of the page cache traversal which will at least provide the info
if the pages are resident and don't need to be paged in.

[ tglx: Use IS_ENABLED() to avoid the #ifdeffery, fixup the irq work
  	callback and add a comment ]

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.708960317@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
66150d0dde bpf, lpm: Make locking RT friendly
The LPM trie map cannot be used in contexts like perf, kprobes and tracing
as this map type dynamically allocates memory.

The memory allocation happens with a raw spinlock held which is a truly
spinning lock on a PREEMPT RT enabled kernel which disables preemption and
interrupts.

As RT does not allow memory allocation from such a section for various
reasons, convert the raw spinlock to a regular spinlock.

On a RT enabled kernel these locks are substituted by 'sleeping' spinlocks
which provide the proper protection but keep the code preemptible.

On a non-RT kernel regular spinlocks map to raw spinlocks, i.e. this does
not cause any functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.602129531@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
7f805d17f1 bpf: Prepare hashtab locking for PREEMPT_RT
PREEMPT_RT forbids certain operations like memory allocations (even with
GFP_ATOMIC) from atomic contexts. This is required because even with
GFP_ATOMIC the memory allocator calls into code pathes which acquire locks
with long held lock sections. To ensure the deterministic behaviour these
locks are regular spinlocks, which are converted to 'sleepable' spinlocks
on RT. The only true atomic contexts on an RT kernel are the low level
hardware handling, scheduling, low level interrupt handling, NMIs etc. None
of these contexts should ever do memory allocations.

As regular device interrupt handlers and soft interrupts are forced into
thread context, the existing code which does
  spin_lock*(); alloc(GPF_ATOMIC); spin_unlock*();
just works.

In theory the BPF locks could be converted to regular spinlocks as well,
but the bucket locks and percpu_freelist locks can be taken from arbitrary
contexts (perf, kprobes, tracepoints) which are required to be atomic
contexts even on RT. These mechanisms require preallocated maps, so there
is no need to invoke memory allocations within the lock held sections.

BPF maps which need dynamic allocation are only used from (forced) thread
context on RT and can therefore use regular spinlocks which in turn allows
to invoke memory allocations from the lock held section.

To achieve this make the hash bucket lock a union of a raw and a regular
spinlock and initialize and lock/unlock either the raw spinlock for
preallocated maps or the regular variant for maps which require memory
allocations.

On a non RT kernel this distinction is neither possible nor required.
spinlock maps to raw_spinlock and the extra code and conditional is
optimized out by the compiler. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.509685912@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
d01f9b198c bpf: Factor out hashtab bucket lock operations
As a preparation for making the BPF locking RT friendly, factor out the
hash bucket lock operations into inline functions. This allows to do the
necessary RT modification in one place instead of sprinkling it all over
the place. No functional change.

The now unused htab argument of the lock/unlock functions will be used in
the next step which adds PREEMPT_RT support.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.420416916@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
b6e5dae15a bpf: Replace open coded recursion prevention in sys_bpf()
The required protection is that the caller cannot be migrated to a
different CPU as these functions end up in places which take either a hash
bucket lock or might trigger a kprobe inside the memory allocator. Both
scenarios can lead to deadlocks. The deadlock prevention is per CPU by
incrementing a per CPU variable which temporarily blocks the invocation of
BPF programs from perf and kprobes.

Replace the open coded preempt_[dis|en]able and __this_cpu_[inc|dec] pairs
with the new helper functions. These functions are already prepared to make
BPF work on PREEMPT_RT enabled kernels. No functional change for !RT
kernels.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.317843926@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
085fee1a72 bpf: Use recursion prevention helpers in hashtab code
The required protection is that the caller cannot be migrated to a
different CPU as these places take either a hash bucket lock or might
trigger a kprobe inside the memory allocator. Both scenarios can lead to
deadlocks. The deadlock prevention is per CPU by incrementing a per CPU
variable which temporarily blocks the invocation of BPF programs from perf
and kprobes.

Replace the open coded preempt_disable/enable() and this_cpu_inc/dec()
pairs with the new recursion prevention helpers to prepare BPF to work on
PREEMPT_RT enabled kernels. On a non-RT kernel the migrate disable/enable
in the helpers map to preempt_disable/enable(), i.e. no functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.211208533@linutronix.de
2020-02-24 16:20:10 -08:00
Thomas Gleixner
c518cfa0c5 bpf: Provide recursion prevention helpers
The places which need to prevent the execution of trace type BPF programs
to prevent deadlocks on the hash bucket lock do this open coded.

Provide two inline functions, bpf_disable/enable_instrumentation() to
replace these open coded protection constructs.

Use migrate_disable/enable() instead of preempt_disable/enable() right away
so this works on RT enabled kernels. On a !RT kernel migrate_disable /
enable() are mapped to preempt_disable/enable().

These helpers use this_cpu_inc/dec() instead of __this_cpu_inc/dec() on an
RT enabled kernel because migrate disabled regions are preemptible and
preemption might hit in the middle of a RMW operation which can lead to
inconsistent state.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145644.103910133@linutronix.de
2020-02-24 16:20:09 -08:00
David Miller
2a916f2f54 bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.
Replace the preemption disable/enable with migrate_disable/enable() to
reflect the actual requirement and to allow PREEMPT_RT to substitute it
with an actual migration disable mechanism which does not disable
preemption.

Including the code paths that go via __bpf_prog_run_save_cb().

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200224145643.998293311@linutronix.de
2020-02-24 16:20:09 -08:00