The subdevice concept is not being used in the driver, so
drop the references to it.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now it was possible to pass a packet to a single destination such
as vport or flow table. With the new support if multiple vports or multiple
tables are provided as destinations, fs_dr will create a multiple
destination table action, this action should replace other destination
actions provided to mlx5dr_create_rule.
Each vport destination can be provided with a reformat actions which
will be done before forwarding the packet to the vport.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A multiple destination table action allows HW packet duplication
to multiple destinations, this is useful for multicast or mirroring
traffic for debug. Duplicating is done using a FW flow table with
multiple destinations.
The new action creation function, mlx5dr_action_create_mult_dest_tbl
will allow creating a single table to iterate over several dr actions.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Function prefix was changed to be similar to other action APIs.
In order to support other FW tables the mlx5_flow_table struct was
replaced with table id and type.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
We need to have the flow-table flags when creation sw-steering tables,
this parameter exists in the layer between fs_core to sw_steering, this
patch gives it to the creation function.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently SW steering doesn't have the means to access HW iterators to
support multi-destination (FTEs) flow table entries.
In order to support multi-destination FTEs for port-mirroring, SW
steering will create a dedicated multi-destination FW managed flow table
and FTEs via direct FW commands that we introduced in the previous patch.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Implement the FW command to setup a FTE (Flow Table Entry) into the FW
managed flow tables.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of using multiple variables use a simple struct. The
number of passed argument was too high after adding the encap
decap support bits arguments used for multiple destination reformat.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use helper routines to setup and teardown multiple EQs and reuse the
code in setup, cleanup and error unwinding flows.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In below sequence, a EQE entry arrives for a CQ which is on the path of
being destroyed.
cpu-0 cpu-1
------ -----
mlx5_core_destroy_cq() mlx5_eq_comp_int()
mlx5_eq_del_cq() [..]
radix_tree_delete() [..]
[..] mlx5_eq_cq_get() /* Didn't find CQ is
* a valid case.
*/
/* destroy CQ in hw */
mlx5_cmd_exec()
This is still a valid scenario and correct delete CQ sequence, as
mirror of the CQ create sequence.
Hence, suppress the non harmful debug message from warn to debug level.
Keep the debug log message rate limited because user application can
trigger it repeatedly.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the max number of channels is limited to 64, which is half of
the indirection table size to allow some flexibility. But on servers
with more than 64 cores, users may want to utilize more queues.
This patch increases the advertised max number of channels to 128 by
changing the ratio between channels and indirection table slots to 1:1.
At the same time, the driver still enable no more than 64 channels at
loading. Users can change it by ethtool afterwards.
Signed-off-by: Fan Li <fanl@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In some configurations, gcc tries too hard to optimize this code:
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c: In function 'mlx5e_grp_sw_update_stats':
drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:302:1: error: the frame size of 1336 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
As was stated in the bug report, the reason is that gcc runs into a corner
case in the register allocator that is rather hard to fix in a good way.
As there is an easy way to work around it, just add a comment and the
barrier that stops gcc from trying to overoptimize the function.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92657
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The function mlx5_buf_alloc_node is only used by the function in the
local scope. So it is appropriate to limit this function in the local
scope.
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2020-01-06
This series contains updates to igc to add basic support for
timestamping.
Vinicius adds basic support for timestamping and enables ptp4l/phc2sys
to work with i225 devices. Initially, adds the ability to read and
adjust the PHC clock. Patches 2 & 3 enable and retrieve hardware
timestamps. Patch 4 implements the ethtool ioctl that ptp4l uses to
check what timestamping methods are supported. Lastly, added support to
do timestamping using the "Start of Packet" signal from the PHY, which
is now supported in i225 devices.
While i225 does support multiple PTP domains, with multiple timestamping
registers, we currently only support one PTP domain and use only one of
the timestamping registers for implementation purposes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn says:
====================
Unique mv88e6xxx IRQ names
There are a few boards which have multiple mv88e6xxx switches. With
such boards, it can be hard to determine which interrupts belong to
which switches. Make the interrupt names unique by including the
device name in the interrupt name. For the SERDES interrupt, also
include the port number. As a result of these patches ZII devel C
looks like:
50: 0 gpio-vf610 27 Level mv88e6xxx-0.1:00
54: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-0.1:00-g1-atu-prob
56: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-0.1:00-g1-vtu-prob
58: 0 mv88e6xxx-g1 7 Edge mv88e6xxx-0.1:00-g2
61: 0 mv88e6xxx-g2 1 Edge !mdio-mux!mdio@1!switch@0!mdio:01
62: 0 mv88e6xxx-g2 2 Edge !mdio-mux!mdio@1!switch@0!mdio:02
63: 0 mv88e6xxx-g2 3 Edge !mdio-mux!mdio@1!switch@0!mdio:03
64: 0 mv88e6xxx-g2 4 Edge !mdio-mux!mdio@1!switch@0!mdio:04
70: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.1:00-serdes-10
75: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-0.1:00-watchdog
76: 5 gpio-vf610 26 Level mv88e6xxx-0.2:00
80: 0 mv88e6xxx-g1 3 Edge mv88e6xxx-0.2:00-g1-atu-prob
82: 0 mv88e6xxx-g1 5 Edge mv88e6xxx-0.2:00-g1-vtu-prob
84: 4 mv88e6xxx-g1 7 Edge mv88e6xxx-0.2:00-g2
87: 2 mv88e6xxx-g2 1 Edge !mdio-mux!mdio@2!switch@0!mdio:01
88: 0 mv88e6xxx-g2 2 Edge !mdio-mux!mdio@2!switch@0!mdio:02
89: 0 mv88e6xxx-g2 3 Edge !mdio-mux!mdio@2!switch@0!mdio:03
90: 0 mv88e6xxx-g2 4 Edge !mdio-mux!mdio@2!switch@0!mdio:04
95: 3 mv88e6xxx-g2 9 Edge mv88e6xxx-0.2:00-serdes-9
96: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.2:00-serdes-10
101: 0 mv88e6xxx-g2 15 Edge mv88e6xxx-0.2:00-watchdog
Interrupt names like !mdio-mux!mdio@2!switch@0!mdio:01 are created by
phylib for the integrated PHYs. The mv88e6xxx driver does not
determine these names.
====================
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically generate a unique interrupt name for the VTU and ATU,
based on the device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically generate a unique g2 interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically generate a unique watchdog interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically generate a unique SERDES interrupt name, based on the
device name and the port the SERDES is for. For example:
95: 3 mv88e6xxx-g2 9 Edge mv88e6xxx-0.2:00-serdes-9
96: 0 mv88e6xxx-g2 10 Edge mv88e6xxx-0.2:00-serdes-10
The 0.2:00 indicates the switch and -9 indicates port 9.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dynamically generate a unique switch interrupt name, based on the
device name.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl4Twv0ACgkQSD+KveBX
+j6NzAf/Z4FkuBtjvroUcRpR/R2B9d9ipWemqvjHUobLDm/q0HLRvz3YHGFOW/1H
JLHVEh3jze5DkBegDmJWpFk/T2MWT5GZQ62jccUtIvjtIOwE5R6+EiUTZc3IHGZ1
Yzzoo+t0LyML6O+jxf3x+ZKjBHCh0jMvI0R2PFoxXESkPK8dKFDg7u6eLAmshYdX
akN9gzvrrClcsG0kx5llGzWJuNaRFGy/LA1rYM/9IpyFkgPG6yuwljWLk6U3era3
bPCOmL3X6SN4ji55RWvsnvwtBB2LY5ZIsFzF9rbkXBhu/bb3AdYfdDREUvemzfbH
dUnFsNECYnmjIx/52/C7ozW7gTtzdg==
=0Pxs
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2020-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-01-06
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v5.3
('net/mlx5: Move devlink registration before interfaces load')
For -stable v5.4
('net/mlx5e: Fix hairpin RSS table size')
('net/mlx5: DR, Init lists that are used in rule's member')
('net/mlx5e: Always print health reporter message to dmesg')
('net/mlx5: DR, No need for atomic refcount for internal SW steering resources')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- kbuild found missing define of MCOUNT_INSN_SIZE for various build configs
- Initialize variable to zero as gcc thinks it is used undefined
(it really isn't but the code is subtle enough that this doesn't hurt)
- Convert from do_div() to div64_ull() to prevent potential divide by zero
- Unregister a trace point on error path in sched_wakeup tracer
- Use signed offset for archs that can have stext not be first
- A simple indentation fix (whitespace error)
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXhOj6xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qukzAQCMNfkAbMFA+C1uORMhr/jWhi4eshWN
4jZ2u5X8zGuuXQD+PaQU4n8d0K4uCPF+lFD16DfFxXvCOXHfN3/zXmxGvw8=
=djaW
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- kbuild found missing define of MCOUNT_INSN_SIZE for various build
configs
- Initialize variable to zero as gcc thinks it is used undefined (it
really isn't but the code is subtle enough that this doesn't hurt)
- Convert from do_div() to div64_ull() to prevent potential divide by
zero
- Unregister a trace point on error path in sched_wakeup tracer
- Use signed offset for archs that can have stext not be first
- A simple indentation fix (whitespace error)"
* tag 'trace-v5.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix indentation issue
kernel/trace: Fix do not unregister tracepoints when register sched_migrate_task fail
tracing: Change offset type to s32 in preempt/irq tracepoints
ftrace: Avoid potential division by zero in function profiler
tracing: Have stack tracer compile when MCOUNT_INSN_SIZE is not defined
tracing: Define MCOUNT_INSN_SIZE when not defined without direct calls
tracing: Initialize val to zero in parse_entry of inject code
Whenever adding new member of rule object we attach it to 2 lists,
These 2 lists should be initialized first.
Fixes: 41d0707415 ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Set hairpin table size to the corret size, based on the groups that
would be created in it. Groups are laid out on the table such that a
group occupies a range of entries in the table. This implies that the
group ranges should have correspondence to the table they are laid upon.
The patch cited below made group 1's size to grow hence causing
overflow of group range laid on the table.
Fixes: a795d8db2a ("net/mlx5e: Support RSS for IP-in-IP and IPv6 tunneled packets")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
No need for an atomic refcounter for the STE and hashtables.
These are internal SW steering resources and they are always
under domain mutex.
This also fixes the following refcount error:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 9 PID: 3527 at lib/refcount.c:25 refcount_warn_saturate+0x81/0xe0
Call Trace:
dr_table_init_nic+0x10d/0x110 [mlx5_core]
mlx5dr_table_create+0xb4/0x230 [mlx5_core]
mlx5_cmd_dr_create_flow_table+0x39/0x120 [mlx5_core]
__mlx5_create_flow_table+0x221/0x5f0 [mlx5_core]
esw_create_offloads_fdb_tables+0x180/0x5a0 [mlx5_core]
...
Fixes: 26d688e33f ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Register devlink before interfaces are added.
This will allow interfaces to use devlink while initalizing. For example,
call mlx5_is_roce_enabled.
Fixes: aba25279c1 ("net/mlx5e: Add TX reporter support")
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In case a reporter exists, error message is logged only to the devlink
tracer. The devlink tracer is a visibility utility only, which user can
choose not to monitor.
After cited patch, 3rd party monitoring tools that tracks these error
message will no longer find them in dmesg, causing a regression.
With this patch, error messages are also logged into the dmesg.
Fixes: c50de4af1d ("net/mlx5e: Generalize tx reporter's functionality")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
For better accuracy, i225 is able to do timestamping using the Start of
Packet signal from the PHY.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This command allows igc to report what types of timestamping are
supported. ptp4l uses this to detect if the hardware supports
timestamping.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-----BEGIN PGP SIGNATURE-----
iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXhNwFyAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0q1aAQDXAztHROCVdYp7
8xln/RjlfmU8tntFJuoMATqwfX+GqQD8DAIS4eW6Ac0ZjB45cOKee9ndOV2SlV9/
T4gyyzeV2Qc=
=3VoD
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-20200106' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpmd fixes from Jarkko Sakkinen:
"There has been a bunch of reports (e.g. [*]) reporting that when
commit 5b359c7c43 ("tpm_tis_core: Turn on the TPM before probing
IRQ's") and subsequent fixes are applied it causes boot freezes on
some machines.
Unfortunately hardware where this causes a failure is not widely
available (only one I'm aware is Lenovo T490), which means we cannot
predict yet how long it will take to properly fix tpm_tis interrupt
probing.
Thus, the least worst short term action is to revert the code to the
state before this commit. In long term we need fix the tpm_tis probing
code to work on machines that Stefan's patches were supposed to fix.
With these patches reverted nothing fatal happens, TPM is fallbacked
to be used in polling mode (which is not in the end too bad because
there are no high throughput workloads for TPM).
[*] https://bugzilla.kernel.org/show_bug.cgi?id=205935"
* tag 'tpmdd-next-20200106' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: Revert "tpm_tis_core: Turn on the TPM before probing IRQ's"
tpm: Revert "tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts"
tpm: Revert "tpm_tis: reserve chip for duration of tpm_tis_core_init"
This adds support for timestamping packets being transmitted.
Based on the code from i210. The basic differences is that i225 has 4
registers to store the transmit timestamps (i210 has one). Right now,
we only support retrieving from one register, support for using the
other registers will be added later.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This adds support for timestamping received packets.
It is based on the i210, as many features of i225 work the same way.
The main difference from i210 is that i225 has support for choosing
the timer register to use when timestamping packets. Right now, we
only support using timer 0. The other difference is that i225 stores
two timestamps in the receive descriptor, right now, we only retrieve
one.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Igor Russkikh says:
====================
Aquantia/Marvell atlantic bugfixes 2020/01
Here is a set of recently discovered bugfixes,
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Function entries were duplicated accidentally, removing the dups.
Fixes: ea4b4d7fc1 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initial loopback configuration should be called earlier, before
starting traffic on HW blocks. Otherwise depending on race conditions
it could be kept disabled.
Fixes: ea4b4d7fc1 ("net: atlantic: loopback tests via private flags")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Last code/checkpatch cleanup did a copy paste error where code from
firmware 3 API logic was moved to firmware 1 logic.
This resulted in FW1.x users would never see the link state as active.
Fixes: 7b0c342f1f ("net: atlantic: code style cleanup")
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before commit 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
cgroup bpf structures were released with
corresponding cgroup structures. It guaranteed the hierarchical order
of destruction: children were always first. It preserved attached
programs from being released before their propagated copies.
But with cgroup auto-detachment there are no such guarantees anymore:
cgroup bpf is released as soon as the cgroup is offline and there are
no live associated sockets. It means that an attached program can be
detached and released, while its propagated copy is still living
in the cgroup subtree. This will obviously lead to an use-after-free
bug.
To reproduce the issue the following script can be used:
#!/bin/bash
CGROOT=/sys/fs/cgroup
mkdir -p ${CGROOT}/A ${CGROOT}/B ${CGROOT}/A/C
sleep 1
./test_cgrp2_attach ${CGROOT}/A egress &
A_PID=$!
./test_cgrp2_attach ${CGROOT}/B egress &
B_PID=$!
echo $$ > ${CGROOT}/A/C/cgroup.procs
iperf -s &
S_PID=$!
iperf -c localhost -t 100 &
C_PID=$!
sleep 1
echo $$ > ${CGROOT}/B/cgroup.procs
echo ${S_PID} > ${CGROOT}/B/cgroup.procs
echo ${C_PID} > ${CGROOT}/B/cgroup.procs
sleep 1
rmdir ${CGROOT}/A/C
rmdir ${CGROOT}/A
sleep 1
kill -9 ${S_PID} ${C_PID} ${A_PID} ${B_PID}
On the unpatched kernel the following stacktrace can be obtained:
[ 33.619799] BUG: unable to handle page fault for address: ffffbdb4801ab002
[ 33.620677] #PF: supervisor read access in kernel mode
[ 33.621293] #PF: error_code(0x0000) - not-present page
[ 33.622754] Oops: 0000 [#1] SMP NOPTI
[ 33.623202] CPU: 0 PID: 601 Comm: iperf Not tainted 5.5.0-rc2+ #23
[ 33.625545] RIP: 0010:__cgroup_bpf_run_filter_skb+0x29f/0x3d0
[ 33.635809] Call Trace:
[ 33.636118] ? __cgroup_bpf_run_filter_skb+0x2bf/0x3d0
[ 33.636728] ? __switch_to_asm+0x40/0x70
[ 33.637196] ip_finish_output+0x68/0xa0
[ 33.637654] ip_output+0x76/0xf0
[ 33.638046] ? __ip_finish_output+0x1c0/0x1c0
[ 33.638576] __ip_queue_xmit+0x157/0x410
[ 33.639049] __tcp_transmit_skb+0x535/0xaf0
[ 33.639557] tcp_write_xmit+0x378/0x1190
[ 33.640049] ? _copy_from_iter_full+0x8d/0x260
[ 33.640592] tcp_sendmsg_locked+0x2a2/0xdc0
[ 33.641098] ? sock_has_perm+0x10/0xa0
[ 33.641574] tcp_sendmsg+0x28/0x40
[ 33.641985] sock_sendmsg+0x57/0x60
[ 33.642411] sock_write_iter+0x97/0x100
[ 33.642876] new_sync_write+0x1b6/0x1d0
[ 33.643339] vfs_write+0xb6/0x1a0
[ 33.643752] ksys_write+0xa7/0xe0
[ 33.644156] do_syscall_64+0x5b/0x1b0
[ 33.644605] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by grabbing a reference to the bpf structure of each ancestor
on the initialization of the cgroup bpf structure, and dropping the
reference at the end of releasing the cgroup bpf structure.
This will restore the hierarchical order of cgroup bpf releasing,
without adding any operations on hot paths.
Thanks to Josef Bacik for the debugging and the initial analysis of
the problem.
Fixes: 4bfc0bb2c6 ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself")
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Michal Kubecek says:
====================
ethtool: allow nesting of begin() and complete() callbacks
The ethtool ioctl interface used to guarantee that ethtool_ops callbacks
were always called in a block between calls to ->begin() and ->complete()
(if these are defined) and that this whole block was executed with RTNL
lock held:
rtnl_lock();
ops->begin();
/* other ethtool_ops calls */
ops->complete();
rtnl_unlock();
This prevented any nesting or crossing of the begin-complete blocks.
However, this is no longer guaranteed even for ioctl interface as at least
ethtool_phys_id() releases RTNL lock while waiting for a timer. With the
introduction of netlink ethtool interface, the begin-complete pairs are
naturally nested e.g. when a request triggers a netlink notification.
Fortunately, only minority of networking drivers implements begin() and
complete() callbacks and most of those that do, fall into three groups:
- wrappers for pm_runtime_get_sync() and pm_runtime_put()
- wrappers for clk_prepare_enable() and clk_disable_unprepare()
- begin() checks netif_running() (fails if false), no complete()
First two have their own refcounting, third is safe w.r.t. nesting of the
blocks.
Only three in-tree networking drivers need an update to deal with nesting
of begin() and complete() calls: via-velocity and epic100 perform resume
and suspend on their own and wil6210 completely serializes the calls using
its own mutex (which would lead to a deadlock if a request request
triggered a netlink notification). The series addresses these problems.
changes between v1 and v2:
- fix inverted condition in epic100 ethtool_begin() (thanks to Andrew
Lunn)
====================
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
epic100 does not use standard refcounted infrastructure but sets device
sleep state directly.
With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.
To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike most networking drivers using begin() and complete() ethtool_ops
callbacks to resume a device which is down and suspend it again when done,
via-velocity does not use standard refcounted infrastructure but sets
device sleep state directly.
With the introduction of netlink ethtool interface, we may have nested
begin-complete blocks so that inner complete() would put the device back to
sleep for the rest of the outer block.
To avoid rewriting an old and not very actively developed driver, just add
a nesting counter and only perform resume and suspend on the outermost
level.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The wil6210 driver locks a mutex in begin() ethtool_ops callback and
unlocks it in complete() so that all ethtool requests are serialized. This
is not going to work correctly with netlink interface; e.g. when ioctl
triggers a netlink notification, netlink code would call begin() again
while the mutex taken by ioctl code is still held by the same task.
Let's get rid of the begin() and complete() callbacks and move the mutex
locking into the remaining ethtool_ops handlers except get_drvinfo which
only copies strings that are not changing so that there is no need for
serialization.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix calling multiple tee_client_close_context in case of shm allocation
fails.
Fixes: 246880958a (“firmware: broadcom: add OP-TEE based BNXT f/w manager”)
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent MD5 tests added duplicate configuration in the default VRF.
This change exposed a bug in existing tests designed to verify no
connection when client and server are not in the same domain. The
server should be running bound to the vrf device with the client run
in the default VRF (the -2 option is meant for validating connection
data). Fix the option for both tests.
While technically this is a bug in previous releases, the tests are
properly failing since the default VRF does not have any routing
configuration so there really is no need to backport to prior releases.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'gtp_encap_disable_sock(sk)' handles the case where sk is NULL, so there
is no need to test it before calling the function.
This saves a few line of code.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Disable checks in hardware pipeline
Amit says:
The hardware pipeline contains some checks that, by default, are
configured to drop packets. Since the software data path does not drop
packets due to these reasons and since we are interested in offloading
the software data path to hardware, then these checks should be disabled
in the hardware pipeline as well.
This patch set changes mlxsw to disable four of these checks and adds
corresponding selftests. The tests pass both when the software data path
is exercised (using veth pair) and when the hardware data path is
exercised (using mlxsw ports in loopback).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>