Eric Dumazet says:
====================
net: introduce u64_stats_t
KCSAN found a data-race in per-cpu u64 stats accounting.
(The stack traces are included in the 8th patch :
tun: switch to u64_stats_t)
This patch series first consolidate code in five patches.
Then the last three patches address the data-race resolution.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to fix the data-race found by KCSAN, we
can use the new u64_stats_t type and its accessors instead
of plain u64 fields. This will still generate optimal code
for both 32 and 64 bit platforms.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to fix this data-race found by KCSAN [1],
switch to u64_stats_t helpers. They provide all
the needed annotations, without adding extra cost.
[1]
BUG: KCSAN: data-race in tun_get_user / tun_net_get_stats64
read to 0xffffe8ffffd8aca8 of 8 bytes by task 4882 on cpu 0:
tun_net_get_stats64+0x9b/0x230 drivers/net/tun.c:1171
dev_get_stats+0x89/0x1e0 net/core/dev.c:9103
rtnl_fill_stats+0x56/0x370 net/core/rtnetlink.c:1177
rtnl_fill_ifinfo+0xd3b/0x2100 net/core/rtnetlink.c:1667
rtmsg_ifinfo_build_skb+0xb0/0x150 net/core/rtnetlink.c:3472
rtmsg_ifinfo_event.part.0+0x4e/0xb0 net/core/rtnetlink.c:3504
rtmsg_ifinfo_event net/core/rtnetlink.c:3515 [inline]
rtmsg_ifinfo+0x85/0x90 net/core/rtnetlink.c:3513
__dev_notify_flags+0x18b/0x200 net/core/dev.c:7649
dev_change_flags+0xb8/0xe0 net/core/dev.c:7691
dev_ifsioc+0x201/0x6a0 net/core/dev_ioctl.c:237
dev_ioctl+0x149/0x660 net/core/dev_ioctl.c:489
sock_do_ioctl+0xdb/0x230 net/socket.c:1061
sock_ioctl+0x3a3/0x5e0 net/socket.c:1189
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x991/0xc60 fs/ioctl.c:696
write to 0xffffe8ffffd8aca8 of 8 bytes by task 4883 on cpu 1:
tun_get_user+0x1d94/0x2ba0 drivers/net/tun.c:2002
tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
call_write_iter include/linux/fs.h:1895 [inline]
new_sync_write+0x388/0x4a0 fs/read_write.c:483
__vfs_write+0xb1/0xc0 fs/read_write.c:496
__kernel_write+0xb8/0x240 fs/read_write.c:515
write_pipe_buf+0xb6/0xf0 fs/splice.c:794
splice_from_pipe_feed fs/splice.c:500 [inline]
__splice_from_pipe+0x248/0x480 fs/splice.c:624
splice_from_pipe+0xbb/0x100 fs/splice.c:659
default_file_splice_write+0x45/0x90 fs/splice.c:806
do_splice_from fs/splice.c:848 [inline]
direct_splice_actor+0xa0/0xc0 fs/splice.c:1020
splice_direct_to_actor+0x215/0x510 fs/splice.c:975
do_splice_direct+0x161/0x1e0 fs/splice.c:1063
do_sendfile+0x384/0x7f0 fs/read_write.c:1464
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 4883 Comm: syz-executor.1 Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On 64bit arches, struct u64_stats_sync is empty and provides
no help against load/store tearing.
Using READ_ONCE()/WRITE_ONCE() would be needed.
But the update side would be slightly more expensive.
local64_t was defined so that we could use regular adds
in a manner which is atomic wrt IRQs.
However the u64_stats infra means we do not have to use
local64_t on 32bit arches since the syncp provides the needed
protection.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver can simply use the common infrastructure instead
of duplicating it.
This cleanup will ease u64_stats_t adoption in a single location.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cleanup will ease u64_stats_t adoption in a single location.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This cleanup will ease u64_stats_t adoption in a single location.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many network drivers need it and hand-coded the same function.
In order to ease u64_stats_t adoption, it is time to factorize.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many network drivers use hand-coded implementation of the same thing,
let's factorize things so that u64_stats_t adoption is done once.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: Demote MTU change prints to debug
This patch series demotes several drivers that printed MTU change and
could therefore spam the kernel console if one has a test that it's all
about testing the values. Intel drivers were not also particularly
consistent in how they printed the same message, so now they are.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the MTU can be a frequent operation and it is already clear
when (or not) a MTU change is successful, demote prints to debug prints.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Timur Tabi <timur@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing a network device MTU can be a fairly frequent operation, and
failure to change the MTU is reflected to user-space properly, both by
an appropriate message as well as by looking at whether the device's MTU
matches the configuration.
Demote the prints to debug prints by using netdev_dbg(), making all
Intel wired LAN drivers consistent, since they used a mixture of PCI
device and network device prints before.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update on more short variant for getting real clock in ns.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh says:
====================
Aquantia Marvell atlantic driver updates 11-2019
Here is a bunch of atlantic driver new features and updates.
Shortlist:
- Me adding ethtool private flags for various loopback test modes,
- Nikita is doing some work here on power management, implementing new PM API,
He also did some checkpatch style cleanup of older driver parts.
- I'm also adding a new UDP GSO offload support and flags for loopback activation
- We are now Marvell, so I am changing email addresses on maintainers list.
v2: styling, ip6 correct handling in udpgso
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Aquantia is now part of Marvell, eventually we'll cease standalone
aquantia.com domain. Thus, change the maintainers file and some other
references to @marvell.com domain
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
atlantic hardware does support UDP hardware segmentation offload.
This allows user to specify one large contiguous buffer with data
which then will be split automagically into multiple UDP packets
of specified size.
Bulk sending of large UDP streams lowers CPU usage and increases
bandwidth.
We did estimations both with udpgso_bench_tx test tool and with modified
iperf3 measurement tool (4 streams, multithread, 200b packet size)
over AQC<->AQC 10G link. Flow control is disabled to prevent RX side
impact on measurements.
No UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 200 -P4 --multithread
UDP GSO:
iperf3 -c 10.0.1.2 -u -b0 -l 12600 --udp-lso 200 -P4 --multithread
Mode CPU iperf speed Line speed Packets per second
-------------------------------------------------------------
NO UDP GSO 350% 3.07 Gbps 3.8 Gbps 1,919,419
SW UDP GSO 200% 5.55 Gbps 6.4 Gbps 3,286,144
HW UDP GSO 90% 6.80 Gbps 8.4 Gbps 4,273,117
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now differentiate requested and negotiated flow control
modes. Therefore `ethtool -A` now operates on local requested
FC values, and regular link settings shows the negotiated FC
settings.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are trying to follow the naming of the chip (atlantic), not
company. So replace some old namings.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thats a pure checkpatck walkthrough the code with no functional
changes. Reverse christmas tree, spacing, etc.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we add a number of ethtool private flags
to allow enabling various loopbacks on HW.
Thats useful for verification and bringup works.
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device FW has a separate memory area where various
config fields are stored and could be used by the
driver.
Here we modify download/upload infrastructure to
allow accessing this area.
Lateron this will be used to configure various behaviours
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
`ethtool -p eth0` will blink leds helping identify
physical port.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We add ethtool msglevel configuration and change some
printouts to use netdev_info set of functions.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now implement .driver.pm callbacks, these
allows driver to work correctly in hibernate
usecases, especially when used in conjunction with
WOL feature.
Before that driver only reacted to legacy .suspend/.resume
callbacks, that was a limitation in some cases.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wake on PHY allows to configure device to wakeup host
as soon as PHY link status is changed to active.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here we improve FW interface structures layout
and prepare these for the wake phy feature implementation.
Signed-off-by: Nikita Danilov <ndanilov@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Add layer 3 devlink-trap support
This patch set from Amit adds support in mlxsw for layer 3 traps that
can report drops and exceptions via devlink-trap.
In a similar fashion to the existing layer 2 traps, these traps can send
packets to the CPU that were not routed as intended by the underlying
device.
The traps are divided between the two types detailed in devlink-trap
documentation: drops and exceptions. Unlike drops, packets received via
exception traps are also injected to the kernel's receive path, as they
are required for the correct functioning of the control plane. For
example, packets trapped due to TTL error must be injected to kernel's
receive path for traceroute to work properly.
Patch set overview:
Patch #1 adds the layer 3 drop traps to devlink along with their
documentation.
Patch #2 adds support for layer 3 drop traps in mlxsw.
Patches #3-#5 add selftests for layer 3 drop traps.
Patch #6 adds the layer 3 exception traps to devlink along with their
documentation.
Patches #7-#9 gradually add support for layer 3 exception traps in
mlxsw.
Patches #10-#12 add selftests for layer 3 exception traps.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that each supported packet trap exception is triggered under the
right conditions.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an option to check that packets hit the tc filter without providing
the exact number of packets that should hit it.
It is useful while sending many packets in background and checking that
at least one of them hit the tc filter.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add common part of all the tests - check devlink status to ensure that
packets were trapped.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap IDs used to report layer 3 exceptions.
Trapped packets are first reported to devlink and then injected to the
kernel's receive path. All the packets have 'offload_fwd_mark' set in
order to prevent them from potentially being forwarded by the bridge
again.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw does not differentiate between these two cases of
routes with invalid nexthops:
1. Nexthops whose nexthop device is a mlxsw upper (has a RIF), but whose
neighbour could not be resolved
2. Nexthops whose nexthop device is not a mlxsw upper (e.g., management
interface)
Up until now this did not matter and mlxsw trapped packets for both
cases using the same trap ID. However, packets that should have been
routed in hardware (case 1), but incurred a problem are considered
exceptions and should be reported to the user. The two cases should
therefore be split between two different trap IDs.
Allocate a new adjacency entry during initialization and upon the
insertion of the first route with an invalid mlxsw nexthop, program this
entry to discard packets. Packets hitting this entry will be reported
using new trap ID - "DISCARD_ROUTER3".
In the future, the entry could be written during initialization, but
currently firmware requires a valid RIF, which is not available at this
stage.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, packets that cannot be routed in hardware (e.g., nexthop
device is not upper of mlxsw), are trapped to the kernel for forwarding.
Such packets are trapped using "RTR_INGRESS0" trap. This trap also traps
packets that hit reject routes (e.g., "unreachable") so that the kernel
will generate the appropriate ICMP error message for them.
Subsequent patch will need to only report to devlink packets that hit a
reject route, which is impossible as long as "RTR_INGRESS0" is
overloaded like that.
Solve this by using "RTR_INGRESS1" trap for packets that hit reject
routes.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add layer 3 generic packet exception traps that can report trapped
packets and documentation of the traps.
Unlike drop traps, these exception traps also need to inject the packet
to the kernel's receive path. For example, a packet that was trapped due
to unreachable neighbour need to be injected into the kernel so that it
will trigger an ARP request or a neighbour solicitation message.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that each supported packet trap is triggered under the right
conditions and that packets are indeed dropped and not forwarded.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add proto parameter in order to enable the use of devlink_trap_cleanup()
in tests that use IPv6 protocol.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2_drops_test() is used to check that drop traps are functioning as
intended. Currently it is only used in the layer 2 test, but it is also
useful for the layer 3 test introduced in the subsequent patch.
l2_drops_cleanup() is used to clean configurations and kill mausezahn
proccess.
Export the functions to the common devlink library to allow it to be
re-used by future tests.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the trap IDs and trap group used to report layer 3 drops. Register
layer 3 packet traps and associated layer 3 trap group with devlink
during driver initialization.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add packet traps that can report packets that were dropped during layer
3 forwarding.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return -EOPNOTSUPP instead of -EINVAL if the requested ioctl is not
implemented.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
drm-fixes-5.4-2019-11-06:
amdgpu:
- Fix navi14 display issue root cause and revert workaround
- GPU reset scheduler interaction fix
- Fix fan boost on multi-GPU
- Gfx10 and sdma5 fixes for navi
- GFXOFF fix for renoir
- Add navi14 PCI ID
- GPUVM fix for arcturus
radeon:
- Port an SI power fix from amdgpu
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191107032241.1021217-1-alexander.deucher@amd.com
- Do not use TBT type for non Type-C ports.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJdwz2eAAoJEPpiX2QO6xPK2BgH/14HVq7euW204goERrqwRjWF
11HaSYbT2yG12vbwrahNJjSejj7VQbhN+TF9Fe221WG1R3XYig1SF72tpmfanKNG
u10BEXHxxuTVSPos8TCQmrspUHUDCYRyfzbByrL/g7i2oMuO1pIaFsKkFN8weu9h
EzEAc+h5k/PGrB0pN2Ez0mVKYnKB1WYkgUvQaziHKUUHh1okyQgpJkKzPfoiGQRq
CWNsfsy+YZ8XJJp12HucE1S8faphgusX82e9DuhWLizb0WMIJElq8wx/iIaeCNsr
IFFl1sePZMshq4LXmhz15NS6cqiOXt50BRjCCgJD1b4mFsPytbbMedekDhBlPSc=
=csHm
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2019-11-06' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
- Fix HPD poll to avoid kworker consuming a lot of cpu cycles.
- Do not use TBT type for non Type-C ports.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191106213958.GA16525@intel.com
- Fix for a state dereference in atomic self-refresh helpers
- One compilation fix for c2p fbdev helpers
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXcPUHwAKCRDj7w1vZxhR
xfQ+AQDqa+ddvrlr9S0lAwv3R6iH8E9/uk1/PaKJEEyPFL6lQgEA7lNWpgWKuSDx
fpY3uqEhV1sNcCmBan968wjySi6BpwY=
=b9Yc
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2019-11-07-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
- Some new documentation for GEM shmem madvise helpers
- Fix for a state dereference in atomic self-refresh helpers
- One compilation fix for c2p fbdev helpers
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191107082215.GA34850@gilmour.lan
tcp_make_synack() already uses tcp_clock_ns(), and can pass
the value to cookie_init_timestamp() to avoid another call
to ktime_get_ns() helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add tests to verify routes with source address set are deleted when
source address is deleted.
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hendrik reported routes in the main table using source address are not
removed when the address is removed. The problem is that fib_sync_down_addr
does not account for devices in the default VRF which are associated
with the main table. Fix by updating the table id reference.
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Reported-by: Hendrik Donner <hd@os-cillation.de>
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking at a syzbot KCSAN report [1], I found multiple
issues in this code :
1) fib6_nh->last_probe has an initial value of 0.
While probably okay on 64bit kernels, this causes an issue
on 32bit kernels since the time_after(jiffies, 0 + interval)
might be false ~24 days after boot (for HZ=1000)
2) The data-race found by KCSAN
I could use READ_ONCE() and WRITE_ONCE(), but we also can
take the opportunity of not piling-up too many rt6_probe_deferred()
works by using instead cmpxchg() so that only one cpu wins the race.
[1]
BUG: KCSAN: data-race in find_match / find_match
write to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 1:
rt6_probe net/ipv6/route.c:663 [inline]
find_match net/ipv6/route.c:757 [inline]
find_match+0x5bd/0x790 net/ipv6/route.c:733
__find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
find_rr_leaf net/ipv6/route.c:852 [inline]
rt6_select net/ipv6/route.c:896 [inline]
fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
tcp_transmit_skb net/ipv4/tcp_output.c:1185 [inline]
tcp_xmit_probe_skb+0x19b/0x1d0 net/ipv4/tcp_output.c:3735
read to 0xffff8880bb7aabe8 of 8 bytes by interrupt on cpu 0:
rt6_probe net/ipv6/route.c:657 [inline]
find_match net/ipv6/route.c:757 [inline]
find_match+0x521/0x790 net/ipv6/route.c:733
__find_rr_leaf+0xe3/0x780 net/ipv6/route.c:831
find_rr_leaf net/ipv6/route.c:852 [inline]
rt6_select net/ipv6/route.c:896 [inline]
fib6_table_lookup+0x383/0x650 net/ipv6/route.c:2164
ip6_pol_route+0xee/0x5c0 net/ipv6/route.c:2200
ip6_pol_route_output+0x48/0x60 net/ipv6/route.c:2452
fib6_rule_lookup+0x3d6/0x470 net/ipv6/fib6_rules.c:117
ip6_route_output_flags_noref+0x16b/0x230 net/ipv6/route.c:2484
ip6_route_output_flags+0x50/0x1a0 net/ipv6/route.c:2497
ip6_dst_lookup_tail+0x25d/0xc30 net/ipv6/ip6_output.c:1049
ip6_dst_lookup_flow+0x68/0x120 net/ipv6/ip6_output.c:1150
inet6_csk_route_socket+0x2f7/0x420 net/ipv6/inet6_connection_sock.c:106
inet6_csk_xmit+0x91/0x1f0 net/ipv6/inet6_connection_sock.c:121
__tcp_transmit_skb+0xe81/0x1d60 net/ipv4/tcp_output.c:1169
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 18894 Comm: udevd Not tainted 5.4.0-rc3+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: cc3a86c802 ("ipv6: Change rt6_probe to take a fib6_nh")
Fixes: f547fac624 ("ipv6: rate-limit probes for neighbourless routes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the problem of the spin locks, originally
meant for the netpoll path of hns driver, causing deadlock in
the normal NAPI poll path. The issue happened due to the presence
of the stray leftover spin lock code related to the netpoll,
whose support was earlier removed from the HNS[1], got activated
due to enabling of NET_POLL_CONTROLLER switch.
Earlier background:
The netpoll handling code originally had this bug(as identified
by Marc Zyngier[2]) of wrong spin lock API being used which did
not disable the interrupts and hence could cause locking issues.
i.e. if the lock were first acquired in context to thread like
'ip' util and this lock if ever got later acquired again in
context to the interrupt context like TX/RX (Interrupts could
always pre-empt the lock holding task and acquire the lock again)
and hence could cause deadlock.
Proposed Solution:
1. If the netpoll was enabled in the HNS driver, which is not
right now, we could have simply used spin_[un]lock_irqsave()
2. But as netpoll is disabled, therefore, it is best to get rid
of the existing locks and stray code for now. This should
solve the problem reported by Marc.
[1] https://git.kernel.org/torvalds/c/4bd2c03be7
[2] https://patchwork.ozlabs.org/patch/1189139/
Fixes: 4bd2c03be7 ("net: hns: remove ndo_poll_controller")
Cc: lipeng <lipeng321@huawei.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Reported-by: Marc Zyngier <maz@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Tested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>