Vasily Averin says:
====================
netdev: seq_file .next functions should increase position index
In Aug 2018 NeilBrown noticed
commit 1f4aace60b ("fs/seq_file.c: simplify seq_file iteration code and interface")
"Some ->next functions do not increment *pos when they return NULL...
Note that such ->next functions are buggy and should be fixed.
A simple demonstration is
dd if=/proc/swaps bs=1000 skip=1
Choose any block size larger than the size of /proc/swaps. This will
always show the whole last line of /proc/swaps"
Described problem is still actual. If you make lseek into middle of last output line
following read will output end of last line and whole last line once again.
$ dd if=/proc/swaps bs=1 # usual output
Filename Type Size Used Priority
/dev/dm-0 partition 4194812 97536 -2
104+0 records in
104+0 records out
104 bytes copied
$ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice
dd: /proc/swaps: cannot skip to specified offset
v/dm-0 partition 4194812 97536 -2
/dev/dm-0 partition 4194812 97536 -2
3+1 records in
3+1 records out
131 bytes copied
There are lot of other affected files, I've found 30+ including
/proc/net/ip_tables_matches and /proc/sysvipc/*
This patch-set fixes files related to netdev@
https://bugzilla.kernel.org/show_bug.cgi?id=206283
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if seq_file .next fuction does not change position index,
read after some lseek can generate unexpected output.
https://bugzilla.kernel.org/show_bug.cgi?id=206283
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Latest commit 853697504d ("tcp: Fix highest_sack and highest_sack_seq")
apparently allowed syzbot to trigger various crashes in TCP stack [1]
I believe this commit only made things easier for syzbot to find
its way into triggering use-after-frees. But really the bugs
could lead to bad TCP behavior or even plain crashes even for
non malicious peers.
I have audited all calls to tcp_rtx_queue_unlink() and
tcp_rtx_queue_unlink_and_free() and made sure tp->highest_sack would be updated
if we are removing from rtx queue the skb that tp->highest_sack points to.
These updates were missing in three locations :
1) tcp_clean_rtx_queue() [This one seems quite serious,
I have no idea why this was not caught earlier]
2) tcp_rtx_queue_purge() [Probably not a big deal for normal operations]
3) tcp_send_synack() [Probably not a big deal for normal operations]
[1]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
BUG: KASAN: use-after-free in tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
Read of size 4 at addr ffff8880a488d068 by task ksoftirqd/1/16
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x197/0x210 lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374
__kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506
kasan_report+0x12/0x20 mm/kasan/common.c:639
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:134
tcp_highest_sack_seq include/net/tcp.h:1864 [inline]
tcp_highest_sack_seq include/net/tcp.h:1856 [inline]
tcp_check_sack_reordering+0x33c/0x3a0 net/ipv4/tcp_input.c:891
tcp_try_undo_partial net/ipv4/tcp_input.c:2730 [inline]
tcp_fastretrans_alert+0xf74/0x23f0 net/ipv4/tcp_input.c:2847
tcp_ack+0x2577/0x5bf0 net/ipv4/tcp_input.c:3710
tcp_rcv_established+0x6dd/0x1e90 net/ipv4/tcp_input.c:5706
tcp_v4_do_rcv+0x619/0x8d0 net/ipv4/tcp_ipv4.c:1619
tcp_v4_rcv+0x307f/0x3b40 net/ipv4/tcp_ipv4.c:2001
ip_protocol_deliver_rcu+0x5a/0x880 net/ipv4/ip_input.c:204
ip_local_deliver_finish+0x23b/0x380 net/ipv4/ip_input.c:231
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:442 [inline]
ip_rcv_finish+0x1db/0x2f0 net/ipv4/ip_input.c:428
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:538
__netif_receive_skb_one_core+0x113/0x1a0 net/core/dev.c:5148
__netif_receive_skb+0x2c/0x1d0 net/core/dev.c:5262
process_backlog+0x206/0x750 net/core/dev.c:6093
napi_poll net/core/dev.c:6530 [inline]
net_rx_action+0x508/0x1120 net/core/dev.c:6598
__do_softirq+0x262/0x98c kernel/softirq.c:292
run_ksoftirqd kernel/softirq.c:603 [inline]
run_ksoftirqd+0x8e/0x110 kernel/softirq.c:595
smpboot_thread_fn+0x6a3/0xa40 kernel/smpboot.c:165
kthread+0x361/0x430 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Allocated by task 10091:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc mm/kasan/common.c:513 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:486
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:521
slab_post_alloc_hook mm/slab.h:584 [inline]
slab_alloc_node mm/slab.c:3263 [inline]
kmem_cache_alloc_node+0x138/0x740 mm/slab.c:3575
__alloc_skb+0xd5/0x5e0 net/core/skbuff.c:198
alloc_skb_fclone include/linux/skbuff.h:1099 [inline]
sk_stream_alloc_skb net/ipv4/tcp.c:875 [inline]
sk_stream_alloc_skb+0x113/0xc90 net/ipv4/tcp.c:852
tcp_sendmsg_locked+0xcf9/0x3470 net/ipv4/tcp.c:1282
tcp_sendmsg+0x30/0x50 net/ipv4/tcp.c:1432
inet_sendmsg+0x9e/0xe0 net/ipv4/af_inet.c:807
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xd7/0x130 net/socket.c:672
__sys_sendto+0x262/0x380 net/socket.c:1998
__do_sys_sendto net/socket.c:2010 [inline]
__se_sys_sendto net/socket.c:2006 [inline]
__x64_sys_sendto+0xe1/0x1a0 net/socket.c:2006
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 10095:
save_stack+0x23/0x90 mm/kasan/common.c:72
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:335 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:474
kasan_slab_free+0xe/0x10 mm/kasan/common.c:483
__cache_free mm/slab.c:3426 [inline]
kmem_cache_free+0x86/0x320 mm/slab.c:3694
kfree_skbmem+0x178/0x1c0 net/core/skbuff.c:645
__kfree_skb+0x1e/0x30 net/core/skbuff.c:681
sk_eat_skb include/net/sock.h:2453 [inline]
tcp_recvmsg+0x1252/0x2930 net/ipv4/tcp.c:2166
inet_recvmsg+0x136/0x610 net/ipv4/af_inet.c:838
sock_recvmsg_nosec net/socket.c:886 [inline]
sock_recvmsg net/socket.c:904 [inline]
sock_recvmsg+0xce/0x110 net/socket.c:900
__sys_recvfrom+0x1ff/0x350 net/socket.c:2055
__do_sys_recvfrom net/socket.c:2073 [inline]
__se_sys_recvfrom net/socket.c:2069 [inline]
__x64_sys_recvfrom+0xe1/0x1a0 net/socket.c:2069
do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff8880a488d040
which belongs to the cache skbuff_fclone_cache of size 456
The buggy address is located 40 bytes inside of
456-byte region [ffff8880a488d040, ffff8880a488d208)
The buggy address belongs to the page:
page:ffffea0002922340 refcount:1 mapcount:0 mapping:ffff88821b057000 index:0x0
raw: 00fffe0000000200 ffffea00022a5788 ffffea0002624a48 ffff88821b057000
raw: 0000000000000000 ffff8880a488d040 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880a488cf00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880a488cf80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a488d000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
^
ffff8880a488d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880a488d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Fixes: 853697504d ("tcp: Fix highest_sack and highest_sack_seq")
Fixes: 50895b9de1 ("tcp: highest_sack fix")
Fixes: 737ff31456 ("tcp: use sequence distance to detect reordering")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cambda Zhu <cambda@linux.alibaba.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a printk message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a pr_warn message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a IP_VS_ERR_RL message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a hw_dbg message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reported an out-of-bound access in em_nbyte. As initially
analyzed by Eric, this is because em_nbyte sets its own em->datalen
in em_nbyte_change() other than the one specified by user, but this
value gets overwritten later by its caller tcf_em_validate().
We should leave em->datalen untouched to respect their choices.
I audit all the in-tree ematch users, all of those implement
->change() set em->datalen, so we can just avoid setting it twice
in this case.
Reported-and-tested-by: syzbot+5af9a90dad568aa9f611@syzkaller.appspotmail.com
Reported-by: syzbot+2f07903a5b05e7f36410@syzkaller.appspotmail.com
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain says:
====================
Fixes for SONIC ethernet driver
Various SONIC driver problems have become apparent over the years,
including tx watchdog timeouts, lost packets and duplicated packets.
The problems are mostly caused by bugs in buffer handling, locking and
(re-)initialization code.
This patch series resolves these problems.
This series has been tested on National Semiconductor hardware (macsonic),
qemu-system-m68k (macsonic) and qemu-system-mips64el (jazzsonic).
The emulated dp8393x device used in QEMU also has bugs.
I have fixed the bugs that I know of in a series of patches at,
https://github.com/fthain/qemu/commits/sonic
Changed since v1:
- Minor revisions as described in commit logs.
- Deferred net-next patches.
Changed since v2:
- Minor revisions as described in commit logs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Section 5.5.3.2 of the datasheet says,
If FIFO Underrun, Byte Count Mismatch, Excessive Collision, or
Excessive Deferral (if enabled) errors occur, transmission ceases.
In this situation, the chip asserts a TXER interrupt rather than TXDN.
But the handler for the TXDN is the only way that the transmit queue
gets restarted. Hence, an aborted transmission can result in a watchdog
timeout.
This problem can be reproduced on congested link, as that can result in
excessive transmitter collisions. Another way to reproduce this is with
a FIFO Underrun, which may be caused by DMA latency.
In event of a TXER interrupt, prevent a watchdog timeout by restarting
transmission.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Section 4.3.1 of the datasheet says,
This bit [TXP] must not be set if a Load CAM operation is in
progress (LCAM is set). The SONIC will lock up if both bits are
set simultaneously.
Testing has shown that the driver sometimes attempts to set LCAM
while TXP is set. Avoid this by waiting for command completion
before and after giving the LCAM command.
After issuing the Load CAM command, poll for !SONIC_CR_LCAM rather than
SONIC_INT_LCD, because the SONIC_CR_TXP bit can't be used until
!SONIC_CR_LCAM.
When in reset mode, take the opportunity to reset the CAM Enable
register.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several issues relating to command register usage during
chip initialization.
Firstly, the SONIC sometimes comes out of software reset with the
Start Timer bit set. This gets logged as,
macsonic macsonic eth0: sonic_init: status=24, i=101
Avoid this by giving the Stop Timer command earlier than later.
Secondly, the loop that waits for the Read RRA command to complete has
the break condition inverted. That's why the for loop iterates until
its termination condition. Call the helper for this instead.
Finally, give the Receiver Enable command after clearing interrupts,
not before, to avoid the possibility of losing an interrupt.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the SONIC's DMA engine is idle before altering the transmit
and receive descriptors. Add a helper for this as it will be needed
again.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As soon as the driver is finished with a receive buffer it allocs a new
one and overwrites the corresponding RRA entry with a new buffer pointer.
Problem is, the buffer pointer is split across two word-sized registers.
It can't be updated in one atomic store. So this operation races with the
chip while it stores received packets and advances its RRP register.
This could result in memory corruption by a DMA write.
Avoid this problem by adding buffers only at the location given by the
RWP register, in accordance with the National Semiconductor datasheet.
Re-factor this code into separate functions to calculate a RRA pointer
and to update the RWP.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
After sonic_tx_timeout() calls sonic_init(), it can happen that
sonic_rx() will subsequently encounter a receive descriptor with no
flags set. Remove the comment that says that this can't happen.
When giving a receive descriptor to the SONIC, clear the descriptor
status field. That way, any rx descriptor with flags set can only be
a newly received packet.
Don't process a descriptor without the LPKT bit set. The buffer is
still in use by the SONIC.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The while loop in sonic_rx() traverses the rx descriptor ring. It stops
when it reaches a descriptor that the SONIC has not used. Each iteration
advances the EOL flag so the SONIC can keep using more descriptors.
Therefore, the while loop has no definite termination condition.
The algorithm described in the National Semiconductor literature is quite
different. It consumes descriptors up to the one with its EOL flag set
(which will also have its "in use" flag set). All freed descriptors are
then returned to the ring at once, by adjusting the EOL flags (and link
pointers).
Adopt the algorithm from datasheet as it's simpler, terminates quickly
and avoids a lot of pointless descriptor EOL flag changes.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SONIC can sometimes advance its rx buffer pointer (RRP register)
without advancing its rx descriptor pointer (CRDA register). As a result
the index of the current rx descriptor may not equal that of the current
rx buffer. The driver mistakenly assumes that they are always equal.
This assumption leads to incorrect packet lengths and possible packet
duplication. Avoid this by calling a new function to locate the buffer
corresponding to a given descriptor.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tx_aborted_errors statistic should count packets flagged with EXD,
EXC, FU, or BCM bits because those bits denote an aborted transmission.
That corresponds to the bitmask 0x0446, not 0x0642. Use macros for these
constants to avoid mistakes. Better to leave out FIFO Underruns (FU) as
there's a separate counter for that purpose.
Don't lump all these errors in with the general tx_errors counter as
that's used for tx timeout events.
On the rx side, don't count RDE and RBAE interrupts as dropped packets.
These interrupts don't indicate a lost packet, just a lack of resources.
When a lack of resources results in a lost packet, this gets reported
in the rx_missed_errors counter (along with RFO events).
Don't double-count rx_frame_errors and rx_crc_errors.
Don't use the general rx_errors counter for events that already have
special counters.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver accesses descriptor memory which is simultaneously accessed by
the chip, so the compiler must not be allowed to re-order CPU accesses.
sonic_buf_get() used 'volatile' to prevent that. sonic_buf_put() should
have done so too but was overlooked.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip can change a packet's descriptor status flags at any time.
However, an active interrupt flag gets cleared rather late. This
allows a race condition that could theoretically lose an interrupt.
Fix this by clearing asserted interrupt flags immediately.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netif_stop_queue() call in sonic_send_packet() races with the
netif_wake_queue() call in sonic_interrupt(). This causes issues
like "NETDEV WATCHDOG: eth0 (macsonic): transmit queue 0 timed out".
Fix this by disabling interrupts when accessing tx_skb[] and next_tx.
Update a comment to clarify the synchronization properties.
Fixes: efcce83936 ("[PATCH] macsonic/jazzsonic network drivers update")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the only 10G PHY interface type defined at the moment the code
was developed was XGMII, although the PHY interface mode used was
not XGMII, XGMII was used in the code to denote 10G. This patch
renames the 10G interface mode to remove the ambiguity.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Madalin Bucur says:
====================
net: fsl/fman: address erratum A011043
This addresses a HW erratum on some QorIQ DPAA devices.
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
When the issue was present, one could see such errors
during boot:
mdio_bus ffe4e5000: Error while reading PHY0 reg at 3.3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When fsl,erratum-a011043 is set, adjust for erratum A011043:
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add fsl,erratum-a011043 to internal MDIO buses.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an entry for erratum A011043: the MDIO_CFG[MDIO_RD_ER]
bit may be falsely set when reading internal PCS registers.
MDIO reads to internal PCS registers may result in having
the MDIO_CFG[MDIO_RD_ER] bit set, even when there is no
error and read data (MDIO_DATA[MDIO_DATA]) is correct.
Software may get false read error when reading internal
PCS registers through MDIO. As a workaround, all internal
MDIO accesses should ignore the MDIO_CFG[MDIO_RD_ER] bit.
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver while collecting firmware dump takes longer time to
collect/process some of the firmware dump entries/memories.
Bigger capture masks makes it worse as it results in larger
amount of data being collected and results in CPU soft lockup.
Place cond_resched() in some of the driver flows that are
expectedly time consuming to relinquish the CPU to avoid CPU
soft lockup panic.
Signed-off-by: Shahed Shaikh <shshaikh@marvell.com>
Tested-by: Yonggen Xu <Yonggen.Xu@dell.com>
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Primarily bugfixes, mostly around handling index wrap-around correctly.
A couple of doc fixes and adding missing APIs.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl4pJv4ACgkQDpNsjXcp
gj5OXgf/Q52Am0tLS6iC7XtcojjidnSFzpAqn9OGo7zuTsP+QURsiZAXJpjvUF8y
kGIcD9R/tobEADkyDnj8D64yO+CqT1PlkaYvO5R+nYx/WTtmvr3/O/7Y4/ZTAwcf
tTvj3yXlKbsA+9YJa34t43Ul4EDCOrQTz+oIZUQsiaRNzIdmCCguLg5gOkIhoXt6
wJwp2Fc3YN6McBBpZH2z0GRwdJWPcixtKH0RNhOyGUiWBhkWYFXNvNEdxKJmTXdK
qqkoW9So8IkyyoXdW8ZpQqCCk9tPRv9+7adDFVwl/T8q/RBrHNtcyQqJL3EsScsi
9IqCrKldzRenk0v6lG+mgJEuTvYs8A==
=7cpi
-----END PGP SIGNATURE-----
Merge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax
Pull XArray fixes from Matthew Wilcox:
"Primarily bugfixes, mostly around handling index wrap-around
correctly.
A couple of doc fixes and adding missing APIs.
I had an oops live on stage at linux.conf.au this year, and it turned
out to be a bug in xas_find() which I can't prove isn't triggerable in
the current codebase. Then in looking for the bug, I spotted two more
bugs.
The bots have had a few days to chew on this with no problems
reported, and it passes the test-suite (which now has more tests to
make sure these problems don't come back)"
* tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax:
XArray: Add xa_for_each_range
XArray: Fix xas_find returning too many entries
XArray: Fix xa_find_after with multi-index entries
XArray: Fix infinite loop with entry at ULONG_MAX
XArray: Add wrappers for nested spinlocks
XArray: Improve documentation of search marks
XArray: Fix xas_pause at ULONG_MAX
- Fix a function comparison warning for a xen trace event macro
- Fix a double perf_event linking to a trace_uprobe_filter for multiple events
- Fix suspicious RCU warnings in trace event code for using
list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
- Fix a bug in the histogram code when using the same variable
- Fix a NULL pointer dereference when tracefs lockdown enabled and calling
trace_set_default_clock()
This v2 version contains:
- A fix to a bug found with the double perf_event linking patch
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXinakBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhNZAQCi86p9eW3f3w7hM2hZcirC+mQKVZgp
2rO4zIAK5V6G7gEAh6I7VZa50a6AE647ZjryE7ufTRUhmSFMWoG0kcJ7OAk=
=/J9n
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- Fix a function comparison warning for a xen trace event macro
- Fix a double perf_event linking to a trace_uprobe_filter for
multiple events
- Fix suspicious RCU warnings in trace event code for using
list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
- Fix a bug in the histogram code when using the same variable
- Fix a NULL pointer dereference when tracefs lockdown enabled and
calling trace_set_default_clock()
- A fix to a bug found with the double perf_event linking patch"
* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
tracing: Do not set trace clock if tracefs lockdown is in effect
tracing: Fix histogram code when expression has same var as value
tracing: trigger: Replace unneeded RCU-list traversals
tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
tracing: xen: Ordered comparison of function pointers
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl4p1+MTHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi4YtCACPHyE8aoDTHZF8UZ9bHKNFVt4C1bRx
ihFB6/PzmIfFw4Cbf+yTW85q3zqJ/6eJIOZF4dlwoFWK+osSk8sYRaOvlEovysbR
sYiAbcOxePj9tSPdrWLYB/5ELtwMTloxBo7mPiJYt127UntWlPGfiz4sdHJBt1zI
IBPOIeACJKGe0+Wtj0mGsXk+WhEB3nFk2DINnLuFc4tG6yXkFNq5/fnXrgVTlUTF
4EwDQgHBUIqKDJarSyIBzud6VVshS7VaMAu8h9kwPScN4sG1y4ucgFzXIc4JfqRN
TnEV48hdRQMVuQtsvuzAMPQvsjMlIXUSTGZzs4XPbEBjgAP8+MP+PJvL
=XVg1
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client
Pull ceph fix from Ilya Dryomov:
"A fix for a potential use-after-free from Jeff, marked for stable"
* tag 'ceph-for-5.5-rc8' of https://github.com/ceph/ceph-client:
ceph: hold extra reference to r_parent over life of request
Prevent the kernel from crashing during resume from hibernation
if free pages contain leftover data from the restore kernel and
init_on_free is set (Alexander Potapenko).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl4psjQSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxpOcP/1UTGUr+VdGfBBjG5WlgCY0Jrd50Y78b
RoVNDR/NvSVSuIs44AgrnfyQiz2Y8jG6qY1iSAbIXmFl37/4+kGhkNmd6pV/xFUc
TdZZotFwFlRQjmeQxxH0kNXuAY6nJ2RwELWrjXqM8PuNjNKIEpfS+0fSaWexHqIm
MDArxcDHkvZU5SnnRQM+LkT/EmbEheB7tgm7vGGqMLsSKc0gUsBmVCURe/lLAH5o
EUKX4FI2jCy+LlmSdZ3EDjf1cstm3YXLiegTLSq1Jh3mFHXkFTwJMmidiz21qXJh
Hc4r3iG0NZ37J8HXpwuq++KlhvNbhHJz+ZgC1IYls16RNYh5mUzxtMHWdSyyqlrW
+z8gBVUyeJUYos5Kjb/NKSt43gnz7Uhy0UVbQXD66hgajXe71CQZQq/D5CMeTdJL
jWNaeGYnhskz3IW2vnrs9Ucf6RHHWezXk51kVsyJXadiLhTdOv7DKahDKVwC/Hvf
kyN1W0F5PZpF50yYmnhJgqDfxkGBNKpwXxTAGk6X0WQFaWeh/2FkX045UdJzBbHu
fa1taTM/5RfPlbWq0wLPeHHSP4M2I0ndeWXZk88vUwwMfm9Wo+FNBhgs/EZfeuKD
16sVMsX0r7R1bG+hEj+mvNeLWqfgz7MpGludCkV1dHIDkn2esxx6JqaWxR/UnLHb
D3fZM/cKGdpY
=kDFG
-----END PGP SIGNATURE-----
Merge tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Prevent the kernel from crashing during resume from hibernation if
free pages contain leftover data from the restore kernel and
init_on_free is set (Alexander Potapenko)"
* tag 'pm-5.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: hibernate: fix crashes with init_on_free=1
In commit 9f79b78ef7 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef7 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8a23eb804c ("Make filldir[64]() verify the directory entry
filename is valid") added some minimal validity checks on the directory
entries passed to filldir[64](). But they really were pretty minimal.
This fleshes out at least the name length check: we used to disallow
zero-length names, but really, negative lengths or oevr-long names
aren't ok either. Both could happen if there is some filesystem
corruption going on.
Now, most filesystems tend to use just an "unsigned char" or similar for
the length of a directory entry name, so even with a corrupt filesystem
you should never see anything odd like that. But since we then use the
name length to create the directory entry record length, let's make sure
it actually is half-way sensible.
Note how POSIX states that the size of a path component is limited by
NAME_MAX, but we actually use PATH_MAX for the check here. That's
because while NAME_MAX is generally the correct maximum name length
(it's 255, for the same old "name length is usually just a byte on
disk"), there's nothing in the VFS layer that really cares.
So the real limitation at a VFS layer is the total pathname length you
can pass as a filename: PATH_MAX.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Should work properly with the latest sbios on 5.5 and newer
kernels.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Set d0 and d1 pin directions for spi0 and spi1 as per their pinmux.
Signed-off-by: Raag Jadav <raagjadav@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
When submitting v2 of "fou: Support binding FoU socket" (1713cb37bf),
I accidentally sent the wrong version of the patch and one fix was
missing. In the initial version of the patch, as well as the version 2
that I submitted, I incorrectly used ".type" for the two V6-attributes.
The correct is to use ".len".
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 1713cb37bf ("fou: Support binding FoU socket")
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second set of fixes for v5.5. There are quite a few patches,
especially on iwlwifi, due to me being on a long break. Libertas also
has a security fix and mt76 a build fix.
iwlwifi
* don't send the PPAG command when PPAG is disabled, since it can cause problems
* a few fixes for a HW bug
* a fix for RS offload;
* a fix for 3168 devices where the NVM tables where the wrong tables were being read
* fix a couple of potential memory leaks in TXQ code
* disable L0S states in all hardware since our hardware doesn't
officially support them anymore (and older versions of the hardware
had instability in these states)
* remove lar_disable parameter since it has been causing issues for
some people who erroneously disable it
* force the debug monitor HW to stop also when debug is disabled,
since it sometimes stays on and prevents low system power states
* don't send IWL_MVM_RXQ_NSSN_SYNC notification due to DMA problems
libertas
* fix two buffer overflows
mt76
* build fix related to CONFIG_MT76_LEDS
* fix off by one in bitrates handling
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJeKX2wAAoJEG4XJFUm622bG8QH/3+QXaSJAVqzbQh2boy7aS88
xhrrnh+bSqWY0ChMk0Z73RF8Ek0WlO+os4uN1cbWIWujrdQUPbTBtOwX4d0TzTue
E3tBFPiHTlVtU43z1bsprA+6EE7fqt/H2lWtlxk0IHzeiQY9NcB6BlDKKCzk5Hib
aMb5HCQy4JmSK83E60HLM9L4nEmEP+yveaKL7uaAZw+qkmyk2mT6um0TlmOYVoNG
9V6k3OZto8LvyV6jKPZgVI6QBATnwHDxlWgooYRj54PuCj9hTbR2mcuUL2QyQeze
AX2QNI+1kWIrAiDaU/lOj8579SiUl36iqtuKmtLhDnSe1GxDkrzmawtz3aGDm4k=
=VZox
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-2020-01-23' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for v5.5
Second set of fixes for v5.5. There are quite a few patches,
especially on iwlwifi, due to me being on a long break. Libertas also
has a security fix and mt76 a build fix.
iwlwifi
* don't send the PPAG command when PPAG is disabled, since it can cause problems
* a few fixes for a HW bug
* a fix for RS offload;
* a fix for 3168 devices where the NVM tables where the wrong tables were being read
* fix a couple of potential memory leaks in TXQ code
* disable L0S states in all hardware since our hardware doesn't
officially support them anymore (and older versions of the hardware
had instability in these states)
* remove lar_disable parameter since it has been causing issues for
some people who erroneously disable it
* force the debug monitor HW to stop also when debug is disabled,
since it sometimes stays on and prevents low system power states
* don't send IWL_MVM_RXQ_NSSN_SYNC notification due to DMA problems
libertas
* fix two buffer overflows
mt76
* build fix related to CONFIG_MT76_LEDS
* fix off by one in bitrates handling
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If both IFF_NAPI_FRAGS mode and XDP are enabled, and the XDP program
consumes the skb, we need to clear the napi.skb (or risk
a use-after-free) and release the mutex (or risk a deadlock)
WARNING: lock held when returning to user space!
5.5.0-rc6-syzkaller #0 Not tainted
------------------------------------------------
syz-executor.0/455 is leaving the kernel with locks still held!
1 lock held by syz-executor.0/455:
#0: ffff888098f6e748 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x1604/0x3fc0 drivers/net/tun.c:1835
Fixes: 90e33d4594 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Petar Penkov <ppenkov@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During reload (or module unload), the router block is de-initialized.
Among other things, this results in the removal of a default multicast
route from each active virtual router (VRF). These default routes are
configured during initialization to trap packets to the CPU. In
Spectrum-2, unlike Spectrum-1, multicast routes are implemented using
ACL rules.
Since the router block is de-initialized before the ACL block, it is
possible that the ACL rules corresponding to the default routes are
deleted while being accessed by the ACL delayed work that queries rules'
activity from the device. This can result in a rare use-after-free [1].
Fix this by protecting the rules list accessed by the delayed work with
a lock. We cannot use a spinlock as the activity read operation is
blocking.
[1]
[ 123.331662] ==================================================================
[ 123.339920] BUG: KASAN: use-after-free in mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0
[ 123.349381] Read of size 8 at addr ffff8881f3bb4520 by task kworker/0:2/78
[ 123.357080]
[ 123.358773] CPU: 0 PID: 78 Comm: kworker/0:2 Not tainted 5.5.0-rc5-custom-33108-gf5df95d3ef41 #2209
[ 123.368898] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
[ 123.378456] Workqueue: mlxsw_core mlxsw_sp_acl_rule_activity_update_work
[ 123.385970] Call Trace:
[ 123.388734] dump_stack+0xc6/0x11e
[ 123.392568] print_address_description.constprop.4+0x21/0x340
[ 123.403236] __kasan_report.cold.8+0x76/0xb1
[ 123.414884] kasan_report+0xe/0x20
[ 123.418716] mlxsw_sp_acl_rule_activity_update_work+0x330/0x3b0
[ 123.444034] process_one_work+0xb06/0x19a0
[ 123.453731] worker_thread+0x91/0xe90
[ 123.467348] kthread+0x348/0x410
[ 123.476847] ret_from_fork+0x24/0x30
[ 123.480863]
[ 123.482545] Allocated by task 73:
[ 123.486273] save_stack+0x19/0x80
[ 123.490000] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 123.495379] mlxsw_sp_acl_rule_create+0xa7/0x230
[ 123.500566] mlxsw_sp2_mr_tcam_route_create+0xf6/0x3e0
[ 123.506334] mlxsw_sp_mr_tcam_route_create+0x5b4/0x820
[ 123.512102] mlxsw_sp_mr_table_create+0x3b5/0x690
[ 123.517389] mlxsw_sp_vr_get+0x289/0x4d0
[ 123.521797] mlxsw_sp_fib_node_get+0xa2/0x990
[ 123.526692] mlxsw_sp_router_fib4_event_work+0x54c/0x2d60
[ 123.532752] process_one_work+0xb06/0x19a0
[ 123.537352] worker_thread+0x91/0xe90
[ 123.541471] kthread+0x348/0x410
[ 123.545103] ret_from_fork+0x24/0x30
[ 123.549113]
[ 123.550795] Freed by task 518:
[ 123.554231] save_stack+0x19/0x80
[ 123.557958] __kasan_slab_free+0x125/0x170
[ 123.562556] kfree+0xd7/0x3a0
[ 123.565895] mlxsw_sp_acl_rule_destroy+0x63/0xd0
[ 123.571081] mlxsw_sp2_mr_tcam_route_destroy+0xd5/0x130
[ 123.576946] mlxsw_sp_mr_tcam_route_destroy+0xba/0x260
[ 123.582714] mlxsw_sp_mr_table_destroy+0x1ab/0x290
[ 123.588091] mlxsw_sp_vr_put+0x1db/0x350
[ 123.592496] mlxsw_sp_fib_node_put+0x298/0x4c0
[ 123.597486] mlxsw_sp_vr_fib_flush+0x15b/0x360
[ 123.602476] mlxsw_sp_router_fib_flush+0xba/0x470
[ 123.607756] mlxsw_sp_vrs_fini+0xaa/0x120
[ 123.612260] mlxsw_sp_router_fini+0x137/0x384
[ 123.617152] mlxsw_sp_fini+0x30a/0x4a0
[ 123.621374] mlxsw_core_bus_device_unregister+0x159/0x600
[ 123.627435] mlxsw_devlink_core_bus_device_reload_down+0x7e/0xb0
[ 123.634176] devlink_reload+0xb4/0x380
[ 123.638391] devlink_nl_cmd_reload+0x610/0x700
[ 123.643382] genl_rcv_msg+0x6a8/0xdc0
[ 123.647497] netlink_rcv_skb+0x134/0x3a0
[ 123.651904] genl_rcv+0x29/0x40
[ 123.655436] netlink_unicast+0x4d4/0x700
[ 123.659843] netlink_sendmsg+0x7c0/0xc70
[ 123.664251] __sys_sendto+0x265/0x3c0
[ 123.668367] __x64_sys_sendto+0xe2/0x1b0
[ 123.672773] do_syscall_64+0xa0/0x530
[ 123.676892] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 123.682552]
[ 123.684238] The buggy address belongs to the object at ffff8881f3bb4500
[ 123.684238] which belongs to the cache kmalloc-128 of size 128
[ 123.698261] The buggy address is located 32 bytes inside of
[ 123.698261] 128-byte region [ffff8881f3bb4500, ffff8881f3bb4580)
[ 123.711303] The buggy address belongs to the page:
[ 123.716682] page:ffffea0007ceed00 refcount:1 mapcount:0 mapping:ffff888236403500 index:0x0
[ 123.725958] raw: 0200000000000200 dead000000000100 dead000000000122 ffff888236403500
[ 123.734646] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 123.743315] page dumped because: kasan: bad access detected
[ 123.749562]
[ 123.751241] Memory state around the buggy address:
[ 123.756620] ffff8881f3bb4400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 123.764716] ffff8881f3bb4480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 123.772812] >ffff8881f3bb4500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 123.780904] ^
[ 123.785697] ffff8881f3bb4580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 123.793793] ffff8881f3bb4600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 123.801883] ==================================================================
Fixes: cf7221a4f5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0034d395f8 ("powerpc/mm/hash64: Map all the kernel regions in
the same 0xc range") has a bug in the definition of MIN_USER_CONTEXT.
The result is that the context id used for the vmemmap and the lowest
context id handed out to userspace are the same. The context id is
essentially the process identifier as far as the first stage of the
MMU translation is concerned.
This can result in multiple SLB entries with the same VSID (Virtual
Segment ID), accessible to the kernel and some random userspace
process that happens to get the overlapping id, which is not expected
eg:
07 c00c000008000000 40066bdea7000500 1T ESID= c00c00 VSID= 66bdea7 LLP:100
12 0002000008000000 40066bdea7000d80 1T ESID= 200 VSID= 66bdea7 LLP:100
Even though the user process and the kernel use the same VSID, the
permissions in the hash page table prevent the user process from
reading or writing to any kernel mappings.
It can also lead to SLB entries with different base page size
encodings (LLP), eg:
05 c00c000008000000 00006bde0053b500 256M ESID=c00c00000 VSID= 6bde0053b LLP:100
09 0000000008000000 00006bde0053bc80 256M ESID= 0 VSID= 6bde0053b LLP: 0
Such SLB entries can result in machine checks, eg. as seen on a G5:
Oops: Machine check, sig: 7 [#1]
BE PAGE SIZE=64K MU-Hash SMP NR_CPUS=4 NUMA Power Mac
NIP: c00000000026f248 LR: c000000000295e58 CTR: 0000000000000000
REGS: c0000000erfd3d70 TRAP: 0200 Tainted: G M (5.5.0-rcl-gcc-8.2.0-00010-g228b667d8ea1)
MSR: 9000000000109032 <SF,HV,EE,ME,IR,DR,RI> CR: 24282048 XER: 00000000
DAR: c00c000000612c80 DSISR: 00000400 IRQMASK: 0
...
NIP [c00000000026f248] .kmem_cache_free+0x58/0x140
LR [c088000008295e58] .putname 8x88/0xa
Call Trace:
.putname+0xB8/0xa
.filename_lookup.part.76+0xbe/0x160
.do_faccessat+0xe0/0x380
system_call+0x5c/ex68
This happens with 256MB segments and 64K pages, as the duplicate VSID
is hit with the first vmemmap segment and the first user segment, and
older 32-bit userspace maps things in the first user segment.
On other CPUs a machine check is not seen. Instead the userspace
process can get stuck continuously faulting, with the fault never
properly serviced, due to the kernel not understanding that there is
already a HPTE for the address but with inaccessible permissions.
On machines with 1T segments we've not seen the bug hit other than by
deliberately exercising it. That seems to be just a matter of luck
though, due to the typical layout of the user virtual address space
and the ranges of vmemmap that are typically populated.
To fix it we add 2 to MIN_USER_CONTEXT. This ensures the lowest
context given to userspace doesn't overlap with the VMEMMAP context,
or with the context for INVALID_REGION_ID.
Fixes: 0034d395f8 ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
Cc: stable@vger.kernel.org # v5.2+
Reported-by: Christian Marillat <marillat@debian.org>
Reported-by: Romain Dolbeau <romain@dolbeau.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Account for INVALID_REGION_ID, mostly rewrite change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200123102547.11623-1-mpe@ellerman.id.au
Hayes Wang says:
====================
r8152: serial fixes
v3:
1. Fix the typos for patch #5 and #6.
2. Modify the commit message of patch #9.
v2:
For patch #2, move declaring the variable "ocp_data".
v1:
These patches are used to fix some issues for RTL8153.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>