linux_dsm_epyc7002/fs
Linus Torvalds 45bcc21a50 pipe: do FASYNC notifications for every pipe IO, not just state changes
commit fe67f4dd8daa252eb9aa7acb61555f3cc3c1ce4c upstream.

It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.

Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable).  User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.

But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).

The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.

This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong.  What matters is that it used to work.

So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes.  It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.

Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a664 and 1b6b26ae70 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed.  Probably because the stress-ng problem case ends up
being timing-dependent too.

It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").

And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.

Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case.  FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.

Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05 19:00:51 +02:00
..
9p
adfs
affs
afs afs: Fix tracepoint string placement with built-in AFS 2021-07-28 14:35:41 +02:00
aufs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
autofs
befs
bfs
btrfs btrfs: fix race between marking inode needs to be logged and log syncing 2024-07-05 19:00:50 +02:00
cachefiles
ceph init: add dsm gpl source 2024-07-05 18:00:04 +02:00
cifs cifs: create sd context must be a multiple of 8 2024-07-05 18:53:59 +02:00
coda
configfs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
cramfs
crypto fscrypt: fix derivation of SipHash keys on big endian CPUs 2021-07-14 16:56:53 +02:00
debugfs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
devpts
dlm fs: dlm: fix memory leak when fenced 2021-07-14 16:55:59 +02:00
ecryptfs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
efivarfs
efs
erofs erofs: fix error return code in erofs_read_superblock() 2021-07-14 16:56:53 +02:00
exfat init: add dsm gpl source 2024-07-05 18:00:04 +02:00
exportfs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
ext2
ext4 ext4: fix potential htree corruption when growing large_dir directories 2024-07-05 18:52:29 +02:00
f2fs f2fs: Show casefolding support only when supported 2021-07-25 14:36:17 +02:00
fat init: add dsm gpl source 2024-07-05 18:00:04 +02:00
freevxfs
fscache
fuse init: add dsm gpl source 2024-07-05 18:00:04 +02:00
gfs2 gfs2: Fix error handling in init_statfs 2021-07-14 16:55:38 +02:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-31 08:16:12 +02:00
hfsplus init: add dsm gpl source 2024-07-05 18:00:04 +02:00
hostfs
hpfs
hugetlbfs hugetlbfs: fix mount mode command line processing 2021-07-28 14:35:46 +02:00
iomap init: add dsm gpl source 2024-07-05 18:00:04 +02:00
isofs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
jbd2
jffs2
jfs fs/jfs: Fix missing error code in lmLogInit() 2021-07-20 16:05:40 +02:00
kernfs
lockd init: add dsm gpl source 2024-07-05 18:00:04 +02:00
minix
nfs init: add dsm gpl source 2024-07-05 18:00:04 +02:00
nfs_common
nfsd init: add dsm gpl source 2024-07-05 18:00:04 +02:00
nilfs2 nilfs2: fix memory leak in nilfs_sysfs_delete_device_group 2021-06-30 08:47:24 -04:00
nls
notify init: add dsm gpl source 2024-07-05 18:00:04 +02:00
ntfs ntfs: fix validity check for file name attribute 2021-07-14 16:55:38 +02:00
ocfs2 ocfs2: issue zeroout to EOF blocks 2024-07-05 18:03:15 +02:00
omfs
openpromfs
orangefs orangefs: fix orangefs df output. 2021-07-20 16:05:48 +02:00
overlayfs ovl: fix uninitialized pointer read in ovl_lookup_real_one() 2024-07-05 18:56:55 +02:00
proc init: add dsm gpl source 2024-07-05 18:00:04 +02:00
pstore init: add dsm gpl source 2024-07-05 18:00:04 +02:00
qnx4
qnx6
quota init: add dsm gpl source 2024-07-05 18:00:04 +02:00
ramfs
reiserfs reiserfs: check directory items on read from disk 2024-07-05 18:52:32 +02:00
romfs
squashfs
sysfs
sysv
tracefs
ubifs ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode 2021-07-20 16:05:51 +02:00
udf init: add dsm gpl source 2024-07-05 18:00:04 +02:00
ufs
unicode
vboxsf vboxsf: Add support for the atomic_open directory-inode op 2024-07-05 18:54:41 +02:00
verity
xfs
zonefs
aio.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
anon_inodes.c
attr.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c
buffer.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c dax: fix ENOMEM handling in grab_mapping_entry() 2021-07-14 16:56:13 +02:00
dcache.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
fcntl.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
fhandle.c
file_table.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
file.c
filesystems.c
fs_context.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
fsopen.c
init.c
inode.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
internal.h init: add dsm gpl source 2024-07-05 18:00:04 +02:00
io_uring.c io_uring: only assign io_uring_enter() SQPOLL error in actual error case 2024-07-05 18:56:00 +02:00
io-wq.c io_uring: fix false WARN_ONCE 2021-07-19 09:44:51 +02:00
io-wq.h
ioctl.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
Kconfig init: add dsm gpl source 2024-07-05 18:00:04 +02:00
Kconfig.binfmt
kernel_read_file.c
libfs.c
locks.c
Makefile init: add dsm gpl source 2024-07-05 18:00:04 +02:00
mbcache.c
mount.h init: add dsm gpl source 2024-07-05 18:00:04 +02:00
mpage.c
namei.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
namespace.c fs: warn about impending deprecation of mandatory locks 2024-07-05 18:56:00 +02:00
no-block.c
nsfs.c
open.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
pipe.c pipe: do FASYNC notifications for every pipe IO, not just state changes 2024-07-05 19:00:51 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
read_write.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
readdir.c
remap_range.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
select.c
seq_file.c seq_file: disallow extremely large seq buffer allocations 2021-07-20 16:05:59 +02:00
signalfd.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
splice.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
stack.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
stat.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
statfs.c
super.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
sync.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
syno_acl_api.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
syno_acl.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
syno_acl.h init: add dsm gpl source 2024-07-05 18:00:04 +02:00
timerfd.c
userfaultfd.c userfaultfd: do not untag user pointers 2021-07-28 14:35:46 +02:00
utimes.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00
xattr.c init: add dsm gpl source 2024-07-05 18:00:04 +02:00