linux_dsm_epyc7002/fs
Dennis Zhou 3f93aef535 btrfs: add zstd compression level support
Zstd compression requires different amounts of memory for each level of
compression. The prior patches implemented indirection to allow for each
compression type to manage their workspaces independently. This patch
uses this indirection to implement compression level support for zstd.

To manage the additional memory require, each compression level has its
own queue of workspaces. A global LRU is used to help with reclaim.
Reclaim is done via a timer which provides a mechanism to decrease
memory utilization by keeping only workspaces around that are sized
appropriately. Forward progress is guaranteed by a preallocated max
workspace hidden from the LRU.

When getting a workspace, it uses a bitmap to identify the levels that
are populated and scans up. If it finds a workspace that is greater than
it, it uses it, but does not update the last_used time and the
corresponding place in the LRU. If we hit memory pressure, we sleep on
the max level workspace. We continue to rescan in case we can use a
smaller workspace, but eventually should be able to obtain the max level
workspace or allocate one again should memory pressure subside.

The memory requirement for decompression is the same as level 1, and
therefore can use any of available workspace.

The number of workspaces is bound by an upper limit of the workqueue's
limit which currently is 2 (percpu limit). The reclaim timer is used to
free inactive/improperly sized workspaces and is set to 307s to avoid
colliding with transaction commit (every 30s).

Repeating the experiment from v2 [1], the Silesia corpus was copied to a
btrfs filesystem 10 times and then read back after dropping the caches.
The btrfs filesystem was on an SSD.

Level   Ratio   Compression (MB/s)  Decompression (MB/s)  Memory (KB)
1       2.658        438.47                910.51            780
2       2.744        364.86                886.55           1004
3       2.801        336.33                828.41           1260
4       2.858        286.71                886.55           1260
5       2.916        212.77                556.84           1388
6       2.363        119.82                990.85           1516
7       3.000        154.06                849.30           1516
8       3.011        159.54                875.03           1772
9       3.025        100.51                940.15           1772
10      3.033        118.97                616.26           1772
11      3.036         94.19                802.11           1772
12      3.037         73.45                931.49           1772
13      3.041         55.17                835.26           2284
14      3.087         44.70                716.78           2547
15      3.126         37.30                878.84           2547

[1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/

Cc: Nick Terrell <terrelln@fb.com>
Cc: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25 14:13:33 +01:00
..
9p Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-11-01 19:58:52 -07:00
adfs adfs: use timespec64 for time conversion 2018-08-22 10:52:51 -07:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs afs: Fix race in async call refcounting 2019-01-17 15:17:28 +00:00
autofs autofs: fix error return in autofs_fill_super() 2019-02-01 15:46:24 -08:00
befs fix a series of Documentation/ broken file name references 2018-06-15 18:10:01 -03:00
bfs bfs: extra sanity checking and static inode bitmap 2019-01-04 13:13:47 -08:00
btrfs btrfs: add zstd compression level support 2019-02-25 14:13:33 +01:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph ceph: avoid repeatedly adding inode to mdsc->snap_flush_list 2019-02-18 18:08:29 +01:00
cifs cifs: update internal module version number 2019-01-31 07:05:06 -06:00
coda vfs: change inode times to use struct timespec64 2018-06-05 16:57:31 -07:00
configfs configfs: fix registered group removal 2018-07-17 06:14:07 -07:00
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto fscrypt: add Adiantum support 2019-01-06 08:36:21 -05:00
debugfs debugfs: debugfs_lookup() should return NULL if not found 2019-01-30 12:39:49 +01:00
devpts devpts: Convert to new IDA API 2018-08-21 23:54:17 -04:00
dlm dlm: fix invalid cluster name warning 2018-12-03 15:30:24 -06:00
ecryptfs ecryptfs_rename(): verify that lower dentries are still OK after lock_rename() 2018-10-09 23:33:17 -04:00
efivarfs efivars: Call guid_parse() against guid_t type of variable 2018-07-22 14:13:44 +02:00
efs
exofs exofs_mount(): fix leaks on failure exits 2018-12-17 18:36:33 -05:00
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 \n 2018-12-27 17:00:35 -08:00
ext4 Revert "ext4: use ext4_write_inode() when fsyncing w/o a journal" 2019-01-31 23:41:11 -05:00
f2fs f2fs-for-4.21-rc1 2018-12-31 09:41:37 -08:00
fat Merge branch 'akpm' (patches from Andrew) 2019-01-05 09:16:18 -08:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse fuse: decrement NR_WRITEBACK_TEMP on the right page 2019-01-16 10:27:59 +01:00
gfs2 Revert "gfs2: read journal in large chunks to locate the head" 2019-02-14 09:52:51 -08:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: return file attributes on statx 2019-01-04 13:13:47 -08:00
hostfs vfs: discard ATTR_ATTR_FLAG 2018-08-17 16:20:28 -07:00
hpfs hpfs: remove unnecessary checks on the value of r when assigning error code 2018-08-25 12:42:33 -07:00
hugetlbfs hugetlbfs: revert "Use i_mmap_rwsem to fix page fault/truncate race" 2019-01-08 17:15:11 -08:00
isofs Update email address 2018-09-29 22:47:48 -04:00
jbd2 jbd2: clean up indentation issue, replace spaces with tab 2018-12-04 00:20:10 -05:00
jffs2 jffs2: Fix use of uninitialized delayed_work, lockdep breakage 2018-12-02 09:20:34 +01:00
jfs jfs: remove redundant dquot_initialize() in jfs_evict_inode() 2018-09-20 09:28:49 -05:00
kernfs kernfs: Improve kernfs_notify() poll notification latency 2018-11-27 11:59:33 +01:00
lockd NFS client updates for Linux 4.21 2019-01-02 16:35:23 -08:00
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs Merge branch 'fixes-v5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-02-20 09:09:33 -08:00
nfs_common
nfsd Revert "nfsd4: return default lease period" 2019-02-14 12:33:19 -05:00
nilfs2 nilfs2: Use xa_erase_irq 2018-11-05 14:57:05 -05:00
nls
notify inotify: Fix fd refcount leak in inotify_add_watch(). 2019-01-02 18:28:37 +01:00
ntfs mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
ocfs2 Merge branch 'akpm' (patches from Andrew) 2019-01-05 09:16:18 -08:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs fs/openpromfs: Use of_node_name_eq for node name comparisons 2018-11-18 13:35:19 -08:00
orangefs fs: don't open code lru_to_page() 2019-01-04 13:13:48 -08:00
overlayfs Revert "ovl: relax permission checking on underlying layers" 2018-12-04 11:31:30 +01:00
proc proc, oom: do not report alien mms when setting oom_score_adj 2019-02-21 09:01:00 -08:00
pstore pstore/ram: Avoid allocation and leak of platform data 2019-01-20 14:44:52 -08:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. 2018-12-18 18:29:15 +01:00
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs Squashfs: Compute expected length from inode size rather than block length 2018-08-02 09:34:02 -07:00
sysfs sysfs: convert BUG_ON to WARN_ON 2019-01-07 08:53:32 +01:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs tracefs: Annotate tracefs_ops with __ro_after_init 2018-07-31 11:32:44 -04:00
ubifs mm: migrate: drop unused argument of migrate_page_move_mapping() 2018-12-28 12:11:51 -08:00
udf \n 2018-12-27 17:00:35 -08:00
ufs fs/ufs: use ktime_get_real_seconds for sb and cg timestamps 2018-08-17 16:20:27 -07:00
xfs xfs: set buffer ops when repair probes for btree type 2019-02-03 14:03:59 -08:00
aio.c aio: initialize kiocb private in case any filesystems expect it. 2019-02-06 08:04:22 -07:00
anon_inodes.c anon_inode_getfile(): switch to alloc_file_pseudo() 2018-07-12 10:04:27 -04:00
attr.c fs: Fix attr.c kernel-doc 2018-07-03 16:44:45 -04:00
bad_inode.c get rid of 'opened' argument of ->atomic_open() - part 3 2018-07-12 10:04:20 -04:00
binfmt_aout.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
binfmt_elf_fdpic.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
binfmt_elf.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c turn filp_clone_open() into inline wrapper for dentry_open() 2018-07-10 23:29:03 -04:00
binfmt_script.c exec: load_script: Do not exec truncated interpreter path 2019-02-18 16:49:36 -08:00
block_dev.c blockdev: Fix livelocks on loop device 2019-01-15 07:30:56 -07:00
buffer.c fs: ratelimit __find_get_block_slow() failure message. 2019-02-06 12:58:56 -07:00
char_dev.c
compat_binfmt_elf.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
compat.c ncpfs: remove compat functionality 2018-06-05 19:23:26 +02:00
coredump.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
d_path.c
dax.c dax fix 4.21 2018-12-31 09:46:39 -08:00
dcache.c fs/dcache: Track & report number of negative dentries 2019-01-30 11:02:11 -08:00
dcookies.c
direct-io.c direct-io: allow direct writes to empty inodes 2019-01-22 08:26:44 -07:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-02-01 15:46:24 -08:00
eventfd.c Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL 2018-06-28 10:40:47 -07:00
eventpoll.c Merge branch 'akpm' (patches from Andrew) 2019-01-05 09:16:18 -08:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-01-05 13:18:59 -08:00
fcntl.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
file.c Char/Misc driver patches for 4.21-rc1 2018-12-28 20:54:57 -08:00
filesystems.c
fs_pin.c
fs_struct.c
fs-writeback.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
inode.c Revert "mm: don't reclaim inodes with many attached pages" 2019-02-12 16:33:18 -08:00
internal.h overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
ioctl.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
iomap.c iomap: fix a use after free in iomap_dio_rw 2019-01-27 08:47:42 -08:00
Kconfig autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
Kconfig.binfmt kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
libfs.c
locks.c locks: fix error in locks_move_blocks() 2019-01-02 20:14:50 -05:00
Makefile autofs: remove left-over autofs4 stubs 2018-06-11 08:22:34 -07:00
mbcache.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
mount.h
mpage.c mpage: mpage_readpages() should submit IO as read-ahead 2018-08-17 16:20:29 -07:00
namei.c Revert "vfs: Allow userns root to call mknod on owned filesystems." 2018-12-22 14:18:34 -08:00
namespace.c Merge branch 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-01-05 13:25:58 -08:00
no-block.c
nsfs.c
open.c overlayfs update for 4.19 2018-08-21 18:19:09 -07:00
pipe.c Merge branch 'work.open3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-08-13 19:58:36 -07:00
pnode.c vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled 2018-12-20 16:32:56 +00:00
pnode.h
posix_acl.c
proc_namespace.c
read_write.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
readdir.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
select.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
seq_file.c fs/seq_file.c: simplify seq_file iteration code and interface 2018-08-17 16:20:28 -07:00
signalfd.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
splice.c splice: don't read more than available pipe space 2018-12-04 08:50:49 -08:00
stack.c
stat.c y2038: Remove newstat family from default syscall set 2018-08-29 15:42:20 +02:00
statfs.c kernel: add kcompat_sys_{f,}statfs64() 2018-07-12 14:49:48 +01:00
super.c mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT 2018-12-21 11:51:23 -05:00
sync.c
timerfd.c y2038: globally rename compat_time to old_time32 2018-08-27 14:48:48 +02:00
userfaultfd.c userfaultfd: clear flag if remap event not enabled 2018-12-28 12:11:51 -08:00
utimes.c y2038: utimes: Rework #ifdef guards for compat syscalls 2018-08-29 15:42:23 +02:00
xattr.c sysfs: Do not return POSIX ACL xattrs via listxattr 2018-09-18 07:30:48 -04:00