mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
8dcc1a9d90
zonefs is a very simple file system exposing each zone of a zoned block device as a file. Unlike a regular file system with zoned block device support (e.g. f2fs), zonefs does not hide the sequential write constraint of zoned block devices to the user. Files representing sequential write zones of the device must be written sequentially starting from the end of the file (append only writes). As such, zonefs is in essence closer to a raw block device access interface than to a full featured POSIX file system. The goal of zonefs is to simplify the implementation of zoned block device support in applications by replacing raw block device file accesses with a richer file API, avoiding relying on direct block device file ioctls which may be more obscure to developers. One example of this approach is the implementation of LSM (log-structured merge) tree structures (such as used in RocksDB and LevelDB) on zoned block devices by allowing SSTables to be stored in a zone file similarly to a regular file system rather than as a range of sectors of a zoned device. The introduction of the higher level construct "one file is one zone" can help reducing the amount of changes needed in the application as well as introducing support for different application programming languages. Zonefs on-disk metadata is reduced to an immutable super block to persistently store a magic number and optional feature flags and values. On mount, zonefs uses blkdev_report_zones() to obtain the device zone configuration and populates the mount point with a static file tree solely based on this information. E.g. file sizes come from the device zone type and write pointer offset managed by the device itself. The zone files created on mount have the following characteristics. 1) Files representing zones of the same type are grouped together under a common sub-directory: * For conventional zones, the sub-directory "cnv" is used. * For sequential write zones, the sub-directory "seq" is used. These two directories are the only directories that exist in zonefs. Users cannot create other directories and cannot rename nor delete the "cnv" and "seq" sub-directories. 2) The name of zone files is the number of the file within the zone type sub-directory, in order of increasing zone start sector. 3) The size of conventional zone files is fixed to the device zone size. Conventional zone files cannot be truncated. 4) The size of sequential zone files represent the file's zone write pointer position relative to the zone start sector. Truncating these files is allowed only down to 0, in which case, the zone is reset to rewind the zone write pointer position to the start of the zone, or up to the zone size, in which case the file's zone is transitioned to the FULL state (finish zone operation). 5) All read and write operations to files are not allowed beyond the file zone size. Any access exceeding the zone size is failed with the -EFBIG error. 6) Creating, deleting, renaming or modifying any attribute of files and sub-directories is not allowed. 7) There are no restrictions on the type of read and write operations that can be issued to conventional zone files. Buffered, direct and mmap read & write operations are accepted. For sequential zone files, there are no restrictions on read operations, but all write operations must be direct IO append writes. mmap write of sequential files is not allowed. Several optional features of zonefs can be enabled at format time. * Conventional zone aggregation: ranges of contiguous conventional zones can be aggregated into a single larger file instead of the default one file per zone. * File ownership: The owner UID and GID of zone files is by default 0 (root) but can be changed to any valid UID/GID. * File access permissions: the default 640 access permissions can be changed. The mkzonefs tool is used to format zoned block devices for use with zonefs. This tool is available on Github at: git@github.com:damien-lemoal/zonefs-tools.git. zonefs-tools also includes a test suite which can be run against any zoned block device, including null_blk block device created with zoned mode. Example: the following formats a 15TB host-managed SMR HDD with 256 MB zones with the conventional zones aggregation feature enabled. $ sudo mkzonefs -o aggr_cnv /dev/sdX $ sudo mount -t zonefs /dev/sdX /mnt $ ls -l /mnt/ total 0 dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq The size of the zone files sub-directories indicate the number of files existing for each type of zones. In this example, there is only one conventional zone file (all conventional zones are aggregated under a single file). $ ls -l /mnt/cnv total 137101312 -rw-r----- 1 root root 140391743488 Nov 25 13:23 0 This aggregated conventional zone file can be used as a regular file. $ sudo mkfs.ext4 /mnt/cnv/0 $ sudo mount -o loop /mnt/cnv/0 /data The "seq" sub-directory grouping files for sequential write zones has in this example 55356 zones. $ ls -lv /mnt/seq total 14511243264 -rw-r----- 1 root root 0 Nov 25 13:23 0 -rw-r----- 1 root root 0 Nov 25 13:23 1 -rw-r----- 1 root root 0 Nov 25 13:23 2 ... -rw-r----- 1 root root 0 Nov 25 13:23 55354 -rw-r----- 1 root root 0 Nov 25 13:23 55355 For sequential write zone files, the file size changes as data is appended at the end of the file, similarly to any regular file system. $ dd if=/dev/zero of=/mnt/seq/0 bs=4K count=1 conv=notrunc oflag=direct 1+0 records in 1+0 records out 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000452219 s, 9.1 MB/s $ ls -l /mnt/seq/0 -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0 The written file can be truncated to the zone size, preventing any further write operation. $ truncate -s 268435456 /mnt/seq/0 $ ls -l /mnt/seq/0 -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0 Truncation to 0 size allows freeing the file zone storage space and restart append-writes to the file. $ truncate -s 0 /mnt/seq/0 $ ls -l /mnt/seq/0 -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0 Since files are statically mapped to zones on the disk, the number of blocks of a file as reported by stat() and fstat() indicates the size of the file zone. $ stat /mnt/seq/0 File: /mnt/seq/0 Size: 0 Blocks: 524288 IO Block: 4096 regular empty file Device: 870h/2160d Inode: 50431 Links: 1 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-11-25 13:23:57.048971997 +0900 Modify: 2019-11-25 13:52:25.553805765 +0900 Change: 2019-11-25 13:52:25.553805765 +0900 Birth: - The number of blocks of the file ("Blocks") in units of 512B blocks gives the maximum file size of 524288 * 512 B = 256 MB, corresponding to the device zone size in this example. Of note is that the "IO block" field always indicates the minimum IO size for writes and corresponds to the device physical sector size. This code contains contributions from: * Johannes Thumshirn <jthumshirn@suse.de>, * Darrick J. Wong <darrick.wong@oracle.com>, * Christoph Hellwig <hch@lst.de>, * Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> and * Ting Yao <tingyao@hust.edu.cn>. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
101 lines
3.6 KiB
C
101 lines
3.6 KiB
C
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
|
#ifndef __LINUX_MAGIC_H__
|
|
#define __LINUX_MAGIC_H__
|
|
|
|
#define ADFS_SUPER_MAGIC 0xadf5
|
|
#define AFFS_SUPER_MAGIC 0xadff
|
|
#define AFS_SUPER_MAGIC 0x5346414F
|
|
#define AUTOFS_SUPER_MAGIC 0x0187
|
|
#define CODA_SUPER_MAGIC 0x73757245
|
|
#define CRAMFS_MAGIC 0x28cd3d45 /* some random number */
|
|
#define CRAMFS_MAGIC_WEND 0x453dcd28 /* magic number with the wrong endianess */
|
|
#define DEBUGFS_MAGIC 0x64626720
|
|
#define SECURITYFS_MAGIC 0x73636673
|
|
#define SELINUX_MAGIC 0xf97cff8c
|
|
#define SMACK_MAGIC 0x43415d53 /* "SMAC" */
|
|
#define RAMFS_MAGIC 0x858458f6 /* some random number */
|
|
#define TMPFS_MAGIC 0x01021994
|
|
#define HUGETLBFS_MAGIC 0x958458f6 /* some random number */
|
|
#define SQUASHFS_MAGIC 0x73717368
|
|
#define ECRYPTFS_SUPER_MAGIC 0xf15f
|
|
#define EFS_SUPER_MAGIC 0x414A53
|
|
#define EROFS_SUPER_MAGIC_V1 0xE0F5E1E2
|
|
#define EXT2_SUPER_MAGIC 0xEF53
|
|
#define EXT3_SUPER_MAGIC 0xEF53
|
|
#define XENFS_SUPER_MAGIC 0xabba1974
|
|
#define EXT4_SUPER_MAGIC 0xEF53
|
|
#define BTRFS_SUPER_MAGIC 0x9123683E
|
|
#define NILFS_SUPER_MAGIC 0x3434
|
|
#define F2FS_SUPER_MAGIC 0xF2F52010
|
|
#define HPFS_SUPER_MAGIC 0xf995e849
|
|
#define ISOFS_SUPER_MAGIC 0x9660
|
|
#define JFFS2_SUPER_MAGIC 0x72b6
|
|
#define XFS_SUPER_MAGIC 0x58465342 /* "XFSB" */
|
|
#define PSTOREFS_MAGIC 0x6165676C
|
|
#define EFIVARFS_MAGIC 0xde5e81e4
|
|
#define HOSTFS_SUPER_MAGIC 0x00c0ffee
|
|
#define OVERLAYFS_SUPER_MAGIC 0x794c7630
|
|
|
|
#define MINIX_SUPER_MAGIC 0x137F /* minix v1 fs, 14 char names */
|
|
#define MINIX_SUPER_MAGIC2 0x138F /* minix v1 fs, 30 char names */
|
|
#define MINIX2_SUPER_MAGIC 0x2468 /* minix v2 fs, 14 char names */
|
|
#define MINIX2_SUPER_MAGIC2 0x2478 /* minix v2 fs, 30 char names */
|
|
#define MINIX3_SUPER_MAGIC 0x4d5a /* minix v3 fs, 60 char names */
|
|
|
|
#define MSDOS_SUPER_MAGIC 0x4d44 /* MD */
|
|
#define NCP_SUPER_MAGIC 0x564c /* Guess, what 0x564c is :-) */
|
|
#define NFS_SUPER_MAGIC 0x6969
|
|
#define OCFS2_SUPER_MAGIC 0x7461636f
|
|
#define OPENPROM_SUPER_MAGIC 0x9fa1
|
|
#define QNX4_SUPER_MAGIC 0x002f /* qnx4 fs detection */
|
|
#define QNX6_SUPER_MAGIC 0x68191122 /* qnx6 fs detection */
|
|
#define AFS_FS_MAGIC 0x6B414653
|
|
|
|
#define REISERFS_SUPER_MAGIC 0x52654973 /* used by gcc */
|
|
/* used by file system utilities that
|
|
look at the superblock, etc. */
|
|
#define REISERFS_SUPER_MAGIC_STRING "ReIsErFs"
|
|
#define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
|
|
#define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
|
|
|
|
#define SMB_SUPER_MAGIC 0x517B
|
|
#define CGROUP_SUPER_MAGIC 0x27e0eb
|
|
#define CGROUP2_SUPER_MAGIC 0x63677270
|
|
|
|
#define RDTGROUP_SUPER_MAGIC 0x7655821
|
|
|
|
#define STACK_END_MAGIC 0x57AC6E9D
|
|
|
|
#define TRACEFS_MAGIC 0x74726163
|
|
|
|
#define V9FS_MAGIC 0x01021997
|
|
|
|
#define BDEVFS_MAGIC 0x62646576
|
|
#define DAXFS_MAGIC 0x64646178
|
|
#define BINFMTFS_MAGIC 0x42494e4d
|
|
#define DEVPTS_SUPER_MAGIC 0x1cd1
|
|
#define BINDERFS_SUPER_MAGIC 0x6c6f6f70
|
|
#define FUTEXFS_SUPER_MAGIC 0xBAD1DEA
|
|
#define PIPEFS_MAGIC 0x50495045
|
|
#define PROC_SUPER_MAGIC 0x9fa0
|
|
#define SOCKFS_MAGIC 0x534F434B
|
|
#define SYSFS_MAGIC 0x62656572
|
|
#define USBDEVICE_SUPER_MAGIC 0x9fa2
|
|
#define MTD_INODE_FS_MAGIC 0x11307854
|
|
#define ANON_INODE_FS_MAGIC 0x09041934
|
|
#define BTRFS_TEST_MAGIC 0x73727279
|
|
#define NSFS_MAGIC 0x6e736673
|
|
#define BPF_FS_MAGIC 0xcafe4a11
|
|
#define AAFS_MAGIC 0x5a3c69f0
|
|
#define ZONEFS_MAGIC 0x5a4f4653
|
|
|
|
/* Since UDF 2.01 is ISO 13346 based... */
|
|
#define UDF_SUPER_MAGIC 0x15013346
|
|
#define BALLOON_KVM_MAGIC 0x13661366
|
|
#define ZSMALLOC_MAGIC 0x58295829
|
|
#define DMA_BUF_MAGIC 0x444d4142 /* "DMAB" */
|
|
#define Z3FOLD_MAGIC 0x33
|
|
#define PPC_CMM_MAGIC 0xc7571590
|
|
|
|
#endif /* __LINUX_MAGIC_H__ */
|