mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-04-04 16:37:57 +07:00
docs: filesystems: convert orangefs.txt to ReST
- Add a SPDX header; - Adjust document and section titles; - Some whitespace fixes and new line breaks; - Mark literal blocks as such; - Add it to filesystems/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6f438eeff5b029d229197a602bd9b74004fe9b63.1581955849.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
7cbb468f0c
commit
18ccb2233f
@ -79,6 +79,7 @@ Documentation for filesystem implementations.
|
|||||||
ocfs2
|
ocfs2
|
||||||
ocfs2-online-filecheck
|
ocfs2-online-filecheck
|
||||||
omfs
|
omfs
|
||||||
|
orangefs
|
||||||
overlayfs
|
overlayfs
|
||||||
virtiofs
|
virtiofs
|
||||||
vfat
|
vfat
|
||||||
|
@ -1,3 +1,6 @@
|
|||||||
|
.. SPDX-License-Identifier: GPL-2.0
|
||||||
|
|
||||||
|
========
|
||||||
ORANGEFS
|
ORANGEFS
|
||||||
========
|
========
|
||||||
|
|
||||||
@ -21,25 +24,25 @@ Orangefs features include:
|
|||||||
* Stateless
|
* Stateless
|
||||||
|
|
||||||
|
|
||||||
MAILING LIST ARCHIVES
|
Mailing List Archives
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
|
http://lists.orangefs.org/pipermail/devel_lists.orangefs.org/
|
||||||
|
|
||||||
|
|
||||||
MAILING LIST SUBMISSIONS
|
Mailing List Submissions
|
||||||
========================
|
========================
|
||||||
|
|
||||||
devel@lists.orangefs.org
|
devel@lists.orangefs.org
|
||||||
|
|
||||||
|
|
||||||
DOCUMENTATION
|
Documentation
|
||||||
=============
|
=============
|
||||||
|
|
||||||
http://www.orangefs.org/documentation/
|
http://www.orangefs.org/documentation/
|
||||||
|
|
||||||
|
|
||||||
USERSPACE FILESYSTEM SOURCE
|
Userspace Filesystem Source
|
||||||
===========================
|
===========================
|
||||||
|
|
||||||
http://www.orangefs.org/download
|
http://www.orangefs.org/download
|
||||||
@ -48,14 +51,14 @@ Orangefs versions prior to 2.9.3 would not be compatible with the
|
|||||||
upstream version of the kernel client.
|
upstream version of the kernel client.
|
||||||
|
|
||||||
|
|
||||||
RUNNING ORANGEFS ON A SINGLE SERVER
|
Running ORANGEFS On a Single Server
|
||||||
===================================
|
===================================
|
||||||
|
|
||||||
OrangeFS is usually run in large installations with multiple servers and
|
OrangeFS is usually run in large installations with multiple servers and
|
||||||
clients, but a complete filesystem can be run on a single machine for
|
clients, but a complete filesystem can be run on a single machine for
|
||||||
development and testing.
|
development and testing.
|
||||||
|
|
||||||
On Fedora, install orangefs and orangefs-server.
|
On Fedora, install orangefs and orangefs-server::
|
||||||
|
|
||||||
dnf -y install orangefs orangefs-server
|
dnf -y install orangefs orangefs-server
|
||||||
|
|
||||||
@ -70,29 +73,29 @@ single line. Uncomment it and change the hostname if necessary. This
|
|||||||
controls clients which use libpvfs2. This does not control the
|
controls clients which use libpvfs2. This does not control the
|
||||||
pvfs2-client-core.
|
pvfs2-client-core.
|
||||||
|
|
||||||
Create the filesystem.
|
Create the filesystem::
|
||||||
|
|
||||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||||
|
|
||||||
Start the server.
|
Start the server::
|
||||||
|
|
||||||
systemctl start orangefs-server
|
systemctl start orangefs-server
|
||||||
|
|
||||||
Test the server.
|
Test the server::
|
||||||
|
|
||||||
pvfs2-ping -m /pvfsmnt
|
pvfs2-ping -m /pvfsmnt
|
||||||
|
|
||||||
Start the client. The module must be compiled in or loaded before this
|
Start the client. The module must be compiled in or loaded before this
|
||||||
point.
|
point::
|
||||||
|
|
||||||
systemctl start orangefs-client
|
systemctl start orangefs-client
|
||||||
|
|
||||||
Mount the filesystem.
|
Mount the filesystem::
|
||||||
|
|
||||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||||
|
|
||||||
|
|
||||||
BUILDING ORANGEFS ON A SINGLE SERVER
|
Building ORANGEFS on a Single Server
|
||||||
====================================
|
====================================
|
||||||
|
|
||||||
Where OrangeFS cannot be installed from distribution packages, it may be
|
Where OrangeFS cannot be installed from distribution packages, it may be
|
||||||
@ -102,49 +105,51 @@ You can omit --prefix if you don't care that things are sprinkled around
|
|||||||
in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by
|
in /usr/local. As of version 2.9.6, OrangeFS uses Berkeley DB by
|
||||||
default, we will probably be changing the default to LMDB soon.
|
default, we will probably be changing the default to LMDB soon.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
./configure --prefix=/opt/ofs --with-db-backend=lmdb
|
||||||
|
|
||||||
make
|
make
|
||||||
|
|
||||||
make install
|
make install
|
||||||
|
|
||||||
Create an orangefs config file.
|
Create an orangefs config file::
|
||||||
|
|
||||||
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
/opt/ofs/bin/pvfs2-genconfig /etc/pvfs2.conf
|
||||||
|
|
||||||
Create an /etc/pvfs2tab file.
|
Create an /etc/pvfs2tab file::
|
||||||
|
|
||||||
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
echo tcp://localhost:3334/orangefs /pvfsmnt pvfs2 defaults,noauto 0 0 > \
|
||||||
/etc/pvfs2tab
|
/etc/pvfs2tab
|
||||||
|
|
||||||
Create the mount point you specified in the tab file if needed.
|
Create the mount point you specified in the tab file if needed::
|
||||||
|
|
||||||
mkdir /pvfsmnt
|
mkdir /pvfsmnt
|
||||||
|
|
||||||
Bootstrap the server.
|
Bootstrap the server::
|
||||||
|
|
||||||
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
/opt/ofs/sbin/pvfs2-server -f /etc/pvfs2.conf
|
||||||
|
|
||||||
Start the server.
|
Start the server::
|
||||||
|
|
||||||
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
/opt/osf/sbin/pvfs2-server /etc/pvfs2.conf
|
||||||
|
|
||||||
Now the server should be running. Pvfs2-ls is a simple
|
Now the server should be running. Pvfs2-ls is a simple
|
||||||
test to verify that the server is running.
|
test to verify that the server is running::
|
||||||
|
|
||||||
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
/opt/ofs/bin/pvfs2-ls /pvfsmnt
|
||||||
|
|
||||||
If stuff seems to be working, load the kernel module and
|
If stuff seems to be working, load the kernel module and
|
||||||
turn on the client core.
|
turn on the client core::
|
||||||
|
|
||||||
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
/opt/ofs/sbin/pvfs2-client -p /opt/osf/sbin/pvfs2-client-core
|
||||||
|
|
||||||
Mount your filesystem.
|
Mount your filesystem::
|
||||||
|
|
||||||
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
mount -t pvfs2 tcp://localhost:3334/orangefs /pvfsmnt
|
||||||
|
|
||||||
|
|
||||||
RUNNING XFSTESTS
|
Running xfstests
|
||||||
================
|
================
|
||||||
|
|
||||||
It is useful to use a scratch filesystem with xfstests. This can be
|
It is useful to use a scratch filesystem with xfstests. This can be
|
||||||
@ -159,21 +164,23 @@ Then there are two FileSystem sections: orangefs and scratch.
|
|||||||
|
|
||||||
This change should be made before creating the filesystem.
|
This change should be made before creating the filesystem.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
pvfs2-server -f /etc/orangefs/orangefs.conf
|
pvfs2-server -f /etc/orangefs/orangefs.conf
|
||||||
|
|
||||||
To run xfstests, create /etc/xfsqa.config.
|
To run xfstests, create /etc/xfsqa.config::
|
||||||
|
|
||||||
TEST_DIR=/orangefs
|
TEST_DIR=/orangefs
|
||||||
TEST_DEV=tcp://localhost:3334/orangefs
|
TEST_DEV=tcp://localhost:3334/orangefs
|
||||||
SCRATCH_MNT=/scratch
|
SCRATCH_MNT=/scratch
|
||||||
SCRATCH_DEV=tcp://localhost:3334/scratch
|
SCRATCH_DEV=tcp://localhost:3334/scratch
|
||||||
|
|
||||||
Then xfstests can be run
|
Then xfstests can be run::
|
||||||
|
|
||||||
./check -pvfs2
|
./check -pvfs2
|
||||||
|
|
||||||
|
|
||||||
OPTIONS
|
Options
|
||||||
=======
|
=======
|
||||||
|
|
||||||
The following mount options are accepted:
|
The following mount options are accepted:
|
||||||
@ -193,32 +200,32 @@ The following mount options are accepted:
|
|||||||
Distributed locking is being worked on for the future.
|
Distributed locking is being worked on for the future.
|
||||||
|
|
||||||
|
|
||||||
DEBUGGING
|
Debugging
|
||||||
=========
|
=========
|
||||||
|
|
||||||
If you want the debug (GOSSIP) statements in a particular
|
If you want the debug (GOSSIP) statements in a particular
|
||||||
source file (inode.c for example) go to syslog:
|
source file (inode.c for example) go to syslog::
|
||||||
|
|
||||||
echo inode > /sys/kernel/debug/orangefs/kernel-debug
|
echo inode > /sys/kernel/debug/orangefs/kernel-debug
|
||||||
|
|
||||||
No debugging (the default):
|
No debugging (the default)::
|
||||||
|
|
||||||
echo none > /sys/kernel/debug/orangefs/kernel-debug
|
echo none > /sys/kernel/debug/orangefs/kernel-debug
|
||||||
|
|
||||||
Debugging from several source files:
|
Debugging from several source files::
|
||||||
|
|
||||||
echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
|
echo inode,dir > /sys/kernel/debug/orangefs/kernel-debug
|
||||||
|
|
||||||
All debugging:
|
All debugging::
|
||||||
|
|
||||||
echo all > /sys/kernel/debug/orangefs/kernel-debug
|
echo all > /sys/kernel/debug/orangefs/kernel-debug
|
||||||
|
|
||||||
Get a list of all debugging keywords:
|
Get a list of all debugging keywords::
|
||||||
|
|
||||||
cat /sys/kernel/debug/orangefs/debug-help
|
cat /sys/kernel/debug/orangefs/debug-help
|
||||||
|
|
||||||
|
|
||||||
PROTOCOL BETWEEN KERNEL MODULE AND USERSPACE
|
Protocol between Kernel Module and Userspace
|
||||||
============================================
|
============================================
|
||||||
|
|
||||||
Orangefs is a user space filesystem and an associated kernel module.
|
Orangefs is a user space filesystem and an associated kernel module.
|
||||||
@ -234,7 +241,8 @@ The kernel module implements a pseudo device that userspace
|
|||||||
can read from and write to. Userspace can also manipulate the
|
can read from and write to. Userspace can also manipulate the
|
||||||
kernel module through the pseudo device with ioctl.
|
kernel module through the pseudo device with ioctl.
|
||||||
|
|
||||||
THE BUFMAP:
|
The Bufmap
|
||||||
|
----------
|
||||||
|
|
||||||
At startup userspace allocates two page-size-aligned (posix_memalign)
|
At startup userspace allocates two page-size-aligned (posix_memalign)
|
||||||
mlocked memory buffers, one is used for IO and one is used for readdir
|
mlocked memory buffers, one is used for IO and one is used for readdir
|
||||||
@ -250,7 +258,8 @@ copied from user space to kernel space with copy_from_user and is used
|
|||||||
to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
|
to initialize the kernel module's "bufmap" (struct orangefs_bufmap), which
|
||||||
then contains:
|
then contains:
|
||||||
|
|
||||||
* refcnt - a reference counter
|
* refcnt
|
||||||
|
- a reference counter
|
||||||
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
|
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) - the IO buffer's
|
||||||
partition size, which represents the filesystem's block size and
|
partition size, which represents the filesystem's block size and
|
||||||
is used for s_blocksize in super blocks.
|
is used for s_blocksize in super blocks.
|
||||||
@ -259,15 +268,17 @@ then contains:
|
|||||||
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
|
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
|
||||||
* total_size - the total size of the IO buffer.
|
* total_size - the total size of the IO buffer.
|
||||||
* page_count - the number of 4096 byte pages in the IO buffer.
|
* page_count - the number of 4096 byte pages in the IO buffer.
|
||||||
* page_array - a pointer to page_count * (sizeof(struct page*)) bytes
|
* page_array - a pointer to ``page_count * (sizeof(struct page*))`` bytes
|
||||||
of kcalloced memory. This memory is used as an array of pointers
|
of kcalloced memory. This memory is used as an array of pointers
|
||||||
to each of the pages in the IO buffer through a call to get_user_pages.
|
to each of the pages in the IO buffer through a call to get_user_pages.
|
||||||
* desc_array - a pointer to desc_count * (sizeof(struct orangefs_bufmap_desc))
|
* desc_array - a pointer to ``desc_count * (sizeof(struct orangefs_bufmap_desc))``
|
||||||
bytes of kcalloced memory. This memory is further intialized:
|
bytes of kcalloced memory. This memory is further intialized:
|
||||||
|
|
||||||
user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
|
user_desc is the kernel's copy of the IO buffer's ORANGEFS_dev_map_desc
|
||||||
structure. user_desc->ptr points to the IO buffer.
|
structure. user_desc->ptr points to the IO buffer.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
pages_per_desc = bufmap->desc_size / PAGE_SIZE
|
pages_per_desc = bufmap->desc_size / PAGE_SIZE
|
||||||
offset = 0
|
offset = 0
|
||||||
|
|
||||||
@ -293,7 +304,8 @@ then contains:
|
|||||||
* readdir_index_lock - a spinlock to protect readdir_index_array during
|
* readdir_index_lock - a spinlock to protect readdir_index_array during
|
||||||
update.
|
update.
|
||||||
|
|
||||||
OPERATIONS:
|
Operations
|
||||||
|
----------
|
||||||
|
|
||||||
The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
|
The kernel module builds an "op" (struct orangefs_kernel_op_s) when it
|
||||||
needs to communicate with userspace. Part of the op contains the "upcall"
|
needs to communicate with userspace. Part of the op contains the "upcall"
|
||||||
@ -308,13 +320,19 @@ in flight at any given time.
|
|||||||
|
|
||||||
Ops are stateful:
|
Ops are stateful:
|
||||||
|
|
||||||
* unknown - op was just initialized
|
* unknown
|
||||||
* waiting - op is on request_list (upward bound)
|
- op was just initialized
|
||||||
* inprogr - op is in progress (waiting for downcall)
|
* waiting
|
||||||
* serviced - op has matching downcall; ok
|
- op is on request_list (upward bound)
|
||||||
* purged - op has to start a timer since client-core
|
* inprogr
|
||||||
|
- op is in progress (waiting for downcall)
|
||||||
|
* serviced
|
||||||
|
- op has matching downcall; ok
|
||||||
|
* purged
|
||||||
|
- op has to start a timer since client-core
|
||||||
exited uncleanly before servicing op
|
exited uncleanly before servicing op
|
||||||
* given up - submitter has given up waiting for it
|
* given up
|
||||||
|
- submitter has given up waiting for it
|
||||||
|
|
||||||
When some arbitrary userspace program needs to perform a
|
When some arbitrary userspace program needs to perform a
|
||||||
filesystem operation on Orangefs (readdir, I/O, create, whatever)
|
filesystem operation on Orangefs (readdir, I/O, create, whatever)
|
||||||
@ -389,10 +407,15 @@ union of structs, each of which is associated with a particular
|
|||||||
response type.
|
response type.
|
||||||
|
|
||||||
The several members outside of the union are:
|
The several members outside of the union are:
|
||||||
- int32_t type - type of operation.
|
|
||||||
- int32_t status - return code for the operation.
|
``int32_t type``
|
||||||
- int64_t trailer_size - 0 unless readdir operation.
|
- type of operation.
|
||||||
- char *trailer_buf - initialized to NULL, used during readdir operations.
|
``int32_t status``
|
||||||
|
- return code for the operation.
|
||||||
|
``int64_t trailer_size``
|
||||||
|
- 0 unless readdir operation.
|
||||||
|
``char *trailer_buf``
|
||||||
|
- initialized to NULL, used during readdir operations.
|
||||||
|
|
||||||
The appropriate member inside the union is filled out for any
|
The appropriate member inside the union is filled out for any
|
||||||
particular response.
|
particular response.
|
||||||
@ -449,18 +472,20 @@ Userspace uses writev() on /dev/pvfs2-req to pass responses to the requests
|
|||||||
made by the kernel side.
|
made by the kernel side.
|
||||||
|
|
||||||
A buffer_list containing:
|
A buffer_list containing:
|
||||||
|
|
||||||
- a pointer to the prepared response to the request from the
|
- a pointer to the prepared response to the request from the
|
||||||
kernel (struct pvfs2_downcall_t).
|
kernel (struct pvfs2_downcall_t).
|
||||||
- and also, in the case of a readdir request, a pointer to a
|
- and also, in the case of a readdir request, a pointer to a
|
||||||
buffer containing descriptors for the objects in the target
|
buffer containing descriptors for the objects in the target
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
... is sent to the function (PINT_dev_write_list) which performs
|
... is sent to the function (PINT_dev_write_list) which performs
|
||||||
the writev.
|
the writev.
|
||||||
|
|
||||||
PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
|
PINT_dev_write_list has a local iovec array: struct iovec io_array[10];
|
||||||
|
|
||||||
The first four elements of io_array are initialized like this for all
|
The first four elements of io_array are initialized like this for all
|
||||||
responses:
|
responses::
|
||||||
|
|
||||||
io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
|
io_array[0].iov_base = address of local variable "proto_ver" (int32_t)
|
||||||
io_array[0].iov_len = sizeof(int32_t)
|
io_array[0].iov_len = sizeof(int32_t)
|
||||||
@ -475,7 +500,7 @@ responses:
|
|||||||
of global variable vfs_request (vfs_request_t)
|
of global variable vfs_request (vfs_request_t)
|
||||||
io_array[3].iov_len = sizeof(pvfs2_downcall_t)
|
io_array[3].iov_len = sizeof(pvfs2_downcall_t)
|
||||||
|
|
||||||
Readdir responses initialize the fifth element io_array like this:
|
Readdir responses initialize the fifth element io_array like this::
|
||||||
|
|
||||||
io_array[4].iov_base = contents of member trailer_buf (char *)
|
io_array[4].iov_base = contents of member trailer_buf (char *)
|
||||||
from out_downcall member of global variable
|
from out_downcall member of global variable
|
||||||
@ -517,7 +542,7 @@ from a dentry is cheap, obtaining it from userspace is relatively expensive,
|
|||||||
hence the motivation to use the dentry when possible.
|
hence the motivation to use the dentry when possible.
|
||||||
|
|
||||||
The timeout values d_time and getattr_time are jiffy based, and the
|
The timeout values d_time and getattr_time are jiffy based, and the
|
||||||
code is designed to avoid the jiffy-wrap problem:
|
code is designed to avoid the jiffy-wrap problem::
|
||||||
|
|
||||||
"In general, if the clock may have wrapped around more than once, there
|
"In general, if the clock may have wrapped around more than once, there
|
||||||
is no way to tell how much time has elapsed. However, if the times t1
|
is no way to tell how much time has elapsed. However, if the times t1
|
Loading…
Reference in New Issue
Block a user