mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-21 14:29:05 +07:00
Merge branch 'mauro' into docs-next
Mauro says: This is the second part of a series I wrote sometime ago where I manually convert lots of files to be properly parsed by Sphinx as ReST files. As it touches on lot of stuff, this series is based on today's linux-next, at tag next-20190617. The first version of this series had 57 patches. The first part with 28 patches were already merged. Right now, there are still ~76 patches pending applying (including this series), and that's because I opted to do ~1 patch per converted directory. That sounds too much to be send on a single round. So, I'm opting to split it on 3 parts for the conversion, plus a final patch adding orphaned books to existing ones. Those patches should probably be good to be merged either by subsystem maintainers or via the docs tree. I opted to mark new files not included yet to the main index.rst (directly or indirectly) with the :orphan: tag, in order to avoid adding warnings to the build system. This should be removed after we find a "home" for all the converted files within the new document tree arrangement, after I submit the third part.
This commit is contained in:
commit
919e2bb8b6
@ -895,7 +895,7 @@ this sysctl interface anymore.
|
||||
pty
|
||||
===
|
||||
|
||||
See Documentation/filesystems/devpts.txt.
|
||||
See Documentation/filesystems/devpts.rst.
|
||||
|
||||
|
||||
randomize_va_space
|
||||
|
@ -1,3 +1,10 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=================
|
||||
Automount Support
|
||||
=================
|
||||
|
||||
|
||||
Support is available for filesystems that wish to do automounting
|
||||
support (such as kAFS which can be found in fs/afs/ and NFS in
|
||||
fs/nfs/). This facility includes allowing in-kernel mounts to be
|
||||
@ -5,13 +12,12 @@ performed and mountpoint degradation to be requested. The latter can
|
||||
also be requested by userspace.
|
||||
|
||||
|
||||
======================
|
||||
IN-KERNEL AUTOMOUNTING
|
||||
In-Kernel Automounting
|
||||
======================
|
||||
|
||||
See section "Mount Traps" of Documentation/filesystems/autofs.rst
|
||||
|
||||
Then from userspace, you can just do something like:
|
||||
Then from userspace, you can just do something like::
|
||||
|
||||
[root@andromeda root]# mount -t afs \#root.afs. /afs
|
||||
[root@andromeda root]# ls /afs
|
||||
@ -21,7 +27,7 @@ Then from userspace, you can just do something like:
|
||||
[root@andromeda root]# ls /afs/cambridge/afsdoc/
|
||||
ChangeLog html LICENSE pdf RELNOTES-1.2.2
|
||||
|
||||
And then if you look in the mountpoint catalogue, you'll see something like:
|
||||
And then if you look in the mountpoint catalogue, you'll see something like::
|
||||
|
||||
[root@andromeda root]# cat /proc/mounts
|
||||
...
|
||||
@ -30,8 +36,7 @@ And then if you look in the mountpoint catalogue, you'll see something like:
|
||||
#afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
|
||||
|
||||
|
||||
===========================
|
||||
AUTOMATIC MOUNTPOINT EXPIRY
|
||||
Automatic Mountpoint Expiry
|
||||
===========================
|
||||
|
||||
Automatic expiration of mountpoints is easy, provided you've mounted the
|
||||
@ -43,7 +48,8 @@ To do expiration, you need to follow these steps:
|
||||
hung.
|
||||
|
||||
(2) When a new mountpoint is created in the ->d_automount method, add
|
||||
the mnt to the list using mnt_set_expiry()
|
||||
the mnt to the list using mnt_set_expiry()::
|
||||
|
||||
mnt_set_expiry(newmnt, &afs_vfsmounts);
|
||||
|
||||
(3) When you want mountpoints to be expired, call mark_mounts_for_expiry()
|
||||
@ -70,8 +76,7 @@ and the copies of those that are on an expiration list will be added to the
|
||||
same expiration list.
|
||||
|
||||
|
||||
=======================
|
||||
USERSPACE DRIVEN EXPIRY
|
||||
Userspace Driven Expiry
|
||||
=======================
|
||||
|
||||
As an alternative, it is possible for userspace to request expiry of any
|
@ -1,6 +1,8 @@
|
||||
==========================
|
||||
FS-CACHE CACHE BACKEND API
|
||||
==========================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
FS-Cache Cache backend API
|
||||
==========================
|
||||
|
||||
The FS-Cache system provides an API by which actual caches can be supplied to
|
||||
FS-Cache for it to then serve out to network filesystems and other interested
|
||||
@ -9,15 +11,14 @@ parties.
|
||||
This API is declared in <linux/fscache-cache.h>.
|
||||
|
||||
|
||||
====================================
|
||||
INITIALISING AND REGISTERING A CACHE
|
||||
Initialising and Registering a Cache
|
||||
====================================
|
||||
|
||||
To start off, a cache definition must be initialised and registered for each
|
||||
cache the backend wants to make available. For instance, CacheFS does this in
|
||||
the fill_super() operation on mounting.
|
||||
|
||||
The cache definition (struct fscache_cache) should be initialised by calling:
|
||||
The cache definition (struct fscache_cache) should be initialised by calling::
|
||||
|
||||
void fscache_init_cache(struct fscache_cache *cache,
|
||||
struct fscache_cache_ops *ops,
|
||||
@ -26,17 +27,17 @@ The cache definition (struct fscache_cache) should be initialised by calling:
|
||||
|
||||
Where:
|
||||
|
||||
(*) "cache" is a pointer to the cache definition;
|
||||
* "cache" is a pointer to the cache definition;
|
||||
|
||||
(*) "ops" is a pointer to the table of operations that the backend supports on
|
||||
* "ops" is a pointer to the table of operations that the backend supports on
|
||||
this cache; and
|
||||
|
||||
(*) "idfmt" is a format and printf-style arguments for constructing a label
|
||||
* "idfmt" is a format and printf-style arguments for constructing a label
|
||||
for the cache.
|
||||
|
||||
|
||||
The cache should then be registered with FS-Cache by passing a pointer to the
|
||||
previously initialised cache definition to:
|
||||
previously initialised cache definition to::
|
||||
|
||||
int fscache_add_cache(struct fscache_cache *cache,
|
||||
struct fscache_object *fsdef,
|
||||
@ -44,12 +45,12 @@ previously initialised cache definition to:
|
||||
|
||||
Two extra arguments should also be supplied:
|
||||
|
||||
(*) "fsdef" which should point to the object representation for the FS-Cache
|
||||
* "fsdef" which should point to the object representation for the FS-Cache
|
||||
master index in this cache. Netfs primary index entries will be created
|
||||
here. FS-Cache keeps the caller's reference to the index object if
|
||||
successful and will release it upon withdrawal of the cache.
|
||||
|
||||
(*) "tagname" which, if given, should be a text string naming this cache. If
|
||||
* "tagname" which, if given, should be a text string naming this cache. If
|
||||
this is NULL, the identifier will be used instead. For CacheFS, the
|
||||
identifier is set to name the underlying block device and the tag can be
|
||||
supplied by mount.
|
||||
@ -58,20 +59,18 @@ This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
|
||||
is already in use. 0 will be returned on success.
|
||||
|
||||
|
||||
=====================
|
||||
UNREGISTERING A CACHE
|
||||
Unregistering a Cache
|
||||
=====================
|
||||
|
||||
A cache can be withdrawn from the system by calling this function with a
|
||||
pointer to the cache definition:
|
||||
pointer to the cache definition::
|
||||
|
||||
void fscache_withdraw_cache(struct fscache_cache *cache);
|
||||
|
||||
In CacheFS's case, this is called by put_super().
|
||||
|
||||
|
||||
========
|
||||
SECURITY
|
||||
Security
|
||||
========
|
||||
|
||||
The cache methods are executed one of two contexts:
|
||||
@ -89,8 +88,7 @@ be masqueraded for the duration of the cache driver's access to the cache.
|
||||
This is left to the cache to handle; FS-Cache makes no effort in this regard.
|
||||
|
||||
|
||||
===================================
|
||||
CONTROL AND STATISTICS PRESENTATION
|
||||
Control and Statistics Presentation
|
||||
===================================
|
||||
|
||||
The cache may present data to the outside world through FS-Cache's interfaces
|
||||
@ -101,11 +99,10 @@ is enabled. This is accessible through the kobject struct fscache_cache::kobj
|
||||
and is for use by the cache as it sees fit.
|
||||
|
||||
|
||||
========================
|
||||
RELEVANT DATA STRUCTURES
|
||||
Relevant Data Structures
|
||||
========================
|
||||
|
||||
(*) Index/Data file FS-Cache representation cookie:
|
||||
* Index/Data file FS-Cache representation cookie::
|
||||
|
||||
struct fscache_cookie {
|
||||
struct fscache_object_def *def;
|
||||
@ -121,7 +118,7 @@ RELEVANT DATA STRUCTURES
|
||||
cache operations.
|
||||
|
||||
|
||||
(*) In-cache object representation:
|
||||
* In-cache object representation::
|
||||
|
||||
struct fscache_object {
|
||||
int debug_id;
|
||||
@ -150,7 +147,7 @@ RELEVANT DATA STRUCTURES
|
||||
initialised by calling fscache_object_init(object).
|
||||
|
||||
|
||||
(*) FS-Cache operation record:
|
||||
* FS-Cache operation record::
|
||||
|
||||
struct fscache_operation {
|
||||
atomic_t usage;
|
||||
@ -173,7 +170,7 @@ RELEVANT DATA STRUCTURES
|
||||
an operation needs more processing time, it should be enqueued again.
|
||||
|
||||
|
||||
(*) FS-Cache retrieval operation record:
|
||||
* FS-Cache retrieval operation record::
|
||||
|
||||
struct fscache_retrieval {
|
||||
struct fscache_operation op;
|
||||
@ -198,7 +195,7 @@ RELEVANT DATA STRUCTURES
|
||||
it sees fit.
|
||||
|
||||
|
||||
(*) FS-Cache storage operation record:
|
||||
* FS-Cache storage operation record::
|
||||
|
||||
struct fscache_storage {
|
||||
struct fscache_operation op;
|
||||
@ -212,16 +209,17 @@ RELEVANT DATA STRUCTURES
|
||||
storage.
|
||||
|
||||
|
||||
================
|
||||
CACHE OPERATIONS
|
||||
Cache Operations
|
||||
================
|
||||
|
||||
The cache backend provides FS-Cache with a table of operations that can be
|
||||
performed on the denizens of the cache. These are held in a structure of type:
|
||||
|
||||
struct fscache_cache_ops
|
||||
::
|
||||
|
||||
(*) Name of cache provider [mandatory]:
|
||||
struct fscache_cache_ops
|
||||
|
||||
* Name of cache provider [mandatory]::
|
||||
|
||||
const char *name
|
||||
|
||||
@ -229,7 +227,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
the backend.
|
||||
|
||||
|
||||
(*) Allocate a new object [mandatory]:
|
||||
* Allocate a new object [mandatory]::
|
||||
|
||||
struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
|
||||
struct fscache_cookie *cookie)
|
||||
@ -244,7 +242,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
form once lookup is complete or aborted.
|
||||
|
||||
|
||||
(*) Look up and create object [mandatory]:
|
||||
* Look up and create object [mandatory]::
|
||||
|
||||
void (*lookup_object)(struct fscache_object *object)
|
||||
|
||||
@ -263,7 +261,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
to abort the lookup of that object.
|
||||
|
||||
|
||||
(*) Release lookup data [mandatory]:
|
||||
* Release lookup data [mandatory]::
|
||||
|
||||
void (*lookup_complete)(struct fscache_object *object)
|
||||
|
||||
@ -271,7 +269,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
using to perform a lookup.
|
||||
|
||||
|
||||
(*) Increment object refcount [mandatory]:
|
||||
* Increment object refcount [mandatory]::
|
||||
|
||||
struct fscache_object *(*grab_object)(struct fscache_object *object)
|
||||
|
||||
@ -280,7 +278,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
It should return the object pointer if successful.
|
||||
|
||||
|
||||
(*) Lock/Unlock object [mandatory]:
|
||||
* Lock/Unlock object [mandatory]::
|
||||
|
||||
void (*lock_object)(struct fscache_object *object)
|
||||
void (*unlock_object)(struct fscache_object *object)
|
||||
@ -289,7 +287,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
to schedule with the lock held, so a spinlock isn't sufficient.
|
||||
|
||||
|
||||
(*) Pin/Unpin object [optional]:
|
||||
* Pin/Unpin object [optional]::
|
||||
|
||||
int (*pin_object)(struct fscache_object *object)
|
||||
void (*unpin_object)(struct fscache_object *object)
|
||||
@ -299,7 +297,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
enough space in the cache to permit this.
|
||||
|
||||
|
||||
(*) Check coherency state of an object [mandatory]:
|
||||
* Check coherency state of an object [mandatory]::
|
||||
|
||||
int (*check_consistency)(struct fscache_object *object)
|
||||
|
||||
@ -308,7 +306,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
if they're consistent and -ESTALE otherwise. -ENOMEM and -ERESTARTSYS
|
||||
may also be returned.
|
||||
|
||||
(*) Update object [mandatory]:
|
||||
* Update object [mandatory]::
|
||||
|
||||
int (*update_object)(struct fscache_object *object)
|
||||
|
||||
@ -317,7 +315,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
obtained by calling object->cookie->def->get_aux()/get_attr().
|
||||
|
||||
|
||||
(*) Invalidate data object [mandatory]:
|
||||
* Invalidate data object [mandatory]::
|
||||
|
||||
int (*invalidate_object)(struct fscache_operation *op)
|
||||
|
||||
@ -329,7 +327,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
fscache_op_complete() must be called on op before returning.
|
||||
|
||||
|
||||
(*) Discard object [mandatory]:
|
||||
* Discard object [mandatory]::
|
||||
|
||||
void (*drop_object)(struct fscache_object *object)
|
||||
|
||||
@ -341,7 +339,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
caller. The caller will invoke the put_object() method as appropriate.
|
||||
|
||||
|
||||
(*) Release object reference [mandatory]:
|
||||
* Release object reference [mandatory]::
|
||||
|
||||
void (*put_object)(struct fscache_object *object)
|
||||
|
||||
@ -349,7 +347,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
be freed when all the references to it are released.
|
||||
|
||||
|
||||
(*) Synchronise a cache [mandatory]:
|
||||
* Synchronise a cache [mandatory]::
|
||||
|
||||
void (*sync)(struct fscache_cache *cache)
|
||||
|
||||
@ -357,7 +355,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
device.
|
||||
|
||||
|
||||
(*) Dissociate a cache [mandatory]:
|
||||
* Dissociate a cache [mandatory]::
|
||||
|
||||
void (*dissociate_pages)(struct fscache_cache *cache)
|
||||
|
||||
@ -365,7 +363,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
cache withdrawal.
|
||||
|
||||
|
||||
(*) Notification that the attributes on a netfs file changed [mandatory]:
|
||||
* Notification that the attributes on a netfs file changed [mandatory]::
|
||||
|
||||
int (*attr_changed)(struct fscache_object *object);
|
||||
|
||||
@ -386,7 +384,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
execution of this operation.
|
||||
|
||||
|
||||
(*) Reserve cache space for an object's data [optional]:
|
||||
* Reserve cache space for an object's data [optional]::
|
||||
|
||||
int (*reserve_space)(struct fscache_object *object, loff_t size);
|
||||
|
||||
@ -404,7 +402,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
size if larger than that already.
|
||||
|
||||
|
||||
(*) Request page be read from cache [mandatory]:
|
||||
* Request page be read from cache [mandatory]::
|
||||
|
||||
int (*read_or_alloc_page)(struct fscache_retrieval *op,
|
||||
struct page *page,
|
||||
@ -446,7 +444,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
with. This will complete the operation when all pages are dealt with.
|
||||
|
||||
|
||||
(*) Request pages be read from cache [mandatory]:
|
||||
* Request pages be read from cache [mandatory]::
|
||||
|
||||
int (*read_or_alloc_pages)(struct fscache_retrieval *op,
|
||||
struct list_head *pages,
|
||||
@ -457,7 +455,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
of pages instead of one page. Any pages on which a read operation is
|
||||
started must be added to the page cache for the specified mapping and also
|
||||
to the LRU. Such pages must also be removed from the pages list and
|
||||
*nr_pages decremented per page.
|
||||
``*nr_pages`` decremented per page.
|
||||
|
||||
If there was an error such as -ENOMEM, then that should be returned; else
|
||||
if one or more pages couldn't be read or allocated, then -ENOBUFS should
|
||||
@ -466,7 +464,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
returned.
|
||||
|
||||
|
||||
(*) Request page be allocated in the cache [mandatory]:
|
||||
* Request page be allocated in the cache [mandatory]::
|
||||
|
||||
int (*allocate_page)(struct fscache_retrieval *op,
|
||||
struct page *page,
|
||||
@ -482,7 +480,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
allocated, then the netfs page should be marked and 0 returned.
|
||||
|
||||
|
||||
(*) Request pages be allocated in the cache [mandatory]:
|
||||
* Request pages be allocated in the cache [mandatory]::
|
||||
|
||||
int (*allocate_pages)(struct fscache_retrieval *op,
|
||||
struct list_head *pages,
|
||||
@ -493,7 +491,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
nr_pages should be treated as for the read_or_alloc_pages() method.
|
||||
|
||||
|
||||
(*) Request page be written to cache [mandatory]:
|
||||
* Request page be written to cache [mandatory]::
|
||||
|
||||
int (*write_page)(struct fscache_storage *op,
|
||||
struct page *page);
|
||||
@ -514,7 +512,7 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
appropriately.
|
||||
|
||||
|
||||
(*) Discard retained per-page metadata [mandatory]:
|
||||
* Discard retained per-page metadata [mandatory]::
|
||||
|
||||
void (*uncache_page)(struct fscache_object *object, struct page *page)
|
||||
|
||||
@ -523,13 +521,12 @@ performed on the denizens of the cache. These are held in a structure of type:
|
||||
maintains for this page.
|
||||
|
||||
|
||||
==================
|
||||
FS-CACHE UTILITIES
|
||||
FS-Cache Utilities
|
||||
==================
|
||||
|
||||
FS-Cache provides some utilities that a cache backend may make use of:
|
||||
|
||||
(*) Note occurrence of an I/O error in a cache:
|
||||
* Note occurrence of an I/O error in a cache::
|
||||
|
||||
void fscache_io_error(struct fscache_cache *cache)
|
||||
|
||||
@ -541,7 +538,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
This does not actually withdraw the cache. That must be done separately.
|
||||
|
||||
|
||||
(*) Invoke the retrieval I/O completion function:
|
||||
* Invoke the retrieval I/O completion function::
|
||||
|
||||
void fscache_end_io(struct fscache_retrieval *op, struct page *page,
|
||||
int error);
|
||||
@ -550,8 +547,8 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
error value should be 0 if successful and an error otherwise.
|
||||
|
||||
|
||||
(*) Record that one or more pages being retrieved or allocated have been dealt
|
||||
with:
|
||||
* Record that one or more pages being retrieved or allocated have been dealt
|
||||
with::
|
||||
|
||||
void fscache_retrieval_complete(struct fscache_retrieval *op,
|
||||
int n_pages);
|
||||
@ -562,7 +559,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
completed.
|
||||
|
||||
|
||||
(*) Record operation completion:
|
||||
* Record operation completion::
|
||||
|
||||
void fscache_op_complete(struct fscache_operation *op);
|
||||
|
||||
@ -571,7 +568,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
one or more pending operations to start running.
|
||||
|
||||
|
||||
(*) Set highest store limit:
|
||||
* Set highest store limit::
|
||||
|
||||
void fscache_set_store_limit(struct fscache_object *object,
|
||||
loff_t i_size);
|
||||
@ -581,7 +578,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
rejected by fscache_read_alloc_page() and co with -ENOBUFS.
|
||||
|
||||
|
||||
(*) Mark pages as being cached:
|
||||
* Mark pages as being cached::
|
||||
|
||||
void fscache_mark_pages_cached(struct fscache_retrieval *op,
|
||||
struct pagevec *pagevec);
|
||||
@ -590,7 +587,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
the netfs must call fscache_uncache_page() to unmark the pages.
|
||||
|
||||
|
||||
(*) Perform coherency check on an object:
|
||||
* Perform coherency check on an object::
|
||||
|
||||
enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
|
||||
const void *data,
|
||||
@ -603,29 +600,26 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
|
||||
One of three values will be returned:
|
||||
|
||||
(*) FSCACHE_CHECKAUX_OKAY
|
||||
|
||||
FSCACHE_CHECKAUX_OKAY
|
||||
The coherency data indicates the object is valid as is.
|
||||
|
||||
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||
|
||||
FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||
The coherency data needs updating, but otherwise the object is
|
||||
valid.
|
||||
|
||||
(*) FSCACHE_CHECKAUX_OBSOLETE
|
||||
|
||||
FSCACHE_CHECKAUX_OBSOLETE
|
||||
The coherency data indicates that the object is obsolete and should
|
||||
be discarded.
|
||||
|
||||
|
||||
(*) Initialise a freshly allocated object:
|
||||
* Initialise a freshly allocated object::
|
||||
|
||||
void fscache_object_init(struct fscache_object *object);
|
||||
|
||||
This initialises all the fields in an object representation.
|
||||
|
||||
|
||||
(*) Indicate the destruction of an object:
|
||||
* Indicate the destruction of an object::
|
||||
|
||||
void fscache_object_destroyed(struct fscache_cache *cache);
|
||||
|
||||
@ -635,7 +629,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
all the objects.
|
||||
|
||||
|
||||
(*) Indicate negative lookup on an object:
|
||||
* Indicate negative lookup on an object::
|
||||
|
||||
void fscache_object_lookup_negative(struct fscache_object *object);
|
||||
|
||||
@ -650,7 +644,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
significant - all subsequent calls are ignored.
|
||||
|
||||
|
||||
(*) Indicate an object has been obtained:
|
||||
* Indicate an object has been obtained::
|
||||
|
||||
void fscache_obtained_object(struct fscache_object *object);
|
||||
|
||||
@ -667,7 +661,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
(2) that writes may now proceed against this object.
|
||||
|
||||
|
||||
(*) Indicate that object lookup failed:
|
||||
* Indicate that object lookup failed::
|
||||
|
||||
void fscache_object_lookup_error(struct fscache_object *object);
|
||||
|
||||
@ -676,7 +670,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
as possible.
|
||||
|
||||
|
||||
(*) Indicate that a stale object was found and discarded:
|
||||
* Indicate that a stale object was found and discarded::
|
||||
|
||||
void fscache_object_retrying_stale(struct fscache_object *object);
|
||||
|
||||
@ -685,7 +679,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
discarded from the cache and the lookup will be performed again.
|
||||
|
||||
|
||||
(*) Indicate that the caching backend killed an object:
|
||||
* Indicate that the caching backend killed an object::
|
||||
|
||||
void fscache_object_mark_killed(struct fscache_object *object,
|
||||
enum fscache_why_object_killed why);
|
||||
@ -693,13 +687,20 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
This is called to indicate that the cache backend preemptively killed an
|
||||
object. The why parameter should be set to indicate the reason:
|
||||
|
||||
FSCACHE_OBJECT_IS_STALE - the object was stale and needs discarding.
|
||||
FSCACHE_OBJECT_NO_SPACE - there was insufficient cache space
|
||||
FSCACHE_OBJECT_WAS_RETIRED - the object was retired when relinquished.
|
||||
FSCACHE_OBJECT_WAS_CULLED - the object was culled to make space.
|
||||
FSCACHE_OBJECT_IS_STALE
|
||||
- the object was stale and needs discarding.
|
||||
|
||||
FSCACHE_OBJECT_NO_SPACE
|
||||
- there was insufficient cache space
|
||||
|
||||
FSCACHE_OBJECT_WAS_RETIRED
|
||||
- the object was retired when relinquished.
|
||||
|
||||
FSCACHE_OBJECT_WAS_CULLED
|
||||
- the object was culled to make space.
|
||||
|
||||
|
||||
(*) Get and release references on a retrieval record:
|
||||
* Get and release references on a retrieval record::
|
||||
|
||||
void fscache_get_retrieval(struct fscache_retrieval *op);
|
||||
void fscache_put_retrieval(struct fscache_retrieval *op);
|
||||
@ -708,7 +709,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
asynchronous data retrieval and block allocation.
|
||||
|
||||
|
||||
(*) Enqueue a retrieval record for processing.
|
||||
* Enqueue a retrieval record for processing::
|
||||
|
||||
void fscache_enqueue_retrieval(struct fscache_retrieval *op);
|
||||
|
||||
@ -718,7 +719,7 @@ FS-Cache provides some utilities that a cache backend may make use of:
|
||||
within the callback function.
|
||||
|
||||
|
||||
(*) List of object state names:
|
||||
* List of object state names::
|
||||
|
||||
const char *fscache_object_states[];
|
||||
|
@ -1,8 +1,10 @@
|
||||
===============================================
|
||||
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
||||
===============================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Contents:
|
||||
===============================================
|
||||
CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
|
||||
===============================================
|
||||
|
||||
.. Contents:
|
||||
|
||||
(*) Overview.
|
||||
|
||||
@ -27,8 +29,8 @@ Contents:
|
||||
(*) Debugging.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
CacheFiles is a caching backend that's meant to use as a cache a directory on
|
||||
@ -58,8 +60,8 @@ spare space and automatically contract when the set of data requires more
|
||||
space.
|
||||
|
||||
|
||||
============
|
||||
REQUIREMENTS
|
||||
|
||||
Requirements
|
||||
============
|
||||
|
||||
The use of CacheFiles and its daemon requires the following features to be
|
||||
@ -79,84 +81,70 @@ It is strongly recommended that the "dir_index" option is enabled on Ext3
|
||||
filesystems being used as a cache.
|
||||
|
||||
|
||||
=============
|
||||
CONFIGURATION
|
||||
Configuration
|
||||
=============
|
||||
|
||||
The cache is configured by a script in /etc/cachefilesd.conf. These commands
|
||||
set up cache ready for use. The following script commands are available:
|
||||
|
||||
(*) brun <N>%
|
||||
(*) bcull <N>%
|
||||
(*) bstop <N>%
|
||||
(*) frun <N>%
|
||||
(*) fcull <N>%
|
||||
(*) fstop <N>%
|
||||
|
||||
brun <N>%, bcull <N>%, bstop <N>%, frun <N>%, fcull <N>%, fstop <N>%
|
||||
Configure the culling limits. Optional. See the section on culling
|
||||
The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
|
||||
|
||||
The commands beginning with a 'b' are file space (block) limits, those
|
||||
beginning with an 'f' are file count limits.
|
||||
|
||||
(*) dir <path>
|
||||
|
||||
dir <path>
|
||||
Specify the directory containing the root of the cache. Mandatory.
|
||||
|
||||
(*) tag <name>
|
||||
|
||||
tag <name>
|
||||
Specify a tag to FS-Cache to use in distinguishing multiple caches.
|
||||
Optional. The default is "CacheFiles".
|
||||
|
||||
(*) debug <mask>
|
||||
|
||||
debug <mask>
|
||||
Specify a numeric bitmask to control debugging in the kernel module.
|
||||
Optional. The default is zero (all off). The following values can be
|
||||
OR'd into the mask to collect various information:
|
||||
|
||||
== =================================================
|
||||
1 Turn on trace of function entry (_enter() macros)
|
||||
2 Turn on trace of function exit (_leave() macros)
|
||||
4 Turn on trace of internal debug points (_debug())
|
||||
== =================================================
|
||||
|
||||
This mask can also be set through sysfs, eg:
|
||||
This mask can also be set through sysfs, eg::
|
||||
|
||||
echo 5 >/sys/modules/cachefiles/parameters/debug
|
||||
|
||||
|
||||
==================
|
||||
STARTING THE CACHE
|
||||
Starting the Cache
|
||||
==================
|
||||
|
||||
The cache is started by running the daemon. The daemon opens the cache device,
|
||||
configures the cache and tells it to begin caching. At that point the cache
|
||||
binds to fscache and the cache becomes live.
|
||||
|
||||
The daemon is run as follows:
|
||||
The daemon is run as follows::
|
||||
|
||||
/sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
|
||||
|
||||
The flags are:
|
||||
|
||||
(*) -d
|
||||
|
||||
``-d``
|
||||
Increase the debugging level. This can be specified multiple times and
|
||||
is cumulative with itself.
|
||||
|
||||
(*) -s
|
||||
|
||||
``-s``
|
||||
Send messages to stderr instead of syslog.
|
||||
|
||||
(*) -n
|
||||
|
||||
``-n``
|
||||
Don't daemonise and go into background.
|
||||
|
||||
(*) -f <configfile>
|
||||
|
||||
``-f <configfile>``
|
||||
Use an alternative configuration file rather than the default one.
|
||||
|
||||
|
||||
===============
|
||||
THINGS TO AVOID
|
||||
Things to Avoid
|
||||
===============
|
||||
|
||||
Do not mount other things within the cache as this will cause problems. The
|
||||
@ -179,8 +167,7 @@ Do not chmod files in the cache. The module creates things with minimal
|
||||
permissions to prevent random users being able to access them directly.
|
||||
|
||||
|
||||
=============
|
||||
CACHE CULLING
|
||||
Cache Culling
|
||||
=============
|
||||
|
||||
The cache may need culling occasionally to make space. This involves
|
||||
@ -192,27 +179,21 @@ Cache culling is done on the basis of the percentage of blocks and the
|
||||
percentage of files available in the underlying filesystem. There are six
|
||||
"limits":
|
||||
|
||||
(*) brun
|
||||
(*) frun
|
||||
|
||||
brun, frun
|
||||
If the amount of free space and the number of available files in the cache
|
||||
rises above both these limits, then culling is turned off.
|
||||
|
||||
(*) bcull
|
||||
(*) fcull
|
||||
|
||||
bcull, fcull
|
||||
If the amount of available space or the number of available files in the
|
||||
cache falls below either of these limits, then culling is started.
|
||||
|
||||
(*) bstop
|
||||
(*) fstop
|
||||
|
||||
bstop, fstop
|
||||
If the amount of available space or the number of available files in the
|
||||
cache falls below either of these limits, then no further allocation of
|
||||
disk space or files is permitted until culling has raised things above
|
||||
these limits again.
|
||||
|
||||
These must be configured thusly:
|
||||
These must be configured thusly::
|
||||
|
||||
0 <= bstop < bcull < brun < 100
|
||||
0 <= fstop < fcull < frun < 100
|
||||
@ -226,16 +207,14 @@ started as soon as space is made in the table. Objects will be skipped if
|
||||
their atimes have changed or if the kernel module says it is still using them.
|
||||
|
||||
|
||||
===============
|
||||
CACHE STRUCTURE
|
||||
Cache Structure
|
||||
===============
|
||||
|
||||
The CacheFiles module will create two directories in the directory it was
|
||||
given:
|
||||
|
||||
(*) cache/
|
||||
|
||||
(*) graveyard/
|
||||
* cache/
|
||||
* graveyard/
|
||||
|
||||
The active cache objects all reside in the first directory. The CacheFiles
|
||||
kernel module moves any retired or culled objects that it can't simply unlink
|
||||
@ -261,10 +240,10 @@ If an object has children, then it will be represented as a directory.
|
||||
Immediately in the representative directory are a collection of directories
|
||||
named for hash values of the child object keys with an '@' prepended. Into
|
||||
this directory, if possible, will be placed the representations of the child
|
||||
objects:
|
||||
objects::
|
||||
|
||||
INDEX INDEX INDEX DATA FILES
|
||||
========= ========== ================================= ================
|
||||
/INDEX /INDEX /INDEX /DATA FILES
|
||||
/=========/==========/=================================/================
|
||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
|
||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
|
||||
cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
|
||||
@ -275,7 +254,7 @@ If the key is so long that it exceeds NAME_MAX with the decorations added on to
|
||||
it, then it will be cut into pieces, the first few of which will be used to
|
||||
make a nest of directories, and the last one of which will be the objects
|
||||
inside the last directory. The names of the intermediate directories will have
|
||||
'+' prepended:
|
||||
'+' prepended::
|
||||
|
||||
J1223/@23/+xy...z/+kl...m/Epqr
|
||||
|
||||
@ -288,11 +267,13 @@ To handle this, CacheFiles will use a suitably printable filename directly and
|
||||
"base-64" encode ones that aren't directly suitable. The two versions of
|
||||
object filenames indicate the encoding:
|
||||
|
||||
=============== =============== ===============
|
||||
OBJECT TYPE PRINTABLE ENCODED
|
||||
=============== =============== ===============
|
||||
Index "I..." "J..."
|
||||
Data "D..." "E..."
|
||||
Special "S..." "T..."
|
||||
=============== =============== ===============
|
||||
|
||||
Intermediate directories are always "@" or "+" as appropriate.
|
||||
|
||||
@ -307,8 +288,7 @@ Note that CacheFiles will erase from the cache any file it doesn't recognise or
|
||||
any file of an incorrect type (such as a FIFO file or a device file).
|
||||
|
||||
|
||||
==========================
|
||||
SECURITY MODEL AND SELINUX
|
||||
Security Model and SELinux
|
||||
==========================
|
||||
|
||||
CacheFiles is implemented to deal properly with the LSM security features of
|
||||
@ -331,26 +311,26 @@ When the CacheFiles module is asked to bind to its cache, it:
|
||||
|
||||
(1) Finds the security label attached to the root cache directory and uses
|
||||
that as the security label with which it will create files. By default,
|
||||
this is:
|
||||
this is::
|
||||
|
||||
cachefiles_var_t
|
||||
|
||||
(2) Finds the security label of the process which issued the bind request
|
||||
(presumed to be the cachefilesd daemon), which by default will be:
|
||||
(presumed to be the cachefilesd daemon), which by default will be::
|
||||
|
||||
cachefilesd_t
|
||||
|
||||
and asks LSM to supply a security ID as which it should act given the
|
||||
daemon's label. By default, this will be:
|
||||
daemon's label. By default, this will be::
|
||||
|
||||
cachefiles_kernel_t
|
||||
|
||||
SELinux transitions the daemon's security ID to the module's security ID
|
||||
based on a rule of this form in the policy.
|
||||
based on a rule of this form in the policy::
|
||||
|
||||
type_transition <daemon's-ID> kernel_t : process <module's-ID>;
|
||||
|
||||
For instance:
|
||||
For instance::
|
||||
|
||||
type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
|
||||
|
||||
@ -370,7 +350,7 @@ There are policy source files available in:
|
||||
|
||||
http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
|
||||
|
||||
and later versions. In that tarball, see the files:
|
||||
and later versions. In that tarball, see the files::
|
||||
|
||||
cachefilesd.te
|
||||
cachefilesd.fc
|
||||
@ -379,7 +359,7 @@ and later versions. In that tarball, see the files:
|
||||
They are built and installed directly by the RPM.
|
||||
|
||||
If a non-RPM based system is being used, then copy the above files to their own
|
||||
directory and run:
|
||||
directory and run::
|
||||
|
||||
make -f /usr/share/selinux/devel/Makefile
|
||||
semodule -i cachefilesd.pp
|
||||
@ -394,7 +374,7 @@ an auxiliary policy must be installed to label the alternate location of the
|
||||
cache.
|
||||
|
||||
For instructions on how to add an auxiliary policy to enable the cache to be
|
||||
located elsewhere when SELinux is in enforcing mode, please see:
|
||||
located elsewhere when SELinux is in enforcing mode, please see::
|
||||
|
||||
/usr/share/doc/cachefilesd-*/move-cache.txt
|
||||
|
||||
@ -402,8 +382,7 @@ When the cachefilesd rpm is installed; alternatively, the document can be found
|
||||
in the sources.
|
||||
|
||||
|
||||
==================
|
||||
A NOTE ON SECURITY
|
||||
A Note on Security
|
||||
==================
|
||||
|
||||
CacheFiles makes use of the split security in the task_struct. It allocates
|
||||
@ -445,17 +424,18 @@ for CacheFiles to run in a context of a specific security label, or to create
|
||||
files and directories with another security label.
|
||||
|
||||
|
||||
=======================
|
||||
STATISTICAL INFORMATION
|
||||
Statistical Information
|
||||
=======================
|
||||
|
||||
If FS-Cache is compiled with the following option enabled:
|
||||
If FS-Cache is compiled with the following option enabled::
|
||||
|
||||
CONFIG_CACHEFILES_HISTOGRAM=y
|
||||
|
||||
then it will gather certain statistics and display them through a proc file.
|
||||
|
||||
(*) /proc/fs/cachefiles/histogram
|
||||
/proc/fs/cachefiles/histogram
|
||||
|
||||
::
|
||||
|
||||
cat /proc/fs/cachefiles/histogram
|
||||
JIFS SECS LOOKUPS MKDIRS CREATES
|
||||
@ -465,36 +445,39 @@ then it will gather certain statistics and display them through a proc file.
|
||||
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||
columns are as follows:
|
||||
|
||||
======= =======================================================
|
||||
COLUMN TIME MEASUREMENT
|
||||
======= =======================================================
|
||||
LOOKUPS Length of time to perform a lookup on the backing fs
|
||||
MKDIRS Length of time to perform a mkdir on the backing fs
|
||||
CREATES Length of time to perform a create on the backing fs
|
||||
======= =======================================================
|
||||
|
||||
Each row shows the number of events that took a particular range of times.
|
||||
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||
|
||||
|
||||
=========
|
||||
DEBUGGING
|
||||
Debugging
|
||||
=========
|
||||
|
||||
If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
|
||||
debugging enabled by adjusting the value in:
|
||||
debugging enabled by adjusting the value in::
|
||||
|
||||
/sys/module/cachefiles/parameters/debug
|
||||
|
||||
This is a bitmask of debugging streams to enable:
|
||||
|
||||
======= ======= =============================== =======================
|
||||
BIT VALUE STREAM POINT
|
||||
======= ======= =============================== =======================
|
||||
0 1 General Function entry trace
|
||||
1 2 Function exit trace
|
||||
2 4 General
|
||||
======= ======= =============================== =======================
|
||||
|
||||
The appropriate set of values should be OR'd together and the result written to
|
||||
the control file. For example:
|
||||
the control file. For example::
|
||||
|
||||
echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
|
||||
|
565
Documentation/filesystems/caching/fscache.rst
Normal file
565
Documentation/filesystems/caching/fscache.rst
Normal file
@ -0,0 +1,565 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
General Filesystem Caching
|
||||
==========================
|
||||
|
||||
Overview
|
||||
========
|
||||
|
||||
This facility is a general purpose cache for network filesystems, though it
|
||||
could be used for caching other things such as ISO9660 filesystems too.
|
||||
|
||||
FS-Cache mediates between cache backends (such as CacheFS) and network
|
||||
filesystems::
|
||||
|
||||
+---------+
|
||||
| | +--------------+
|
||||
| NFS |--+ | |
|
||||
| | | +-->| CacheFS |
|
||||
+---------+ | +----------+ | | /dev/hda5 |
|
||||
| | | | +--------------+
|
||||
+---------+ +-->| | |
|
||||
| | | |--+
|
||||
| AFS |----->| FS-Cache |
|
||||
| | | |--+
|
||||
+---------+ +-->| | |
|
||||
| | | | +--------------+
|
||||
+---------+ | +----------+ | | |
|
||||
| | | +-->| CacheFiles |
|
||||
| ISOFS |--+ | /var/cache |
|
||||
| | +--------------+
|
||||
+---------+
|
||||
|
||||
Or to look at it another way, FS-Cache is a module that provides a caching
|
||||
facility to a network filesystem such that the cache is transparent to the
|
||||
user::
|
||||
|
||||
+---------+
|
||||
| |
|
||||
| Server |
|
||||
| |
|
||||
+---------+
|
||||
| NETWORK
|
||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
||||
| +----------+
|
||||
V | |
|
||||
+---------+ | |
|
||||
| | | |
|
||||
| NFS |----->| FS-Cache |
|
||||
| | | |--+
|
||||
+---------+ | | | +--------------+ +--------------+
|
||||
| | | | | | | |
|
||||
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
||||
+---------+ | /var/cache | | /dev/sda6 |
|
||||
| | +--------------+ +--------------+
|
||||
| VFS | ^ ^
|
||||
| | | |
|
||||
+---------+ +--------------+ |
|
||||
| KERNEL SPACE | |
|
||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
||||
| USER SPACE | |
|
||||
V | |
|
||||
+---------+ +--------------+
|
||||
| | | |
|
||||
| Process | | cachefilesd |
|
||||
| | | |
|
||||
+---------+ +--------------+
|
||||
|
||||
|
||||
FS-Cache does not follow the idea of completely loading every netfs file
|
||||
opened in its entirety into a cache before permitting it to be accessed and
|
||||
then serving the pages out of that cache rather than the netfs inode because:
|
||||
|
||||
(1) It must be practical to operate without a cache.
|
||||
|
||||
(2) The size of any accessible file must not be limited to the size of the
|
||||
cache.
|
||||
|
||||
(3) The combined size of all opened files (this includes mapped libraries)
|
||||
must not be limited to the size of the cache.
|
||||
|
||||
(4) The user should not be forced to download an entire file just to do a
|
||||
one-off access of a small portion of it (such as might be done with the
|
||||
"file" program).
|
||||
|
||||
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
||||
the netfs('s) using it.
|
||||
|
||||
|
||||
FS-Cache provides the following facilities:
|
||||
|
||||
(1) More than one cache can be used at once. Caches can be selected
|
||||
explicitly by use of tags.
|
||||
|
||||
(2) Caches can be added / removed at any time.
|
||||
|
||||
(3) The netfs is provided with an interface that allows either party to
|
||||
withdraw caching facilities from a file (required for (2)).
|
||||
|
||||
(4) The interface to the netfs returns as few errors as possible, preferring
|
||||
rather to let the netfs remain oblivious.
|
||||
|
||||
(5) Cookies are used to represent indices, files and other objects to the
|
||||
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
||||
cached there.
|
||||
|
||||
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
||||
desires, though it must be aware that the index search function is
|
||||
recursive, stack space is limited, and indices can only be children of
|
||||
indices.
|
||||
|
||||
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
||||
indicates that page A is at index B of the data-file represented by cookie
|
||||
C, and that it should be read or written. The cache backend may or may
|
||||
not start I/O on that page, but if it does, a netfs callback will be
|
||||
invoked to indicate completion. The I/O may be either synchronous or
|
||||
asynchronous.
|
||||
|
||||
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
||||
them as obsolete and the index hierarchy rooted at that point will get
|
||||
recycled.
|
||||
|
||||
(9) The netfs provides a "match" function for index searches. In addition to
|
||||
saying whether a match was made or not, this can also specify that an
|
||||
entry should be updated or deleted.
|
||||
|
||||
(10) As much as possible is done asynchronously.
|
||||
|
||||
|
||||
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
||||
and pages are kept. Bits of this tree may actually reside in one or more
|
||||
caches::
|
||||
|
||||
FSDEF
|
||||
|
|
||||
+------------------------------------+
|
||||
| |
|
||||
NFS AFS
|
||||
| |
|
||||
+--------------------------+ +-----------+
|
||||
| | | |
|
||||
homedir mirror afs.org redhat.com
|
||||
| | |
|
||||
+------------+ +---------------+ +----------+
|
||||
| | | | | |
|
||||
00001 00002 00007 00125 vol00001 vol00002
|
||||
| | | | |
|
||||
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
||||
| | | | | | | | | | | | |
|
||||
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
||||
| |
|
||||
PG0 +-------+
|
||||
| |
|
||||
00001 00003
|
||||
|
|
||||
+---+---+
|
||||
| | |
|
||||
PG0 PG1 PG2
|
||||
|
||||
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
||||
have different index hierarchies:
|
||||
|
||||
* The NFS primary index contains per-server indices. Each server index is
|
||||
indexed by NFS file handles to get data file objects. Each data file
|
||||
objects can have an array of pages, but may also have further child
|
||||
objects, such as extended attributes and directory entries. Extended
|
||||
attribute objects themselves have page-array contents.
|
||||
|
||||
* The AFS primary index contains per-cell indices. Each cell index contains
|
||||
per-logical-volume indices. Each of volume index contains up to three
|
||||
indices for the read-write, read-only and backup mirrors of those volumes.
|
||||
Each of these contains vnode data file objects, each of which contains an
|
||||
array of pages.
|
||||
|
||||
The very top index is the FS-Cache master index in which individual netfs's
|
||||
have entries.
|
||||
|
||||
Any index object may reside in more than one cache, provided it only has index
|
||||
children. Any index with non-index object children will be assumed to only
|
||||
reside in one cache.
|
||||
|
||||
|
||||
The netfs API to FS-Cache can be found in:
|
||||
|
||||
Documentation/filesystems/caching/netfs-api.rst
|
||||
|
||||
The cache backend API to FS-Cache can be found in:
|
||||
|
||||
Documentation/filesystems/caching/backend-api.rst
|
||||
|
||||
A description of the internal representations and object state machine can be
|
||||
found in:
|
||||
|
||||
Documentation/filesystems/caching/object.rst
|
||||
|
||||
|
||||
Statistical Information
|
||||
=======================
|
||||
|
||||
If FS-Cache is compiled with the following options enabled::
|
||||
|
||||
CONFIG_FSCACHE_STATS=y
|
||||
CONFIG_FSCACHE_HISTOGRAM=y
|
||||
|
||||
then it will gather certain statistics and display them through a number of
|
||||
proc files.
|
||||
|
||||
/proc/fs/fscache/stats
|
||||
----------------------
|
||||
|
||||
This shows counts of a number of events that can happen in FS-Cache:
|
||||
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|CLASS |EVENT |MEANING |
|
||||
+==============+=======+=======================================================+
|
||||
|Cookies |idx=N |Number of index cookies allocated |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |dat=N |Number of data storage cookies allocated |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |spc=N |Number of special cookies allocated |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Objects |alc=N |Number of objects allocated |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nal=N |Number of object allocation failures |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |avl=N |Number of objects that reached the available state |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ded=N |Number of objects that reached the dead state |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|ChkAux |non=N |Number of objects that didn't have a coherency check |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of objects that passed a coherency check |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |upd=N |Number of objects that needed a coherency data update |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |obs=N |Number of objects that were declared obsolete |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Pages |mrk=N |Number of pages marked as being cached |
|
||||
| |unc=N |Number of uncache page requests seen |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Acquire |n=N |Number of acquire cookie requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nul=N |Number of acq reqs given a NULL parent |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |noc=N |Number of acq reqs rejected due to no cache available |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of acq reqs succeeded |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nbf=N |Number of acq reqs rejected due to error |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |oom=N |Number of acq reqs failed on ENOMEM |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Lookups |n=N |Number of lookup calls made on cache backends |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |neg=N |Number of negative lookups made |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |pos=N |Number of positive lookups made |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |crt=N |Number of objects created by lookup |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |tmo=N |Number of lookups timed out and requeued |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Updates |n=N |Number of update cookie requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nul=N |Number of upd reqs given a NULL parent |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |run=N |Number of upd reqs granted CPU time |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Relinqs |n=N |Number of relinquish cookie requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nul=N |Number of rlq reqs given a NULL parent |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |wcr=N |Number of rlq reqs waited on completion of creation |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|AttrChg |n=N |Number of attribute changed requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of attr changed requests queued |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nbf=N |Number of attr changed rejected -ENOBUFS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |oom=N |Number of attr changed failed -ENOMEM |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |run=N |Number of attr changed ops given CPU time |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Allocs |n=N |Number of allocation requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of successful alloc reqs |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |wt=N |Number of alloc reqs that waited on lookup completion |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nbf=N |Number of alloc reqs rejected -ENOBUFS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |int=N |Number of alloc reqs aborted -ERESTARTSYS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ops=N |Number of alloc reqs submitted |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |owt=N |Number of alloc reqs waited for CPU time |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |abt=N |Number of alloc reqs aborted due to object death |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Retrvls |n=N |Number of retrieval (read) requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of successful retr reqs |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |wt=N |Number of retr reqs that waited on lookup completion |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nod=N |Number of retr reqs returned -ENODATA |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nbf=N |Number of retr reqs rejected -ENOBUFS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |int=N |Number of retr reqs aborted -ERESTARTSYS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |oom=N |Number of retr reqs failed -ENOMEM |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ops=N |Number of retr reqs submitted |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |owt=N |Number of retr reqs waited for CPU time |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |abt=N |Number of retr reqs aborted due to object death |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Stores |n=N |Number of storage (write) requests seen |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ok=N |Number of successful store reqs |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |agn=N |Number of store reqs on a page already pending storage |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |nbf=N |Number of store reqs rejected -ENOBUFS |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |oom=N |Number of store reqs failed -ENOMEM |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ops=N |Number of store reqs submitted |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |run=N |Number of store reqs granted CPU time |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |pgs=N |Number of pages given store req processing time |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |rxd=N |Number of store reqs deleted from tracking tree |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |olm=N |Number of store reqs over store limit |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|VmScan |nos=N |Number of release reqs against pages with no |
|
||||
| | |pending store |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |gon=N |Number of release reqs against pages stored by |
|
||||
| | |time lock granted |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |bsy=N |Number of release reqs ignored due to in-progress store|
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |can=N |Number of page stores cancelled due to release req |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|Ops |pend=N |Number of times async ops added to pending queues |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |run=N |Number of times async ops given CPU time |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |enq=N |Number of times async ops queued for processing |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |can=N |Number of async ops cancelled |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |rej=N |Number of async ops rejected due to object |
|
||||
| | |lookup/create failure |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ini=N |Number of async ops initialised |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |dfr=N |Number of async ops queued for deferred release |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |rel=N |Number of async ops released |
|
||||
| | |(should equal ini=N when idle) |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |gc=N |Number of deferred-release async ops garbage collected |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|CacheOp |alo=N |Number of in-progress alloc_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |luo=N |Number of in-progress lookup_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |luc=N |Number of in-progress lookup_complete() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |gro=N |Number of in-progress grab_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |upo=N |Number of in-progress update_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |dro=N |Number of in-progress drop_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |pto=N |Number of in-progress put_object() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |syn=N |Number of in-progress sync_cache() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |atc=N |Number of in-progress attr_changed() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |rap=N |Number of in-progress read_or_alloc_page() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ras=N |Number of in-progress read_or_alloc_pages() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |alp=N |Number of in-progress allocate_page() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |als=N |Number of in-progress allocate_pages() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |wrp=N |Number of in-progress write_page() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |ucp=N |Number of in-progress uncache_page() cache ops |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |dsp=N |Number of in-progress dissociate_pages() cache ops |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|CacheEv |nsp=N |Number of object lookups/creations rejected due to |
|
||||
| | |lack of space |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |stl=N |Number of stale objects deleted |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |rtr=N |Number of objects retired when relinquished |
|
||||
+ +-------+-------------------------------------------------------+
|
||||
| |cul=N |Number of objects culled |
|
||||
+--------------+-------+-------------------------------------------------------+
|
||||
|
||||
|
||||
|
||||
/proc/fs/fscache/histogram
|
||||
--------------------------
|
||||
|
||||
::
|
||||
|
||||
cat /proc/fs/fscache/histogram
|
||||
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
||||
===== ===== ========= ========= ========= ========= =========
|
||||
|
||||
This shows the breakdown of the number of times each amount of time
|
||||
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||
columns are as follows:
|
||||
|
||||
========= =======================================================
|
||||
COLUMN TIME MEASUREMENT
|
||||
========= =======================================================
|
||||
OBJ INST Length of time to instantiate an object
|
||||
OP RUNS Length of time a call to process an operation took
|
||||
OBJ RUNS Length of time a call to process an object event took
|
||||
RETRV DLY Time between an requesting a read and lookup completing
|
||||
RETRIEVLS Time between beginning and end of a retrieval
|
||||
========= =======================================================
|
||||
|
||||
Each row shows the number of events that took a particular range of times.
|
||||
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||
|
||||
|
||||
|
||||
Object List
|
||||
===========
|
||||
|
||||
If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
|
||||
list of all the objects currently allocated and allow them to be viewed
|
||||
through::
|
||||
|
||||
/proc/fs/fscache/objects
|
||||
|
||||
This will look something like::
|
||||
|
||||
[root@andromeda ~]# head /proc/fs/fscache/objects
|
||||
OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
|
||||
======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
|
||||
17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
|
||||
1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
|
||||
|
||||
where the first set of columns before the '|' describe the object:
|
||||
|
||||
======= ===============================================================
|
||||
COLUMN DESCRIPTION
|
||||
======= ===============================================================
|
||||
OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
|
||||
PARENT Debugging ID of parent object
|
||||
STAT Object state
|
||||
CHLDN Number of child objects of this object
|
||||
OPS Number of outstanding operations on this object
|
||||
OOP Number of outstanding child object management operations
|
||||
IPR
|
||||
EX Number of outstanding exclusive operations
|
||||
READS Number of outstanding read operations
|
||||
EM Object's event mask
|
||||
EV Events raised on this object
|
||||
F Object flags
|
||||
S Object work item busy state mask (1:pending 2:running)
|
||||
======= ===============================================================
|
||||
|
||||
and the second set of columns describe the object's cookie, if present:
|
||||
|
||||
================ ======================================================
|
||||
COLUMN DESCRIPTION
|
||||
================ ======================================================
|
||||
NETFS_COOKIE_DEF Name of netfs cookie definition
|
||||
TY Cookie type (IX - index, DT - data, hex - special)
|
||||
FL Cookie flags
|
||||
NETFS_DATA Netfs private data stored in the cookie
|
||||
OBJECT_KEY Object key } 1 column, with separating comma
|
||||
AUX_DATA Object aux data } presence may be configured
|
||||
================ ======================================================
|
||||
|
||||
The data shown may be filtered by attaching the a key to an appropriate keyring
|
||||
before viewing the file. Something like::
|
||||
|
||||
keyctl add user fscache:objlist <restrictions> @s
|
||||
|
||||
where <restrictions> are a selection of the following letters:
|
||||
|
||||
== =========================================================
|
||||
K Show hexdump of object key (don't show if not given)
|
||||
A Show hexdump of object aux data (don't show if not given)
|
||||
== =========================================================
|
||||
|
||||
and the following paired letters:
|
||||
|
||||
== =========================================================
|
||||
C Show objects that have a cookie
|
||||
c Show objects that don't have a cookie
|
||||
B Show objects that are busy
|
||||
b Show objects that aren't busy
|
||||
W Show objects that have pending writes
|
||||
w Show objects that don't have pending writes
|
||||
R Show objects that have outstanding reads
|
||||
r Show objects that don't have outstanding reads
|
||||
S Show objects that have work queued
|
||||
s Show objects that don't have work queued
|
||||
== =========================================================
|
||||
|
||||
If neither side of a letter pair is given, then both are implied. For example:
|
||||
|
||||
keyctl add user fscache:objlist KB @s
|
||||
|
||||
shows objects that are busy, and lists their object keys, but does not dump
|
||||
their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
|
||||
not implied.
|
||||
|
||||
By default all objects and all fields will be shown.
|
||||
|
||||
|
||||
Debugging
|
||||
=========
|
||||
|
||||
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
||||
debugging enabled by adjusting the value in::
|
||||
|
||||
/sys/module/fscache/parameters/debug
|
||||
|
||||
This is a bitmask of debugging streams to enable:
|
||||
|
||||
======= ======= =============================== =======================
|
||||
BIT VALUE STREAM POINT
|
||||
======= ======= =============================== =======================
|
||||
0 1 Cache management Function entry trace
|
||||
1 2 Function exit trace
|
||||
2 4 General
|
||||
3 8 Cookie management Function entry trace
|
||||
4 16 Function exit trace
|
||||
5 32 General
|
||||
6 64 Page handling Function entry trace
|
||||
7 128 Function exit trace
|
||||
8 256 General
|
||||
9 512 Operation management Function entry trace
|
||||
10 1024 Function exit trace
|
||||
11 2048 General
|
||||
======= ======= =============================== =======================
|
||||
|
||||
The appropriate set of values should be OR'd together and the result written to
|
||||
the control file. For example::
|
||||
|
||||
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
||||
|
||||
will turn on all function entry debugging.
|
@ -1,448 +0,0 @@
|
||||
==========================
|
||||
General Filesystem Caching
|
||||
==========================
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
========
|
||||
|
||||
This facility is a general purpose cache for network filesystems, though it
|
||||
could be used for caching other things such as ISO9660 filesystems too.
|
||||
|
||||
FS-Cache mediates between cache backends (such as CacheFS) and network
|
||||
filesystems:
|
||||
|
||||
+---------+
|
||||
| | +--------------+
|
||||
| NFS |--+ | |
|
||||
| | | +-->| CacheFS |
|
||||
+---------+ | +----------+ | | /dev/hda5 |
|
||||
| | | | +--------------+
|
||||
+---------+ +-->| | |
|
||||
| | | |--+
|
||||
| AFS |----->| FS-Cache |
|
||||
| | | |--+
|
||||
+---------+ +-->| | |
|
||||
| | | | +--------------+
|
||||
+---------+ | +----------+ | | |
|
||||
| | | +-->| CacheFiles |
|
||||
| ISOFS |--+ | /var/cache |
|
||||
| | +--------------+
|
||||
+---------+
|
||||
|
||||
Or to look at it another way, FS-Cache is a module that provides a caching
|
||||
facility to a network filesystem such that the cache is transparent to the
|
||||
user:
|
||||
|
||||
+---------+
|
||||
| |
|
||||
| Server |
|
||||
| |
|
||||
+---------+
|
||||
| NETWORK
|
||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
|
||||
| +----------+
|
||||
V | |
|
||||
+---------+ | |
|
||||
| | | |
|
||||
| NFS |----->| FS-Cache |
|
||||
| | | |--+
|
||||
+---------+ | | | +--------------+ +--------------+
|
||||
| | | | | | | |
|
||||
V +----------+ +-->| CacheFiles |-->| Ext3 |
|
||||
+---------+ | /var/cache | | /dev/sda6 |
|
||||
| | +--------------+ +--------------+
|
||||
| VFS | ^ ^
|
||||
| | | |
|
||||
+---------+ +--------------+ |
|
||||
| KERNEL SPACE | |
|
||||
~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
|
||||
| USER SPACE | |
|
||||
V | |
|
||||
+---------+ +--------------+
|
||||
| | | |
|
||||
| Process | | cachefilesd |
|
||||
| | | |
|
||||
+---------+ +--------------+
|
||||
|
||||
|
||||
FS-Cache does not follow the idea of completely loading every netfs file
|
||||
opened in its entirety into a cache before permitting it to be accessed and
|
||||
then serving the pages out of that cache rather than the netfs inode because:
|
||||
|
||||
(1) It must be practical to operate without a cache.
|
||||
|
||||
(2) The size of any accessible file must not be limited to the size of the
|
||||
cache.
|
||||
|
||||
(3) The combined size of all opened files (this includes mapped libraries)
|
||||
must not be limited to the size of the cache.
|
||||
|
||||
(4) The user should not be forced to download an entire file just to do a
|
||||
one-off access of a small portion of it (such as might be done with the
|
||||
"file" program).
|
||||
|
||||
It instead serves the cache out in PAGE_SIZE chunks as and when requested by
|
||||
the netfs('s) using it.
|
||||
|
||||
|
||||
FS-Cache provides the following facilities:
|
||||
|
||||
(1) More than one cache can be used at once. Caches can be selected
|
||||
explicitly by use of tags.
|
||||
|
||||
(2) Caches can be added / removed at any time.
|
||||
|
||||
(3) The netfs is provided with an interface that allows either party to
|
||||
withdraw caching facilities from a file (required for (2)).
|
||||
|
||||
(4) The interface to the netfs returns as few errors as possible, preferring
|
||||
rather to let the netfs remain oblivious.
|
||||
|
||||
(5) Cookies are used to represent indices, files and other objects to the
|
||||
netfs. The simplest cookie is just a NULL pointer - indicating nothing
|
||||
cached there.
|
||||
|
||||
(6) The netfs is allowed to propose - dynamically - any index hierarchy it
|
||||
desires, though it must be aware that the index search function is
|
||||
recursive, stack space is limited, and indices can only be children of
|
||||
indices.
|
||||
|
||||
(7) Data I/O is done direct to and from the netfs's pages. The netfs
|
||||
indicates that page A is at index B of the data-file represented by cookie
|
||||
C, and that it should be read or written. The cache backend may or may
|
||||
not start I/O on that page, but if it does, a netfs callback will be
|
||||
invoked to indicate completion. The I/O may be either synchronous or
|
||||
asynchronous.
|
||||
|
||||
(8) Cookies can be "retired" upon release. At this point FS-Cache will mark
|
||||
them as obsolete and the index hierarchy rooted at that point will get
|
||||
recycled.
|
||||
|
||||
(9) The netfs provides a "match" function for index searches. In addition to
|
||||
saying whether a match was made or not, this can also specify that an
|
||||
entry should be updated or deleted.
|
||||
|
||||
(10) As much as possible is done asynchronously.
|
||||
|
||||
|
||||
FS-Cache maintains a virtual indexing tree in which all indices, files, objects
|
||||
and pages are kept. Bits of this tree may actually reside in one or more
|
||||
caches.
|
||||
|
||||
FSDEF
|
||||
|
|
||||
+------------------------------------+
|
||||
| |
|
||||
NFS AFS
|
||||
| |
|
||||
+--------------------------+ +-----------+
|
||||
| | | |
|
||||
homedir mirror afs.org redhat.com
|
||||
| | |
|
||||
+------------+ +---------------+ +----------+
|
||||
| | | | | |
|
||||
00001 00002 00007 00125 vol00001 vol00002
|
||||
| | | | |
|
||||
+---+---+ +-----+ +---+ +------+------+ +-----+----+
|
||||
| | | | | | | | | | | | |
|
||||
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
|
||||
| |
|
||||
PG0 +-------+
|
||||
| |
|
||||
00001 00003
|
||||
|
|
||||
+---+---+
|
||||
| | |
|
||||
PG0 PG1 PG2
|
||||
|
||||
In the example above, you can see two netfs's being backed: NFS and AFS. These
|
||||
have different index hierarchies:
|
||||
|
||||
(*) The NFS primary index contains per-server indices. Each server index is
|
||||
indexed by NFS file handles to get data file objects. Each data file
|
||||
objects can have an array of pages, but may also have further child
|
||||
objects, such as extended attributes and directory entries. Extended
|
||||
attribute objects themselves have page-array contents.
|
||||
|
||||
(*) The AFS primary index contains per-cell indices. Each cell index contains
|
||||
per-logical-volume indices. Each of volume index contains up to three
|
||||
indices for the read-write, read-only and backup mirrors of those volumes.
|
||||
Each of these contains vnode data file objects, each of which contains an
|
||||
array of pages.
|
||||
|
||||
The very top index is the FS-Cache master index in which individual netfs's
|
||||
have entries.
|
||||
|
||||
Any index object may reside in more than one cache, provided it only has index
|
||||
children. Any index with non-index object children will be assumed to only
|
||||
reside in one cache.
|
||||
|
||||
|
||||
The netfs API to FS-Cache can be found in:
|
||||
|
||||
Documentation/filesystems/caching/netfs-api.txt
|
||||
|
||||
The cache backend API to FS-Cache can be found in:
|
||||
|
||||
Documentation/filesystems/caching/backend-api.txt
|
||||
|
||||
A description of the internal representations and object state machine can be
|
||||
found in:
|
||||
|
||||
Documentation/filesystems/caching/object.txt
|
||||
|
||||
|
||||
=======================
|
||||
STATISTICAL INFORMATION
|
||||
=======================
|
||||
|
||||
If FS-Cache is compiled with the following options enabled:
|
||||
|
||||
CONFIG_FSCACHE_STATS=y
|
||||
CONFIG_FSCACHE_HISTOGRAM=y
|
||||
|
||||
then it will gather certain statistics and display them through a number of
|
||||
proc files.
|
||||
|
||||
(*) /proc/fs/fscache/stats
|
||||
|
||||
This shows counts of a number of events that can happen in FS-Cache:
|
||||
|
||||
CLASS EVENT MEANING
|
||||
======= ======= =======================================================
|
||||
Cookies idx=N Number of index cookies allocated
|
||||
dat=N Number of data storage cookies allocated
|
||||
spc=N Number of special cookies allocated
|
||||
Objects alc=N Number of objects allocated
|
||||
nal=N Number of object allocation failures
|
||||
avl=N Number of objects that reached the available state
|
||||
ded=N Number of objects that reached the dead state
|
||||
ChkAux non=N Number of objects that didn't have a coherency check
|
||||
ok=N Number of objects that passed a coherency check
|
||||
upd=N Number of objects that needed a coherency data update
|
||||
obs=N Number of objects that were declared obsolete
|
||||
Pages mrk=N Number of pages marked as being cached
|
||||
unc=N Number of uncache page requests seen
|
||||
Acquire n=N Number of acquire cookie requests seen
|
||||
nul=N Number of acq reqs given a NULL parent
|
||||
noc=N Number of acq reqs rejected due to no cache available
|
||||
ok=N Number of acq reqs succeeded
|
||||
nbf=N Number of acq reqs rejected due to error
|
||||
oom=N Number of acq reqs failed on ENOMEM
|
||||
Lookups n=N Number of lookup calls made on cache backends
|
||||
neg=N Number of negative lookups made
|
||||
pos=N Number of positive lookups made
|
||||
crt=N Number of objects created by lookup
|
||||
tmo=N Number of lookups timed out and requeued
|
||||
Updates n=N Number of update cookie requests seen
|
||||
nul=N Number of upd reqs given a NULL parent
|
||||
run=N Number of upd reqs granted CPU time
|
||||
Relinqs n=N Number of relinquish cookie requests seen
|
||||
nul=N Number of rlq reqs given a NULL parent
|
||||
wcr=N Number of rlq reqs waited on completion of creation
|
||||
AttrChg n=N Number of attribute changed requests seen
|
||||
ok=N Number of attr changed requests queued
|
||||
nbf=N Number of attr changed rejected -ENOBUFS
|
||||
oom=N Number of attr changed failed -ENOMEM
|
||||
run=N Number of attr changed ops given CPU time
|
||||
Allocs n=N Number of allocation requests seen
|
||||
ok=N Number of successful alloc reqs
|
||||
wt=N Number of alloc reqs that waited on lookup completion
|
||||
nbf=N Number of alloc reqs rejected -ENOBUFS
|
||||
int=N Number of alloc reqs aborted -ERESTARTSYS
|
||||
ops=N Number of alloc reqs submitted
|
||||
owt=N Number of alloc reqs waited for CPU time
|
||||
abt=N Number of alloc reqs aborted due to object death
|
||||
Retrvls n=N Number of retrieval (read) requests seen
|
||||
ok=N Number of successful retr reqs
|
||||
wt=N Number of retr reqs that waited on lookup completion
|
||||
nod=N Number of retr reqs returned -ENODATA
|
||||
nbf=N Number of retr reqs rejected -ENOBUFS
|
||||
int=N Number of retr reqs aborted -ERESTARTSYS
|
||||
oom=N Number of retr reqs failed -ENOMEM
|
||||
ops=N Number of retr reqs submitted
|
||||
owt=N Number of retr reqs waited for CPU time
|
||||
abt=N Number of retr reqs aborted due to object death
|
||||
Stores n=N Number of storage (write) requests seen
|
||||
ok=N Number of successful store reqs
|
||||
agn=N Number of store reqs on a page already pending storage
|
||||
nbf=N Number of store reqs rejected -ENOBUFS
|
||||
oom=N Number of store reqs failed -ENOMEM
|
||||
ops=N Number of store reqs submitted
|
||||
run=N Number of store reqs granted CPU time
|
||||
pgs=N Number of pages given store req processing time
|
||||
rxd=N Number of store reqs deleted from tracking tree
|
||||
olm=N Number of store reqs over store limit
|
||||
VmScan nos=N Number of release reqs against pages with no pending store
|
||||
gon=N Number of release reqs against pages stored by time lock granted
|
||||
bsy=N Number of release reqs ignored due to in-progress store
|
||||
can=N Number of page stores cancelled due to release req
|
||||
Ops pend=N Number of times async ops added to pending queues
|
||||
run=N Number of times async ops given CPU time
|
||||
enq=N Number of times async ops queued for processing
|
||||
can=N Number of async ops cancelled
|
||||
rej=N Number of async ops rejected due to object lookup/create failure
|
||||
ini=N Number of async ops initialised
|
||||
dfr=N Number of async ops queued for deferred release
|
||||
rel=N Number of async ops released (should equal ini=N when idle)
|
||||
gc=N Number of deferred-release async ops garbage collected
|
||||
CacheOp alo=N Number of in-progress alloc_object() cache ops
|
||||
luo=N Number of in-progress lookup_object() cache ops
|
||||
luc=N Number of in-progress lookup_complete() cache ops
|
||||
gro=N Number of in-progress grab_object() cache ops
|
||||
upo=N Number of in-progress update_object() cache ops
|
||||
dro=N Number of in-progress drop_object() cache ops
|
||||
pto=N Number of in-progress put_object() cache ops
|
||||
syn=N Number of in-progress sync_cache() cache ops
|
||||
atc=N Number of in-progress attr_changed() cache ops
|
||||
rap=N Number of in-progress read_or_alloc_page() cache ops
|
||||
ras=N Number of in-progress read_or_alloc_pages() cache ops
|
||||
alp=N Number of in-progress allocate_page() cache ops
|
||||
als=N Number of in-progress allocate_pages() cache ops
|
||||
wrp=N Number of in-progress write_page() cache ops
|
||||
ucp=N Number of in-progress uncache_page() cache ops
|
||||
dsp=N Number of in-progress dissociate_pages() cache ops
|
||||
CacheEv nsp=N Number of object lookups/creations rejected due to lack of space
|
||||
stl=N Number of stale objects deleted
|
||||
rtr=N Number of objects retired when relinquished
|
||||
cul=N Number of objects culled
|
||||
|
||||
|
||||
(*) /proc/fs/fscache/histogram
|
||||
|
||||
cat /proc/fs/fscache/histogram
|
||||
JIFS SECS OBJ INST OP RUNS OBJ RUNS RETRV DLY RETRIEVLS
|
||||
===== ===== ========= ========= ========= ========= =========
|
||||
|
||||
This shows the breakdown of the number of times each amount of time
|
||||
between 0 jiffies and HZ-1 jiffies a variety of tasks took to run. The
|
||||
columns are as follows:
|
||||
|
||||
COLUMN TIME MEASUREMENT
|
||||
======= =======================================================
|
||||
OBJ INST Length of time to instantiate an object
|
||||
OP RUNS Length of time a call to process an operation took
|
||||
OBJ RUNS Length of time a call to process an object event took
|
||||
RETRV DLY Time between an requesting a read and lookup completing
|
||||
RETRIEVLS Time between beginning and end of a retrieval
|
||||
|
||||
Each row shows the number of events that took a particular range of times.
|
||||
Each step is 1 jiffy in size. The JIFS column indicates the particular
|
||||
jiffy range covered, and the SECS field the equivalent number of seconds.
|
||||
|
||||
|
||||
===========
|
||||
OBJECT LIST
|
||||
===========
|
||||
|
||||
If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a
|
||||
list of all the objects currently allocated and allow them to be viewed
|
||||
through:
|
||||
|
||||
/proc/fs/fscache/objects
|
||||
|
||||
This will look something like:
|
||||
|
||||
[root@andromeda ~]# head /proc/fs/fscache/objects
|
||||
OBJECT PARENT STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA OBJECT_KEY, AUX_DATA
|
||||
======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================
|
||||
17e4b 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a
|
||||
1693a 2 ACTV 0 0 0 0 0 0 7b 4 0 0 | NFS.fh DT 0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a
|
||||
|
||||
where the first set of columns before the '|' describe the object:
|
||||
|
||||
COLUMN DESCRIPTION
|
||||
======= ===============================================================
|
||||
OBJECT Object debugging ID (appears as OBJ%x in some debug messages)
|
||||
PARENT Debugging ID of parent object
|
||||
STAT Object state
|
||||
CHLDN Number of child objects of this object
|
||||
OPS Number of outstanding operations on this object
|
||||
OOP Number of outstanding child object management operations
|
||||
IPR
|
||||
EX Number of outstanding exclusive operations
|
||||
READS Number of outstanding read operations
|
||||
EM Object's event mask
|
||||
EV Events raised on this object
|
||||
F Object flags
|
||||
S Object work item busy state mask (1:pending 2:running)
|
||||
|
||||
and the second set of columns describe the object's cookie, if present:
|
||||
|
||||
COLUMN DESCRIPTION
|
||||
=============== =======================================================
|
||||
NETFS_COOKIE_DEF Name of netfs cookie definition
|
||||
TY Cookie type (IX - index, DT - data, hex - special)
|
||||
FL Cookie flags
|
||||
NETFS_DATA Netfs private data stored in the cookie
|
||||
OBJECT_KEY Object key } 1 column, with separating comma
|
||||
AUX_DATA Object aux data } presence may be configured
|
||||
|
||||
The data shown may be filtered by attaching the a key to an appropriate keyring
|
||||
before viewing the file. Something like:
|
||||
|
||||
keyctl add user fscache:objlist <restrictions> @s
|
||||
|
||||
where <restrictions> are a selection of the following letters:
|
||||
|
||||
K Show hexdump of object key (don't show if not given)
|
||||
A Show hexdump of object aux data (don't show if not given)
|
||||
|
||||
and the following paired letters:
|
||||
|
||||
C Show objects that have a cookie
|
||||
c Show objects that don't have a cookie
|
||||
B Show objects that are busy
|
||||
b Show objects that aren't busy
|
||||
W Show objects that have pending writes
|
||||
w Show objects that don't have pending writes
|
||||
R Show objects that have outstanding reads
|
||||
r Show objects that don't have outstanding reads
|
||||
S Show objects that have work queued
|
||||
s Show objects that don't have work queued
|
||||
|
||||
If neither side of a letter pair is given, then both are implied. For example:
|
||||
|
||||
keyctl add user fscache:objlist KB @s
|
||||
|
||||
shows objects that are busy, and lists their object keys, but does not dump
|
||||
their auxiliary data. It also implies "CcWwRrSs", but as 'B' is given, 'b' is
|
||||
not implied.
|
||||
|
||||
By default all objects and all fields will be shown.
|
||||
|
||||
|
||||
=========
|
||||
DEBUGGING
|
||||
=========
|
||||
|
||||
If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
|
||||
debugging enabled by adjusting the value in:
|
||||
|
||||
/sys/module/fscache/parameters/debug
|
||||
|
||||
This is a bitmask of debugging streams to enable:
|
||||
|
||||
BIT VALUE STREAM POINT
|
||||
======= ======= =============================== =======================
|
||||
0 1 Cache management Function entry trace
|
||||
1 2 Function exit trace
|
||||
2 4 General
|
||||
3 8 Cookie management Function entry trace
|
||||
4 16 Function exit trace
|
||||
5 32 General
|
||||
6 64 Page handling Function entry trace
|
||||
7 128 Function exit trace
|
||||
8 256 General
|
||||
9 512 Operation management Function entry trace
|
||||
10 1024 Function exit trace
|
||||
11 2048 General
|
||||
|
||||
The appropriate set of values should be OR'd together and the result written to
|
||||
the control file. For example:
|
||||
|
||||
echo $((1|8|64)) >/sys/module/fscache/parameters/debug
|
||||
|
||||
will turn on all function entry debugging.
|
14
Documentation/filesystems/caching/index.rst
Normal file
14
Documentation/filesystems/caching/index.rst
Normal file
@ -0,0 +1,14 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Filesystem Caching
|
||||
==================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
fscache
|
||||
object
|
||||
backend-api
|
||||
cachefiles
|
||||
netfs-api
|
||||
operations
|
@ -1,6 +1,8 @@
|
||||
===============================
|
||||
FS-CACHE NETWORK FILESYSTEM API
|
||||
===============================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============================
|
||||
FS-Cache Network Filesystem API
|
||||
===============================
|
||||
|
||||
There's an API by which a network filesystem can make use of the FS-Cache
|
||||
facilities. This is based around a number of principles:
|
||||
@ -19,7 +21,7 @@ facilities. This is based around a number of principles:
|
||||
|
||||
This API is declared in <linux/fscache.h>.
|
||||
|
||||
This document contains the following sections:
|
||||
.. This document contains the following sections:
|
||||
|
||||
(1) Network filesystem definition
|
||||
(2) Index definition
|
||||
@ -41,12 +43,11 @@ This document contains the following sections:
|
||||
(18) FS-Cache specific page flags.
|
||||
|
||||
|
||||
=============================
|
||||
NETWORK FILESYSTEM DEFINITION
|
||||
Network Filesystem Definition
|
||||
=============================
|
||||
|
||||
FS-Cache needs a description of the network filesystem. This is specified
|
||||
using a record of the following structure:
|
||||
using a record of the following structure::
|
||||
|
||||
struct fscache_netfs {
|
||||
uint32_t version;
|
||||
@ -71,7 +72,7 @@ The fields are:
|
||||
another parameter passed into the registration function.
|
||||
|
||||
For example, kAFS (linux/fs/afs/) uses the following definitions to describe
|
||||
itself:
|
||||
itself::
|
||||
|
||||
struct fscache_netfs afs_cache_netfs = {
|
||||
.version = 0,
|
||||
@ -79,8 +80,7 @@ itself:
|
||||
};
|
||||
|
||||
|
||||
================
|
||||
INDEX DEFINITION
|
||||
Index Definition
|
||||
================
|
||||
|
||||
Indices are used for two purposes:
|
||||
@ -114,11 +114,10 @@ There are some limits on indices:
|
||||
function is recursive. Too many layers will run the kernel out of stack.
|
||||
|
||||
|
||||
=================
|
||||
OBJECT DEFINITION
|
||||
Object Definition
|
||||
=================
|
||||
|
||||
To define an object, a structure of the following type should be filled out:
|
||||
To define an object, a structure of the following type should be filled out::
|
||||
|
||||
struct fscache_cookie_def
|
||||
{
|
||||
@ -149,16 +148,13 @@ This has the following fields:
|
||||
|
||||
This is one of the following values:
|
||||
|
||||
(*) FSCACHE_COOKIE_TYPE_INDEX
|
||||
|
||||
FSCACHE_COOKIE_TYPE_INDEX
|
||||
This defines an index, which is a special FS-Cache type.
|
||||
|
||||
(*) FSCACHE_COOKIE_TYPE_DATAFILE
|
||||
|
||||
FSCACHE_COOKIE_TYPE_DATAFILE
|
||||
This defines an ordinary data file.
|
||||
|
||||
(*) Any other value between 2 and 255
|
||||
|
||||
Any other value between 2 and 255
|
||||
This defines an extraordinary object such as an XATTR.
|
||||
|
||||
(2) The name of the object type (NUL terminated unless all 16 chars are used)
|
||||
@ -192,9 +188,14 @@ This has the following fields:
|
||||
|
||||
If present, the function should return one of the following values:
|
||||
|
||||
(*) FSCACHE_CHECKAUX_OKAY - the entry is okay as is
|
||||
(*) FSCACHE_CHECKAUX_NEEDS_UPDATE - the entry requires update
|
||||
(*) FSCACHE_CHECKAUX_OBSOLETE - the entry should be deleted
|
||||
FSCACHE_CHECKAUX_OKAY
|
||||
- the entry is okay as is
|
||||
|
||||
FSCACHE_CHECKAUX_NEEDS_UPDATE
|
||||
- the entry requires update
|
||||
|
||||
FSCACHE_CHECKAUX_OBSOLETE
|
||||
- the entry should be deleted
|
||||
|
||||
This function can also be used to extract data from the auxiliary data in
|
||||
the cache and copy it into the netfs's structures.
|
||||
@ -236,32 +237,30 @@ This has the following fields:
|
||||
This function is not required for indices as they're not permitted data.
|
||||
|
||||
|
||||
===================================
|
||||
NETWORK FILESYSTEM (UN)REGISTRATION
|
||||
Network Filesystem (Un)registration
|
||||
===================================
|
||||
|
||||
The first step is to declare the network filesystem to the cache. This also
|
||||
involves specifying the layout of the primary index (for AFS, this would be the
|
||||
"cell" level).
|
||||
|
||||
The registration function is:
|
||||
The registration function is::
|
||||
|
||||
int fscache_register_netfs(struct fscache_netfs *netfs);
|
||||
|
||||
It just takes a pointer to the netfs definition. It returns 0 or an error as
|
||||
appropriate.
|
||||
|
||||
For kAFS, registration is done as follows:
|
||||
For kAFS, registration is done as follows::
|
||||
|
||||
ret = fscache_register_netfs(&afs_cache_netfs);
|
||||
|
||||
The last step is, of course, unregistration:
|
||||
The last step is, of course, unregistration::
|
||||
|
||||
void fscache_unregister_netfs(struct fscache_netfs *netfs);
|
||||
|
||||
|
||||
================
|
||||
CACHE TAG LOOKUP
|
||||
Cache Tag Lookup
|
||||
================
|
||||
|
||||
FS-Cache permits the use of more than one cache. To permit particular index
|
||||
@ -270,7 +269,7 @@ representation tags. This step is optional; it can be left entirely up to
|
||||
FS-Cache as to which cache should be used. The problem with doing that is that
|
||||
FS-Cache will always pick the first cache that was registered.
|
||||
|
||||
To get the representation for a named tag:
|
||||
To get the representation for a named tag::
|
||||
|
||||
struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
|
||||
|
||||
@ -278,7 +277,7 @@ This takes a text string as the name and returns a representation of a tag. It
|
||||
will never return an error. It may return a dummy tag, however, if it runs out
|
||||
of memory; this will inhibit caching with this tag.
|
||||
|
||||
Any representation so obtained must be released by passing it to this function:
|
||||
Any representation so obtained must be released by passing it to this function::
|
||||
|
||||
void fscache_release_cache_tag(struct fscache_cache_tag *tag);
|
||||
|
||||
@ -286,13 +285,12 @@ The tag will be retrieved by FS-Cache when it calls the object definition
|
||||
operation select_cache().
|
||||
|
||||
|
||||
==================
|
||||
INDEX REGISTRATION
|
||||
Index Registration
|
||||
==================
|
||||
|
||||
The third step is to inform FS-Cache about part of an index hierarchy that can
|
||||
be used to locate files. This is done by requesting a cookie for each index in
|
||||
the path to the file:
|
||||
the path to the file::
|
||||
|
||||
struct fscache_cookie *
|
||||
fscache_acquire_cookie(struct fscache_cookie *parent,
|
||||
@ -339,7 +337,7 @@ must be enabled to do anything with it. A disabled cookie can be enabled by
|
||||
calling fscache_enable_cookie() (see below).
|
||||
|
||||
For example, with AFS, a cell would be added to the primary index. This index
|
||||
entry would have a dependent inode containing volume mappings within this cell:
|
||||
entry would have a dependent inode containing volume mappings within this cell::
|
||||
|
||||
cell->cache =
|
||||
fscache_acquire_cookie(afs_cache_netfs.primary_index,
|
||||
@ -349,7 +347,7 @@ entry would have a dependent inode containing volume mappings within this cell:
|
||||
cell, 0, true);
|
||||
|
||||
And then a particular volume could be added to that index by ID, creating
|
||||
another index for vnodes (AFS inode equivalents):
|
||||
another index for vnodes (AFS inode equivalents)::
|
||||
|
||||
volume->cache =
|
||||
fscache_acquire_cookie(volume->cell->cache,
|
||||
@ -359,13 +357,12 @@ another index for vnodes (AFS inode equivalents):
|
||||
volume, 0, true);
|
||||
|
||||
|
||||
======================
|
||||
DATA FILE REGISTRATION
|
||||
Data File Registration
|
||||
======================
|
||||
|
||||
The fourth step is to request a data file be created in the cache. This is
|
||||
identical to index cookie acquisition. The only difference is that the type in
|
||||
the object definition should be something other than index type.
|
||||
the object definition should be something other than index type::
|
||||
|
||||
vnode->cache =
|
||||
fscache_acquire_cookie(volume->cache,
|
||||
@ -375,15 +372,14 @@ the object definition should be something other than index type.
|
||||
vnode, vnode->status.size, true);
|
||||
|
||||
|
||||
=================================
|
||||
MISCELLANEOUS OBJECT REGISTRATION
|
||||
Miscellaneous Object Registration
|
||||
=================================
|
||||
|
||||
An optional step is to request an object of miscellaneous type be created in
|
||||
the cache. This is almost identical to index cookie acquisition. The only
|
||||
difference is that the type in the object definition should be something other
|
||||
than index type. While the parent object could be an index, it's more likely
|
||||
it would be some other type of object such as a data file.
|
||||
it would be some other type of object such as a data file::
|
||||
|
||||
xattr->cache =
|
||||
fscache_acquire_cookie(vnode->cache,
|
||||
@ -396,13 +392,12 @@ Miscellaneous objects might be used to store extended attributes or directory
|
||||
entries for example.
|
||||
|
||||
|
||||
==========================
|
||||
SETTING THE DATA FILE SIZE
|
||||
Setting the Data File Size
|
||||
==========================
|
||||
|
||||
The fifth step is to set the physical attributes of the file, such as its size.
|
||||
This doesn't automatically reserve any space in the cache, but permits the
|
||||
cache to adjust its metadata for data tracking appropriately:
|
||||
cache to adjust its metadata for data tracking appropriately::
|
||||
|
||||
int fscache_attr_changed(struct fscache_cookie *cookie);
|
||||
|
||||
@ -417,8 +412,7 @@ some point in the future, and as such, it may happen after the function returns
|
||||
to the caller. The attribute adjustment excludes read and write operations.
|
||||
|
||||
|
||||
=====================
|
||||
PAGE ALLOC/READ/WRITE
|
||||
Page alloc/read/write
|
||||
=====================
|
||||
|
||||
And the sixth step is to store and retrieve pages in the cache. There are
|
||||
@ -441,7 +435,7 @@ PAGE READ
|
||||
|
||||
Firstly, the netfs should ask FS-Cache to examine the caches and read the
|
||||
contents cached for a particular page of a particular file if present, or else
|
||||
allocate space to store the contents if not:
|
||||
allocate space to store the contents if not::
|
||||
|
||||
typedef
|
||||
void (*fscache_rw_complete_t)(struct page *page,
|
||||
@ -474,14 +468,14 @@ Else if there's a copy of the page resident in the cache:
|
||||
|
||||
(4) When the read is complete, end_io_func() will be invoked with:
|
||||
|
||||
(*) The netfs data supplied when the cookie was created.
|
||||
* The netfs data supplied when the cookie was created.
|
||||
|
||||
(*) The page descriptor.
|
||||
* The page descriptor.
|
||||
|
||||
(*) The context argument passed to the above function. This will be
|
||||
* The context argument passed to the above function. This will be
|
||||
maintained with the get_context/put_context functions mentioned above.
|
||||
|
||||
(*) An argument that's 0 on success or negative for an error code.
|
||||
* An argument that's 0 on success or negative for an error code.
|
||||
|
||||
If an error occurs, it should be assumed that the page contains no usable
|
||||
data. fscache_readpages_cancel() may need to be called.
|
||||
@ -504,11 +498,11 @@ This function may also return -ENOMEM or -EINTR, in which case it won't have
|
||||
read any data from the cache.
|
||||
|
||||
|
||||
PAGE ALLOCATE
|
||||
Page Allocate
|
||||
-------------
|
||||
|
||||
Alternatively, if there's not expected to be any data in the cache for a page
|
||||
because the file has been extended, a block can simply be allocated instead:
|
||||
because the file has been extended, a block can simply be allocated instead::
|
||||
|
||||
int fscache_alloc_page(struct fscache_cookie *cookie,
|
||||
struct page *page,
|
||||
@ -523,12 +517,12 @@ The mark_pages_cached() cookie operation will be called on the page if
|
||||
successful.
|
||||
|
||||
|
||||
PAGE WRITE
|
||||
Page Write
|
||||
----------
|
||||
|
||||
Secondly, if the netfs changes the contents of the page (either due to an
|
||||
initial download or if a user performs a write), then the page should be
|
||||
written back to the cache:
|
||||
written back to the cache::
|
||||
|
||||
int fscache_write_page(struct fscache_cookie *cookie,
|
||||
struct page *page,
|
||||
@ -566,11 +560,11 @@ place if unforeseen circumstances arose (such as a disk error).
|
||||
Writing takes place asynchronously.
|
||||
|
||||
|
||||
MULTIPLE PAGE READ
|
||||
Multiple Page Read
|
||||
------------------
|
||||
|
||||
A facility is provided to read several pages at once, as requested by the
|
||||
readpages() address space operation:
|
||||
readpages() address space operation::
|
||||
|
||||
int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
|
||||
struct address_space *mapping,
|
||||
@ -598,7 +592,7 @@ This works in a similar way to fscache_read_or_alloc_page(), except:
|
||||
be returned.
|
||||
|
||||
Otherwise, if all pages had reads dispatched, then 0 will be returned, the
|
||||
list will be empty and *nr_pages will be 0.
|
||||
list will be empty and ``*nr_pages`` will be 0.
|
||||
|
||||
(4) end_io_func will be called once for each page being read as the reads
|
||||
complete. It will be called in process context if error != 0, but it may
|
||||
@ -609,13 +603,13 @@ some of the pages being read and some being allocated. Those pages will have
|
||||
been marked appropriately and will need uncaching.
|
||||
|
||||
|
||||
CANCELLATION OF UNREAD PAGES
|
||||
Cancellation of Unread Pages
|
||||
----------------------------
|
||||
|
||||
If one or more pages are passed to fscache_read_or_alloc_pages() but not then
|
||||
read from the cache and also not read from the underlying filesystem then
|
||||
those pages will need to have any marks and reservations removed. This can be
|
||||
done by calling:
|
||||
done by calling::
|
||||
|
||||
void fscache_readpages_cancel(struct fscache_cookie *cookie,
|
||||
struct list_head *pages);
|
||||
@ -625,11 +619,10 @@ fscache_read_or_alloc_pages(). Every page in the pages list will be examined
|
||||
and any that have PG_fscache set will be uncached.
|
||||
|
||||
|
||||
==============
|
||||
PAGE UNCACHING
|
||||
Page Uncaching
|
||||
==============
|
||||
|
||||
To uncache a page, this function should be called:
|
||||
To uncache a page, this function should be called::
|
||||
|
||||
void fscache_uncache_page(struct fscache_cookie *cookie,
|
||||
struct page *page);
|
||||
@ -644,12 +637,12 @@ data file must be retired (see the relinquish cookie function below).
|
||||
|
||||
Furthermore, note that this does not cancel the asynchronous read or write
|
||||
operation started by the read/alloc and write functions, so the page
|
||||
invalidation functions must use:
|
||||
invalidation functions must use::
|
||||
|
||||
bool fscache_check_page_write(struct fscache_cookie *cookie,
|
||||
struct page *page);
|
||||
|
||||
to see if a page is being written to the cache, and:
|
||||
to see if a page is being written to the cache, and::
|
||||
|
||||
void fscache_wait_on_page_write(struct fscache_cookie *cookie,
|
||||
struct page *page);
|
||||
@ -660,7 +653,7 @@ to wait for it to finish if it is.
|
||||
When releasepage() is being implemented, a special FS-Cache function exists to
|
||||
manage the heuristics of coping with vmscan trying to eject pages, which may
|
||||
conflict with the cache trying to write pages to the cache (which may itself
|
||||
need to allocate memory):
|
||||
need to allocate memory)::
|
||||
|
||||
bool fscache_maybe_release_page(struct fscache_cookie *cookie,
|
||||
struct page *page,
|
||||
@ -676,12 +669,12 @@ storage request to complete, or it may attempt to cancel the storage request -
|
||||
in which case the page will not be stored in the cache this time.
|
||||
|
||||
|
||||
BULK INODE PAGE UNCACHE
|
||||
Bulk Image Page Uncache
|
||||
-----------------------
|
||||
|
||||
A convenience routine is provided to perform an uncache on all the pages
|
||||
attached to an inode. This assumes that the pages on the inode correspond on a
|
||||
1:1 basis with the pages in the cache.
|
||||
1:1 basis with the pages in the cache::
|
||||
|
||||
void fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
|
||||
struct inode *inode);
|
||||
@ -692,12 +685,11 @@ written to the cache and for the cache to finish with the page generally. No
|
||||
error is returned.
|
||||
|
||||
|
||||
===============================
|
||||
INDEX AND DATA FILE CONSISTENCY
|
||||
Index and Data File consistency
|
||||
===============================
|
||||
|
||||
To find out whether auxiliary data for an object is up to data within the
|
||||
cache, the following function can be called:
|
||||
cache, the following function can be called::
|
||||
|
||||
int fscache_check_consistency(struct fscache_cookie *cookie,
|
||||
const void *aux_data);
|
||||
@ -708,7 +700,7 @@ data buffer first. It returns 0 if it is and -ESTALE if it isn't; it may also
|
||||
return -ENOMEM and -ERESTARTSYS.
|
||||
|
||||
To request an update of the index data for an index or other object, the
|
||||
following function should be called:
|
||||
following function should be called::
|
||||
|
||||
void fscache_update_cookie(struct fscache_cookie *cookie,
|
||||
const void *aux_data);
|
||||
@ -721,8 +713,7 @@ Note that partial updates may happen automatically at other times, such as when
|
||||
data blocks are added to a data file object.
|
||||
|
||||
|
||||
=================
|
||||
COOKIE ENABLEMENT
|
||||
Cookie Enablement
|
||||
=================
|
||||
|
||||
Cookies exist in one of two states: enabled and disabled. If a cookie is
|
||||
@ -731,7 +722,7 @@ invalidate its state; allocate, read or write backing pages - though it is
|
||||
still possible to uncache pages and relinquish the cookie.
|
||||
|
||||
The initial enablement state is set by fscache_acquire_cookie(), but the cookie
|
||||
can be enabled or disabled later. To disable a cookie, call:
|
||||
can be enabled or disabled later. To disable a cookie, call::
|
||||
|
||||
void fscache_disable_cookie(struct fscache_cookie *cookie,
|
||||
const void *aux_data,
|
||||
@ -746,7 +737,7 @@ All possible failures are handled internally. The caller should consider
|
||||
calling fscache_uncache_all_inode_pages() afterwards to make sure all page
|
||||
markings are cleared up.
|
||||
|
||||
Cookies can be enabled or reenabled with:
|
||||
Cookies can be enabled or reenabled with::
|
||||
|
||||
void fscache_enable_cookie(struct fscache_cookie *cookie,
|
||||
const void *aux_data,
|
||||
@ -771,13 +762,12 @@ In both cases, the cookie's auxiliary data buffer is updated from aux_data if
|
||||
that is non-NULL inside the enablement lock before proceeding.
|
||||
|
||||
|
||||
===============================
|
||||
MISCELLANEOUS COOKIE OPERATIONS
|
||||
Miscellaneous Cookie operations
|
||||
===============================
|
||||
|
||||
There are a number of operations that can be used to control cookies:
|
||||
|
||||
(*) Cookie pinning:
|
||||
* Cookie pinning::
|
||||
|
||||
int fscache_pin_cookie(struct fscache_cookie *cookie);
|
||||
void fscache_unpin_cookie(struct fscache_cookie *cookie);
|
||||
@ -790,7 +780,7 @@ There are a number of operations that can be used to control cookies:
|
||||
-ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
|
||||
-EIO if there's any other problem.
|
||||
|
||||
(*) Data space reservation:
|
||||
* Data space reservation::
|
||||
|
||||
int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
|
||||
|
||||
@ -809,11 +799,10 @@ There are a number of operations that can be used to control cookies:
|
||||
make space if it's not in use.
|
||||
|
||||
|
||||
=====================
|
||||
COOKIE UNREGISTRATION
|
||||
Cookie Unregistration
|
||||
=====================
|
||||
|
||||
To get rid of a cookie, this function should be called.
|
||||
To get rid of a cookie, this function should be called::
|
||||
|
||||
void fscache_relinquish_cookie(struct fscache_cookie *cookie,
|
||||
const void *aux_data,
|
||||
@ -835,16 +824,14 @@ the cookies for "child" indices, objects and pages have been relinquished
|
||||
first.
|
||||
|
||||
|
||||
==================
|
||||
INDEX INVALIDATION
|
||||
Index Invalidation
|
||||
==================
|
||||
|
||||
There is no direct way to invalidate an index subtree. To do this, the caller
|
||||
should relinquish and retire the cookie they have, and then acquire a new one.
|
||||
|
||||
|
||||
======================
|
||||
DATA FILE INVALIDATION
|
||||
Data File Invalidation
|
||||
======================
|
||||
|
||||
Sometimes it will be necessary to invalidate an object that contains data.
|
||||
@ -853,7 +840,7 @@ change - at which point the netfs has to throw away all the state it had for an
|
||||
inode and reload from the server.
|
||||
|
||||
To indicate that a cache object should be invalidated, the following function
|
||||
can be called:
|
||||
can be called::
|
||||
|
||||
void fscache_invalidate(struct fscache_cookie *cookie);
|
||||
|
||||
@ -868,13 +855,12 @@ auxiliary data update operation as it is very likely these will have changed.
|
||||
|
||||
Using the following function, the netfs can wait for the invalidation operation
|
||||
to have reached a point at which it can start submitting ordinary operations
|
||||
once again:
|
||||
once again::
|
||||
|
||||
void fscache_wait_on_invalidate(struct fscache_cookie *cookie);
|
||||
|
||||
|
||||
===========================
|
||||
FS-CACHE SPECIFIC PAGE FLAG
|
||||
FS-cache Specific Page Flag
|
||||
===========================
|
||||
|
||||
FS-Cache makes use of a page flag, PG_private_2, for its own purpose. This is
|
||||
@ -898,7 +884,7 @@ was given under certain circumstances.
|
||||
This bit does not overlap with such as PG_private. This means that FS-Cache
|
||||
can be used with a filesystem that uses the block buffering code.
|
||||
|
||||
There are a number of operations defined on this flag:
|
||||
There are a number of operations defined on this flag::
|
||||
|
||||
int PageFsCache(struct page *page);
|
||||
void SetPageFsCache(struct page *page)
|
@ -1,10 +1,12 @@
|
||||
====================================================
|
||||
IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
|
||||
====================================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
====================================================
|
||||
In-Kernel Cache Object Representation and Management
|
||||
====================================================
|
||||
|
||||
By: David Howells <dhowells@redhat.com>
|
||||
|
||||
Contents:
|
||||
.. Contents:
|
||||
|
||||
(*) Representation
|
||||
|
||||
@ -18,8 +20,7 @@ Contents:
|
||||
(*) The set of events.
|
||||
|
||||
|
||||
==============
|
||||
REPRESENTATION
|
||||
Representation
|
||||
==============
|
||||
|
||||
FS-Cache maintains an in-kernel representation of each object that a netfs is
|
||||
@ -38,7 +39,7 @@ or even by no objects (it may not be cached).
|
||||
|
||||
Furthermore, both cookies and objects are hierarchical. The two hierarchies
|
||||
correspond, but the cookies tree is a superset of the union of the object trees
|
||||
of multiple caches:
|
||||
of multiple caches::
|
||||
|
||||
NETFS INDEX TREE : CACHE 1 : CACHE 2
|
||||
: :
|
||||
@ -89,8 +90,7 @@ pointers to the cookies. The cookies themselves and any objects attached to
|
||||
those cookies are hidden from it.
|
||||
|
||||
|
||||
===============================
|
||||
OBJECT MANAGEMENT STATE MACHINE
|
||||
Object Management State Machine
|
||||
===============================
|
||||
|
||||
Within FS-Cache, each active object is managed by its own individual state
|
||||
@ -124,7 +124,7 @@ is not masked, the object will be queued for processing (by calling
|
||||
fscache_enqueue_object()).
|
||||
|
||||
|
||||
PROVISION OF CPU TIME
|
||||
Provision of CPU Time
|
||||
---------------------
|
||||
|
||||
The work to be done by the various states was given CPU time by the threads of
|
||||
@ -141,7 +141,7 @@ because:
|
||||
workqueues don't necessarily have the right numbers of threads.
|
||||
|
||||
|
||||
LOCKING SIMPLIFICATION
|
||||
Locking Simplification
|
||||
----------------------
|
||||
|
||||
Because only one worker thread may be operating on any particular object's
|
||||
@ -151,8 +151,7 @@ from the cache backend's representation (fscache_object) - which may be
|
||||
requested from either end.
|
||||
|
||||
|
||||
=================
|
||||
THE SET OF STATES
|
||||
The Set of States
|
||||
=================
|
||||
|
||||
The object state machine has a set of states that it can be in. There are
|
||||
@ -275,19 +274,17 @@ memory and potentially deletes stuff from disk:
|
||||
this state.
|
||||
|
||||
|
||||
THE SET OF EVENTS
|
||||
The Set of Events
|
||||
-----------------
|
||||
|
||||
There are a number of events that can be raised to an object state machine:
|
||||
|
||||
(*) FSCACHE_OBJECT_EV_UPDATE
|
||||
|
||||
FSCACHE_OBJECT_EV_UPDATE
|
||||
The netfs requested that an object be updated. The state machine will ask
|
||||
the cache backend to update the object, and the cache backend will ask the
|
||||
netfs for details of the change through its cookie definition ops.
|
||||
|
||||
(*) FSCACHE_OBJECT_EV_CLEARED
|
||||
|
||||
FSCACHE_OBJECT_EV_CLEARED
|
||||
This is signalled in two circumstances:
|
||||
|
||||
(a) when an object's last child object is dropped and
|
||||
@ -296,20 +293,16 @@ There are a number of events that can be raised to an object state machine:
|
||||
|
||||
This is used to proceed from the dying state.
|
||||
|
||||
(*) FSCACHE_OBJECT_EV_ERROR
|
||||
|
||||
FSCACHE_OBJECT_EV_ERROR
|
||||
This is signalled when an I/O error occurs during the processing of some
|
||||
object.
|
||||
|
||||
(*) FSCACHE_OBJECT_EV_RELEASE
|
||||
(*) FSCACHE_OBJECT_EV_RETIRE
|
||||
|
||||
FSCACHE_OBJECT_EV_RELEASE, FSCACHE_OBJECT_EV_RETIRE
|
||||
These are signalled when the netfs relinquishes a cookie it was using.
|
||||
The event selected depends on whether the netfs asks for the backing
|
||||
object to be retired (deleted) or retained.
|
||||
|
||||
(*) FSCACHE_OBJECT_EV_WITHDRAW
|
||||
|
||||
FSCACHE_OBJECT_EV_WITHDRAW
|
||||
This is signalled when the cache backend wants to withdraw an object.
|
||||
This means that the object will have to be detached from the netfs's
|
||||
cookie.
|
@ -1,10 +1,12 @@
|
||||
================================
|
||||
ASYNCHRONOUS OPERATIONS HANDLING
|
||||
================================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
================================
|
||||
Asynchronous Operations Handling
|
||||
================================
|
||||
|
||||
By: David Howells <dhowells@redhat.com>
|
||||
|
||||
Contents:
|
||||
.. Contents:
|
||||
|
||||
(*) Overview.
|
||||
|
||||
@ -17,8 +19,7 @@ Contents:
|
||||
(*) Asynchronous callback.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
Overview
|
||||
========
|
||||
|
||||
FS-Cache has an asynchronous operations handling facility that it uses for its
|
||||
@ -33,11 +34,10 @@ backend for completion.
|
||||
To make use of this facility, <linux/fscache-cache.h> should be #included.
|
||||
|
||||
|
||||
===============================
|
||||
OPERATION RECORD INITIALISATION
|
||||
Operation Record Initialisation
|
||||
===============================
|
||||
|
||||
An operation is recorded in an fscache_operation struct:
|
||||
An operation is recorded in an fscache_operation struct::
|
||||
|
||||
struct fscache_operation {
|
||||
union {
|
||||
@ -50,7 +50,7 @@ An operation is recorded in an fscache_operation struct:
|
||||
};
|
||||
|
||||
Someone wanting to issue an operation should allocate something with this
|
||||
struct embedded in it. They should initialise it by calling:
|
||||
struct embedded in it. They should initialise it by calling::
|
||||
|
||||
void fscache_operation_init(struct fscache_operation *op,
|
||||
fscache_operation_release_t release);
|
||||
@ -67,8 +67,7 @@ FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
|
||||
operation and waited for afterwards.
|
||||
|
||||
|
||||
==========
|
||||
PARAMETERS
|
||||
Parameters
|
||||
==========
|
||||
|
||||
There are a number of parameters that can be set in the operation record's flag
|
||||
@ -87,7 +86,7 @@ operations:
|
||||
|
||||
If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
|
||||
before submitting the operation, and the operating thread must wait for it
|
||||
to be cleared before proceeding:
|
||||
to be cleared before proceeding::
|
||||
|
||||
wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
|
||||
TASK_UNINTERRUPTIBLE);
|
||||
@ -101,7 +100,7 @@ operations:
|
||||
page to a netfs page after the backing fs has read the page in.
|
||||
|
||||
If this option is used, op->fast_work and op->processor must be
|
||||
initialised before submitting the operation:
|
||||
initialised before submitting the operation::
|
||||
|
||||
INIT_WORK(&op->fast_work, do_some_work);
|
||||
|
||||
@ -114,7 +113,7 @@ operations:
|
||||
pages that have just been fetched from a remote server.
|
||||
|
||||
If this option is used, op->slow_work and op->processor must be
|
||||
initialised before submitting the operation:
|
||||
initialised before submitting the operation::
|
||||
|
||||
fscache_operation_init_slow(op, processor)
|
||||
|
||||
@ -132,8 +131,7 @@ Furthermore, operations may be one of two types:
|
||||
operations running at the same time.
|
||||
|
||||
|
||||
=========
|
||||
PROCEDURE
|
||||
Procedure
|
||||
=========
|
||||
|
||||
Operations are used through the following procedure:
|
||||
@ -143,7 +141,7 @@ Operations are used through the following procedure:
|
||||
generic op embedded within.
|
||||
|
||||
(2) The submitting thread must then submit the operation for processing using
|
||||
one of the following two functions:
|
||||
one of the following two functions::
|
||||
|
||||
int fscache_submit_op(struct fscache_object *object,
|
||||
struct fscache_operation *op);
|
||||
@ -164,7 +162,7 @@ Operations are used through the following procedure:
|
||||
operation of conflicting exclusivity is in progress on the object.
|
||||
|
||||
If the operation is asynchronous, the manager will retain a reference to
|
||||
it, so the caller should put their reference to it by passing it to:
|
||||
it, so the caller should put their reference to it by passing it to::
|
||||
|
||||
void fscache_put_operation(struct fscache_operation *op);
|
||||
|
||||
@ -179,12 +177,12 @@ Operations are used through the following procedure:
|
||||
(4) The operation holds an effective lock upon the object, preventing other
|
||||
exclusive ops conflicting until it is released. The operation can be
|
||||
enqueued for further immediate asynchronous processing by adjusting the
|
||||
CPU time provisioning option if necessary, eg:
|
||||
CPU time provisioning option if necessary, eg::
|
||||
|
||||
op->flags &= ~FSCACHE_OP_TYPE;
|
||||
op->flags |= ~FSCACHE_OP_FAST;
|
||||
|
||||
and calling:
|
||||
and calling::
|
||||
|
||||
void fscache_enqueue_operation(struct fscache_operation *op)
|
||||
|
||||
@ -192,13 +190,12 @@ Operations are used through the following procedure:
|
||||
pools.
|
||||
|
||||
|
||||
=====================
|
||||
ASYNCHRONOUS CALLBACK
|
||||
Asynchronous Callback
|
||||
=====================
|
||||
|
||||
When used in asynchronous mode, the worker thread pool will invoke the
|
||||
processor method with a pointer to the operation. This should then get at the
|
||||
container struct by using container_of():
|
||||
container struct by using container_of()::
|
||||
|
||||
static void fscache_write_op(struct fscache_operation *_op)
|
||||
{
|
@ -1,7 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===========================================
|
||||
Mounting root file system via SMB (cifs.ko)
|
||||
===========================================
|
||||
|
||||
Written 2019 by Paulo Alcantara <palcantara@suse.de>
|
||||
|
||||
Written 2019 by Aurelien Aptel <aaptel@suse.com>
|
||||
|
||||
The CONFIG_CIFS_ROOT option enables experimental root file system
|
||||
@ -32,7 +36,7 @@ Server configuration
|
||||
====================
|
||||
|
||||
To enable SMB1+UNIX extensions you will need to set these global
|
||||
settings in Samba smb.conf:
|
||||
settings in Samba smb.conf::
|
||||
|
||||
[global]
|
||||
server min protocol = NT1
|
||||
@ -41,12 +45,16 @@ settings in Samba smb.conf:
|
||||
Kernel command line
|
||||
===================
|
||||
|
||||
root=/dev/cifs
|
||||
::
|
||||
|
||||
root=/dev/cifs
|
||||
|
||||
This is just a virtual device that basically tells the kernel to mount
|
||||
the root file system via SMB protocol.
|
||||
|
||||
cifsroot=//<server-ip>/<share>[,options]
|
||||
::
|
||||
|
||||
cifsroot=//<server-ip>/<share>[,options]
|
||||
|
||||
Enables the kernel to mount the root file system via SMB that are
|
||||
located in the <server-ip> and <share> specified in this option.
|
||||
@ -65,33 +73,33 @@ options
|
||||
Examples
|
||||
========
|
||||
|
||||
Export root file system as a Samba share in smb.conf file.
|
||||
Export root file system as a Samba share in smb.conf file::
|
||||
|
||||
...
|
||||
[linux]
|
||||
path = /path/to/rootfs
|
||||
read only = no
|
||||
guest ok = yes
|
||||
force user = root
|
||||
force group = root
|
||||
browseable = yes
|
||||
writeable = yes
|
||||
admin users = root
|
||||
public = yes
|
||||
create mask = 0777
|
||||
directory mask = 0777
|
||||
...
|
||||
...
|
||||
[linux]
|
||||
path = /path/to/rootfs
|
||||
read only = no
|
||||
guest ok = yes
|
||||
force user = root
|
||||
force group = root
|
||||
browseable = yes
|
||||
writeable = yes
|
||||
admin users = root
|
||||
public = yes
|
||||
create mask = 0777
|
||||
directory mask = 0777
|
||||
...
|
||||
|
||||
Restart smb service.
|
||||
Restart smb service::
|
||||
|
||||
# systemctl restart smb
|
||||
# systemctl restart smb
|
||||
|
||||
Test it under QEMU on a kernel built with CONFIG_CIFS_ROOT and
|
||||
CONFIG_IP_PNP options enabled.
|
||||
CONFIG_IP_PNP options enabled::
|
||||
|
||||
# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
|
||||
-kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
|
||||
-append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
|
||||
# qemu-system-x86_64 -enable-kvm -cpu host -m 1024 \
|
||||
-kernel /path/to/linux/arch/x86/boot/bzImage -nographic \
|
||||
-append "root=/dev/cifs rw ip=dhcp cifsroot=//10.0.2.2/linux,username=foo,password=bar console=ttyS0 3"
|
||||
|
||||
|
||||
1: https://wiki.samba.org/index.php/UNIX_Extensions
|
1670
Documentation/filesystems/coda.rst
Normal file
1670
Documentation/filesystems/coda.rst
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,5 +1,6 @@
|
||||
|
||||
configfs - Userspace-driven kernel object configuration.
|
||||
=======================================================
|
||||
Configfs - Userspace-driven Kernel Object Configuration
|
||||
=======================================================
|
||||
|
||||
Joel Becker <joel.becker@oracle.com>
|
||||
|
||||
@ -9,7 +10,8 @@ Copyright (c) 2005 Oracle Corporation,
|
||||
Joel Becker <joel.becker@oracle.com>
|
||||
|
||||
|
||||
[What is configfs?]
|
||||
What is configfs?
|
||||
=================
|
||||
|
||||
configfs is a ram-based filesystem that provides the converse of
|
||||
sysfs's functionality. Where sysfs is a filesystem-based view of
|
||||
@ -35,10 +37,11 @@ kernel modules backing the items must respond to this.
|
||||
Both sysfs and configfs can and should exist together on the same
|
||||
system. One is not a replacement for the other.
|
||||
|
||||
[Using configfs]
|
||||
Using configfs
|
||||
==============
|
||||
|
||||
configfs can be compiled as a module or into the kernel. You can access
|
||||
it by doing
|
||||
it by doing::
|
||||
|
||||
mount -t configfs none /config
|
||||
|
||||
@ -56,28 +59,29 @@ values. Don't mix more than one attribute in one attribute file.
|
||||
There are two types of configfs attributes:
|
||||
|
||||
* Normal attributes, which similar to sysfs attributes, are small ASCII text
|
||||
files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
|
||||
only one value per file should be used, and the same caveats from sysfs apply.
|
||||
Configfs expects write(2) to store the entire buffer at once. When writing to
|
||||
normal configfs attributes, userspace processes should first read the entire
|
||||
file, modify the portions they wish to change, and then write the entire
|
||||
buffer back.
|
||||
files, with a maximum size of one page (PAGE_SIZE, 4096 on i386). Preferably
|
||||
only one value per file should be used, and the same caveats from sysfs apply.
|
||||
Configfs expects write(2) to store the entire buffer at once. When writing to
|
||||
normal configfs attributes, userspace processes should first read the entire
|
||||
file, modify the portions they wish to change, and then write the entire
|
||||
buffer back.
|
||||
|
||||
* Binary attributes, which are somewhat similar to sysfs binary attributes,
|
||||
but with a few slight changes to semantics. The PAGE_SIZE limitation does not
|
||||
apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
|
||||
The write(2) calls from user space are buffered, and the attributes'
|
||||
write_bin_attribute method will be invoked on the final close, therefore it is
|
||||
imperative for user-space to check the return code of close(2) in order to
|
||||
verify that the operation finished successfully.
|
||||
To avoid a malicious user OOMing the kernel, there's a per-binary attribute
|
||||
maximum buffer value.
|
||||
but with a few slight changes to semantics. The PAGE_SIZE limitation does not
|
||||
apply, but the whole binary item must fit in single kernel vmalloc'ed buffer.
|
||||
The write(2) calls from user space are buffered, and the attributes'
|
||||
write_bin_attribute method will be invoked on the final close, therefore it is
|
||||
imperative for user-space to check the return code of close(2) in order to
|
||||
verify that the operation finished successfully.
|
||||
To avoid a malicious user OOMing the kernel, there's a per-binary attribute
|
||||
maximum buffer value.
|
||||
|
||||
When an item needs to be destroyed, remove it with rmdir(2). An
|
||||
item cannot be destroyed if any other item has a link to it (via
|
||||
symlink(2)). Links can be removed via unlink(2).
|
||||
|
||||
[Configuring FakeNBD: an Example]
|
||||
Configuring FakeNBD: an Example
|
||||
===============================
|
||||
|
||||
Imagine there's a Network Block Device (NBD) driver that allows you to
|
||||
access remote block devices. Call it FakeNBD. FakeNBD uses configfs
|
||||
@ -86,14 +90,14 @@ sysadmins use to configure FakeNBD, but somehow that program has to tell
|
||||
the driver about it. Here's where configfs comes in.
|
||||
|
||||
When the FakeNBD driver is loaded, it registers itself with configfs.
|
||||
readdir(3) sees this just fine:
|
||||
readdir(3) sees this just fine::
|
||||
|
||||
# ls /config
|
||||
fakenbd
|
||||
|
||||
A fakenbd connection can be created with mkdir(2). The name is
|
||||
arbitrary, but likely the tool will make some use of the name. Perhaps
|
||||
it is a uuid or a disk name:
|
||||
it is a uuid or a disk name::
|
||||
|
||||
# mkdir /config/fakenbd/disk1
|
||||
# ls /config/fakenbd/disk1
|
||||
@ -102,7 +106,7 @@ it is a uuid or a disk name:
|
||||
The target attribute contains the IP address of the server FakeNBD will
|
||||
connect to. The device attribute is the device on the server.
|
||||
Predictably, the rw attribute determines whether the connection is
|
||||
read-only or read-write.
|
||||
read-only or read-write::
|
||||
|
||||
# echo 10.0.0.1 > /config/fakenbd/disk1/target
|
||||
# echo /dev/sda1 > /config/fakenbd/disk1/device
|
||||
@ -111,7 +115,8 @@ read-only or read-write.
|
||||
That's it. That's all there is. Now the device is configured, via the
|
||||
shell no less.
|
||||
|
||||
[Coding With configfs]
|
||||
Coding With configfs
|
||||
====================
|
||||
|
||||
Every object in configfs is a config_item. A config_item reflects an
|
||||
object in the subsystem. It has attributes that match values on that
|
||||
@ -130,7 +135,10 @@ appears as a directory at the top of the configfs filesystem. A
|
||||
subsystem is also a config_group, and can do everything a config_group
|
||||
can.
|
||||
|
||||
[struct config_item]
|
||||
struct config_item
|
||||
==================
|
||||
|
||||
::
|
||||
|
||||
struct config_item {
|
||||
char *ci_name;
|
||||
@ -168,7 +176,10 @@ By itself, a config_item cannot do much more than appear in configfs.
|
||||
Usually a subsystem wants the item to display and/or store attributes,
|
||||
among other things. For that, it needs a type.
|
||||
|
||||
[struct config_item_type]
|
||||
struct config_item_type
|
||||
=======================
|
||||
|
||||
::
|
||||
|
||||
struct configfs_item_operations {
|
||||
void (*release)(struct config_item *);
|
||||
@ -192,7 +203,10 @@ allocated dynamically will need to provide the ct_item_ops->release()
|
||||
method. This method is called when the config_item's reference count
|
||||
reaches zero.
|
||||
|
||||
[struct configfs_attribute]
|
||||
struct configfs_attribute
|
||||
=========================
|
||||
|
||||
::
|
||||
|
||||
struct configfs_attribute {
|
||||
char *ca_name;
|
||||
@ -214,7 +228,10 @@ be called whenever userspace asks for a read(2) on the attribute. If an
|
||||
attribute is writable and provides a ->store method, that method will be
|
||||
be called whenever userspace asks for a write(2) on the attribute.
|
||||
|
||||
[struct configfs_bin_attribute]
|
||||
struct configfs_bin_attribute
|
||||
=============================
|
||||
|
||||
::
|
||||
|
||||
struct configfs_bin_attribute {
|
||||
struct configfs_attribute cb_attr;
|
||||
@ -240,11 +257,12 @@ will happen for write(2). The reads/writes are bufferred so only a
|
||||
single read/write will occur; the attributes' need not concern itself
|
||||
with it.
|
||||
|
||||
[struct config_group]
|
||||
struct config_group
|
||||
===================
|
||||
|
||||
A config_item cannot live in a vacuum. The only way one can be created
|
||||
is via mkdir(2) on a config_group. This will trigger creation of a
|
||||
child item.
|
||||
child item::
|
||||
|
||||
struct config_group {
|
||||
struct config_item cg_item;
|
||||
@ -264,7 +282,7 @@ The config_group structure contains a config_item. Properly configuring
|
||||
that item means that a group can behave as an item in its own right.
|
||||
However, it can do more: it can create child items or groups. This is
|
||||
accomplished via the group operations specified on the group's
|
||||
config_item_type.
|
||||
config_item_type::
|
||||
|
||||
struct configfs_group_operations {
|
||||
struct config_item *(*make_item)(struct config_group *group,
|
||||
@ -279,7 +297,8 @@ config_item_type.
|
||||
};
|
||||
|
||||
A group creates child items by providing the
|
||||
ct_group_ops->make_item() method. If provided, this method is called from mkdir(2) in the group's directory. The subsystem allocates a new
|
||||
ct_group_ops->make_item() method. If provided, this method is called from
|
||||
mkdir(2) in the group's directory. The subsystem allocates a new
|
||||
config_item (or more likely, its container structure), initializes it,
|
||||
and returns it to configfs. Configfs will then populate the filesystem
|
||||
tree to reflect the new item.
|
||||
@ -296,13 +315,14 @@ upon item allocation. If a subsystem has no work to do, it may omit
|
||||
the ct_group_ops->drop_item() method, and configfs will call
|
||||
config_item_put() on the item on behalf of the subsystem.
|
||||
|
||||
IMPORTANT: drop_item() is void, and as such cannot fail. When rmdir(2)
|
||||
is called, configfs WILL remove the item from the filesystem tree
|
||||
(assuming that it has no children to keep it busy). The subsystem is
|
||||
responsible for responding to this. If the subsystem has references to
|
||||
the item in other threads, the memory is safe. It may take some time
|
||||
for the item to actually disappear from the subsystem's usage. But it
|
||||
is gone from configfs.
|
||||
Important:
|
||||
drop_item() is void, and as such cannot fail. When rmdir(2)
|
||||
is called, configfs WILL remove the item from the filesystem tree
|
||||
(assuming that it has no children to keep it busy). The subsystem is
|
||||
responsible for responding to this. If the subsystem has references to
|
||||
the item in other threads, the memory is safe. It may take some time
|
||||
for the item to actually disappear from the subsystem's usage. But it
|
||||
is gone from configfs.
|
||||
|
||||
When drop_item() is called, the item's linkage has already been torn
|
||||
down. It no longer has a reference on its parent and has no place in
|
||||
@ -319,10 +339,11 @@ is implemented in the configfs rmdir(2) code. ->drop_item() will not be
|
||||
called, as the item has not been dropped. rmdir(2) will fail, as the
|
||||
directory is not empty.
|
||||
|
||||
[struct configfs_subsystem]
|
||||
struct configfs_subsystem
|
||||
=========================
|
||||
|
||||
A subsystem must register itself, usually at module_init time. This
|
||||
tells configfs to make the subsystem appear in the file tree.
|
||||
tells configfs to make the subsystem appear in the file tree::
|
||||
|
||||
struct configfs_subsystem {
|
||||
struct config_group su_group;
|
||||
@ -332,17 +353,19 @@ tells configfs to make the subsystem appear in the file tree.
|
||||
int configfs_register_subsystem(struct configfs_subsystem *subsys);
|
||||
void configfs_unregister_subsystem(struct configfs_subsystem *subsys);
|
||||
|
||||
A subsystem consists of a toplevel config_group and a mutex.
|
||||
A subsystem consists of a toplevel config_group and a mutex.
|
||||
The group is where child config_items are created. For a subsystem,
|
||||
this group is usually defined statically. Before calling
|
||||
configfs_register_subsystem(), the subsystem must have initialized the
|
||||
group via the usual group _init() functions, and it must also have
|
||||
initialized the mutex.
|
||||
When the register call returns, the subsystem is live, and it
|
||||
|
||||
When the register call returns, the subsystem is live, and it
|
||||
will be visible via configfs. At that point, mkdir(2) can be called and
|
||||
the subsystem must be ready for it.
|
||||
|
||||
[An Example]
|
||||
An Example
|
||||
==========
|
||||
|
||||
The best example of these basic concepts is the simple_children
|
||||
subsystem/group and the simple_child item in
|
||||
@ -350,7 +373,8 @@ samples/configfs/configfs_sample.c. It shows a trivial object displaying
|
||||
and storing an attribute, and a simple group creating and destroying
|
||||
these children.
|
||||
|
||||
[Hierarchy Navigation and the Subsystem Mutex]
|
||||
Hierarchy Navigation and the Subsystem Mutex
|
||||
============================================
|
||||
|
||||
There is an extra bonus that configfs provides. The config_groups and
|
||||
config_items are arranged in a hierarchy due to the fact that they
|
||||
@ -375,7 +399,8 @@ be in its parent's cg_children list for the same duration. This allows
|
||||
a subsystem to trust ci_parent and cg_children while they hold the
|
||||
mutex.
|
||||
|
||||
[Item Aggregation Via symlink(2)]
|
||||
Item Aggregation Via symlink(2)
|
||||
===============================
|
||||
|
||||
configfs provides a simple group via the group->item parent/child
|
||||
relationship. Often, however, a larger environment requires aggregation
|
||||
@ -403,7 +428,8 @@ A config_item cannot be removed while it links to any other item, nor
|
||||
can it be removed while an item links to it. Dangling symlinks are not
|
||||
allowed in configfs.
|
||||
|
||||
[Automatically Created Subgroups]
|
||||
Automatically Created Subgroups
|
||||
===============================
|
||||
|
||||
A new config_group may want to have two types of child config_items.
|
||||
While this could be codified by magic names in ->make_item(), it is much
|
||||
@ -433,7 +459,8 @@ As a consequence of this, default groups cannot be removed directly via
|
||||
rmdir(2). They also are not considered when rmdir(2) on the parent
|
||||
group is checking for children.
|
||||
|
||||
[Dependent Subsystems]
|
||||
Dependent Subsystems
|
||||
====================
|
||||
|
||||
Sometimes other drivers depend on particular configfs items. For
|
||||
example, ocfs2 mounts depend on a heartbeat region item. If that
|
||||
@ -460,9 +487,11 @@ succeeds, then heartbeat knows the region is safe to give to ocfs2.
|
||||
If it fails, it was being torn down anyway, and heartbeat can gracefully
|
||||
pass up an error.
|
||||
|
||||
[Committable Items]
|
||||
Committable Items
|
||||
=================
|
||||
|
||||
NOTE: Committable items are currently unimplemented.
|
||||
Note:
|
||||
Committable items are currently unimplemented.
|
||||
|
||||
Some config_items cannot have a valid initial state. That is, no
|
||||
default values can be specified for the item's attributes such that the
|
||||
@ -504,5 +533,3 @@ As rmdir(2) does not work in the "live" directory, an item must be
|
||||
shutdown, or "uncommitted". Again, this is done via rename(2), this
|
||||
time from the "live" directory back to the "pending" one. The subsystem
|
||||
is notified by the ct_group_ops->uncommit_object() method.
|
||||
|
||||
|
36
Documentation/filesystems/devpts.rst
Normal file
36
Documentation/filesystems/devpts.rst
Normal file
@ -0,0 +1,36 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================
|
||||
The Devpts Filesystem
|
||||
=====================
|
||||
|
||||
Each mount of the devpts filesystem is now distinct such that ptys
|
||||
and their indicies allocated in one mount are independent from ptys
|
||||
and their indicies in all other mounts.
|
||||
|
||||
All mounts of the devpts filesystem now create a ``/dev/pts/ptmx`` node
|
||||
with permissions ``0000``.
|
||||
|
||||
To retain backwards compatibility the a ptmx device node (aka any node
|
||||
created with ``mknod name c 5 2``) when opened will look for an instance
|
||||
of devpts under the name ``pts`` in the same directory as the ptmx device
|
||||
node.
|
||||
|
||||
As an option instead of placing a ``/dev/ptmx`` device node at ``/dev/ptmx``
|
||||
it is possible to place a symlink to ``/dev/pts/ptmx`` at ``/dev/ptmx`` or
|
||||
to bind mount ``/dev/ptx/ptmx`` to ``/dev/ptmx``. If you opt for using
|
||||
the devpts filesystem in this manner devpts should be mounted with
|
||||
the ``ptmxmode=0666``, or ``chmod 0666 /dev/pts/ptmx`` should be called.
|
||||
|
||||
Total count of pty pairs in all instances is limited by sysctls::
|
||||
|
||||
kernel.pty.max = 4096 - global limit
|
||||
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
|
||||
kernel.pty.nr - current count of ptys
|
||||
|
||||
Per-instance limit could be set by adding mount option ``max=<count>``.
|
||||
|
||||
This feature was added in kernel 3.4 together with
|
||||
``sysctl kernel.pty.reserve``.
|
||||
|
||||
In kernels older than 3.4 sysctl ``kernel.pty.max`` works as per-instance limit.
|
@ -1,26 +0,0 @@
|
||||
Each mount of the devpts filesystem is now distinct such that ptys
|
||||
and their indicies allocated in one mount are independent from ptys
|
||||
and their indicies in all other mounts.
|
||||
|
||||
All mounts of the devpts filesystem now create a /dev/pts/ptmx node
|
||||
with permissions 0000.
|
||||
|
||||
To retain backwards compatibility the a ptmx device node (aka any node
|
||||
created with "mknod name c 5 2") when opened will look for an instance
|
||||
of devpts under the name "pts" in the same directory as the ptmx device
|
||||
node.
|
||||
|
||||
As an option instead of placing a /dev/ptmx device node at /dev/ptmx
|
||||
it is possible to place a symlink to /dev/pts/ptmx at /dev/ptmx or
|
||||
to bind mount /dev/ptx/ptmx to /dev/ptmx. If you opt for using
|
||||
the devpts filesystem in this manner devpts should be mounted with
|
||||
the ptmxmode=0666, or chmod 0666 /dev/pts/ptmx should be called.
|
||||
|
||||
Total count of pty pairs in all instances is limited by sysctls:
|
||||
kernel.pty.max = 4096 - global limit
|
||||
kernel.pty.reserve = 1024 - reserved for filesystems mounted from the initial mount namespace
|
||||
kernel.pty.nr - current count of ptys
|
||||
|
||||
Per-instance limit could be set by adding mount option "max=<count>".
|
||||
This feature was added in kernel 3.4 together with sysctl kernel.pty.reserve.
|
||||
In kernels older than 3.4 sysctl kernel.pty.max works as per-instance limit.
|
@ -1,5 +1,8 @@
|
||||
Linux Directory Notification
|
||||
============================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================
|
||||
Linux Directory Notification
|
||||
============================
|
||||
|
||||
Stephen Rothwell <sfr@canb.auug.org.au>
|
||||
|
||||
@ -12,6 +15,7 @@ being delivered using signals.
|
||||
The application decides which "events" it wants to be notified about.
|
||||
The currently defined events are:
|
||||
|
||||
========= =====================================================
|
||||
DN_ACCESS A file in the directory was accessed (read)
|
||||
DN_MODIFY A file in the directory was modified (write,truncate)
|
||||
DN_CREATE A file was created in the directory
|
||||
@ -19,6 +23,7 @@ The currently defined events are:
|
||||
DN_RENAME A file in the directory was renamed
|
||||
DN_ATTRIB A file in the directory had its attributes
|
||||
changed (chmod,chown)
|
||||
========= =====================================================
|
||||
|
||||
Usually, the application must reregister after each notification, but
|
||||
if DN_MULTISHOT is or'ed with the event mask, then the registration will
|
||||
@ -36,7 +41,7 @@ especially important if DN_MULTISHOT is specified. Note that SIGRTMIN
|
||||
is often blocked, so it is better to use (at least) SIGRTMIN + 1.
|
||||
|
||||
Implementation expectations (features and bugs :-))
|
||||
---------------------------
|
||||
---------------------------------------------------
|
||||
|
||||
The notification should work for any local access to files even if the
|
||||
actual file system is on a remote server. This implies that remote
|
@ -1,3 +1,5 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============
|
||||
Fiemap Ioctl
|
||||
============
|
||||
@ -10,9 +12,9 @@ returns a list of extents.
|
||||
Request Basics
|
||||
--------------
|
||||
|
||||
A fiemap request is encoded within struct fiemap:
|
||||
A fiemap request is encoded within struct fiemap::
|
||||
|
||||
struct fiemap {
|
||||
struct fiemap {
|
||||
__u64 fm_start; /* logical offset (inclusive) at
|
||||
* which to start mapping (in) */
|
||||
__u64 fm_length; /* logical length of mapping which
|
||||
@ -23,7 +25,7 @@ struct fiemap {
|
||||
__u32 fm_extent_count; /* size of fm_extents array (in) */
|
||||
__u32 fm_reserved;
|
||||
struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
|
||||
};
|
||||
};
|
||||
|
||||
|
||||
fm_start, and fm_length specify the logical range within the file
|
||||
@ -51,12 +53,12 @@ nothing to prevent the file from changing between calls to FIEMAP.
|
||||
|
||||
The following flags can be set in fm_flags:
|
||||
|
||||
* FIEMAP_FLAG_SYNC
|
||||
If this flag is set, the kernel will sync the file before mapping extents.
|
||||
FIEMAP_FLAG_SYNC
|
||||
If this flag is set, the kernel will sync the file before mapping extents.
|
||||
|
||||
* FIEMAP_FLAG_XATTR
|
||||
If this flag is set, the extents returned will describe the inodes
|
||||
extended attribute lookup tree, instead of its data tree.
|
||||
FIEMAP_FLAG_XATTR
|
||||
If this flag is set, the extents returned will describe the inodes
|
||||
extended attribute lookup tree, instead of its data tree.
|
||||
|
||||
|
||||
Extent Mapping
|
||||
@ -75,18 +77,18 @@ complete the requested range and will not have the FIEMAP_EXTENT_LAST
|
||||
flag set (see the next section on extent flags).
|
||||
|
||||
Each extent is described by a single fiemap_extent structure as
|
||||
returned in fm_extents.
|
||||
returned in fm_extents::
|
||||
|
||||
struct fiemap_extent {
|
||||
__u64 fe_logical; /* logical offset in bytes for the start of
|
||||
* the extent */
|
||||
__u64 fe_physical; /* physical offset in bytes for the start
|
||||
* of the extent */
|
||||
__u64 fe_length; /* length in bytes for the extent */
|
||||
__u64 fe_reserved64[2];
|
||||
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
|
||||
__u32 fe_reserved[3];
|
||||
};
|
||||
struct fiemap_extent {
|
||||
__u64 fe_logical; /* logical offset in bytes for the start of
|
||||
* the extent */
|
||||
__u64 fe_physical; /* physical offset in bytes for the start
|
||||
* of the extent */
|
||||
__u64 fe_length; /* length in bytes for the extent */
|
||||
__u64 fe_reserved64[2];
|
||||
__u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */
|
||||
__u32 fe_reserved[3];
|
||||
};
|
||||
|
||||
All offsets and lengths are in bytes and mirror those on disk. It is valid
|
||||
for an extents logical offset to start before the request or its logical
|
||||
@ -114,26 +116,27 @@ worry about all present and future flags which might imply unaligned
|
||||
data. Note that the opposite is not true - it would be valid for
|
||||
FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
|
||||
|
||||
* FIEMAP_EXTENT_LAST
|
||||
This is generally the last extent in the file. A mapping attempt past
|
||||
this extent may return nothing. Some implementations set this flag to
|
||||
indicate this extent is the last one in the range queried by the user
|
||||
(via fiemap->fm_length).
|
||||
FIEMAP_EXTENT_LAST
|
||||
This is generally the last extent in the file. A mapping attempt past
|
||||
this extent may return nothing. Some implementations set this flag to
|
||||
indicate this extent is the last one in the range queried by the user
|
||||
(via fiemap->fm_length).
|
||||
|
||||
* FIEMAP_EXTENT_UNKNOWN
|
||||
The location of this extent is currently unknown. This may indicate
|
||||
the data is stored on an inaccessible volume or that no storage has
|
||||
been allocated for the file yet.
|
||||
FIEMAP_EXTENT_UNKNOWN
|
||||
The location of this extent is currently unknown. This may indicate
|
||||
the data is stored on an inaccessible volume or that no storage has
|
||||
been allocated for the file yet.
|
||||
|
||||
* FIEMAP_EXTENT_DELALLOC
|
||||
- This will also set FIEMAP_EXTENT_UNKNOWN.
|
||||
Delayed allocation - while there is data for this extent, its
|
||||
physical location has not been allocated yet.
|
||||
FIEMAP_EXTENT_DELALLOC
|
||||
This will also set FIEMAP_EXTENT_UNKNOWN.
|
||||
|
||||
* FIEMAP_EXTENT_ENCODED
|
||||
This extent does not consist of plain filesystem blocks but is
|
||||
encoded (e.g. encrypted or compressed). Reading the data in this
|
||||
extent via I/O to the block device will have undefined results.
|
||||
Delayed allocation - while there is data for this extent, its
|
||||
physical location has not been allocated yet.
|
||||
|
||||
FIEMAP_EXTENT_ENCODED
|
||||
This extent does not consist of plain filesystem blocks but is
|
||||
encoded (e.g. encrypted or compressed). Reading the data in this
|
||||
extent via I/O to the block device will have undefined results.
|
||||
|
||||
Note that it is *always* undefined to try to update the data
|
||||
in-place by writing to the indicated location without the
|
||||
@ -145,32 +148,32 @@ unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
|
||||
clear; user applications must not try reading or writing to the
|
||||
filesystem via the block device under any other circumstances.
|
||||
|
||||
* FIEMAP_EXTENT_DATA_ENCRYPTED
|
||||
- This will also set FIEMAP_EXTENT_ENCODED
|
||||
The data in this extent has been encrypted by the file system.
|
||||
FIEMAP_EXTENT_DATA_ENCRYPTED
|
||||
This will also set FIEMAP_EXTENT_ENCODED
|
||||
The data in this extent has been encrypted by the file system.
|
||||
|
||||
* FIEMAP_EXTENT_NOT_ALIGNED
|
||||
Extent offsets and length are not guaranteed to be block aligned.
|
||||
FIEMAP_EXTENT_NOT_ALIGNED
|
||||
Extent offsets and length are not guaranteed to be block aligned.
|
||||
|
||||
* FIEMAP_EXTENT_DATA_INLINE
|
||||
FIEMAP_EXTENT_DATA_INLINE
|
||||
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||
Data is located within a meta data block.
|
||||
Data is located within a meta data block.
|
||||
|
||||
* FIEMAP_EXTENT_DATA_TAIL
|
||||
FIEMAP_EXTENT_DATA_TAIL
|
||||
This will also set FIEMAP_EXTENT_NOT_ALIGNED
|
||||
Data is packed into a block with data from other files.
|
||||
Data is packed into a block with data from other files.
|
||||
|
||||
* FIEMAP_EXTENT_UNWRITTEN
|
||||
Unwritten extent - the extent is allocated but its data has not been
|
||||
initialized. This indicates the extent's data will be all zero if read
|
||||
through the filesystem but the contents are undefined if read directly from
|
||||
the device.
|
||||
FIEMAP_EXTENT_UNWRITTEN
|
||||
Unwritten extent - the extent is allocated but its data has not been
|
||||
initialized. This indicates the extent's data will be all zero if read
|
||||
through the filesystem but the contents are undefined if read directly from
|
||||
the device.
|
||||
|
||||
* FIEMAP_EXTENT_MERGED
|
||||
This will be set when a file does not support extents, i.e., it uses a block
|
||||
based addressing scheme. Since returning an extent for each block back to
|
||||
userspace would be highly inefficient, the kernel will try to merge most
|
||||
adjacent blocks into 'extents'.
|
||||
FIEMAP_EXTENT_MERGED
|
||||
This will be set when a file does not support extents, i.e., it uses a block
|
||||
based addressing scheme. Since returning an extent for each block back to
|
||||
userspace would be highly inefficient, the kernel will try to merge most
|
||||
adjacent blocks into 'extents'.
|
||||
|
||||
|
||||
VFS -> File System Implementation
|
||||
@ -179,23 +182,23 @@ VFS -> File System Implementation
|
||||
File systems wishing to support fiemap must implement a ->fiemap callback on
|
||||
their inode_operations structure. The fs ->fiemap call is responsible for
|
||||
defining its set of supported fiemap flags, and calling a helper function on
|
||||
each discovered extent:
|
||||
each discovered extent::
|
||||
|
||||
struct inode_operations {
|
||||
struct inode_operations {
|
||||
...
|
||||
|
||||
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
|
||||
u64 len);
|
||||
|
||||
->fiemap is passed struct fiemap_extent_info which describes the
|
||||
fiemap request:
|
||||
fiemap request::
|
||||
|
||||
struct fiemap_extent_info {
|
||||
struct fiemap_extent_info {
|
||||
unsigned int fi_flags; /* Flags as passed from user */
|
||||
unsigned int fi_extents_mapped; /* Number of mapped extents */
|
||||
unsigned int fi_extents_max; /* Size of fiemap_extent array */
|
||||
struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */
|
||||
};
|
||||
};
|
||||
|
||||
It is intended that the file system should not need to access any of this
|
||||
structure directly. Filesystem handlers should be tolerant to signals and return
|
||||
@ -203,9 +206,9 @@ EINTR once fatal signal received.
|
||||
|
||||
|
||||
Flag checking should be done at the beginning of the ->fiemap callback via the
|
||||
fiemap_check_flags() helper:
|
||||
fiemap_check_flags() helper::
|
||||
|
||||
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||
int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
|
||||
|
||||
The struct fieinfo should be passed in as received from ioctl_fiemap(). The
|
||||
set of fiemap flags which the fs understands should be passed via fs_flags. If
|
||||
@ -216,10 +219,10 @@ ioctl_fiemap().
|
||||
|
||||
|
||||
For each extent in the request range, the file system should call
|
||||
the helper function, fiemap_fill_next_extent():
|
||||
the helper function, fiemap_fill_next_extent()::
|
||||
|
||||
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
|
||||
u64 phys, u64 len, u32 flags, u32 dev);
|
||||
int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
|
||||
u64 phys, u64 len, u32 flags, u32 dev);
|
||||
|
||||
fiemap_fill_next_extent() will use the passed values to populate the
|
||||
next free extent in the fm_extents array. 'General' extent flags will
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===================================
|
||||
File management in the Linux kernel
|
||||
-----------------------------------
|
||||
===================================
|
||||
|
||||
This document describes how locking for files (struct file)
|
||||
and file descriptor table (struct files) works.
|
||||
@ -34,7 +37,7 @@ appear atomic. Here are the locking rules for
|
||||
the fdtable structure -
|
||||
|
||||
1. All references to the fdtable must be done through
|
||||
the files_fdtable() macro :
|
||||
the files_fdtable() macro::
|
||||
|
||||
struct fdtable *fdt;
|
||||
|
||||
@ -61,7 +64,8 @@ the fdtable structure -
|
||||
4. To look up the file structure given an fd, a reader
|
||||
must use either fcheck() or fcheck_files() APIs. These
|
||||
take care of barrier requirements due to lock-free lookup.
|
||||
An example :
|
||||
|
||||
An example::
|
||||
|
||||
struct file *file;
|
||||
|
||||
@ -77,7 +81,7 @@ the fdtable structure -
|
||||
of the fd (fget()/fget_light()) are lock-free, it is possible
|
||||
that look-up may race with the last put() operation on the
|
||||
file structure. This is avoided using atomic_long_inc_not_zero()
|
||||
on ->f_count :
|
||||
on ->f_count::
|
||||
|
||||
rcu_read_lock();
|
||||
file = fcheck_files(files, fd);
|
||||
@ -106,7 +110,8 @@ the fdtable structure -
|
||||
holding files->file_lock. If ->file_lock is dropped, then
|
||||
another thread expand the files thereby creating a new
|
||||
fdtable and making the earlier fdtable pointer stale.
|
||||
For example :
|
||||
|
||||
For example::
|
||||
|
||||
spin_lock(&files->file_lock);
|
||||
fd = locate_fd(files, file, start);
|
@ -1,3 +1,9 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============
|
||||
Fuse I/O Modes
|
||||
==============
|
||||
|
||||
Fuse supports the following I/O modes:
|
||||
|
||||
- direct-io
|
@ -24,6 +24,22 @@ algorithms work.
|
||||
splice
|
||||
locking
|
||||
directory-locking
|
||||
devpts
|
||||
dnotify
|
||||
fiemap
|
||||
files
|
||||
locks
|
||||
mandatory-locking
|
||||
mount_api
|
||||
quota
|
||||
seq_file
|
||||
sharedsubtree
|
||||
sysfs-pci
|
||||
sysfs-tagging
|
||||
|
||||
automount-support
|
||||
|
||||
caching/index
|
||||
|
||||
porting
|
||||
|
||||
@ -57,7 +73,10 @@ Documentation for filesystem implementations.
|
||||
befs
|
||||
bfs
|
||||
btrfs
|
||||
cifs/cifsroot
|
||||
ceph
|
||||
coda
|
||||
configfs
|
||||
cramfs
|
||||
debugfs
|
||||
dlmfs
|
||||
@ -73,6 +92,7 @@ Documentation for filesystem implementations.
|
||||
hfsplus
|
||||
hpfs
|
||||
fuse
|
||||
fuse-io
|
||||
inotify
|
||||
isofs
|
||||
nilfs2
|
||||
@ -88,6 +108,7 @@ Documentation for filesystem implementations.
|
||||
ramfs-rootfs-initramfs
|
||||
relay
|
||||
romfs
|
||||
spufs/index
|
||||
squashfs
|
||||
sysfs
|
||||
sysv-fs
|
||||
@ -97,4 +118,6 @@ Documentation for filesystem implementations.
|
||||
udf
|
||||
virtiofs
|
||||
vfat
|
||||
xfs-delayed-logging-design
|
||||
xfs-self-describing-metadata
|
||||
zonefs
|
||||
|
@ -1,4 +1,8 @@
|
||||
File Locking Release Notes
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
File Locking Release Notes
|
||||
==========================
|
||||
|
||||
Andy Walker <andy@lysaker.kvaerner.no>
|
||||
|
||||
@ -6,7 +10,7 @@
|
||||
|
||||
|
||||
1. What's New?
|
||||
--------------
|
||||
==============
|
||||
|
||||
1.1 Broken Flock Emulation
|
||||
--------------------------
|
||||
@ -25,7 +29,7 @@ anyway (see the file "Documentation/process/changes.rst".)
|
||||
---------------------------
|
||||
|
||||
1.2.1 Typical Problems - Sendmail
|
||||
---------------------------------
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Because sendmail was unable to use the old flock() emulation, many sendmail
|
||||
installations use fcntl() instead of flock(). This is true of Slackware 3.0
|
||||
for example. This gave rise to some other subtle problems if sendmail was
|
||||
@ -37,7 +41,7 @@ to lock solid with deadlocked processes.
|
||||
|
||||
|
||||
1.2.2 The Solution
|
||||
------------------
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
The solution I have chosen, after much experimentation and discussion,
|
||||
is to make flock() and fcntl() locks oblivious to each other. Both can
|
||||
exists, and neither will have any effect on the other.
|
||||
@ -54,7 +58,7 @@ fcntl(), with all the problems that implies.
|
||||
---------------------------------------
|
||||
|
||||
Mandatory locking, as described in
|
||||
'Documentation/filesystems/mandatory-locking.txt' was prior to this release a
|
||||
'Documentation/filesystems/mandatory-locking.rst' was prior to this release a
|
||||
general configuration option that was valid for all mounted filesystems. This
|
||||
had a number of inherent dangers, not the least of which was the ability to
|
||||
freeze an NFS server by asking it to read a file for which a mandatory lock
|
@ -1,8 +1,13 @@
|
||||
Mandatory File Locking For The Linux Operating System
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====================================================
|
||||
Mandatory File Locking For The Linux Operating System
|
||||
=====================================================
|
||||
|
||||
Andy Walker <andy@lysaker.kvaerner.no>
|
||||
|
||||
15 April 1996
|
||||
|
||||
(Updated September 2007)
|
||||
|
||||
0. Why you should avoid mandatory locking
|
||||
@ -53,15 +58,17 @@ possible on existing user code. The scheme is based on marking individual files
|
||||
as candidates for mandatory locking, and using the existing fcntl()/lockf()
|
||||
interface for applying locks just as if they were normal, advisory locks.
|
||||
|
||||
Note 1: In saying "file" in the paragraphs above I am actually not telling
|
||||
the whole truth. System V locking is based on fcntl(). The granularity of
|
||||
fcntl() is such that it allows the locking of byte ranges in files, in addition
|
||||
to entire files, so the mandatory locking rules also have byte level
|
||||
granularity.
|
||||
.. Note::
|
||||
|
||||
Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
|
||||
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
||||
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
||||
1. In saying "file" in the paragraphs above I am actually not telling
|
||||
the whole truth. System V locking is based on fcntl(). The granularity of
|
||||
fcntl() is such that it allows the locking of byte ranges in files, in
|
||||
addition to entire files, so the mandatory locking rules also have byte
|
||||
level granularity.
|
||||
|
||||
2. POSIX.1 does not specify any scheme for mandatory locking, despite
|
||||
borrowing the fcntl() locking scheme from System V. The mandatory locking
|
||||
scheme is defined by the System V Interface Definition (SVID) Version 3.
|
||||
|
||||
2. Marking a file for mandatory locking
|
||||
---------------------------------------
|
@ -1,8 +1,10 @@
|
||||
====================
|
||||
FILESYSTEM MOUNT API
|
||||
====================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
CONTENTS
|
||||
====================
|
||||
fILESYSTEM Mount API
|
||||
====================
|
||||
|
||||
.. CONTENTS
|
||||
|
||||
(1) Overview.
|
||||
|
||||
@ -21,8 +23,7 @@ CONTENTS
|
||||
(8) Parameter helper functions.
|
||||
|
||||
|
||||
========
|
||||
OVERVIEW
|
||||
Overview
|
||||
========
|
||||
|
||||
The creation of new mounts is now to be done in a multistep process:
|
||||
@ -43,7 +44,7 @@ The creation of new mounts is now to be done in a multistep process:
|
||||
|
||||
(7) Destroy the context.
|
||||
|
||||
To support this, the file_system_type struct gains two new fields:
|
||||
To support this, the file_system_type struct gains two new fields::
|
||||
|
||||
int (*init_fs_context)(struct fs_context *fc);
|
||||
const struct fs_parameter_description *parameters;
|
||||
@ -57,12 +58,11 @@ Note that security initialisation is done *after* the filesystem is called so
|
||||
that the namespaces may be adjusted first.
|
||||
|
||||
|
||||
======================
|
||||
THE FILESYSTEM CONTEXT
|
||||
The Filesystem context
|
||||
======================
|
||||
|
||||
The creation and reconfiguration of a superblock is governed by a filesystem
|
||||
context. This is represented by the fs_context structure:
|
||||
context. This is represented by the fs_context structure::
|
||||
|
||||
struct fs_context {
|
||||
const struct fs_context_operations *ops;
|
||||
@ -86,78 +86,106 @@ context. This is represented by the fs_context structure:
|
||||
|
||||
The fs_context fields are as follows:
|
||||
|
||||
(*) const struct fs_context_operations *ops
|
||||
* ::
|
||||
|
||||
const struct fs_context_operations *ops
|
||||
|
||||
These are operations that can be done on a filesystem context (see
|
||||
below). This must be set by the ->init_fs_context() file_system_type
|
||||
operation.
|
||||
|
||||
(*) struct file_system_type *fs_type
|
||||
* ::
|
||||
|
||||
struct file_system_type *fs_type
|
||||
|
||||
A pointer to the file_system_type of the filesystem that is being
|
||||
constructed or reconfigured. This retains a reference on the type owner.
|
||||
|
||||
(*) void *fs_private
|
||||
* ::
|
||||
|
||||
void *fs_private
|
||||
|
||||
A pointer to the file system's private data. This is where the filesystem
|
||||
will need to store any options it parses.
|
||||
|
||||
(*) struct dentry *root
|
||||
* ::
|
||||
|
||||
struct dentry *root
|
||||
|
||||
A pointer to the root of the mountable tree (and indirectly, the
|
||||
superblock thereof). This is filled in by the ->get_tree() op. If this
|
||||
is set, an active reference on root->d_sb must also be held.
|
||||
|
||||
(*) struct user_namespace *user_ns
|
||||
(*) struct net *net_ns
|
||||
* ::
|
||||
|
||||
struct user_namespace *user_ns
|
||||
struct net *net_ns
|
||||
|
||||
There are a subset of the namespaces in use by the invoking process. They
|
||||
retain references on each namespace. The subscribed namespaces may be
|
||||
replaced by the filesystem to reflect other sources, such as the parent
|
||||
mount superblock on an automount.
|
||||
|
||||
(*) const struct cred *cred
|
||||
* ::
|
||||
|
||||
const struct cred *cred
|
||||
|
||||
The mounter's credentials. This retains a reference on the credentials.
|
||||
|
||||
(*) char *source
|
||||
* ::
|
||||
|
||||
char *source
|
||||
|
||||
This specifies the source. It may be a block device (e.g. /dev/sda1) or
|
||||
something more exotic, such as the "host:/path" that NFS desires.
|
||||
|
||||
(*) char *subtype
|
||||
* ::
|
||||
|
||||
char *subtype
|
||||
|
||||
This is a string to be added to the type displayed in /proc/mounts to
|
||||
qualify it (used by FUSE). This is available for the filesystem to set if
|
||||
desired.
|
||||
|
||||
(*) void *security
|
||||
* ::
|
||||
|
||||
void *security
|
||||
|
||||
A place for the LSMs to hang their security data for the superblock. The
|
||||
relevant security operations are described below.
|
||||
|
||||
(*) void *s_fs_info
|
||||
* ::
|
||||
|
||||
void *s_fs_info
|
||||
|
||||
The proposed s_fs_info for a new superblock, set in the superblock by
|
||||
sget_fc(). This can be used to distinguish superblocks.
|
||||
|
||||
(*) unsigned int sb_flags
|
||||
(*) unsigned int sb_flags_mask
|
||||
* ::
|
||||
|
||||
unsigned int sb_flags
|
||||
unsigned int sb_flags_mask
|
||||
|
||||
Which bits SB_* flags are to be set/cleared in super_block::s_flags.
|
||||
|
||||
(*) unsigned int s_iflags
|
||||
* ::
|
||||
|
||||
unsigned int s_iflags
|
||||
|
||||
These will be bitwise-OR'd with s->s_iflags when a superblock is created.
|
||||
|
||||
(*) enum fs_context_purpose
|
||||
* ::
|
||||
|
||||
enum fs_context_purpose
|
||||
|
||||
This indicates the purpose for which the context is intended. The
|
||||
available values are:
|
||||
|
||||
FS_CONTEXT_FOR_MOUNT, -- New superblock for explicit mount
|
||||
FS_CONTEXT_FOR_SUBMOUNT -- New automatic submount of extant mount
|
||||
FS_CONTEXT_FOR_RECONFIGURE -- Change an existing mount
|
||||
========================== ======================================
|
||||
FS_CONTEXT_FOR_MOUNT, New superblock for explicit mount
|
||||
FS_CONTEXT_FOR_SUBMOUNT New automatic submount of extant mount
|
||||
FS_CONTEXT_FOR_RECONFIGURE Change an existing mount
|
||||
========================== ======================================
|
||||
|
||||
The mount context is created by calling vfs_new_fs_context() or
|
||||
vfs_dup_fs_context() and is destroyed with put_fs_context(). Note that the
|
||||
@ -176,11 +204,10 @@ mount context. For instance, NFS might pin the appropriate protocol version
|
||||
module.
|
||||
|
||||
|
||||
=================================
|
||||
THE FILESYSTEM CONTEXT OPERATIONS
|
||||
The Filesystem Context Operations
|
||||
=================================
|
||||
|
||||
The filesystem context points to a table of operations:
|
||||
The filesystem context points to a table of operations::
|
||||
|
||||
struct fs_context_operations {
|
||||
void (*free)(struct fs_context *fc);
|
||||
@ -195,24 +222,32 @@ The filesystem context points to a table of operations:
|
||||
These operations are invoked by the various stages of the mount procedure to
|
||||
manage the filesystem context. They are as follows:
|
||||
|
||||
(*) void (*free)(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
void (*free)(struct fs_context *fc);
|
||||
|
||||
Called to clean up the filesystem-specific part of the filesystem context
|
||||
when the context is destroyed. It should be aware that parts of the
|
||||
context may have been removed and NULL'd out by ->get_tree().
|
||||
|
||||
(*) int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
|
||||
* ::
|
||||
|
||||
int (*dup)(struct fs_context *fc, struct fs_context *src_fc);
|
||||
|
||||
Called when a filesystem context has been duplicated to duplicate the
|
||||
filesystem-private data. An error may be returned to indicate failure to
|
||||
do this.
|
||||
|
||||
[!] Note that even if this fails, put_fs_context() will be called
|
||||
.. Warning::
|
||||
|
||||
Note that even if this fails, put_fs_context() will be called
|
||||
immediately thereafter, so ->dup() *must* make the
|
||||
filesystem-private data safe for ->free().
|
||||
|
||||
(*) int (*parse_param)(struct fs_context *fc,
|
||||
struct struct fs_parameter *param);
|
||||
* ::
|
||||
|
||||
int (*parse_param)(struct fs_context *fc,
|
||||
struct struct fs_parameter *param);
|
||||
|
||||
Called when a parameter is being added to the filesystem context. param
|
||||
points to the key name and maybe a value object. VFS-specific options
|
||||
@ -224,7 +259,9 @@ manage the filesystem context. They are as follows:
|
||||
|
||||
If successful, 0 should be returned or a negative error code otherwise.
|
||||
|
||||
(*) int (*parse_monolithic)(struct fs_context *fc, void *data);
|
||||
* ::
|
||||
|
||||
int (*parse_monolithic)(struct fs_context *fc, void *data);
|
||||
|
||||
Called when the mount(2) system call is invoked to pass the entire data
|
||||
page in one go. If this is expected to be just a list of "key[=val]"
|
||||
@ -236,7 +273,9 @@ manage the filesystem context. They are as follows:
|
||||
finds it's the standard key-val list then it may pass it off to
|
||||
generic_parse_monolithic().
|
||||
|
||||
(*) int (*get_tree)(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
int (*get_tree)(struct fs_context *fc);
|
||||
|
||||
Called to get or create the mountable root and superblock, using the
|
||||
information stored in the filesystem context (reconfiguration goes via a
|
||||
@ -249,7 +288,9 @@ manage the filesystem context. They are as follows:
|
||||
The phase on a userspace-driven context will be set to only allow this to
|
||||
be called once on any particular context.
|
||||
|
||||
(*) int (*reconfigure)(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
int (*reconfigure)(struct fs_context *fc);
|
||||
|
||||
Called to effect reconfiguration of a superblock using information stored
|
||||
in the filesystem context. It may detach any resources it desires from
|
||||
@ -259,19 +300,20 @@ manage the filesystem context. They are as follows:
|
||||
On success it should return 0. In the case of an error, it should return
|
||||
a negative error code.
|
||||
|
||||
[NOTE] reconfigure is intended as a replacement for remount_fs.
|
||||
.. Note:: reconfigure is intended as a replacement for remount_fs.
|
||||
|
||||
|
||||
===========================
|
||||
FILESYSTEM CONTEXT SECURITY
|
||||
Filesystem context Security
|
||||
===========================
|
||||
|
||||
The filesystem context contains a security pointer that the LSMs can use for
|
||||
building up a security context for the superblock to be mounted. There are a
|
||||
number of operations used by the new mount code for this purpose:
|
||||
|
||||
(*) int security_fs_context_alloc(struct fs_context *fc,
|
||||
struct dentry *reference);
|
||||
* ::
|
||||
|
||||
int security_fs_context_alloc(struct fs_context *fc,
|
||||
struct dentry *reference);
|
||||
|
||||
Called to initialise fc->security (which is preset to NULL) and allocate
|
||||
any resources needed. It should return 0 on success or a negative error
|
||||
@ -283,22 +325,28 @@ number of operations used by the new mount code for this purpose:
|
||||
non-NULL in the case of a submount (FS_CONTEXT_FOR_SUBMOUNT) in which case
|
||||
it indicates the automount point.
|
||||
|
||||
(*) int security_fs_context_dup(struct fs_context *fc,
|
||||
struct fs_context *src_fc);
|
||||
* ::
|
||||
|
||||
int security_fs_context_dup(struct fs_context *fc,
|
||||
struct fs_context *src_fc);
|
||||
|
||||
Called to initialise fc->security (which is preset to NULL) and allocate
|
||||
any resources needed. The original filesystem context is pointed to by
|
||||
src_fc and may be used for reference. It should return 0 on success or a
|
||||
negative error code on failure.
|
||||
|
||||
(*) void security_fs_context_free(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
void security_fs_context_free(struct fs_context *fc);
|
||||
|
||||
Called to clean up anything attached to fc->security. Note that the
|
||||
contents may have been transferred to a superblock and the pointer cleared
|
||||
during get_tree.
|
||||
|
||||
(*) int security_fs_context_parse_param(struct fs_context *fc,
|
||||
struct fs_parameter *param);
|
||||
* ::
|
||||
|
||||
int security_fs_context_parse_param(struct fs_context *fc,
|
||||
struct fs_parameter *param);
|
||||
|
||||
Called for each mount parameter, including the source. The arguments are
|
||||
as for the ->parse_param() method. It should return 0 to indicate that
|
||||
@ -310,7 +358,9 @@ number of operations used by the new mount code for this purpose:
|
||||
(provided the value pointer is NULL'd out). If it is stolen, 1 must be
|
||||
returned to prevent it being passed to the filesystem.
|
||||
|
||||
(*) int security_fs_context_validate(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
int security_fs_context_validate(struct fs_context *fc);
|
||||
|
||||
Called after all the options have been parsed to validate the collection
|
||||
as a whole and to do any necessary allocation so that
|
||||
@ -320,36 +370,43 @@ number of operations used by the new mount code for this purpose:
|
||||
In the case of reconfiguration, the target superblock will be accessible
|
||||
via fc->root.
|
||||
|
||||
(*) int security_sb_get_tree(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
int security_sb_get_tree(struct fs_context *fc);
|
||||
|
||||
Called during the mount procedure to verify that the specified superblock
|
||||
is allowed to be mounted and to transfer the security data there. It
|
||||
should return 0 or a negative error code.
|
||||
|
||||
(*) void security_sb_reconfigure(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
void security_sb_reconfigure(struct fs_context *fc);
|
||||
|
||||
Called to apply any reconfiguration to an LSM's context. It must not
|
||||
fail. Error checking and resource allocation must be done in advance by
|
||||
the parameter parsing and validation hooks.
|
||||
|
||||
(*) int security_sb_mountpoint(struct fs_context *fc, struct path *mountpoint,
|
||||
unsigned int mnt_flags);
|
||||
* ::
|
||||
|
||||
int security_sb_mountpoint(struct fs_context *fc,
|
||||
struct path *mountpoint,
|
||||
unsigned int mnt_flags);
|
||||
|
||||
Called during the mount procedure to verify that the root dentry attached
|
||||
to the context is permitted to be attached to the specified mountpoint.
|
||||
It should return 0 on success or a negative error code on failure.
|
||||
|
||||
|
||||
==========================
|
||||
VFS FILESYSTEM CONTEXT API
|
||||
VFS Filesystem context API
|
||||
==========================
|
||||
|
||||
There are four operations for creating a filesystem context and one for
|
||||
destroying a context:
|
||||
|
||||
(*) struct fs_context *fs_context_for_mount(
|
||||
struct file_system_type *fs_type,
|
||||
unsigned int sb_flags);
|
||||
* ::
|
||||
|
||||
struct fs_context *fs_context_for_mount(struct file_system_type *fs_type,
|
||||
unsigned int sb_flags);
|
||||
|
||||
Allocate a filesystem context for the purpose of setting up a new mount,
|
||||
whether that be with a new superblock or sharing an existing one. This
|
||||
@ -359,7 +416,9 @@ destroying a context:
|
||||
fs_type specifies the filesystem type that will manage the context and
|
||||
sb_flags presets the superblock flags stored therein.
|
||||
|
||||
(*) struct fs_context *fs_context_for_reconfigure(
|
||||
* ::
|
||||
|
||||
struct fs_context *fs_context_for_reconfigure(
|
||||
struct dentry *dentry,
|
||||
unsigned int sb_flags,
|
||||
unsigned int sb_flags_mask);
|
||||
@ -369,7 +428,9 @@ destroying a context:
|
||||
configured. sb_flags and sb_flags_mask indicate which superblock flags
|
||||
need changing and to what.
|
||||
|
||||
(*) struct fs_context *fs_context_for_submount(
|
||||
* ::
|
||||
|
||||
struct fs_context *fs_context_for_submount(
|
||||
struct file_system_type *fs_type,
|
||||
struct dentry *reference);
|
||||
|
||||
@ -382,7 +443,9 @@ destroying a context:
|
||||
Note that it's not a requirement that the reference dentry be of the same
|
||||
filesystem type as fs_type.
|
||||
|
||||
(*) struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
|
||||
* ::
|
||||
|
||||
struct fs_context *vfs_dup_fs_context(struct fs_context *src_fc);
|
||||
|
||||
Duplicate a filesystem context, copying any options noted and duplicating
|
||||
or additionally referencing any resources held therein. This is available
|
||||
@ -392,14 +455,18 @@ destroying a context:
|
||||
|
||||
The purpose in the new context is inherited from the old one.
|
||||
|
||||
(*) void put_fs_context(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
void put_fs_context(struct fs_context *fc);
|
||||
|
||||
Destroy a filesystem context, releasing any resources it holds. This
|
||||
calls the ->free() operation. This is intended to be called by anyone who
|
||||
created a filesystem context.
|
||||
|
||||
[!] filesystem contexts are not refcounted, so this causes unconditional
|
||||
destruction.
|
||||
.. Warning::
|
||||
|
||||
filesystem contexts are not refcounted, so this causes unconditional
|
||||
destruction.
|
||||
|
||||
In all the above operations, apart from the put op, the return is a mount
|
||||
context pointer or a negative error code.
|
||||
@ -407,8 +474,10 @@ context pointer or a negative error code.
|
||||
For the remaining operations, if an error occurs, a negative error code will be
|
||||
returned.
|
||||
|
||||
(*) int vfs_parse_fs_param(struct fs_context *fc,
|
||||
struct fs_parameter *param);
|
||||
* ::
|
||||
|
||||
int vfs_parse_fs_param(struct fs_context *fc,
|
||||
struct fs_parameter *param);
|
||||
|
||||
Supply a single mount parameter to the filesystem context. This include
|
||||
the specification of the source/device which is specified as the "source"
|
||||
@ -423,53 +492,64 @@ returned.
|
||||
|
||||
The parameter value is typed and can be one of:
|
||||
|
||||
fs_value_is_flag, Parameter not given a value.
|
||||
fs_value_is_string, Value is a string
|
||||
fs_value_is_blob, Value is a binary blob
|
||||
fs_value_is_filename, Value is a filename* + dirfd
|
||||
fs_value_is_file, Value is an open file (file*)
|
||||
==================== =============================
|
||||
fs_value_is_flag Parameter not given a value
|
||||
fs_value_is_string Value is a string
|
||||
fs_value_is_blob Value is a binary blob
|
||||
fs_value_is_filename Value is a filename* + dirfd
|
||||
fs_value_is_file Value is an open file (file*)
|
||||
==================== =============================
|
||||
|
||||
If there is a value, that value is stored in a union in the struct in one
|
||||
of param->{string,blob,name,file}. Note that the function may steal and
|
||||
clear the pointer, but then becomes responsible for disposing of the
|
||||
object.
|
||||
|
||||
(*) int vfs_parse_fs_string(struct fs_context *fc, const char *key,
|
||||
const char *value, size_t v_size);
|
||||
* ::
|
||||
|
||||
int vfs_parse_fs_string(struct fs_context *fc, const char *key,
|
||||
const char *value, size_t v_size);
|
||||
|
||||
A wrapper around vfs_parse_fs_param() that copies the value string it is
|
||||
passed.
|
||||
|
||||
(*) int generic_parse_monolithic(struct fs_context *fc, void *data);
|
||||
* ::
|
||||
|
||||
int generic_parse_monolithic(struct fs_context *fc, void *data);
|
||||
|
||||
Parse a sys_mount() data page, assuming the form to be a text list
|
||||
consisting of key[=val] options separated by commas. Each item in the
|
||||
list is passed to vfs_mount_option(). This is the default when the
|
||||
->parse_monolithic() method is NULL.
|
||||
|
||||
(*) int vfs_get_tree(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
int vfs_get_tree(struct fs_context *fc);
|
||||
|
||||
Get or create the mountable root and superblock, using the parameters in
|
||||
the filesystem context to select/configure the superblock. This invokes
|
||||
the ->get_tree() method.
|
||||
|
||||
(*) struct vfsmount *vfs_create_mount(struct fs_context *fc);
|
||||
* ::
|
||||
|
||||
struct vfsmount *vfs_create_mount(struct fs_context *fc);
|
||||
|
||||
Create a mount given the parameters in the specified filesystem context.
|
||||
Note that this does not attach the mount to anything.
|
||||
|
||||
|
||||
===========================
|
||||
SUPERBLOCK CREATION HELPERS
|
||||
Superblock Creation Helpers
|
||||
===========================
|
||||
|
||||
A number of VFS helpers are available for use by filesystems for the creation
|
||||
or looking up of superblocks.
|
||||
|
||||
(*) struct super_block *
|
||||
sget_fc(struct fs_context *fc,
|
||||
int (*test)(struct super_block *sb, struct fs_context *fc),
|
||||
int (*set)(struct super_block *sb, struct fs_context *fc));
|
||||
* ::
|
||||
|
||||
struct super_block *
|
||||
sget_fc(struct fs_context *fc,
|
||||
int (*test)(struct super_block *sb, struct fs_context *fc),
|
||||
int (*set)(struct super_block *sb, struct fs_context *fc));
|
||||
|
||||
This is the core routine. If test is non-NULL, it searches for an
|
||||
existing superblock matching the criteria held in the fs_context, using
|
||||
@ -482,10 +562,12 @@ or looking up of superblocks.
|
||||
|
||||
The following helpers all wrap sget_fc():
|
||||
|
||||
(*) int vfs_get_super(struct fs_context *fc,
|
||||
enum vfs_get_super_keying keying,
|
||||
int (*fill_super)(struct super_block *sb,
|
||||
struct fs_context *fc))
|
||||
* ::
|
||||
|
||||
int vfs_get_super(struct fs_context *fc,
|
||||
enum vfs_get_super_keying keying,
|
||||
int (*fill_super)(struct super_block *sb,
|
||||
struct fs_context *fc))
|
||||
|
||||
This creates/looks up a deviceless superblock. The keying indicates how
|
||||
many superblocks of this type may exist and in what manner they may be
|
||||
@ -515,14 +597,14 @@ PARAMETER DESCRIPTION
|
||||
=====================
|
||||
|
||||
Parameters are described using structures defined in linux/fs_parser.h.
|
||||
There's a core description struct that links everything together:
|
||||
There's a core description struct that links everything together::
|
||||
|
||||
struct fs_parameter_description {
|
||||
const struct fs_parameter_spec *specs;
|
||||
const struct fs_parameter_enum *enums;
|
||||
};
|
||||
|
||||
For example:
|
||||
For example::
|
||||
|
||||
enum {
|
||||
Opt_autocell,
|
||||
@ -539,10 +621,12 @@ For example:
|
||||
|
||||
The members are as follows:
|
||||
|
||||
(1) const struct fs_parameter_specification *specs;
|
||||
(1) ::
|
||||
|
||||
const struct fs_parameter_specification *specs;
|
||||
|
||||
Table of parameter specifications, terminated with a null entry, where the
|
||||
entries are of type:
|
||||
entries are of type::
|
||||
|
||||
struct fs_parameter_spec {
|
||||
const char *name;
|
||||
@ -558,6 +642,7 @@ The members are as follows:
|
||||
|
||||
The 'type' field indicates the desired value type and must be one of:
|
||||
|
||||
======================= ======================= =====================
|
||||
TYPE NAME EXPECTED VALUE RESULT IN
|
||||
======================= ======================= =====================
|
||||
fs_param_is_flag No value n/a
|
||||
@ -573,19 +658,23 @@ The members are as follows:
|
||||
fs_param_is_blockdev Blockdev path * Needs lookup
|
||||
fs_param_is_path Path * Needs lookup
|
||||
fs_param_is_fd File descriptor result->int_32
|
||||
======================= ======================= =====================
|
||||
|
||||
Note that if the value is of fs_param_is_bool type, fs_parse() will try
|
||||
to match any string value against "0", "1", "no", "yes", "false", "true".
|
||||
|
||||
Each parameter can also be qualified with 'flags':
|
||||
|
||||
======================= ================================================
|
||||
fs_param_v_optional The value is optional
|
||||
fs_param_neg_with_no result->negated set if key is prefixed with "no"
|
||||
fs_param_neg_with_empty result->negated set if value is ""
|
||||
fs_param_deprecated The parameter is deprecated.
|
||||
======================= ================================================
|
||||
|
||||
These are wrapped with a number of convenience wrappers:
|
||||
|
||||
======================= ===============================================
|
||||
MACRO SPECIFIES
|
||||
======================= ===============================================
|
||||
fsparam_flag() fs_param_is_flag
|
||||
@ -602,9 +691,10 @@ The members are as follows:
|
||||
fsparam_bdev() fs_param_is_blockdev
|
||||
fsparam_path() fs_param_is_path
|
||||
fsparam_fd() fs_param_is_fd
|
||||
======================= ===============================================
|
||||
|
||||
all of which take two arguments, name string and option number - for
|
||||
example:
|
||||
example::
|
||||
|
||||
static const struct fs_parameter_spec afs_param_specs[] = {
|
||||
fsparam_flag ("autocell", Opt_autocell),
|
||||
@ -618,10 +708,12 @@ The members are as follows:
|
||||
of arguments to specify the type and the flags for anything that doesn't
|
||||
match one of the above macros.
|
||||
|
||||
(2) const struct fs_parameter_enum *enums;
|
||||
(2) ::
|
||||
|
||||
const struct fs_parameter_enum *enums;
|
||||
|
||||
Table of enum value names to integer mappings, terminated with a null
|
||||
entry. This is of type:
|
||||
entry. This is of type::
|
||||
|
||||
struct fs_parameter_enum {
|
||||
u8 opt;
|
||||
@ -630,7 +722,7 @@ The members are as follows:
|
||||
};
|
||||
|
||||
Where the array is an unsorted list of { parameter ID, name }-keyed
|
||||
elements that indicate the value to map to, e.g.:
|
||||
elements that indicate the value to map to, e.g.::
|
||||
|
||||
static const struct fs_parameter_enum afs_param_enums[] = {
|
||||
{ Opt_bar, "x", 1},
|
||||
@ -648,18 +740,19 @@ CONFIG_VALIDATE_FS_PARSER=y) and will allow the description to be queried from
|
||||
userspace using the fsinfo() syscall.
|
||||
|
||||
|
||||
==========================
|
||||
PARAMETER HELPER FUNCTIONS
|
||||
Parameter Helper Functions
|
||||
==========================
|
||||
|
||||
A number of helper functions are provided to help a filesystem or an LSM
|
||||
process the parameters it is given.
|
||||
|
||||
(*) int lookup_constant(const struct constant_table tbl[],
|
||||
const char *name, int not_found);
|
||||
* ::
|
||||
|
||||
int lookup_constant(const struct constant_table tbl[],
|
||||
const char *name, int not_found);
|
||||
|
||||
Look up a constant by name in a table of name -> integer mappings. The
|
||||
table is an array of elements of the following type:
|
||||
table is an array of elements of the following type::
|
||||
|
||||
struct constant_table {
|
||||
const char *name;
|
||||
@ -669,9 +762,11 @@ process the parameters it is given.
|
||||
If a match is found, the corresponding value is returned. If a match
|
||||
isn't found, the not_found value is returned instead.
|
||||
|
||||
(*) bool validate_constant_table(const struct constant_table *tbl,
|
||||
size_t tbl_size,
|
||||
int low, int high, int special);
|
||||
* ::
|
||||
|
||||
bool validate_constant_table(const struct constant_table *tbl,
|
||||
size_t tbl_size,
|
||||
int low, int high, int special);
|
||||
|
||||
Validate a constant table. Checks that all the elements are appropriately
|
||||
ordered, that there are no duplicates and that the values are between low
|
||||
@ -682,16 +777,20 @@ process the parameters it is given.
|
||||
If all is good, true is returned. If the table is invalid, errors are
|
||||
logged to dmesg and false is returned.
|
||||
|
||||
(*) bool fs_validate_description(const struct fs_parameter_description *desc);
|
||||
* ::
|
||||
|
||||
bool fs_validate_description(const struct fs_parameter_description *desc);
|
||||
|
||||
This performs some validation checks on a parameter description. It
|
||||
returns true if the description is good and false if it is not. It will
|
||||
log errors to dmesg if validation fails.
|
||||
|
||||
(*) int fs_parse(struct fs_context *fc,
|
||||
const struct fs_parameter_description *desc,
|
||||
struct fs_parameter *param,
|
||||
struct fs_parse_result *result);
|
||||
* ::
|
||||
|
||||
int fs_parse(struct fs_context *fc,
|
||||
const struct fs_parameter_description *desc,
|
||||
struct fs_parameter *param,
|
||||
struct fs_parse_result *result);
|
||||
|
||||
This is the main interpreter of parameters. It uses the parameter
|
||||
description to look up a parameter by key name and to convert that to an
|
||||
@ -711,14 +810,16 @@ process the parameters it is given.
|
||||
parameter is matched, but the value is erroneous, -EINVAL will be
|
||||
returned; otherwise the parameter's option number will be returned.
|
||||
|
||||
(*) int fs_lookup_param(struct fs_context *fc,
|
||||
struct fs_parameter *value,
|
||||
bool want_bdev,
|
||||
struct path *_path);
|
||||
* ::
|
||||
|
||||
int fs_lookup_param(struct fs_context *fc,
|
||||
struct fs_parameter *value,
|
||||
bool want_bdev,
|
||||
struct path *_path);
|
||||
|
||||
This takes a parameter that carries a string or filename type and attempts
|
||||
to do a path lookup on it. If the parameter expects a blockdev, a check
|
||||
is made that the inode actually represents one.
|
||||
|
||||
Returns 0 if successful and *_path will be set; returns a negative error
|
||||
code if not.
|
||||
Returns 0 if successful and ``*_path`` will be set; returns a negative
|
||||
error code if not.
|
@ -1870,7 +1870,7 @@ unbindable mount is unbindable
|
||||
|
||||
For more information on mount propagation see:
|
||||
|
||||
Documentation/filesystems/sharedsubtree.txt
|
||||
Documentation/filesystems/sharedsubtree.rst
|
||||
|
||||
|
||||
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
|
||||
|
@ -1,4 +1,6 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
===============
|
||||
Quota subsystem
|
||||
===============
|
||||
|
||||
@ -39,6 +41,7 @@ Currently, the interface supports only one message type QUOTA_NL_C_WARNING.
|
||||
This command is used to send a notification about any of the above mentioned
|
||||
events. Each message has six attributes. These are (type of the argument is
|
||||
in parentheses):
|
||||
|
||||
QUOTA_NL_A_QTYPE (u32)
|
||||
- type of quota being exceeded (one of USRQUOTA, GRPQUOTA)
|
||||
QUOTA_NL_A_EXCESS_ID (u64)
|
||||
@ -48,20 +51,34 @@ in parentheses):
|
||||
- UID of a user who caused the event
|
||||
QUOTA_NL_A_WARNING (u32)
|
||||
- what kind of limit is exceeded:
|
||||
QUOTA_NL_IHARDWARN - inode hardlimit
|
||||
QUOTA_NL_ISOFTLONGWARN - inode softlimit is exceeded longer
|
||||
than given grace period
|
||||
QUOTA_NL_ISOFTWARN - inode softlimit
|
||||
QUOTA_NL_BHARDWARN - space (block) hardlimit
|
||||
QUOTA_NL_BSOFTLONGWARN - space (block) softlimit is exceeded
|
||||
longer than given grace period.
|
||||
QUOTA_NL_BSOFTWARN - space (block) softlimit
|
||||
|
||||
QUOTA_NL_IHARDWARN
|
||||
inode hardlimit
|
||||
QUOTA_NL_ISOFTLONGWARN
|
||||
inode softlimit is exceeded longer
|
||||
than given grace period
|
||||
QUOTA_NL_ISOFTWARN
|
||||
inode softlimit
|
||||
QUOTA_NL_BHARDWARN
|
||||
space (block) hardlimit
|
||||
QUOTA_NL_BSOFTLONGWARN
|
||||
space (block) softlimit is exceeded
|
||||
longer than given grace period.
|
||||
QUOTA_NL_BSOFTWARN
|
||||
space (block) softlimit
|
||||
|
||||
- four warnings are also defined for the event when user stops
|
||||
exceeding some limit:
|
||||
QUOTA_NL_IHARDBELOW - inode hardlimit
|
||||
QUOTA_NL_ISOFTBELOW - inode softlimit
|
||||
QUOTA_NL_BHARDBELOW - space (block) hardlimit
|
||||
QUOTA_NL_BSOFTBELOW - space (block) softlimit
|
||||
|
||||
QUOTA_NL_IHARDBELOW
|
||||
inode hardlimit
|
||||
QUOTA_NL_ISOFTBELOW
|
||||
inode softlimit
|
||||
QUOTA_NL_BHARDBELOW
|
||||
space (block) hardlimit
|
||||
QUOTA_NL_BSOFTBELOW
|
||||
space (block) softlimit
|
||||
|
||||
QUOTA_NL_A_DEV_MAJOR (u32)
|
||||
- major number of a device with the affected filesystem
|
||||
QUOTA_NL_A_DEV_MINOR (u32)
|
@ -1,6 +1,11 @@
|
||||
The seq_file interface
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
======================
|
||||
The seq_file Interface
|
||||
======================
|
||||
|
||||
Copyright 2003 Jonathan Corbet <corbet@lwn.net>
|
||||
|
||||
This file is originally from the LWN.net Driver Porting series at
|
||||
http://lwn.net/Articles/driver-porting/
|
||||
|
||||
@ -43,7 +48,7 @@ loadable module which creates a file called /proc/sequence. The file, when
|
||||
read, simply produces a set of increasing integer values, one per line. The
|
||||
sequence will continue until the user loses patience and finds something
|
||||
better to do. The file is seekable, in that one can do something like the
|
||||
following:
|
||||
following::
|
||||
|
||||
dd if=/proc/sequence of=out1 count=1
|
||||
dd if=/proc/sequence skip=1 of=out2 count=1
|
||||
@ -55,16 +60,18 @@ wanting to see the full source for this module can find it at
|
||||
http://lwn.net/Articles/22359/).
|
||||
|
||||
Deprecated create_proc_entry
|
||||
============================
|
||||
|
||||
Note that the above article uses create_proc_entry which was removed in
|
||||
kernel 3.10. Current versions require the following update
|
||||
kernel 3.10. Current versions require the following update::
|
||||
|
||||
- entry = create_proc_entry("sequence", 0, NULL);
|
||||
- if (entry)
|
||||
- entry->proc_fops = &ct_file_ops;
|
||||
+ entry = proc_create("sequence", 0, NULL, &ct_file_ops);
|
||||
- entry = create_proc_entry("sequence", 0, NULL);
|
||||
- if (entry)
|
||||
- entry->proc_fops = &ct_file_ops;
|
||||
+ entry = proc_create("sequence", 0, NULL, &ct_file_ops);
|
||||
|
||||
The iterator interface
|
||||
======================
|
||||
|
||||
Modules implementing a virtual file with seq_file must implement an
|
||||
iterator object that allows stepping through the data of interest
|
||||
@ -99,7 +106,7 @@ position. The pos passed to start() will always be either zero, or
|
||||
the most recent pos used in the previous session.
|
||||
|
||||
For our simple sequence example,
|
||||
the start() function looks like:
|
||||
the start() function looks like::
|
||||
|
||||
static void *ct_seq_start(struct seq_file *s, loff_t *pos)
|
||||
{
|
||||
@ -129,7 +136,7 @@ move the iterator forward to the next position in the sequence. The
|
||||
example module can simply increment the position by one; more useful
|
||||
modules will do what is needed to step through some data structure. The
|
||||
next() function returns a new iterator, or NULL if the sequence is
|
||||
complete. Here's the example version:
|
||||
complete. Here's the example version::
|
||||
|
||||
static void *ct_seq_next(struct seq_file *s, void *v, loff_t *pos)
|
||||
{
|
||||
@ -141,10 +148,10 @@ complete. Here's the example version:
|
||||
The stop() function closes a session; its job, of course, is to clean
|
||||
up. If dynamic memory is allocated for the iterator, stop() is the
|
||||
place to free it; if a lock was taken by start(), stop() must release
|
||||
that lock. The value that *pos was set to by the last next() call
|
||||
that lock. The value that ``*pos`` was set to by the last next() call
|
||||
before stop() is remembered, and used for the first start() call of
|
||||
the next session unless lseek() has been called on the file; in that
|
||||
case next start() will be asked to start at position zero.
|
||||
case next start() will be asked to start at position zero::
|
||||
|
||||
static void ct_seq_stop(struct seq_file *s, void *v)
|
||||
{
|
||||
@ -152,7 +159,7 @@ case next start() will be asked to start at position zero.
|
||||
}
|
||||
|
||||
Finally, the show() function should format the object currently pointed to
|
||||
by the iterator for output. The example module's show() function is:
|
||||
by the iterator for output. The example module's show() function is::
|
||||
|
||||
static int ct_seq_show(struct seq_file *s, void *v)
|
||||
{
|
||||
@ -169,7 +176,7 @@ generated output before returning SEQ_SKIP, that output will be dropped.
|
||||
|
||||
We will look at seq_printf() in a moment. But first, the definition of the
|
||||
seq_file iterator is finished by creating a seq_operations structure with
|
||||
the four functions we have just defined:
|
||||
the four functions we have just defined::
|
||||
|
||||
static const struct seq_operations ct_seq_ops = {
|
||||
.start = ct_seq_start,
|
||||
@ -194,6 +201,7 @@ other locks while the iterator is active.
|
||||
|
||||
|
||||
Formatted output
|
||||
================
|
||||
|
||||
The seq_file code manages positioning within the output created by the
|
||||
iterator and getting it into the user's buffer. But, for that to work, that
|
||||
@ -203,7 +211,7 @@ been defined which make this task easy.
|
||||
Most code will simply use seq_printf(), which works pretty much like
|
||||
printk(), but which requires the seq_file pointer as an argument.
|
||||
|
||||
For straight character output, the following functions may be used:
|
||||
For straight character output, the following functions may be used::
|
||||
|
||||
seq_putc(struct seq_file *m, char c);
|
||||
seq_puts(struct seq_file *m, const char *s);
|
||||
@ -213,7 +221,7 @@ The first two output a single character and a string, just like one would
|
||||
expect. seq_escape() is like seq_puts(), except that any character in s
|
||||
which is in the string esc will be represented in octal form in the output.
|
||||
|
||||
There are also a pair of functions for printing filenames:
|
||||
There are also a pair of functions for printing filenames::
|
||||
|
||||
int seq_path(struct seq_file *m, const struct path *path,
|
||||
const char *esc);
|
||||
@ -226,8 +234,10 @@ the path relative to the current process's filesystem root. If a different
|
||||
root is desired, it can be used with seq_path_root(). If it turns out that
|
||||
path cannot be reached from root, seq_path_root() returns SEQ_SKIP.
|
||||
|
||||
A function producing complicated output may want to check
|
||||
A function producing complicated output may want to check::
|
||||
|
||||
bool seq_has_overflowed(struct seq_file *m);
|
||||
|
||||
and avoid further seq_<output> calls if true is returned.
|
||||
|
||||
A true return from seq_has_overflowed means that the seq_file buffer will
|
||||
@ -236,6 +246,7 @@ buffer and retry printing.
|
||||
|
||||
|
||||
Making it all work
|
||||
==================
|
||||
|
||||
So far, we have a nice set of functions which can produce output within the
|
||||
seq_file system, but we have not yet turned them into a file that a user
|
||||
@ -244,7 +255,7 @@ creation of a set of file_operations which implement the operations on that
|
||||
file. The seq_file interface provides a set of canned operations which do
|
||||
most of the work. The virtual file author still must implement the open()
|
||||
method, however, to hook everything up. The open function is often a single
|
||||
line, as in the example module:
|
||||
line, as in the example module::
|
||||
|
||||
static int ct_open(struct inode *inode, struct file *file)
|
||||
{
|
||||
@ -263,7 +274,7 @@ by the iterator functions.
|
||||
There is also a wrapper function to seq_open() called seq_open_private(). It
|
||||
kmallocs a zero filled block of memory and stores a pointer to it in the
|
||||
private field of the seq_file structure, returning 0 on success. The
|
||||
block size is specified in a third parameter to the function, e.g.:
|
||||
block size is specified in a third parameter to the function, e.g.::
|
||||
|
||||
static int ct_open(struct inode *inode, struct file *file)
|
||||
{
|
||||
@ -273,7 +284,7 @@ block size is specified in a third parameter to the function, e.g.:
|
||||
|
||||
There is also a variant function, __seq_open_private(), which is functionally
|
||||
identical except that, if successful, it returns the pointer to the allocated
|
||||
memory block, allowing further initialisation e.g.:
|
||||
memory block, allowing further initialisation e.g.::
|
||||
|
||||
static int ct_open(struct inode *inode, struct file *file)
|
||||
{
|
||||
@ -295,7 +306,7 @@ frees the memory allocated in the corresponding open.
|
||||
|
||||
The other operations of interest - read(), llseek(), and release() - are
|
||||
all implemented by the seq_file code itself. So a virtual file's
|
||||
file_operations structure will look like:
|
||||
file_operations structure will look like::
|
||||
|
||||
static const struct file_operations ct_file_ops = {
|
||||
.owner = THIS_MODULE,
|
||||
@ -309,7 +320,7 @@ There is also a seq_release_private() which passes the contents of the
|
||||
seq_file private field to kfree() before releasing the structure.
|
||||
|
||||
The final step is the creation of the /proc file itself. In the example
|
||||
code, that is done in the initialization code in the usual way:
|
||||
code, that is done in the initialization code in the usual way::
|
||||
|
||||
static int ct_init(void)
|
||||
{
|
||||
@ -325,9 +336,10 @@ And that is pretty much it.
|
||||
|
||||
|
||||
seq_list
|
||||
========
|
||||
|
||||
If your file will be iterating through a linked list, you may find these
|
||||
routines useful:
|
||||
routines useful::
|
||||
|
||||
struct list_head *seq_list_start(struct list_head *head,
|
||||
loff_t pos);
|
||||
@ -338,15 +350,16 @@ routines useful:
|
||||
|
||||
These helpers will interpret pos as a position within the list and iterate
|
||||
accordingly. Your start() and next() functions need only invoke the
|
||||
seq_list_* helpers with a pointer to the appropriate list_head structure.
|
||||
``seq_list_*`` helpers with a pointer to the appropriate list_head structure.
|
||||
|
||||
|
||||
The extra-simple version
|
||||
========================
|
||||
|
||||
For extremely simple virtual files, there is an even easier interface. A
|
||||
module can define only the show() function, which should create all the
|
||||
output that the virtual file will contain. The file's open() method then
|
||||
calls:
|
||||
calls::
|
||||
|
||||
int single_open(struct file *file,
|
||||
int (*show)(struct seq_file *m, void *p),
|
@ -1,7 +1,10 @@
|
||||
Shared Subtrees
|
||||
---------------
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
Contents:
|
||||
===============
|
||||
Shared Subtrees
|
||||
===============
|
||||
|
||||
.. Contents:
|
||||
1) Overview
|
||||
2) Features
|
||||
3) Setting mount states
|
||||
@ -41,31 +44,38 @@ replicas continue to be exactly same.
|
||||
|
||||
Here is an example:
|
||||
|
||||
Let's say /mnt has a mount that is shared.
|
||||
mount --make-shared /mnt
|
||||
Let's say /mnt has a mount that is shared::
|
||||
|
||||
mount --make-shared /mnt
|
||||
|
||||
Note: mount(8) command now supports the --make-shared flag,
|
||||
so the sample 'smount' program is no longer needed and has been
|
||||
removed.
|
||||
|
||||
# mount --bind /mnt /tmp
|
||||
::
|
||||
|
||||
# mount --bind /mnt /tmp
|
||||
|
||||
The above command replicates the mount at /mnt to the mountpoint /tmp
|
||||
and the contents of both the mounts remain identical.
|
||||
|
||||
#ls /mnt
|
||||
a b c
|
||||
::
|
||||
|
||||
#ls /tmp
|
||||
a b c
|
||||
#ls /mnt
|
||||
a b c
|
||||
|
||||
Now let's say we mount a device at /tmp/a
|
||||
# mount /dev/sd0 /tmp/a
|
||||
#ls /tmp
|
||||
a b c
|
||||
|
||||
#ls /tmp/a
|
||||
t1 t2 t3
|
||||
Now let's say we mount a device at /tmp/a::
|
||||
|
||||
#ls /mnt/a
|
||||
t1 t2 t3
|
||||
# mount /dev/sd0 /tmp/a
|
||||
|
||||
#ls /tmp/a
|
||||
t1 t2 t3
|
||||
|
||||
#ls /mnt/a
|
||||
t1 t2 t3
|
||||
|
||||
Note that the mount has propagated to the mount at /mnt as well.
|
||||
|
||||
@ -123,14 +133,15 @@ replicas continue to be exactly same.
|
||||
|
||||
2d) A unbindable mount is a unbindable private mount
|
||||
|
||||
let's say we have a mount at /mnt and we make it unbindable
|
||||
let's say we have a mount at /mnt and we make it unbindable::
|
||||
|
||||
# mount --make-unbindable /mnt
|
||||
# mount --make-unbindable /mnt
|
||||
|
||||
Let's try to bind mount this mount somewhere else.
|
||||
# mount --bind /mnt /tmp
|
||||
mount: wrong fs type, bad option, bad superblock on /mnt,
|
||||
or too many mounted file systems
|
||||
Let's try to bind mount this mount somewhere else::
|
||||
|
||||
# mount --bind /mnt /tmp
|
||||
mount: wrong fs type, bad option, bad superblock on /mnt,
|
||||
or too many mounted file systems
|
||||
|
||||
Binding a unbindable mount is a invalid operation.
|
||||
|
||||
@ -138,12 +149,12 @@ replicas continue to be exactly same.
|
||||
3) Setting mount states
|
||||
|
||||
The mount command (util-linux package) can be used to set mount
|
||||
states:
|
||||
states::
|
||||
|
||||
mount --make-shared mountpoint
|
||||
mount --make-slave mountpoint
|
||||
mount --make-private mountpoint
|
||||
mount --make-unbindable mountpoint
|
||||
mount --make-shared mountpoint
|
||||
mount --make-slave mountpoint
|
||||
mount --make-private mountpoint
|
||||
mount --make-unbindable mountpoint
|
||||
|
||||
|
||||
4) Use cases
|
||||
@ -154,9 +165,10 @@ replicas continue to be exactly same.
|
||||
|
||||
Solution:
|
||||
|
||||
The system administrator can make the mount at /cdrom shared
|
||||
mount --bind /cdrom /cdrom
|
||||
mount --make-shared /cdrom
|
||||
The system administrator can make the mount at /cdrom shared::
|
||||
|
||||
mount --bind /cdrom /cdrom
|
||||
mount --make-shared /cdrom
|
||||
|
||||
Now any process that clones off a new namespace will have a
|
||||
mount at /cdrom which is a replica of the same mount in the
|
||||
@ -172,14 +184,14 @@ replicas continue to be exactly same.
|
||||
Solution:
|
||||
|
||||
To begin with, the administrator can mark the entire mount tree
|
||||
as shareable.
|
||||
as shareable::
|
||||
|
||||
mount --make-rshared /
|
||||
mount --make-rshared /
|
||||
|
||||
A new process can clone off a new namespace. And mark some part
|
||||
of its namespace as slave
|
||||
of its namespace as slave::
|
||||
|
||||
mount --make-rslave /myprivatetree
|
||||
mount --make-rslave /myprivatetree
|
||||
|
||||
Hence forth any mounts within the /myprivatetree done by the
|
||||
process will not show up in any other namespace. However mounts
|
||||
@ -206,13 +218,13 @@ replicas continue to be exactly same.
|
||||
versions of the file depending on the path used to access that
|
||||
file.
|
||||
|
||||
An example is:
|
||||
An example is::
|
||||
|
||||
mount --make-shared /
|
||||
mount --rbind / /view/v1
|
||||
mount --rbind / /view/v2
|
||||
mount --rbind / /view/v3
|
||||
mount --rbind / /view/v4
|
||||
mount --make-shared /
|
||||
mount --rbind / /view/v1
|
||||
mount --rbind / /view/v2
|
||||
mount --rbind / /view/v3
|
||||
mount --rbind / /view/v4
|
||||
|
||||
and if /usr has a versioning filesystem mounted, then that
|
||||
mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and
|
||||
@ -224,8 +236,8 @@ replicas continue to be exactly same.
|
||||
filesystem is being requested and return the corresponding
|
||||
inode.
|
||||
|
||||
5) Detailed semantics:
|
||||
-------------------
|
||||
5) Detailed semantics
|
||||
---------------------
|
||||
The section below explains the detailed semantics of
|
||||
bind, rbind, move, mount, umount and clone-namespace operations.
|
||||
|
||||
@ -235,6 +247,7 @@ replicas continue to be exactly same.
|
||||
5a) Mount states
|
||||
|
||||
A given mount can be in one of the following states
|
||||
|
||||
1) shared
|
||||
2) slave
|
||||
3) shared and slave
|
||||
@ -252,7 +265,8 @@ replicas continue to be exactly same.
|
||||
A 'shared mount' is defined as a vfsmount that belongs to a
|
||||
'peer group'.
|
||||
|
||||
For example:
|
||||
For example::
|
||||
|
||||
mount --make-shared /mnt
|
||||
mount --bind /mnt /tmp
|
||||
|
||||
@ -270,7 +284,7 @@ replicas continue to be exactly same.
|
||||
A slave mount as the name implies has a master mount from which
|
||||
mount/unmount events are received. Events do not propagate from
|
||||
the slave mount to the master. Only a shared mount can be made
|
||||
a slave by executing the following command
|
||||
a slave by executing the following command::
|
||||
|
||||
mount --make-slave mount
|
||||
|
||||
@ -290,8 +304,10 @@ replicas continue to be exactly same.
|
||||
peer group.
|
||||
|
||||
Only a slave vfsmount can be made as 'shared and slave' by
|
||||
either executing the following command
|
||||
either executing the following command::
|
||||
|
||||
mount --make-shared mount
|
||||
|
||||
or by moving the slave vfsmount under a shared vfsmount.
|
||||
|
||||
(4) Private mount
|
||||
@ -307,30 +323,32 @@ replicas continue to be exactly same.
|
||||
|
||||
|
||||
State diagram:
|
||||
|
||||
The state diagram below explains the state transition of a mount,
|
||||
in response to various commands.
|
||||
------------------------------------------------------------------------
|
||||
| |make-shared | make-slave | make-private |make-unbindab|
|
||||
--------------|------------|--------------|--------------|-------------|
|
||||
|shared |shared |*slave/private| private | unbindable |
|
||||
| | | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|slave |shared | **slave | private | unbindable |
|
||||
| |and slave | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|shared |shared | slave | private | unbindable |
|
||||
|and slave |and slave | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|private |shared | **private | private | unbindable |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|unbindable |shared |**unbindable | private | unbindable |
|
||||
------------------------------------------------------------------------
|
||||
in response to various commands::
|
||||
|
||||
* if the shared mount is the only mount in its peer group, making it
|
||||
slave, makes it private automatically. Note that there is no master to
|
||||
which it can be slaved to.
|
||||
-----------------------------------------------------------------------
|
||||
| |make-shared | make-slave | make-private |make-unbindab|
|
||||
--------------|------------|--------------|--------------|-------------|
|
||||
|shared |shared |*slave/private| private | unbindable |
|
||||
| | | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|slave |shared | **slave | private | unbindable |
|
||||
| |and slave | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|shared |shared | slave | private | unbindable |
|
||||
|and slave |and slave | | | |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|private |shared | **private | private | unbindable |
|
||||
|-------------|------------|--------------|--------------|-------------|
|
||||
|unbindable |shared |**unbindable | private | unbindable |
|
||||
------------------------------------------------------------------------
|
||||
|
||||
** slaving a non-shared mount has no effect on the mount.
|
||||
* if the shared mount is the only mount in its peer group, making it
|
||||
slave, makes it private automatically. Note that there is no master to
|
||||
which it can be slaved to.
|
||||
|
||||
** slaving a non-shared mount has no effect on the mount.
|
||||
|
||||
Apart from the commands listed below, the 'move' operation also changes
|
||||
the state of a mount depending on type of the destination mount. Its
|
||||
@ -338,31 +356,32 @@ replicas continue to be exactly same.
|
||||
|
||||
5b) Bind semantics
|
||||
|
||||
Consider the following command
|
||||
Consider the following command::
|
||||
|
||||
mount --bind A/a B/b
|
||||
mount --bind A/a B/b
|
||||
|
||||
where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B'
|
||||
is the destination mount and 'b' is the dentry in the destination mount.
|
||||
|
||||
The outcome depends on the type of mount of 'A' and 'B'. The table
|
||||
below contains quick reference.
|
||||
---------------------------------------------------------------------------
|
||||
| BIND MOUNT OPERATION |
|
||||
|**************************************************************************
|
||||
|source(A)->| shared | private | slave | unbindable |
|
||||
| dest(B) | | | | |
|
||||
| | | | | | |
|
||||
| v | | | | |
|
||||
|**************************************************************************
|
||||
| shared | shared | shared | shared & slave | invalid |
|
||||
| | | | | |
|
||||
|non-shared| shared | private | slave | invalid |
|
||||
***************************************************************************
|
||||
below contains quick reference::
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
| BIND MOUNT OPERATION |
|
||||
|************************************************************************|
|
||||
|source(A)->| shared | private | slave | unbindable |
|
||||
| dest(B) | | | | |
|
||||
| | | | | | |
|
||||
| v | | | | |
|
||||
|************************************************************************|
|
||||
| shared | shared | shared | shared & slave | invalid |
|
||||
| | | | | |
|
||||
|non-shared| shared | private | slave | invalid |
|
||||
**************************************************************************
|
||||
|
||||
Details:
|
||||
|
||||
1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
|
||||
1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C'
|
||||
which is clone of 'A', is created. Its root dentry is 'a' . 'C' is
|
||||
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
||||
are created and mounted at the dentry 'b' on all mounts where 'B'
|
||||
@ -371,7 +390,7 @@ replicas continue to be exactly same.
|
||||
'B'. And finally the peer-group of 'C' is merged with the peer group
|
||||
of 'A'.
|
||||
|
||||
2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
|
||||
2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C'
|
||||
which is clone of 'A', is created. Its root dentry is 'a'. 'C' is
|
||||
mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ...
|
||||
are created and mounted at the dentry 'b' on all mounts where 'B'
|
||||
@ -379,7 +398,7 @@ replicas continue to be exactly same.
|
||||
'C', 'C1', .., 'Cn' with exactly the same configuration as the
|
||||
propagation tree for 'B'.
|
||||
|
||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
|
||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new
|
||||
mount 'C' which is clone of 'A', is created. Its root dentry is 'a' .
|
||||
'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2',
|
||||
'C3' ... are created and mounted at the dentry 'b' on all mounts where
|
||||
@ -389,19 +408,19 @@ replicas continue to be exactly same.
|
||||
is made the slave of mount 'Z'. In other words, mount 'C' is in the
|
||||
state 'slave and shared'.
|
||||
|
||||
4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
|
||||
4. 'A' is a unbindable mount and 'B' is a shared mount. This is a
|
||||
invalid operation.
|
||||
|
||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||
unbindable) mount. A new mount 'C' which is clone of 'A', is created.
|
||||
Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'.
|
||||
|
||||
6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
|
||||
6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C'
|
||||
which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is
|
||||
mounted on mount 'B' at dentry 'b'. 'C' is made a member of the
|
||||
peer-group of 'A'.
|
||||
|
||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
|
||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A
|
||||
new mount 'C' which is a clone of 'A' is created. Its root dentry is
|
||||
'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a
|
||||
slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of
|
||||
@ -409,7 +428,7 @@ replicas continue to be exactly same.
|
||||
mount/unmount on 'A' do not propagate anywhere else. Similarly
|
||||
mount/unmount on 'C' do not propagate anywhere else.
|
||||
|
||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
|
||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a
|
||||
invalid operation. A unbindable mount cannot be bind mounted.
|
||||
|
||||
5c) Rbind semantics
|
||||
@ -422,7 +441,9 @@ replicas continue to be exactly same.
|
||||
then the subtree under the unbindable mount is pruned in the new
|
||||
location.
|
||||
|
||||
eg: let's say we have the following mount tree.
|
||||
eg:
|
||||
|
||||
let's say we have the following mount tree::
|
||||
|
||||
A
|
||||
/ \
|
||||
@ -430,12 +451,12 @@ replicas continue to be exactly same.
|
||||
/ \ / \
|
||||
D E F G
|
||||
|
||||
Let's say all the mount except the mount C in the tree are
|
||||
of a type other than unbindable.
|
||||
Let's say all the mount except the mount C in the tree are
|
||||
of a type other than unbindable.
|
||||
|
||||
If this tree is rbound to say Z
|
||||
If this tree is rbound to say Z
|
||||
|
||||
We will have the following tree at the new location.
|
||||
We will have the following tree at the new location::
|
||||
|
||||
Z
|
||||
|
|
||||
@ -457,24 +478,26 @@ replicas continue to be exactly same.
|
||||
the dentry in the destination mount.
|
||||
|
||||
The outcome depends on the type of the mount of 'A' and 'B'. The table
|
||||
below is a quick reference.
|
||||
---------------------------------------------------------------------------
|
||||
| MOVE MOUNT OPERATION |
|
||||
|**************************************************************************
|
||||
| source(A)->| shared | private | slave | unbindable |
|
||||
| dest(B) | | | | |
|
||||
| | | | | | |
|
||||
| v | | | | |
|
||||
|**************************************************************************
|
||||
| shared | shared | shared |shared and slave| invalid |
|
||||
| | | | | |
|
||||
|non-shared| shared | private | slave | unbindable |
|
||||
***************************************************************************
|
||||
NOTE: moving a mount residing under a shared mount is invalid.
|
||||
below is a quick reference::
|
||||
|
||||
---------------------------------------------------------------------------
|
||||
| MOVE MOUNT OPERATION |
|
||||
|**************************************************************************
|
||||
| source(A)->| shared | private | slave | unbindable |
|
||||
| dest(B) | | | | |
|
||||
| | | | | | |
|
||||
| v | | | | |
|
||||
|**************************************************************************
|
||||
| shared | shared | shared |shared and slave| invalid |
|
||||
| | | | | |
|
||||
|non-shared| shared | private | slave | unbindable |
|
||||
***************************************************************************
|
||||
|
||||
.. Note:: moving a mount residing under a shared mount is invalid.
|
||||
|
||||
Details follow:
|
||||
|
||||
1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
|
||||
1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is
|
||||
mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An'
|
||||
are created and mounted at dentry 'b' on all mounts that receive
|
||||
propagation from mount 'B'. A new propagation tree is created in the
|
||||
@ -483,7 +506,7 @@ replicas continue to be exactly same.
|
||||
propagation tree is appended to the already existing propagation tree
|
||||
of 'A'.
|
||||
|
||||
2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
|
||||
2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is
|
||||
mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An'
|
||||
are created and mounted at dentry 'b' on all mounts that receive
|
||||
propagation from mount 'B'. The mount 'A' becomes a shared mount and a
|
||||
@ -491,7 +514,7 @@ replicas continue to be exactly same.
|
||||
'B'. This new propagation tree contains all the new mounts 'A1',
|
||||
'A2'... 'An'.
|
||||
|
||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
|
||||
3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The
|
||||
mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1',
|
||||
'A2'... 'An' are created and mounted at dentry 'b' on all mounts that
|
||||
receive propagation from mount 'B'. A new propagation tree is created
|
||||
@ -501,32 +524,32 @@ replicas continue to be exactly same.
|
||||
'A'. Mount 'A' continues to be the slave mount of 'Z' but it also
|
||||
becomes 'shared'.
|
||||
|
||||
4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
|
||||
4. 'A' is a unbindable mount and 'B' is a shared mount. The operation
|
||||
is invalid. Because mounting anything on the shared mount 'B' can
|
||||
create new mounts that get mounted on the mounts that receive
|
||||
propagation from 'B'. And since the mount 'A' is unbindable, cloning
|
||||
it to mount at other mountpoints is not possible.
|
||||
|
||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||
5. 'A' is a private mount and 'B' is a non-shared(private or slave or
|
||||
unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'.
|
||||
|
||||
6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
|
||||
6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A'
|
||||
is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
||||
shared mount.
|
||||
|
||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
|
||||
7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount.
|
||||
The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A'
|
||||
continues to be a slave mount of mount 'Z'.
|
||||
|
||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
|
||||
8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount
|
||||
'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a
|
||||
unbindable mount.
|
||||
|
||||
5e) Mount semantics
|
||||
|
||||
Consider the following command
|
||||
Consider the following command::
|
||||
|
||||
mount device B/b
|
||||
mount device B/b
|
||||
|
||||
'B' is the destination mount and 'b' is the dentry in the destination
|
||||
mount.
|
||||
@ -537,9 +560,9 @@ replicas continue to be exactly same.
|
||||
|
||||
5f) Unmount semantics
|
||||
|
||||
Consider the following command
|
||||
Consider the following command::
|
||||
|
||||
umount A
|
||||
umount A
|
||||
|
||||
where 'A' is a mount mounted on mount 'B' at dentry 'b'.
|
||||
|
||||
@ -592,10 +615,12 @@ replicas continue to be exactly same.
|
||||
|
||||
A. What is the result of the following command sequence?
|
||||
|
||||
mount --bind /mnt /mnt
|
||||
mount --make-shared /mnt
|
||||
mount --bind /mnt /tmp
|
||||
mount --move /tmp /mnt/1
|
||||
::
|
||||
|
||||
mount --bind /mnt /mnt
|
||||
mount --make-shared /mnt
|
||||
mount --bind /mnt /tmp
|
||||
mount --move /tmp /mnt/1
|
||||
|
||||
what should be the contents of /mnt /mnt/1 /mnt/1/1 should be?
|
||||
Should they all be identical? or should /mnt and /mnt/1 be
|
||||
@ -604,23 +629,27 @@ replicas continue to be exactly same.
|
||||
|
||||
B. What is the result of the following command sequence?
|
||||
|
||||
mount --make-rshared /
|
||||
mkdir -p /v/1
|
||||
mount --rbind / /v/1
|
||||
::
|
||||
|
||||
mount --make-rshared /
|
||||
mkdir -p /v/1
|
||||
mount --rbind / /v/1
|
||||
|
||||
what should be the content of /v/1/v/1 be?
|
||||
|
||||
|
||||
C. What is the result of the following command sequence?
|
||||
|
||||
mount --bind /mnt /mnt
|
||||
mount --make-shared /mnt
|
||||
mkdir -p /mnt/1/2/3 /mnt/1/test
|
||||
mount --bind /mnt/1 /tmp
|
||||
mount --make-slave /mnt
|
||||
mount --make-shared /mnt
|
||||
mount --bind /mnt/1/2 /tmp1
|
||||
mount --make-slave /mnt
|
||||
::
|
||||
|
||||
mount --bind /mnt /mnt
|
||||
mount --make-shared /mnt
|
||||
mkdir -p /mnt/1/2/3 /mnt/1/test
|
||||
mount --bind /mnt/1 /tmp
|
||||
mount --make-slave /mnt
|
||||
mount --make-shared /mnt
|
||||
mount --bind /mnt/1/2 /tmp1
|
||||
mount --make-slave /mnt
|
||||
|
||||
At this point we have the first mount at /tmp and
|
||||
its root dentry is 1. Let's call this mount 'A'
|
||||
@ -668,7 +697,8 @@ replicas continue to be exactly same.
|
||||
|
||||
step 1:
|
||||
let's say the root tree has just two directories with
|
||||
one vfsmount.
|
||||
one vfsmount::
|
||||
|
||||
root
|
||||
/ \
|
||||
tmp usr
|
||||
@ -676,14 +706,17 @@ replicas continue to be exactly same.
|
||||
And we want to replicate the tree at multiple
|
||||
mountpoints under /root/tmp
|
||||
|
||||
step2:
|
||||
mount --make-shared /root
|
||||
step 2:
|
||||
::
|
||||
|
||||
mkdir -p /tmp/m1
|
||||
|
||||
mount --rbind /root /tmp/m1
|
||||
mount --make-shared /root
|
||||
|
||||
the new tree now looks like this:
|
||||
mkdir -p /tmp/m1
|
||||
|
||||
mount --rbind /root /tmp/m1
|
||||
|
||||
the new tree now looks like this::
|
||||
|
||||
root
|
||||
/ \
|
||||
@ -697,11 +730,13 @@ replicas continue to be exactly same.
|
||||
|
||||
it has two vfsmounts
|
||||
|
||||
step3:
|
||||
step 3:
|
||||
::
|
||||
|
||||
mkdir -p /tmp/m2
|
||||
mount --rbind /root /tmp/m2
|
||||
|
||||
the new tree now looks like this:
|
||||
the new tree now looks like this::
|
||||
|
||||
root
|
||||
/ \
|
||||
@ -724,6 +759,7 @@ replicas continue to be exactly same.
|
||||
it has 6 vfsmounts
|
||||
|
||||
step 4:
|
||||
::
|
||||
mkdir -p /tmp/m3
|
||||
mount --rbind /root /tmp/m3
|
||||
|
||||
@ -740,7 +776,8 @@ replicas continue to be exactly same.
|
||||
|
||||
step 1:
|
||||
let's say the root tree has just two directories with
|
||||
one vfsmount.
|
||||
one vfsmount::
|
||||
|
||||
root
|
||||
/ \
|
||||
tmp usr
|
||||
@ -748,17 +785,20 @@ replicas continue to be exactly same.
|
||||
How do we set up the same tree at multiple locations under
|
||||
/root/tmp
|
||||
|
||||
step2:
|
||||
mount --bind /root/tmp /root/tmp
|
||||
step 2:
|
||||
::
|
||||
|
||||
mount --make-rshared /root
|
||||
mount --make-unbindable /root/tmp
|
||||
|
||||
mkdir -p /tmp/m1
|
||||
mount --bind /root/tmp /root/tmp
|
||||
|
||||
mount --rbind /root /tmp/m1
|
||||
mount --make-rshared /root
|
||||
mount --make-unbindable /root/tmp
|
||||
|
||||
the new tree now looks like this:
|
||||
mkdir -p /tmp/m1
|
||||
|
||||
mount --rbind /root /tmp/m1
|
||||
|
||||
the new tree now looks like this::
|
||||
|
||||
root
|
||||
/ \
|
||||
@ -768,11 +808,13 @@ replicas continue to be exactly same.
|
||||
/ \
|
||||
tmp usr
|
||||
|
||||
step3:
|
||||
step 3:
|
||||
::
|
||||
|
||||
mkdir -p /tmp/m2
|
||||
mount --rbind /root /tmp/m2
|
||||
|
||||
the new tree now looks like this:
|
||||
the new tree now looks like this::
|
||||
|
||||
root
|
||||
/ \
|
||||
@ -782,12 +824,13 @@ replicas continue to be exactly same.
|
||||
/ \ / \
|
||||
tmp usr tmp usr
|
||||
|
||||
step4:
|
||||
step 4:
|
||||
::
|
||||
|
||||
mkdir -p /tmp/m3
|
||||
mount --rbind /root /tmp/m3
|
||||
|
||||
the new tree now looks like this:
|
||||
the new tree now looks like this::
|
||||
|
||||
root
|
||||
/ \
|
||||
@ -801,25 +844,31 @@ replicas continue to be exactly same.
|
||||
|
||||
8A) Datastructure
|
||||
|
||||
4 new fields are introduced to struct vfsmount
|
||||
->mnt_share
|
||||
->mnt_slave_list
|
||||
->mnt_slave
|
||||
->mnt_master
|
||||
4 new fields are introduced to struct vfsmount:
|
||||
|
||||
->mnt_share links together all the mount to/from which this vfsmount
|
||||
* ->mnt_share
|
||||
* ->mnt_slave_list
|
||||
* ->mnt_slave
|
||||
* ->mnt_master
|
||||
|
||||
->mnt_share
|
||||
links together all the mount to/from which this vfsmount
|
||||
send/receives propagation events.
|
||||
|
||||
->mnt_slave_list links all the mounts to which this vfsmount propagates
|
||||
->mnt_slave_list
|
||||
links all the mounts to which this vfsmount propagates
|
||||
to.
|
||||
|
||||
->mnt_slave links together all the slaves that its master vfsmount
|
||||
->mnt_slave
|
||||
links together all the slaves that its master vfsmount
|
||||
propagates to.
|
||||
|
||||
->mnt_master points to the master vfsmount from which this vfsmount
|
||||
->mnt_master
|
||||
points to the master vfsmount from which this vfsmount
|
||||
receives propagation.
|
||||
|
||||
->mnt_flags takes two more flags to indicate the propagation status of
|
||||
->mnt_flags
|
||||
takes two more flags to indicate the propagation status of
|
||||
the vfsmount. MNT_SHARE indicates that the vfsmount is a shared
|
||||
vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be
|
||||
replicated.
|
||||
@ -842,7 +891,7 @@ replicas continue to be exactly same.
|
||||
|
||||
A example propagation tree looks as shown in the figure below.
|
||||
[ NOTE: Though it looks like a forest, if we consider all the shared
|
||||
mounts as a conceptual entity called 'pnode', it becomes a tree]
|
||||
mounts as a conceptual entity called 'pnode', it becomes a tree]::
|
||||
|
||||
|
||||
A <--> B <--> C <---> D
|
||||
@ -864,14 +913,19 @@ replicas continue to be exactly same.
|
||||
A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G'
|
||||
|
||||
E's ->mnt_share links with ->mnt_share of K
|
||||
'E', 'K', 'F', 'G' have their ->mnt_master point to struct
|
||||
vfsmount of 'A'
|
||||
|
||||
'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A'
|
||||
|
||||
'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K'
|
||||
|
||||
K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N'
|
||||
|
||||
C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K'
|
||||
|
||||
J and K's ->mnt_master points to struct vfsmount of C
|
||||
|
||||
and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I'
|
||||
|
||||
'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'.
|
||||
|
||||
|
||||
@ -903,6 +957,7 @@ replicas continue to be exactly same.
|
||||
Prepare phase:
|
||||
|
||||
for each mount in the source tree:
|
||||
|
||||
a) Create the necessary number of mount trees to
|
||||
be attached to each of the mounts that receive
|
||||
propagation from the destination mount.
|
||||
@ -929,11 +984,12 @@ replicas continue to be exactly same.
|
||||
Abort phase
|
||||
delete all the newly created trees.
|
||||
|
||||
NOTE: all the propagation related functionality resides in the file
|
||||
pnode.c
|
||||
.. Note::
|
||||
all the propagation related functionality resides in the file pnode.c
|
||||
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com)
|
||||
|
||||
version 0.2 (Incorporated comments from Al Viro)
|
13
Documentation/filesystems/spufs/index.rst
Normal file
13
Documentation/filesystems/spufs/index.rst
Normal file
@ -0,0 +1,13 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==============
|
||||
SPU Filesystem
|
||||
==============
|
||||
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
spufs
|
||||
spu_create
|
||||
spu_run
|
131
Documentation/filesystems/spufs/spu_create.rst
Normal file
131
Documentation/filesystems/spufs/spu_create.rst
Normal file
@ -0,0 +1,131 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========
|
||||
spu_create
|
||||
==========
|
||||
|
||||
Name
|
||||
====
|
||||
spu_create - create a new spu context
|
||||
|
||||
|
||||
Synopsis
|
||||
========
|
||||
|
||||
::
|
||||
|
||||
#include <sys/types.h>
|
||||
#include <sys/spu.h>
|
||||
|
||||
int spu_create(const char *pathname, int flags, mode_t mode);
|
||||
|
||||
Description
|
||||
===========
|
||||
The spu_create system call is used on PowerPC machines that implement
|
||||
the Cell Broadband Engine Architecture in order to access Synergistic
|
||||
Processor Units (SPUs). It creates a new logical context for an SPU in
|
||||
pathname and returns a handle to associated with it. pathname must
|
||||
point to a non-existing directory in the mount point of the SPU file
|
||||
system (spufs). When spu_create is successful, a directory gets cre-
|
||||
ated on pathname and it is populated with files.
|
||||
|
||||
The returned file handle can only be passed to spu_run(2) or closed,
|
||||
other operations are not defined on it. When it is closed, all associ-
|
||||
ated directory entries in spufs are removed. When the last file handle
|
||||
pointing either inside of the context directory or to this file
|
||||
descriptor is closed, the logical SPU context is destroyed.
|
||||
|
||||
The parameter flags can be zero or any bitwise or'd combination of the
|
||||
following constants:
|
||||
|
||||
SPU_RAWIO
|
||||
Allow mapping of some of the hardware registers of the SPU into
|
||||
user space. This flag requires the CAP_SYS_RAWIO capability, see
|
||||
capabilities(7).
|
||||
|
||||
The mode parameter specifies the permissions used for creating the new
|
||||
directory in spufs. mode is modified with the user's umask(2) value
|
||||
and then used for both the directory and the files contained in it. The
|
||||
file permissions mask out some more bits of mode because they typically
|
||||
support only read or write access. See stat(2) for a full list of the
|
||||
possible mode values.
|
||||
|
||||
|
||||
Return Value
|
||||
============
|
||||
spu_create returns a new file descriptor. It may return -1 to indicate
|
||||
an error condition and set errno to one of the error codes listed
|
||||
below.
|
||||
|
||||
|
||||
Errors
|
||||
======
|
||||
EACCES
|
||||
The current user does not have write access on the spufs mount
|
||||
point.
|
||||
|
||||
EEXIST An SPU context already exists at the given path name.
|
||||
|
||||
EFAULT pathname is not a valid string pointer in the current address
|
||||
space.
|
||||
|
||||
EINVAL pathname is not a directory in the spufs mount point.
|
||||
|
||||
ELOOP Too many symlinks were found while resolving pathname.
|
||||
|
||||
EMFILE The process has reached its maximum open file limit.
|
||||
|
||||
ENAMETOOLONG
|
||||
pathname was too long.
|
||||
|
||||
ENFILE The system has reached the global open file limit.
|
||||
|
||||
ENOENT Part of pathname could not be resolved.
|
||||
|
||||
ENOMEM The kernel could not allocate all resources required.
|
||||
|
||||
ENOSPC There are not enough SPU resources available to create a new
|
||||
context or the user specific limit for the number of SPU con-
|
||||
texts has been reached.
|
||||
|
||||
ENOSYS the functionality is not provided by the current system, because
|
||||
either the hardware does not provide SPUs or the spufs module is
|
||||
not loaded.
|
||||
|
||||
ENOTDIR
|
||||
A part of pathname is not a directory.
|
||||
|
||||
|
||||
|
||||
Notes
|
||||
=====
|
||||
spu_create is meant to be used from libraries that implement a more
|
||||
abstract interface to SPUs, not to be used from regular applications.
|
||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||
ommended libraries.
|
||||
|
||||
|
||||
Files
|
||||
=====
|
||||
pathname must point to a location beneath the mount point of spufs. By
|
||||
convention, it gets mounted in /spu.
|
||||
|
||||
|
||||
Conforming to
|
||||
=============
|
||||
This call is Linux specific and only implemented by the ppc64 architec-
|
||||
ture. Programs using this system call are not portable.
|
||||
|
||||
|
||||
Bugs
|
||||
====
|
||||
The code does not yet fully implement all features lined out here.
|
||||
|
||||
|
||||
Author
|
||||
======
|
||||
Arnd Bergmann <arndb@de.ibm.com>
|
||||
|
||||
See Also
|
||||
========
|
||||
capabilities(7), close(2), spu_run(2), spufs(7)
|
138
Documentation/filesystems/spufs/spu_run.rst
Normal file
138
Documentation/filesystems/spufs/spu_run.rst
Normal file
@ -0,0 +1,138 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======
|
||||
spu_run
|
||||
=======
|
||||
|
||||
|
||||
Name
|
||||
====
|
||||
spu_run - execute an spu context
|
||||
|
||||
|
||||
Synopsis
|
||||
========
|
||||
|
||||
::
|
||||
|
||||
#include <sys/spu.h>
|
||||
|
||||
int spu_run(int fd, unsigned int *npc, unsigned int *event);
|
||||
|
||||
Description
|
||||
===========
|
||||
The spu_run system call is used on PowerPC machines that implement the
|
||||
Cell Broadband Engine Architecture in order to access Synergistic Pro-
|
||||
cessor Units (SPUs). It uses the fd that was returned from spu_cre-
|
||||
ate(2) to address a specific SPU context. When the context gets sched-
|
||||
uled to a physical SPU, it starts execution at the instruction pointer
|
||||
passed in npc.
|
||||
|
||||
Execution of SPU code happens synchronously, meaning that spu_run does
|
||||
not return while the SPU is still running. If there is a need to exe-
|
||||
cute SPU code in parallel with other code on either the main CPU or
|
||||
other SPUs, you need to create a new thread of execution first, e.g.
|
||||
using the pthread_create(3) call.
|
||||
|
||||
When spu_run returns, the current value of the SPU instruction pointer
|
||||
is written back to npc, so you can call spu_run again without updating
|
||||
the pointers.
|
||||
|
||||
event can be a NULL pointer or point to an extended status code that
|
||||
gets filled when spu_run returns. It can be one of the following con-
|
||||
stants:
|
||||
|
||||
SPE_EVENT_DMA_ALIGNMENT
|
||||
A DMA alignment error
|
||||
|
||||
SPE_EVENT_SPE_DATA_SEGMENT
|
||||
A DMA segmentation error
|
||||
|
||||
SPE_EVENT_SPE_DATA_STORAGE
|
||||
A DMA storage error
|
||||
|
||||
If NULL is passed as the event argument, these errors will result in a
|
||||
signal delivered to the calling process.
|
||||
|
||||
Return Value
|
||||
============
|
||||
spu_run returns the value of the spu_status register or -1 to indicate
|
||||
an error and set errno to one of the error codes listed below. The
|
||||
spu_status register value contains a bit mask of status codes and
|
||||
optionally a 14 bit code returned from the stop-and-signal instruction
|
||||
on the SPU. The bit masks for the status codes are:
|
||||
|
||||
0x02
|
||||
SPU was stopped by stop-and-signal.
|
||||
|
||||
0x04
|
||||
SPU was stopped by halt.
|
||||
|
||||
0x08
|
||||
SPU is waiting for a channel.
|
||||
|
||||
0x10
|
||||
SPU is in single-step mode.
|
||||
|
||||
0x20
|
||||
SPU has tried to execute an invalid instruction.
|
||||
|
||||
0x40
|
||||
SPU has tried to access an invalid channel.
|
||||
|
||||
0x3fff0000
|
||||
The bits masked with this value contain the code returned from
|
||||
stop-and-signal.
|
||||
|
||||
There are always one or more of the lower eight bits set or an error
|
||||
code is returned from spu_run.
|
||||
|
||||
Errors
|
||||
======
|
||||
EAGAIN or EWOULDBLOCK
|
||||
fd is in non-blocking mode and spu_run would block.
|
||||
|
||||
EBADF fd is not a valid file descriptor.
|
||||
|
||||
EFAULT npc is not a valid pointer or status is neither NULL nor a valid
|
||||
pointer.
|
||||
|
||||
EINTR A signal occurred while spu_run was in progress. The npc value
|
||||
has been updated to the new program counter value if necessary.
|
||||
|
||||
EINVAL fd is not a file descriptor returned from spu_create(2).
|
||||
|
||||
ENOMEM Insufficient memory was available to handle a page fault result-
|
||||
ing from an MFC direct memory access.
|
||||
|
||||
ENOSYS the functionality is not provided by the current system, because
|
||||
either the hardware does not provide SPUs or the spufs module is
|
||||
not loaded.
|
||||
|
||||
|
||||
Notes
|
||||
=====
|
||||
spu_run is meant to be used from libraries that implement a more
|
||||
abstract interface to SPUs, not to be used from regular applications.
|
||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||
ommended libraries.
|
||||
|
||||
|
||||
Conforming to
|
||||
=============
|
||||
This call is Linux specific and only implemented by the ppc64 architec-
|
||||
ture. Programs using this system call are not portable.
|
||||
|
||||
|
||||
Bugs
|
||||
====
|
||||
The code does not yet fully implement all features lined out here.
|
||||
|
||||
|
||||
Author
|
||||
======
|
||||
Arnd Bergmann <arndb@de.ibm.com>
|
||||
|
||||
See Also
|
||||
========
|
||||
capabilities(7), close(2), spu_create(2), spufs(7)
|
@ -1,12 +1,18 @@
|
||||
SPUFS(2) Linux Programmer's Manual SPUFS(2)
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=====
|
||||
spufs
|
||||
=====
|
||||
|
||||
Name
|
||||
====
|
||||
|
||||
NAME
|
||||
spufs - the SPU file system
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
Description
|
||||
===========
|
||||
|
||||
The SPU file system is used on PowerPC machines that implement the Cell
|
||||
Broadband Engine Architecture in order to access Synergistic Processor
|
||||
Units (SPUs).
|
||||
@ -21,7 +27,9 @@ DESCRIPTION
|
||||
ally add or remove files.
|
||||
|
||||
|
||||
MOUNT OPTIONS
|
||||
Mount Options
|
||||
=============
|
||||
|
||||
uid=<uid>
|
||||
set the user owning the mount point, the default is 0 (root).
|
||||
|
||||
@ -29,7 +37,9 @@ MOUNT OPTIONS
|
||||
set the group owning the mount point, the default is 0 (root).
|
||||
|
||||
|
||||
FILES
|
||||
Files
|
||||
=====
|
||||
|
||||
The files in spufs mostly follow the standard behavior for regular sys-
|
||||
tem calls like read(2) or write(2), but often support only a subset of
|
||||
the operations supported on regular file systems. This list details the
|
||||
@ -125,14 +135,12 @@ FILES
|
||||
space is available for writing.
|
||||
|
||||
|
||||
/mbox_stat
|
||||
/ibox_stat
|
||||
/wbox_stat
|
||||
/mbox_stat, /ibox_stat, /wbox_stat
|
||||
Read-only files that contain the length of the current queue, i.e. how
|
||||
many words can be read from mbox or ibox or how many words can be
|
||||
written to wbox without blocking. The files can be read only in 4-byte
|
||||
units and return a big-endian binary integer number. The possible
|
||||
operations on an open *box_stat file are:
|
||||
operations on an open ``*box_stat`` file are:
|
||||
|
||||
read(2)
|
||||
If a count smaller than four is requested, read returns -1 and
|
||||
@ -143,12 +151,7 @@ FILES
|
||||
in EAGAIN.
|
||||
|
||||
|
||||
/npc
|
||||
/decr
|
||||
/decr_status
|
||||
/spu_tag_mask
|
||||
/event_mask
|
||||
/srr0
|
||||
/npc, /decr, /decr_status, /spu_tag_mask, /event_mask, /srr0
|
||||
Internal registers of the SPU. The representation is an ASCII string
|
||||
with the numeric value of the next instruction to be executed. These
|
||||
can be used in read/write mode for debugging, but normal operation of
|
||||
@ -157,17 +160,14 @@ FILES
|
||||
|
||||
The contents of these files are:
|
||||
|
||||
=================== ===================================
|
||||
npc Next Program Counter
|
||||
|
||||
decr SPU Decrementer
|
||||
|
||||
decr_status Decrementer Status
|
||||
|
||||
spu_tag_mask MFC tag mask for SPU DMA
|
||||
|
||||
event_mask Event mask for SPU interrupts
|
||||
|
||||
srr0 Interrupt Return address register
|
||||
=================== ===================================
|
||||
|
||||
|
||||
The possible operations on an open npc, decr, decr_status,
|
||||
@ -206,8 +206,7 @@ FILES
|
||||
from the data buffer, updating the value of the fpcr register.
|
||||
|
||||
|
||||
/signal1
|
||||
/signal2
|
||||
/signal1, /signal2
|
||||
The two signal notification channels of an SPU. These are read-write
|
||||
files that operate on a 32 bit word. Writing to one of these files
|
||||
triggers an interrupt on the SPU. The value written to the signal
|
||||
@ -233,8 +232,7 @@ FILES
|
||||
file.
|
||||
|
||||
|
||||
/signal1_type
|
||||
/signal2_type
|
||||
/signal1_type, /signal2_type
|
||||
These two files change the behavior of the signal1 and signal2 notifi-
|
||||
cation files. The contain a numerical ASCII string which is read as
|
||||
either "1" or "0". In mode 0 (overwrite), the hardware replaces the
|
||||
@ -259,263 +257,17 @@ FILES
|
||||
the previous setting.
|
||||
|
||||
|
||||
EXAMPLES
|
||||
Examples
|
||||
========
|
||||
/etc/fstab entry
|
||||
none /spu spufs gid=spu 0 0
|
||||
|
||||
|
||||
AUTHORS
|
||||
Authors
|
||||
=======
|
||||
Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>,
|
||||
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
|
||||
|
||||
SEE ALSO
|
||||
See Also
|
||||
========
|
||||
capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7)
|
||||
|
||||
|
||||
|
||||
Linux 2005-09-28 SPUFS(2)
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2)
|
||||
|
||||
|
||||
|
||||
NAME
|
||||
spu_run - execute an spu context
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
#include <sys/spu.h>
|
||||
|
||||
int spu_run(int fd, unsigned int *npc, unsigned int *event);
|
||||
|
||||
DESCRIPTION
|
||||
The spu_run system call is used on PowerPC machines that implement the
|
||||
Cell Broadband Engine Architecture in order to access Synergistic Pro-
|
||||
cessor Units (SPUs). It uses the fd that was returned from spu_cre-
|
||||
ate(2) to address a specific SPU context. When the context gets sched-
|
||||
uled to a physical SPU, it starts execution at the instruction pointer
|
||||
passed in npc.
|
||||
|
||||
Execution of SPU code happens synchronously, meaning that spu_run does
|
||||
not return while the SPU is still running. If there is a need to exe-
|
||||
cute SPU code in parallel with other code on either the main CPU or
|
||||
other SPUs, you need to create a new thread of execution first, e.g.
|
||||
using the pthread_create(3) call.
|
||||
|
||||
When spu_run returns, the current value of the SPU instruction pointer
|
||||
is written back to npc, so you can call spu_run again without updating
|
||||
the pointers.
|
||||
|
||||
event can be a NULL pointer or point to an extended status code that
|
||||
gets filled when spu_run returns. It can be one of the following con-
|
||||
stants:
|
||||
|
||||
SPE_EVENT_DMA_ALIGNMENT
|
||||
A DMA alignment error
|
||||
|
||||
SPE_EVENT_SPE_DATA_SEGMENT
|
||||
A DMA segmentation error
|
||||
|
||||
SPE_EVENT_SPE_DATA_STORAGE
|
||||
A DMA storage error
|
||||
|
||||
If NULL is passed as the event argument, these errors will result in a
|
||||
signal delivered to the calling process.
|
||||
|
||||
RETURN VALUE
|
||||
spu_run returns the value of the spu_status register or -1 to indicate
|
||||
an error and set errno to one of the error codes listed below. The
|
||||
spu_status register value contains a bit mask of status codes and
|
||||
optionally a 14 bit code returned from the stop-and-signal instruction
|
||||
on the SPU. The bit masks for the status codes are:
|
||||
|
||||
0x02 SPU was stopped by stop-and-signal.
|
||||
|
||||
0x04 SPU was stopped by halt.
|
||||
|
||||
0x08 SPU is waiting for a channel.
|
||||
|
||||
0x10 SPU is in single-step mode.
|
||||
|
||||
0x20 SPU has tried to execute an invalid instruction.
|
||||
|
||||
0x40 SPU has tried to access an invalid channel.
|
||||
|
||||
0x3fff0000
|
||||
The bits masked with this value contain the code returned from
|
||||
stop-and-signal.
|
||||
|
||||
There are always one or more of the lower eight bits set or an error
|
||||
code is returned from spu_run.
|
||||
|
||||
ERRORS
|
||||
EAGAIN or EWOULDBLOCK
|
||||
fd is in non-blocking mode and spu_run would block.
|
||||
|
||||
EBADF fd is not a valid file descriptor.
|
||||
|
||||
EFAULT npc is not a valid pointer or status is neither NULL nor a valid
|
||||
pointer.
|
||||
|
||||
EINTR A signal occurred while spu_run was in progress. The npc value
|
||||
has been updated to the new program counter value if necessary.
|
||||
|
||||
EINVAL fd is not a file descriptor returned from spu_create(2).
|
||||
|
||||
ENOMEM Insufficient memory was available to handle a page fault result-
|
||||
ing from an MFC direct memory access.
|
||||
|
||||
ENOSYS the functionality is not provided by the current system, because
|
||||
either the hardware does not provide SPUs or the spufs module is
|
||||
not loaded.
|
||||
|
||||
|
||||
NOTES
|
||||
spu_run is meant to be used from libraries that implement a more
|
||||
abstract interface to SPUs, not to be used from regular applications.
|
||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||
ommended libraries.
|
||||
|
||||
|
||||
CONFORMING TO
|
||||
This call is Linux specific and only implemented by the ppc64 architec-
|
||||
ture. Programs using this system call are not portable.
|
||||
|
||||
|
||||
BUGS
|
||||
The code does not yet fully implement all features lined out here.
|
||||
|
||||
|
||||
AUTHOR
|
||||
Arnd Bergmann <arndb@de.ibm.com>
|
||||
|
||||
SEE ALSO
|
||||
capabilities(7), close(2), spu_create(2), spufs(7)
|
||||
|
||||
|
||||
|
||||
Linux 2005-09-28 SPU_RUN(2)
|
||||
|
||||
------------------------------------------------------------------------------
|
||||
|
||||
SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2)
|
||||
|
||||
|
||||
|
||||
NAME
|
||||
spu_create - create a new spu context
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
#include <sys/types.h>
|
||||
#include <sys/spu.h>
|
||||
|
||||
int spu_create(const char *pathname, int flags, mode_t mode);
|
||||
|
||||
DESCRIPTION
|
||||
The spu_create system call is used on PowerPC machines that implement
|
||||
the Cell Broadband Engine Architecture in order to access Synergistic
|
||||
Processor Units (SPUs). It creates a new logical context for an SPU in
|
||||
pathname and returns a handle to associated with it. pathname must
|
||||
point to a non-existing directory in the mount point of the SPU file
|
||||
system (spufs). When spu_create is successful, a directory gets cre-
|
||||
ated on pathname and it is populated with files.
|
||||
|
||||
The returned file handle can only be passed to spu_run(2) or closed,
|
||||
other operations are not defined on it. When it is closed, all associ-
|
||||
ated directory entries in spufs are removed. When the last file handle
|
||||
pointing either inside of the context directory or to this file
|
||||
descriptor is closed, the logical SPU context is destroyed.
|
||||
|
||||
The parameter flags can be zero or any bitwise or'd combination of the
|
||||
following constants:
|
||||
|
||||
SPU_RAWIO
|
||||
Allow mapping of some of the hardware registers of the SPU into
|
||||
user space. This flag requires the CAP_SYS_RAWIO capability, see
|
||||
capabilities(7).
|
||||
|
||||
The mode parameter specifies the permissions used for creating the new
|
||||
directory in spufs. mode is modified with the user's umask(2) value
|
||||
and then used for both the directory and the files contained in it. The
|
||||
file permissions mask out some more bits of mode because they typically
|
||||
support only read or write access. See stat(2) for a full list of the
|
||||
possible mode values.
|
||||
|
||||
|
||||
RETURN VALUE
|
||||
spu_create returns a new file descriptor. It may return -1 to indicate
|
||||
an error condition and set errno to one of the error codes listed
|
||||
below.
|
||||
|
||||
|
||||
ERRORS
|
||||
EACCES
|
||||
The current user does not have write access on the spufs mount
|
||||
point.
|
||||
|
||||
EEXIST An SPU context already exists at the given path name.
|
||||
|
||||
EFAULT pathname is not a valid string pointer in the current address
|
||||
space.
|
||||
|
||||
EINVAL pathname is not a directory in the spufs mount point.
|
||||
|
||||
ELOOP Too many symlinks were found while resolving pathname.
|
||||
|
||||
EMFILE The process has reached its maximum open file limit.
|
||||
|
||||
ENAMETOOLONG
|
||||
pathname was too long.
|
||||
|
||||
ENFILE The system has reached the global open file limit.
|
||||
|
||||
ENOENT Part of pathname could not be resolved.
|
||||
|
||||
ENOMEM The kernel could not allocate all resources required.
|
||||
|
||||
ENOSPC There are not enough SPU resources available to create a new
|
||||
context or the user specific limit for the number of SPU con-
|
||||
texts has been reached.
|
||||
|
||||
ENOSYS the functionality is not provided by the current system, because
|
||||
either the hardware does not provide SPUs or the spufs module is
|
||||
not loaded.
|
||||
|
||||
ENOTDIR
|
||||
A part of pathname is not a directory.
|
||||
|
||||
|
||||
|
||||
NOTES
|
||||
spu_create is meant to be used from libraries that implement a more
|
||||
abstract interface to SPUs, not to be used from regular applications.
|
||||
See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec-
|
||||
ommended libraries.
|
||||
|
||||
|
||||
FILES
|
||||
pathname must point to a location beneath the mount point of spufs. By
|
||||
convention, it gets mounted in /spu.
|
||||
|
||||
|
||||
CONFORMING TO
|
||||
This call is Linux specific and only implemented by the ppc64 architec-
|
||||
ture. Programs using this system call are not portable.
|
||||
|
||||
|
||||
BUGS
|
||||
The code does not yet fully implement all features lined out here.
|
||||
|
||||
|
||||
AUTHOR
|
||||
Arnd Bergmann <arndb@de.ibm.com>
|
||||
|
||||
SEE ALSO
|
||||
capabilities(7), close(2), spu_run(2), spufs(7)
|
||||
|
||||
|
||||
|
||||
Linux 2005-09-28 SPU_CREATE(2)
|
@ -1,8 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================================
|
||||
Accessing PCI device resources through sysfs
|
||||
--------------------------------------------
|
||||
============================================
|
||||
|
||||
sysfs, usually mounted at /sys, provides access to PCI resources on platforms
|
||||
that support it. For example, a given bus might look like this:
|
||||
that support it. For example, a given bus might look like this::
|
||||
|
||||
/sys/devices/pci0000:17
|
||||
|-- 0000:17:00.0
|
||||
@ -30,8 +33,9 @@ This bus contains a single function device in slot 0. The domain and bus
|
||||
numbers are reproduced for convenience. Under the device directory are several
|
||||
files, each with their own function.
|
||||
|
||||
=================== =====================================================
|
||||
file function
|
||||
---- --------
|
||||
=================== =====================================================
|
||||
class PCI class (ascii, ro)
|
||||
config PCI config space (binary, rw)
|
||||
device PCI device (ascii, ro)
|
||||
@ -40,13 +44,16 @@ files, each with their own function.
|
||||
local_cpus nearby CPU mask (cpumask, ro)
|
||||
remove remove device from kernel's list (ascii, wo)
|
||||
resource PCI resource host addresses (ascii, ro)
|
||||
resource0..N PCI resource N, if present (binary, mmap, rw[1])
|
||||
resource0..N PCI resource N, if present (binary, mmap, rw\ [1]_)
|
||||
resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
|
||||
revision PCI revision (ascii, ro)
|
||||
rom PCI ROM resource, if present (binary, ro)
|
||||
subsystem_device PCI subsystem device (ascii, ro)
|
||||
subsystem_vendor PCI subsystem vendor (ascii, ro)
|
||||
vendor PCI vendor (ascii, ro)
|
||||
=================== =====================================================
|
||||
|
||||
::
|
||||
|
||||
ro - read only file
|
||||
rw - file is readable and writable
|
||||
@ -56,7 +63,7 @@ files, each with their own function.
|
||||
binary - file contains binary data
|
||||
cpumask - file contains a cpumask type
|
||||
|
||||
[1] rw for RESOURCE_IO (I/O port) regions only
|
||||
.. [1] rw for RESOURCE_IO (I/O port) regions only
|
||||
|
||||
The read only files are informational, writes to them will be ignored, with
|
||||
the exception of the 'rom' file. Writable files can be used to perform
|
||||
@ -67,11 +74,11 @@ don't support mmapping of certain resources, so be sure to check the return
|
||||
value from any attempted mmap. The most notable of these are I/O port
|
||||
resources, which also provide read/write access.
|
||||
|
||||
The 'enable' file provides a counter that indicates how many times the device
|
||||
The 'enable' file provides a counter that indicates how many times the device
|
||||
has been enabled. If the 'enable' file currently returns '4', and a '1' is
|
||||
echoed into it, it will then return '5'. Echoing a '0' into it will decrease
|
||||
the count. Even when it returns to 0, though, some of the initialisation
|
||||
may not be reversed.
|
||||
may not be reversed.
|
||||
|
||||
The 'rom' file is special in that it provides read-only access to the device's
|
||||
ROM file, if available. It's disabled by default, however, so applications
|
||||
@ -93,7 +100,7 @@ Accessing legacy resources through sysfs
|
||||
|
||||
Legacy I/O port and ISA memory resources are also provided in sysfs if the
|
||||
underlying platform supports them. They're located in the PCI class hierarchy,
|
||||
e.g.
|
||||
e.g.::
|
||||
|
||||
/sys/class/pci_bus/0000:17/
|
||||
|-- bridge -> ../../../devices/pci0000:17
|
@ -1,5 +1,8 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=============
|
||||
Sysfs tagging
|
||||
-------------
|
||||
=============
|
||||
|
||||
(Taken almost verbatim from Eric Biederman's netns tagging patch
|
||||
commit msg)
|
||||
@ -18,25 +21,28 @@ in the directories and applications only see a limited set of
|
||||
the network devices.
|
||||
|
||||
Each sysfs directory entry may be tagged with a namespace via the
|
||||
void *ns member of its kernfs_node. If a directory entry is tagged,
|
||||
then kernfs_node->flags will have a flag between KOBJ_NS_TYPE_NONE
|
||||
``void *ns member`` of its ``kernfs_node``. If a directory entry is tagged,
|
||||
then ``kernfs_node->flags`` will have a flag between KOBJ_NS_TYPE_NONE
|
||||
and KOBJ_NS_TYPES, and ns will point to the namespace to which it
|
||||
belongs.
|
||||
|
||||
Each sysfs superblock's kernfs_super_info contains an array void
|
||||
*ns[KOBJ_NS_TYPES]. When a task in a tagging namespace
|
||||
Each sysfs superblock's kernfs_super_info contains an array
|
||||
``void *ns[KOBJ_NS_TYPES]``. When a task in a tagging namespace
|
||||
kobj_nstype first mounts sysfs, a new superblock is created. It
|
||||
will be differentiated from other sysfs mounts by having its
|
||||
s_fs_info->ns[kobj_nstype] set to the new namespace. Note that
|
||||
``s_fs_info->ns[kobj_nstype]`` set to the new namespace. Note that
|
||||
through bind mounting and mounts propagation, a task can easily view
|
||||
the contents of other namespaces' sysfs mounts. Therefore, when a
|
||||
namespace exits, it will call kobj_ns_exit() to invalidate any
|
||||
kernfs_node->ns pointers pointing to it.
|
||||
|
||||
Users of this interface:
|
||||
- define a type in the kobj_ns_type enumeration.
|
||||
- call kobj_ns_type_register() with its kobj_ns_type_operations which has
|
||||
|
||||
- define a type in the ``kobj_ns_type`` enumeration.
|
||||
- call kobj_ns_type_register() with its ``kobj_ns_type_operations`` which has
|
||||
|
||||
- current_ns() which returns current's namespace
|
||||
- netlink_ns() which returns a socket's namespace
|
||||
- initial_ns() which returns the initial namesapce
|
||||
|
||||
- call kobj_ns_exit() when an individual tag is no longer valid
|
@ -1,8 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
==========================
|
||||
XFS Delayed Logging Design
|
||||
--------------------------
|
||||
==========================
|
||||
|
||||
Introduction to Re-logging in XFS
|
||||
---------------------------------
|
||||
=================================
|
||||
|
||||
XFS logging is a combination of logical and physical logging. Some objects,
|
||||
such as inodes and dquots, are logged in logical format where the details
|
||||
@ -25,7 +28,7 @@ changes in the new transaction that is written to the log.
|
||||
That is, if we have a sequence of changes A through to F, and the object was
|
||||
written to disk after change D, we would see in the log the following series
|
||||
of transactions, their contents and the log sequence number (LSN) of the
|
||||
transaction:
|
||||
transaction::
|
||||
|
||||
Transaction Contents LSN
|
||||
A A X
|
||||
@ -85,7 +88,7 @@ IO permanently. Hence the XFS journalling subsystem can be considered to be IO
|
||||
bound.
|
||||
|
||||
Delayed Logging: Concepts
|
||||
-------------------------
|
||||
=========================
|
||||
|
||||
The key thing to note about the asynchronous logging combined with the
|
||||
relogging technique XFS uses is that we can be relogging changed objects
|
||||
@ -154,9 +157,10 @@ The fundamental requirements for delayed logging in XFS are simple:
|
||||
6. No performance regressions for synchronous transaction workloads.
|
||||
|
||||
Delayed Logging: Design
|
||||
-----------------------
|
||||
=======================
|
||||
|
||||
Storing Changes
|
||||
---------------
|
||||
|
||||
The problem with accumulating changes at a logical level (i.e. just using the
|
||||
existing log item dirty region tracking) is that when it comes to writing the
|
||||
@ -194,30 +198,30 @@ asynchronous transactions to the log. The differences between the existing
|
||||
formatting method and the delayed logging formatting can be seen in the
|
||||
diagram below.
|
||||
|
||||
Current format log vector:
|
||||
Current format log vector::
|
||||
|
||||
Object +---------------------------------------------+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
Object +---------------------------------------------+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
|
||||
After formatting:
|
||||
After formatting::
|
||||
|
||||
Log Buffer +-V1-+-V2-+----V3----+
|
||||
Log Buffer +-V1-+-V2-+----V3----+
|
||||
|
||||
Delayed logging vector:
|
||||
Delayed logging vector::
|
||||
|
||||
Object +---------------------------------------------+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
Object +---------------------------------------------+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
|
||||
After formatting:
|
||||
After formatting::
|
||||
|
||||
Memory Buffer +-V1-+-V2-+----V3----+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
Memory Buffer +-V1-+-V2-+----V3----+
|
||||
Vector 1 +----+
|
||||
Vector 2 +----+
|
||||
Vector 3 +----------+
|
||||
|
||||
The memory buffer and associated vector need to be passed as a single object,
|
||||
but still need to be associated with the parent object so if the object is
|
||||
@ -242,6 +246,7 @@ relogged in memory.
|
||||
|
||||
|
||||
Tracking Changes
|
||||
----------------
|
||||
|
||||
Now that we can record transactional changes in memory in a form that allows
|
||||
them to be used without limitations, we need to be able to track and accumulate
|
||||
@ -278,6 +283,7 @@ done for convenience/sanity of the developers.
|
||||
|
||||
|
||||
Delayed Logging: Checkpoints
|
||||
----------------------------
|
||||
|
||||
When we have a log synchronisation event, commonly known as a "log force",
|
||||
all the items in the CIL must be written into the log via the log buffers.
|
||||
@ -341,7 +347,7 @@ Hence log vectors need to be able to be chained together to allow them to be
|
||||
detached from the log items. That is, when the CIL is flushed the memory
|
||||
buffer and log vector attached to each log item needs to be attached to the
|
||||
checkpoint context so that the log item can be released. In diagrammatic form,
|
||||
the CIL would look like this before the flush:
|
||||
the CIL would look like this before the flush::
|
||||
|
||||
CIL Head
|
||||
|
|
||||
@ -362,7 +368,7 @@ the CIL would look like this before the flush:
|
||||
-> vector array
|
||||
|
||||
And after the flush the CIL head is empty, and the checkpoint context log
|
||||
vector list would look like:
|
||||
vector list would look like::
|
||||
|
||||
Checkpoint Context
|
||||
|
|
||||
@ -411,6 +417,7 @@ compare" situation that can be done after a working and reviewed implementation
|
||||
is in the dev tree....
|
||||
|
||||
Delayed Logging: Checkpoint Sequencing
|
||||
--------------------------------------
|
||||
|
||||
One of the key aspects of the XFS transaction subsystem is that it tags
|
||||
committed transactions with the log sequence number of the transaction commit.
|
||||
@ -474,6 +481,7 @@ force the log at the LSN of that transaction) and so the higher level code
|
||||
behaves the same regardless of whether delayed logging is being used or not.
|
||||
|
||||
Delayed Logging: Checkpoint Log Space Accounting
|
||||
------------------------------------------------
|
||||
|
||||
The big issue for a checkpoint transaction is the log space reservation for the
|
||||
transaction. We don't know how big a checkpoint transaction is going to be
|
||||
@ -491,7 +499,7 @@ the size of the transaction and the number of regions being logged (the number
|
||||
of log vectors in the transaction).
|
||||
|
||||
An example of the differences would be logging directory changes versus logging
|
||||
inode changes. If you modify lots of inode cores (e.g. chmod -R g+w *), then
|
||||
inode changes. If you modify lots of inode cores (e.g. ``chmod -R g+w *``), then
|
||||
there are lots of transactions that only contain an inode core and an inode log
|
||||
format structure. That is, two vectors totaling roughly 150 bytes. If we modify
|
||||
10,000 inodes, we have about 1.5MB of metadata to write in 20,000 vectors. Each
|
||||
@ -565,6 +573,7 @@ which is once every 30s.
|
||||
|
||||
|
||||
Delayed Logging: Log Item Pinning
|
||||
---------------------------------
|
||||
|
||||
Currently log items are pinned during transaction commit while the items are
|
||||
still locked. This happens just after the items are formatted, though it could
|
||||
@ -605,6 +614,7 @@ object, we have a race with CIL being flushed between the check and the pin
|
||||
lock to guarantee that we pin the items correctly.
|
||||
|
||||
Delayed Logging: Concurrent Scalability
|
||||
---------------------------------------
|
||||
|
||||
A fundamental requirement for the CIL is that accesses through transaction
|
||||
commits must scale to many concurrent commits. The current transaction commit
|
||||
@ -683,8 +693,9 @@ woken by the wrong event.
|
||||
|
||||
|
||||
Lifecycle Changes
|
||||
-----------------
|
||||
|
||||
The existing log item life cycle is as follows:
|
||||
The existing log item life cycle is as follows::
|
||||
|
||||
1. Transaction allocate
|
||||
2. Transaction reserve
|
||||
@ -729,7 +740,7 @@ at the same time. If the log item is in the AIL or between steps 6 and 7
|
||||
and steps 1-6 are re-entered, then the item is relogged. Only when steps 8-9
|
||||
are entered and completed is the object considered clean.
|
||||
|
||||
With delayed logging, there are new steps inserted into the life cycle:
|
||||
With delayed logging, there are new steps inserted into the life cycle::
|
||||
|
||||
1. Transaction allocate
|
||||
2. Transaction reserve
|
@ -1,8 +1,11 @@
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
============================
|
||||
XFS Self Describing Metadata
|
||||
----------------------------
|
||||
============================
|
||||
|
||||
Introduction
|
||||
------------
|
||||
============
|
||||
|
||||
The largest scalability problem facing XFS is not one of algorithmic
|
||||
scalability, but of verification of the filesystem structure. Scalabilty of the
|
||||
@ -34,7 +37,7 @@ required for basic forensic analysis of the filesystem structure.
|
||||
|
||||
|
||||
Self Describing Metadata
|
||||
------------------------
|
||||
========================
|
||||
|
||||
One of the problems with the current metadata format is that apart from the
|
||||
magic number in the metadata block, we have no other way of identifying what it
|
||||
@ -142,7 +145,7 @@ modification occurred between the corruption being written and when it was
|
||||
detected.
|
||||
|
||||
Runtime Validation
|
||||
------------------
|
||||
==================
|
||||
|
||||
Validation of self-describing metadata takes place at runtime in two places:
|
||||
|
||||
@ -183,18 +186,18 @@ error occurs during this process, the buffer is again marked with a EFSCORRUPTED
|
||||
error for the higher layers to catch.
|
||||
|
||||
Structures
|
||||
----------
|
||||
==========
|
||||
|
||||
A typical on-disk structure needs to contain the following information:
|
||||
A typical on-disk structure needs to contain the following information::
|
||||
|
||||
struct xfs_ondisk_hdr {
|
||||
__be32 magic; /* magic number */
|
||||
__be32 crc; /* CRC, not logged */
|
||||
uuid_t uuid; /* filesystem identifier */
|
||||
__be64 owner; /* parent object */
|
||||
__be64 blkno; /* location on disk */
|
||||
__be64 lsn; /* last modification in log, not logged */
|
||||
};
|
||||
struct xfs_ondisk_hdr {
|
||||
__be32 magic; /* magic number */
|
||||
__be32 crc; /* CRC, not logged */
|
||||
uuid_t uuid; /* filesystem identifier */
|
||||
__be64 owner; /* parent object */
|
||||
__be64 blkno; /* location on disk */
|
||||
__be64 lsn; /* last modification in log, not logged */
|
||||
};
|
||||
|
||||
Depending on the metadata, this information may be part of a header structure
|
||||
separate to the metadata contents, or may be distributed through an existing
|
||||
@ -214,24 +217,24 @@ level of information is generally provided. For example:
|
||||
well. hence the additional metadata headers change the overall format
|
||||
of the metadata.
|
||||
|
||||
A typical buffer read verifier is structured as follows:
|
||||
A typical buffer read verifier is structured as follows::
|
||||
|
||||
#define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc)
|
||||
#define XFS_FOO_CRC_OFF offsetof(struct xfs_ondisk_hdr, crc)
|
||||
|
||||
static void
|
||||
xfs_foo_read_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
static void
|
||||
xfs_foo_read_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
|
||||
if ((xfs_sb_version_hascrc(&mp->m_sb) &&
|
||||
!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
|
||||
XFS_FOO_CRC_OFF)) ||
|
||||
!xfs_foo_verify(bp)) {
|
||||
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
|
||||
xfs_buf_ioerror(bp, EFSCORRUPTED);
|
||||
}
|
||||
}
|
||||
if ((xfs_sb_version_hascrc(&mp->m_sb) &&
|
||||
!xfs_verify_cksum(bp->b_addr, BBTOB(bp->b_length),
|
||||
XFS_FOO_CRC_OFF)) ||
|
||||
!xfs_foo_verify(bp)) {
|
||||
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
|
||||
xfs_buf_ioerror(bp, EFSCORRUPTED);
|
||||
}
|
||||
}
|
||||
|
||||
The code ensures that the CRC is only checked if the filesystem has CRCs enabled
|
||||
by checking the superblock of the feature bit, and then if the CRC verifies OK
|
||||
@ -239,83 +242,83 @@ by checking the superblock of the feature bit, and then if the CRC verifies OK
|
||||
|
||||
The verifier function will take a couple of different forms, depending on
|
||||
whether the magic number can be used to determine the format of the block. In
|
||||
the case it can't, the code is structured as follows:
|
||||
the case it can't, the code is structured as follows::
|
||||
|
||||
static bool
|
||||
xfs_foo_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
static bool
|
||||
xfs_foo_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
|
||||
if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
|
||||
return false;
|
||||
if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
|
||||
return false;
|
||||
|
||||
if (!xfs_sb_version_hascrc(&mp->m_sb)) {
|
||||
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
|
||||
return false;
|
||||
if (bp->b_bn != be64_to_cpu(hdr->blkno))
|
||||
return false;
|
||||
if (hdr->owner == 0)
|
||||
return false;
|
||||
}
|
||||
if (!xfs_sb_version_hascrc(&mp->m_sb)) {
|
||||
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
|
||||
return false;
|
||||
if (bp->b_bn != be64_to_cpu(hdr->blkno))
|
||||
return false;
|
||||
if (hdr->owner == 0)
|
||||
return false;
|
||||
}
|
||||
|
||||
/* object specific verification checks here */
|
||||
/* object specific verification checks here */
|
||||
|
||||
return true;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
If there are different magic numbers for the different formats, the verifier
|
||||
will look like:
|
||||
will look like::
|
||||
|
||||
static bool
|
||||
xfs_foo_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
static bool
|
||||
xfs_foo_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
|
||||
if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) {
|
||||
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
|
||||
return false;
|
||||
if (bp->b_bn != be64_to_cpu(hdr->blkno))
|
||||
return false;
|
||||
if (hdr->owner == 0)
|
||||
return false;
|
||||
} else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
|
||||
return false;
|
||||
if (hdr->magic == cpu_to_be32(XFS_FOO_CRC_MAGIC)) {
|
||||
if (!uuid_equal(&hdr->uuid, &mp->m_sb.sb_uuid))
|
||||
return false;
|
||||
if (bp->b_bn != be64_to_cpu(hdr->blkno))
|
||||
return false;
|
||||
if (hdr->owner == 0)
|
||||
return false;
|
||||
} else if (hdr->magic != cpu_to_be32(XFS_FOO_MAGIC))
|
||||
return false;
|
||||
|
||||
/* object specific verification checks here */
|
||||
/* object specific verification checks here */
|
||||
|
||||
return true;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
Write verifiers are very similar to the read verifiers, they just do things in
|
||||
the opposite order to the read verifiers. A typical write verifier:
|
||||
the opposite order to the read verifiers. A typical write verifier::
|
||||
|
||||
static void
|
||||
xfs_foo_write_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_buf_log_item *bip = bp->b_fspriv;
|
||||
static void
|
||||
xfs_foo_write_verify(
|
||||
struct xfs_buf *bp)
|
||||
{
|
||||
struct xfs_mount *mp = bp->b_mount;
|
||||
struct xfs_buf_log_item *bip = bp->b_fspriv;
|
||||
|
||||
if (!xfs_foo_verify(bp)) {
|
||||
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
|
||||
xfs_buf_ioerror(bp, EFSCORRUPTED);
|
||||
return;
|
||||
}
|
||||
if (!xfs_foo_verify(bp)) {
|
||||
XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bp->b_addr);
|
||||
xfs_buf_ioerror(bp, EFSCORRUPTED);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!xfs_sb_version_hascrc(&mp->m_sb))
|
||||
return;
|
||||
if (!xfs_sb_version_hascrc(&mp->m_sb))
|
||||
return;
|
||||
|
||||
|
||||
if (bip) {
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn);
|
||||
}
|
||||
xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF);
|
||||
}
|
||||
if (bip) {
|
||||
struct xfs_ondisk_hdr *hdr = bp->b_addr;
|
||||
hdr->lsn = cpu_to_be64(bip->bli_item.li_lsn);
|
||||
}
|
||||
xfs_update_cksum(bp->b_addr, BBTOB(bp->b_length), XFS_FOO_CRC_OFF);
|
||||
}
|
||||
|
||||
This will verify the internal structure of the metadata before we go any
|
||||
further, detecting corruptions that have occurred as the metadata has been
|
||||
@ -324,7 +327,7 @@ update the LSN field (when it was last modified) and calculate the CRC on the
|
||||
metadata. Once this is done, we can issue the IO.
|
||||
|
||||
Inodes and Dquots
|
||||
-----------------
|
||||
=================
|
||||
|
||||
Inodes and dquots are special snowflakes. They have per-object CRC and
|
||||
self-identifiers, but they are packed so that there are multiple objects per
|
||||
@ -347,4 +350,3 @@ XXX: inode unlinked list modification doesn't recalculate the inode CRC! None of
|
||||
the unlinked list modifications check or update CRCs, neither during unlink nor
|
||||
log recovery. So, it's gone unnoticed until now. This won't matter immediately -
|
||||
repair will probably complain about it - but it needs to be fixed.
|
||||
|
@ -9,7 +9,7 @@ Configfs is a filesystem-based manager of kernel objects. IIO uses some
|
||||
objects that could be easily configured using configfs (e.g.: devices,
|
||||
triggers).
|
||||
|
||||
See Documentation/filesystems/configfs/configfs.txt for more information
|
||||
See Documentation/filesystems/configfs.rst for more information
|
||||
about how configfs works.
|
||||
|
||||
2. Usage
|
||||
|
@ -24,7 +24,7 @@ Linux provides a number of functions for gadgets to use.
|
||||
Creating a gadget means deciding what configurations there will be
|
||||
and which functions each configuration will provide.
|
||||
|
||||
Configfs (please see `Documentation/filesystems/configfs/*`) lends itself nicely
|
||||
Configfs (please see `Documentation/filesystems/configfs.rst`) lends itself nicely
|
||||
for the purpose of telling the kernel about the above mentioned decision.
|
||||
This document is about how to do it.
|
||||
|
||||
@ -354,7 +354,7 @@ the directories in general can be named at will. A group can have
|
||||
a number of its default sub-groups created automatically.
|
||||
|
||||
For more information on configfs please see
|
||||
`Documentation/filesystems/configfs/*`.
|
||||
`Documentation/filesystems/configfs.rst`.
|
||||
|
||||
The concepts described above translate to USB gadgets like this:
|
||||
|
||||
|
14
MAINTAINERS
14
MAINTAINERS
@ -3731,7 +3731,7 @@ CACHEFILES: FS-CACHE BACKEND FOR CACHING ON MOUNTED FILESYSTEMS
|
||||
M: David Howells <dhowells@redhat.com>
|
||||
L: linux-cachefs@redhat.com (moderated for non-subscribers)
|
||||
S: Supported
|
||||
F: Documentation/filesystems/caching/cachefiles.txt
|
||||
F: Documentation/filesystems/caching/cachefiles.rst
|
||||
F: fs/cachefiles/
|
||||
|
||||
CADENCE MIPI-CSI2 BRIDGES
|
||||
@ -4203,7 +4203,7 @@ M: coda@cs.cmu.edu
|
||||
L: codalist@coda.cs.cmu.edu
|
||||
S: Maintained
|
||||
W: http://www.coda.cs.cmu.edu/
|
||||
F: Documentation/filesystems/coda.txt
|
||||
F: Documentation/filesystems/coda.rst
|
||||
F: fs/coda/
|
||||
F: include/linux/coda*.h
|
||||
F: include/uapi/linux/coda*.h
|
||||
@ -4996,7 +4996,7 @@ M: Jan Kara <jack@suse.cz>
|
||||
R: Amir Goldstein <amir73il@gmail.com>
|
||||
L: linux-fsdevel@vger.kernel.org
|
||||
S: Maintained
|
||||
F: Documentation/filesystems/dnotify.txt
|
||||
F: Documentation/filesystems/dnotify.rst
|
||||
F: fs/notify/dnotify/
|
||||
F: include/linux/dnotify.h
|
||||
|
||||
@ -5010,7 +5010,7 @@ W: http://www.win.tue.nl/~aeb/partitions/partition_types-1.html
|
||||
DISKQUOTA
|
||||
M: Jan Kara <jack@suse.com>
|
||||
S: Maintained
|
||||
F: Documentation/filesystems/quota.txt
|
||||
F: Documentation/filesystems/quota.rst
|
||||
F: fs/quota/
|
||||
F: include/linux/quota*.h
|
||||
F: include/uapi/linux/quota*.h
|
||||
@ -15882,7 +15882,7 @@ M: Jeremy Kerr <jk@ozlabs.org>
|
||||
L: linuxppc-dev@lists.ozlabs.org
|
||||
S: Supported
|
||||
W: http://www.ibm.com/developerworks/power/cell/
|
||||
F: Documentation/filesystems/spufs.txt
|
||||
F: Documentation/filesystems/spufs/spufs.rst
|
||||
F: arch/powerpc/platforms/cell/spufs/
|
||||
|
||||
SQUASHFS FILE SYSTEM
|
||||
@ -18533,8 +18533,8 @@ W: http://xfs.org/
|
||||
T: git git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
|
||||
F: Documentation/ABI/testing/sysfs-fs-xfs
|
||||
F: Documentation/admin-guide/xfs.rst
|
||||
F: Documentation/filesystems/xfs-delayed-logging-design.txt
|
||||
F: Documentation/filesystems/xfs-self-describing-metadata.txt
|
||||
F: Documentation/filesystems/xfs-delayed-logging-design.rst
|
||||
F: Documentation/filesystems/xfs-self-describing-metadata.rst
|
||||
F: fs/xfs/
|
||||
F: include/uapi/linux/dqblk_xfs.h
|
||||
F: include/uapi/linux/fsmap.h
|
||||
|
@ -8,7 +8,7 @@ config CACHEFILES
|
||||
filesystems - primarily networking filesystems - thus allowing fast
|
||||
local disk to enhance the speed of slower devices.
|
||||
|
||||
See Documentation/filesystems/caching/cachefiles.txt for more
|
||||
See Documentation/filesystems/caching/cachefiles.rst for more
|
||||
information.
|
||||
|
||||
config CACHEFILES_DEBUG
|
||||
@ -36,5 +36,5 @@ config CACHEFILES_HISTOGRAM
|
||||
bouncing between CPUs. On the other hand, the histogram may be
|
||||
useful for debugging purposes. Saying 'N' here is recommended.
|
||||
|
||||
See Documentation/filesystems/caching/cachefiles.txt for more
|
||||
See Documentation/filesystems/caching/cachefiles.rst for more
|
||||
information.
|
||||
|
@ -15,7 +15,7 @@ config CODA_FS
|
||||
*client*. You will need user level code as well, both for the
|
||||
client and server. Servers are currently user level, i.e. they need
|
||||
no kernel support. Please read
|
||||
<file:Documentation/filesystems/coda.txt> and check out the Coda
|
||||
<file:Documentation/filesystems/coda.rst> and check out the Coda
|
||||
home page <http://www.coda.cs.cmu.edu/>.
|
||||
|
||||
To compile the coda client support as a module, choose M here: the
|
||||
|
@ -9,7 +9,7 @@
|
||||
*
|
||||
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
||||
*
|
||||
* Please see Documentation/filesystems/configfs/configfs.txt for more
|
||||
* Please see Documentation/filesystems/configfs.rst for more
|
||||
* information.
|
||||
*/
|
||||
|
||||
|
@ -9,7 +9,7 @@
|
||||
*
|
||||
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
||||
*
|
||||
* Please see the file Documentation/filesystems/configfs/configfs.txt for
|
||||
* Please see the file Documentation/filesystems/configfs.rst for
|
||||
* critical information about using the config_item interface.
|
||||
*/
|
||||
|
||||
|
@ -8,7 +8,7 @@ config FSCACHE
|
||||
Different sorts of caches can be plugged in, depending on the
|
||||
resources available.
|
||||
|
||||
See Documentation/filesystems/caching/fscache.txt for more information.
|
||||
See Documentation/filesystems/caching/fscache.rst for more information.
|
||||
|
||||
config FSCACHE_STATS
|
||||
bool "Gather statistical information on local caching"
|
||||
@ -25,7 +25,7 @@ config FSCACHE_STATS
|
||||
between CPUs. On the other hand, the stats are very useful for
|
||||
debugging purposes. Saying 'Y' here is recommended.
|
||||
|
||||
See Documentation/filesystems/caching/fscache.txt for more information.
|
||||
See Documentation/filesystems/caching/fscache.rst for more information.
|
||||
|
||||
config FSCACHE_HISTOGRAM
|
||||
bool "Gather latency information on local caching"
|
||||
@ -42,7 +42,7 @@ config FSCACHE_HISTOGRAM
|
||||
bouncing between CPUs. On the other hand, the histogram may be
|
||||
useful for debugging purposes. Saying 'N' here is recommended.
|
||||
|
||||
See Documentation/filesystems/caching/fscache.txt for more information.
|
||||
See Documentation/filesystems/caching/fscache.rst for more information.
|
||||
|
||||
config FSCACHE_DEBUG
|
||||
bool "Debug FS-Cache"
|
||||
@ -52,7 +52,7 @@ config FSCACHE_DEBUG
|
||||
management module. If this is set, the debugging output may be
|
||||
enabled by setting bits in /sys/modules/fscache/parameter/debug.
|
||||
|
||||
See Documentation/filesystems/caching/fscache.txt for more information.
|
||||
See Documentation/filesystems/caching/fscache.rst for more information.
|
||||
|
||||
config FSCACHE_OBJECT_LIST
|
||||
bool "Maintain global object list for debugging purposes"
|
||||
|
@ -172,7 +172,7 @@ struct fscache_cache *fscache_select_cache_for_object(
|
||||
*
|
||||
* Initialise a record of a cache and fill in the name.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
void fscache_init_cache(struct fscache_cache *cache,
|
||||
@ -207,7 +207,7 @@ EXPORT_SYMBOL(fscache_init_cache);
|
||||
*
|
||||
* Add a cache to the system, making it available for netfs's to use.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
int fscache_add_cache(struct fscache_cache *cache,
|
||||
@ -307,7 +307,7 @@ EXPORT_SYMBOL(fscache_add_cache);
|
||||
* Note that an I/O error occurred in a cache and that it should no longer be
|
||||
* used for anything. This also reports the error into the kernel log.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
void fscache_io_error(struct fscache_cache *cache)
|
||||
@ -355,7 +355,7 @@ static void fscache_withdraw_all_objects(struct fscache_cache *cache,
|
||||
* Withdraw a cache from service, unbinding all its cache objects from the
|
||||
* netfs cookies they're currently representing.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
void fscache_withdraw_cache(struct fscache_cache *cache)
|
||||
|
@ -4,7 +4,7 @@
|
||||
* Copyright (C) 2004-2007 Red Hat, Inc. All Rights Reserved.
|
||||
* Written by David Howells (dhowells@redhat.com)
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for more information on
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for more information on
|
||||
* the netfs API.
|
||||
*/
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
* Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
|
||||
* Written by David Howells (dhowells@redhat.com)
|
||||
*
|
||||
* See Documentation/filesystems/caching/object.txt for a description of the
|
||||
* See Documentation/filesystems/caching/object.rst for a description of the
|
||||
* object state machine and the in-kernel representations.
|
||||
*/
|
||||
|
||||
@ -295,7 +295,7 @@ static void fscache_object_work_func(struct work_struct *work)
|
||||
*
|
||||
* Initialise a cache object description to its basic values.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
void fscache_object_init(struct fscache_object *object,
|
||||
|
@ -4,7 +4,7 @@
|
||||
* Copyright (C) 2008 Red Hat, Inc. All Rights Reserved.
|
||||
* Written by David Howells (dhowells@redhat.com)
|
||||
*
|
||||
* See Documentation/filesystems/caching/operations.txt
|
||||
* See Documentation/filesystems/caching/operations.rst
|
||||
*/
|
||||
|
||||
#define FSCACHE_DEBUG_LEVEL OPERATION
|
||||
|
@ -61,7 +61,7 @@
|
||||
*
|
||||
* Initial implementation of mandatory locks. SunOS turned out to be
|
||||
* a rotten model, so I implemented the "obvious" semantics.
|
||||
* See 'Documentation/filesystems/mandatory-locking.txt' for details.
|
||||
* See 'Documentation/filesystems/mandatory-locking.rst' for details.
|
||||
* Andy Walker (andy@lysaker.kvaerner.no), April 06, 1996.
|
||||
*
|
||||
* Don't allow mandatory locks on mmap()'ed files. Added simple functions to
|
||||
|
@ -13,7 +13,7 @@
|
||||
*
|
||||
* configfs Copyright (C) 2005 Oracle. All rights reserved.
|
||||
*
|
||||
* Please read Documentation/filesystems/configfs/configfs.txt before using
|
||||
* Please read Documentation/filesystems/configfs.rst before using
|
||||
* the configfs interface, ESPECIALLY the parts about reference counts and
|
||||
* item destructors.
|
||||
*/
|
||||
|
@ -85,7 +85,7 @@ struct p_log {
|
||||
* Superblock creation fills in ->root whereas reconfiguration begins with this
|
||||
* already set.
|
||||
*
|
||||
* See Documentation/filesystems/mount_api.txt
|
||||
* See Documentation/filesystems/mount_api.rst
|
||||
*/
|
||||
struct fs_context {
|
||||
const struct fs_context_operations *ops;
|
||||
|
@ -6,7 +6,7 @@
|
||||
*
|
||||
* NOTE!!! See:
|
||||
*
|
||||
* Documentation/filesystems/caching/backend-api.txt
|
||||
* Documentation/filesystems/caching/backend-api.rst
|
||||
*
|
||||
* for a description of the cache backend interface declared here.
|
||||
*/
|
||||
@ -454,7 +454,7 @@ static inline void fscache_object_lookup_error(struct fscache_object *object)
|
||||
* Set the maximum size an object is permitted to reach, implying the highest
|
||||
* byte that may be written. Intended to be called by the attr_changed() op.
|
||||
*
|
||||
* See Documentation/filesystems/caching/backend-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/backend-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
|
@ -6,7 +6,7 @@
|
||||
*
|
||||
* NOTE!!! See:
|
||||
*
|
||||
* Documentation/filesystems/caching/netfs-api.txt
|
||||
* Documentation/filesystems/caching/netfs-api.rst
|
||||
*
|
||||
* for a description of the network filesystem interface declared here.
|
||||
*/
|
||||
@ -233,7 +233,7 @@ extern void __fscache_enable_cookie(struct fscache_cookie *, const void *, loff_
|
||||
*
|
||||
* Register a filesystem as desiring caching services if they're available.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -253,7 +253,7 @@ int fscache_register_netfs(struct fscache_netfs *netfs)
|
||||
* Indicate that a filesystem no longer desires caching services for the
|
||||
* moment.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -270,7 +270,7 @@ void fscache_unregister_netfs(struct fscache_netfs *netfs)
|
||||
* Acquire a specific cache referral tag that can be used to select a specific
|
||||
* cache in which to cache an index.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -288,7 +288,7 @@ struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name)
|
||||
*
|
||||
* Release a reference to a cache referral tag previously looked up.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -315,7 +315,7 @@ void fscache_release_cache_tag(struct fscache_cache_tag *tag)
|
||||
* that can be used to locate files. This is done by requesting a cookie for
|
||||
* each index in the path to the file.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -351,7 +351,7 @@ struct fscache_cookie *fscache_acquire_cookie(
|
||||
* provided to update the auxiliary data in the cache before the object is
|
||||
* disconnected.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -394,7 +394,7 @@ int fscache_check_consistency(struct fscache_cookie *cookie,
|
||||
* cookie. The auxiliary data on the cookie will be updated first if @aux_data
|
||||
* is set.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -410,7 +410,7 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data)
|
||||
*
|
||||
* Permit data-storage cache objects to be pinned in the cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -425,7 +425,7 @@ int fscache_pin_cookie(struct fscache_cookie *cookie)
|
||||
*
|
||||
* Permit data-storage cache objects to be unpinned from the cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -441,7 +441,7 @@ void fscache_unpin_cookie(struct fscache_cookie *cookie)
|
||||
* changed. This includes the data size. These attributes will be obtained
|
||||
* through the get_attr() cookie definition op.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -463,7 +463,7 @@ int fscache_attr_changed(struct fscache_cookie *cookie)
|
||||
*
|
||||
* This can be called with spinlocks held.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -479,7 +479,7 @@ void fscache_invalidate(struct fscache_cookie *cookie)
|
||||
*
|
||||
* Wait for the invalidation of an object to complete.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -498,7 +498,7 @@ void fscache_wait_on_invalidate(struct fscache_cookie *cookie)
|
||||
* cookie so that a write to that object within the space can always be
|
||||
* honoured.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -533,7 +533,7 @@ int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size)
|
||||
* Else, if the page is unbacked, -ENODATA is returned and a block may have
|
||||
* been allocated in the cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -582,7 +582,7 @@ int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
|
||||
* regard to different pages, the return values are prioritised in that order.
|
||||
* Any pages submitted for reading are removed from the pages list.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -617,7 +617,7 @@ int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
|
||||
* Else, a block will be allocated if one wasn't already, and 0 will be
|
||||
* returned
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -667,7 +667,7 @@ void fscache_readpages_cancel(struct fscache_cookie *cookie,
|
||||
* be cleared at the completion of the write to indicate the success or failure
|
||||
* of the operation. Note that the completion may happen before the return.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -693,7 +693,7 @@ int fscache_write_page(struct fscache_cookie *cookie,
|
||||
* Note that this cannot cancel any outstanding I/O operations between this
|
||||
* page and the cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -711,7 +711,7 @@ void fscache_uncache_page(struct fscache_cookie *cookie,
|
||||
*
|
||||
* Ask the cache if a page is being written to the cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
@ -731,7 +731,7 @@ bool fscache_check_page_write(struct fscache_cookie *cookie,
|
||||
* Ask the cache to wake us up when a page is no longer being written to the
|
||||
* cache.
|
||||
*
|
||||
* See Documentation/filesystems/caching/netfs-api.txt for a complete
|
||||
* See Documentation/filesystems/caching/netfs-api.rst for a complete
|
||||
* description.
|
||||
*/
|
||||
static inline
|
||||
|
@ -77,7 +77,7 @@
|
||||
* state. This is called immediately after commit_creds().
|
||||
*
|
||||
* Security hooks for mount using fs_context.
|
||||
* [See also Documentation/filesystems/mount_api.txt]
|
||||
* [See also Documentation/filesystems/mount_api.rst]
|
||||
*
|
||||
* @fs_context_dup:
|
||||
* Allocate and attach a security structure to sc->security. This pointer
|
||||
|
Loading…
Reference in New Issue
Block a user