Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
This commit is contained in:
Linus Torvalds 2016-12-16 10:24:44 -08:00
commit 9a19a6db37
73 changed files with 264 additions and 9721 deletions

View File

@ -87,8 +87,6 @@ jfs.txt
- info and mount options for the JFS filesystem.
locks.txt
- info on file locking implementations, flock() vs. fcntl(), etc.
logfs.txt
- info on the LogFS flash filesystem.
mandatory-locking.txt
- info on the Linux implementation of Sys V mandatory file locking.
ncpfs.txt

View File

@ -1,241 +0,0 @@
The LogFS Flash Filesystem
==========================
Specification
=============
Superblocks
-----------
Two superblocks exist at the beginning and end of the filesystem.
Each superblock is 256 Bytes large, with another 3840 Bytes reserved
for future purposes, making a total of 4096 Bytes.
Superblock locations may differ for MTD and block devices. On MTD the
first non-bad block contains a superblock in the first 4096 Bytes and
the last non-bad block contains a superblock in the last 4096 Bytes.
On block devices, the first 4096 Bytes of the device contain the first
superblock and the last aligned 4096 Byte-block contains the second
superblock.
For the most part, the superblocks can be considered read-only. They
are written only to correct errors detected within the superblocks,
move the journal and change the filesystem parameters through tunefs.
As a result, the superblock does not contain any fields that require
constant updates, like the amount of free space, etc.
Segments
--------
The space in the device is split up into equal-sized segments.
Segments are the primary write unit of LogFS. Within each segments,
writes happen from front (low addresses) to back (high addresses. If
only a partial segment has been written, the segment number, the
current position within and optionally a write buffer are stored in
the journal.
Segments are erased as a whole. Therefore Garbage Collection may be
required to completely free a segment before doing so.
Journal
--------
The journal contains all global information about the filesystem that
is subject to frequent change. At mount time, it has to be scanned
for the most recent commit entry, which contains a list of pointers to
all currently valid entries.
Object Store
------------
All space except for the superblocks and journal is part of the object
store. Each segment contains a segment header and a number of
objects, each consisting of the object header and the payload.
Objects are either inodes, directory entries (dentries), file data
blocks or indirect blocks.
Levels
------
Garbage collection (GC) may fail if all data is written
indiscriminately. One requirement of GC is that data is separated
roughly according to the distance between the tree root and the data.
Effectively that means all file data is on level 0, indirect blocks
are on levels 1, 2, 3 4 or 5 for 1x, 2x, 3x, 4x or 5x indirect blocks,
respectively. Inode file data is on level 6 for the inodes and 7-11
for indirect blocks.
Each segment contains objects of a single level only. As a result,
each level requires its own separate segment to be open for writing.
Inode File
----------
All inodes are stored in a special file, the inode file. Single
exception is the inode file's inode (master inode) which for obvious
reasons is stored in the journal instead. Instead of data blocks, the
leaf nodes of the inode files are inodes.
Aliases
-------
Writes in LogFS are done by means of a wandering tree. A naïve
implementation would require that for each write or a block, all
parent blocks are written as well, since the block pointers have
changed. Such an implementation would not be very efficient.
In LogFS, the block pointer changes are cached in the journal by means
of alias entries. Each alias consists of its logical address - inode
number, block index, level and child number (index into block) - and
the changed data. Any 8-byte word can be changes in this manner.
Currently aliases are used for block pointers, file size, file used
bytes and the height of an inodes indirect tree.
Segment Aliases
---------------
Related to regular aliases, these are used to handle bad blocks.
Initially, bad blocks are handled by moving the affected segment
content to a spare segment and noting this move in the journal with a
segment alias, a simple (to, from) tupel. GC will later empty this
segment and the alias can be removed again. This is used on MTD only.
Vim
---
By cleverly predicting the life time of data, it is possible to
separate long-living data from short-living data and thereby reduce
the GC overhead later. Each type of distinc life expectency (vim) can
have a separate segment open for writing. Each (level, vim) tupel can
be open just once. If an open segment with unknown vim is encountered
at mount time, it is closed and ignored henceforth.
Indirect Tree
-------------
Inodes in LogFS are similar to FFS-style filesystems with direct and
indirect block pointers. One difference is that LogFS uses a single
indirect pointer that can be either a 1x, 2x, etc. indirect pointer.
A height field in the inode defines the height of the indirect tree
and thereby the indirection of the pointer.
Another difference is the addressing of indirect blocks. In LogFS,
the first 16 pointers in the first indirect block are left empty,
corresponding to the 16 direct pointers in the inode. In ext2 (maybe
others as well) the first pointer in the first indirect block
corresponds to logical block 12, skipping the 12 direct pointers.
So where ext2 is using arithmetic to better utilize space, LogFS keeps
arithmetic simple and uses compression to save space.
Compression
-----------
Both file data and metadata can be compressed. Compression for file
data can be enabled with chattr +c and disabled with chattr -c. Doing
so has no effect on existing data, but new data will be stored
accordingly. New inodes will inherit the compression flag of the
parent directory.
Metadata is always compressed. However, the space accounting ignores
this and charges for the uncompressed size. Failing to do so could
result in GC failures when, after moving some data, indirect blocks
compress worse than previously. Even on a 100% full medium, GC may
not consume any extra space, so the compression gains are lost space
to the user.
However, they are not lost space to the filesystem internals. By
cheating the user for those bytes, the filesystem gained some slack
space and GC will run less often and faster.
Garbage Collection and Wear Leveling
------------------------------------
Garbage collection is invoked whenever the number of free segments
falls below a threshold. The best (known) candidate is picked based
on the least amount of valid data contained in the segment. All
remaining valid data is copied elsewhere, thereby invalidating it.
The GC code also checks for aliases and writes then back if their
number gets too large.
Wear leveling is done by occasionally picking a suboptimal segment for
garbage collection. If a stale segments erase count is significantly
lower than the active segments' erase counts, it will be picked. Wear
leveling is rate limited, so it will never monopolize the device for
more than one segment worth at a time.
Values for "occasionally", "significantly lower" are compile time
constants.
Hashed directories
------------------
To satisfy efficient lookup(), directory entries are hashed and
located based on the hash. In order to both support large directories
and not be overly inefficient for small directories, several hash
tables of increasing size are used. For each table, the hash value
modulo the table size gives the table index.
Tables sizes are chosen to limit the number of indirect blocks with a
fully populated table to 0, 1, 2 or 3 respectively. So the first
table contains 16 entries, the second 512-16, etc.
The last table is special in several ways. First its size depends on
the effective 32bit limit on telldir/seekdir cookies. Since logfs
uses the upper half of the address space for indirect blocks, the size
is limited to 2^31. Secondly the table contains hash buckets with 16
entries each.
Using single-entry buckets would result in birthday "attacks". At
just 2^16 used entries, hash collisions would be likely (P >= 0.5).
My math skills are insufficient to do the combinatorics for the 17x
collisions necessary to overflow a bucket, but testing showed that in
10,000 runs the lowest directory fill before a bucket overflow was
188,057,130 entries with an average of 315,149,915 entries. So for
directory sizes of up to a million, bucket overflows should be
virtually impossible under normal circumstances.
With carefully chosen filenames, it is obviously possible to cause an
overflow with just 21 entries (4 higher tables + 16 entries + 1). So
there may be a security concern if a malicious user has write access
to a directory.
Open For Discussion
===================
Device Address Space
--------------------
A device address space is used for caching. Both block devices and
MTD provide functions to either read a single page or write a segment.
Partial segments may be written for data integrity, but where possible
complete segments are written for performance on simple block device
flash media.
Meta Inodes
-----------
Inodes are stored in the inode file, which is just a regular file for
most purposes. At umount time, however, the inode file needs to
remain open until all dirty inodes are written. So
generic_shutdown_super() may not close this inode, but shouldn't
complain about remaining inodes due to the inode file either. Same
goes for mapping inode of the device address space.
Currently logfs uses a hack that essentially copies part of fs/inode.c
code over. A general solution would be preferred.
Indirect block mapping
----------------------
With compression, the block device (or mapping inode) cannot be used
to cache indirect blocks. Some other place is required. Currently
logfs uses the top half of each inode's address space. The low 8TB
(on 32bit) are filled with file data, the high 8TB are used for
indirect blocks.
One problem is that 16TB files created on 64bit systems actually have
data in the top 8TB. But files >16TB would cause problems anyway, so
only the limit has changed.

View File

@ -7564,14 +7564,6 @@ S: Maintained
F: Documentation/ldm.txt
F: block/partitions/ldm.*
LogFS
M: Joern Engel <joern@logfs.org>
M: Prasad Joshi <prasadjoshi.linux@gmail.com>
L: logfs@logfs.org
W: logfs.org
S: Maintained
F: fs/logfs/
LSILOGIC MPT FUSION DRIVERS (FC/SAS/SPI)
M: Sathya Prakash <sathya.prakash@broadcom.com>
M: Chaitra P B <chaitra.basappa@broadcom.com>

View File

@ -181,7 +181,7 @@ static inline ssize_t vhci_get_user(struct vhci_data *data,
if (!skb)
return -ENOMEM;
if (copy_from_iter(skb_put(skb, len), len, from) != len) {
if (!copy_from_iter_full(skb_put(skb, len), len, from)) {
kfree_skb(skb);
return -EFAULT;
}

View File

@ -2523,7 +2523,7 @@ static void amdgpu_debugfs_remove_files(struct amdgpu_device *adev)
static ssize_t amdgpu_debugfs_regs_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
bool pm_pg_lock, use_bank;
@ -2599,7 +2599,7 @@ static ssize_t amdgpu_debugfs_regs_read(struct file *f, char __user *buf,
static ssize_t amdgpu_debugfs_regs_write(struct file *f, const char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
bool pm_pg_lock, use_bank;
@ -2673,7 +2673,7 @@ static ssize_t amdgpu_debugfs_regs_write(struct file *f, const char __user *buf,
static ssize_t amdgpu_debugfs_regs_pcie_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2700,7 +2700,7 @@ static ssize_t amdgpu_debugfs_regs_pcie_read(struct file *f, char __user *buf,
static ssize_t amdgpu_debugfs_regs_pcie_write(struct file *f, const char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2728,7 +2728,7 @@ static ssize_t amdgpu_debugfs_regs_pcie_write(struct file *f, const char __user
static ssize_t amdgpu_debugfs_regs_didt_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2755,7 +2755,7 @@ static ssize_t amdgpu_debugfs_regs_didt_read(struct file *f, char __user *buf,
static ssize_t amdgpu_debugfs_regs_didt_write(struct file *f, const char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2783,7 +2783,7 @@ static ssize_t amdgpu_debugfs_regs_didt_write(struct file *f, const char __user
static ssize_t amdgpu_debugfs_regs_smc_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2810,7 +2810,7 @@ static ssize_t amdgpu_debugfs_regs_smc_read(struct file *f, char __user *buf,
static ssize_t amdgpu_debugfs_regs_smc_write(struct file *f, const char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -2838,7 +2838,7 @@ static ssize_t amdgpu_debugfs_regs_smc_write(struct file *f, const char __user *
static ssize_t amdgpu_debugfs_gca_config_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
uint32_t *config, no_regs = 0;
@ -2908,7 +2908,7 @@ static ssize_t amdgpu_debugfs_gca_config_read(struct file *f, char __user *buf,
static ssize_t amdgpu_debugfs_sensor_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
int idx, r;
int32_t value;

View File

@ -280,7 +280,7 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
static ssize_t amdgpu_debugfs_ring_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_ring *ring = (struct amdgpu_ring*)f->f_inode->i_private;
struct amdgpu_ring *ring = file_inode(f)->i_private;
int r, i;
uint32_t value, result, early[3];

View File

@ -1511,7 +1511,7 @@ static const struct drm_info_list amdgpu_ttm_debugfs_list[] = {
static ssize_t amdgpu_ttm_vram_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;
@ -1555,7 +1555,7 @@ static const struct file_operations amdgpu_ttm_vram_fops = {
static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf,
size_t size, loff_t *pos)
{
struct amdgpu_device *adev = f->f_inode->i_private;
struct amdgpu_device *adev = file_inode(f)->i_private;
ssize_t result = 0;
int r;

View File

@ -679,7 +679,6 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
int depth;
bool zerocopy = false;
size_t linear;
ssize_t n;
if (q->flags & IFF_VNET_HDR) {
vnet_hdr_len = q->vnet_hdr_sz;
@ -690,8 +689,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, struct msghdr *m,
len -= vnet_hdr_len;
err = -EFAULT;
n = copy_from_iter(&vnet_hdr, sizeof(vnet_hdr), from);
if (n != sizeof(vnet_hdr))
if (!copy_from_iter_full(&vnet_hdr, sizeof(vnet_hdr), from))
goto err;
iov_iter_advance(from, vnet_hdr_len - sizeof(vnet_hdr));
if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&

View File

@ -1156,7 +1156,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
bool zerocopy = false;
int err;
u32 rxhash;
ssize_t n;
if (!(tun->dev->flags & IFF_UP))
return -EIO;
@ -1166,8 +1165,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
return -EINVAL;
len -= sizeof(pi);
n = copy_from_iter(&pi, sizeof(pi), from);
if (n != sizeof(pi))
if (!copy_from_iter_full(&pi, sizeof(pi), from))
return -EFAULT;
}
@ -1176,8 +1174,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
return -EINVAL;
len -= tun->vnet_hdr_sz;
n = copy_from_iter(&gso, sizeof(gso), from);
if (n != sizeof(gso))
if (!copy_from_iter_full(&gso, sizeof(gso), from))
return -EFAULT;
if ((gso.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&

View File

@ -1092,7 +1092,7 @@ static ssize_t gb_camera_debugfs_read(struct file *file, char __user *buf,
size_t len, loff_t *offset)
{
const struct gb_camera_debugfs_entry *op = file->private_data;
struct gb_camera *gcam = file->f_inode->i_private;
struct gb_camera *gcam = file_inode(file)->i_private;
struct gb_camera_debugfs_buffer *buffer;
ssize_t ret;
@ -1114,7 +1114,7 @@ static ssize_t gb_camera_debugfs_write(struct file *file,
loff_t *offset)
{
const struct gb_camera_debugfs_entry *op = file->private_data;
struct gb_camera *gcam = file->f_inode->i_private;
struct gb_camera *gcam = file_inode(file)->i_private;
ssize_t ret;
char *kbuf;

View File

@ -1249,7 +1249,7 @@ static int apb_log_poll(void *data)
static ssize_t apb_log_read(struct file *f, char __user *buf,
size_t count, loff_t *ppos)
{
struct es2_ap_dev *es2 = f->f_inode->i_private;
struct es2_ap_dev *es2 = file_inode(f)->i_private;
ssize_t ret;
size_t copied;
char *tmp_buf;
@ -1303,7 +1303,7 @@ static void usb_log_disable(struct es2_ap_dev *es2)
static ssize_t apb_log_enable_read(struct file *f, char __user *buf,
size_t count, loff_t *ppos)
{
struct es2_ap_dev *es2 = f->f_inode->i_private;
struct es2_ap_dev *es2 = file_inode(f)->i_private;
int enable = !IS_ERR_OR_NULL(es2->apb_log_task);
char tmp_buf[3];
@ -1316,7 +1316,7 @@ static ssize_t apb_log_enable_write(struct file *f, const char __user *buf,
{
int enable;
ssize_t retval;
struct es2_ap_dev *es2 = f->f_inode->i_private;
struct es2_ap_dev *es2 = file_inode(f)->i_private;
retval = kstrtoint_from_user(buf, count, 10, &enable);
if (retval)

View File

@ -757,7 +757,7 @@ static int gb_svc_version_request(struct gb_operation *op)
static ssize_t pwr_debugfs_voltage_read(struct file *file, char __user *buf,
size_t len, loff_t *offset)
{
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file->f_inode->i_private;
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file_inode(file)->i_private;
struct gb_svc *svc = pwrmon_rails->svc;
int ret, desc;
u32 value;
@ -780,7 +780,7 @@ static ssize_t pwr_debugfs_voltage_read(struct file *file, char __user *buf,
static ssize_t pwr_debugfs_current_read(struct file *file, char __user *buf,
size_t len, loff_t *offset)
{
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file->f_inode->i_private;
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file_inode(file)->i_private;
struct gb_svc *svc = pwrmon_rails->svc;
int ret, desc;
u32 value;
@ -803,7 +803,7 @@ static ssize_t pwr_debugfs_current_read(struct file *file, char __user *buf,
static ssize_t pwr_debugfs_power_read(struct file *file, char __user *buf,
size_t len, loff_t *offset)
{
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file->f_inode->i_private;
struct svc_debugfs_pwrmon_rail *pwrmon_rails = file_inode(file)->i_private;
struct gb_svc *svc = pwrmon_rails->svc;
int ret, desc;
u32 value;

View File

@ -921,7 +921,7 @@ EXPORT_SYMBOL_GPL(gb_timesync_schedule_asynchronous);
static ssize_t gb_timesync_ping_read(struct file *file, char __user *ubuf,
size_t len, loff_t *offset, bool ktime)
{
struct gb_timesync_svc *timesync_svc = file->f_inode->i_private;
struct gb_timesync_svc *timesync_svc = file_inode(file)->i_private;
char *buf;
ssize_t ret = 0;

View File

@ -57,9 +57,6 @@ static void ll_release(struct dentry *de)
LASSERT(de);
lld = ll_d2d(de);
if (!lld) /* NFS copies the de->d_op methods (bug 4655) */
return;
if (lld->lld_it) {
ll_intent_release(lld->lld_it);
kfree(lld->lld_it);
@ -126,30 +123,13 @@ static int ll_ddelete(const struct dentry *de)
return 0;
}
int ll_d_init(struct dentry *de)
static int ll_d_init(struct dentry *de)
{
CDEBUG(D_DENTRY, "ldd on dentry %pd (%p) parent %p inode %p refc %d\n",
de, de, de->d_parent, d_inode(de), d_count(de));
if (!de->d_fsdata) {
struct ll_dentry_data *lld;
lld = kzalloc(sizeof(*lld), GFP_NOFS);
if (likely(lld)) {
spin_lock(&de->d_lock);
if (likely(!de->d_fsdata)) {
de->d_fsdata = lld;
__d_lustre_invalidate(de);
} else {
kfree(lld);
}
spin_unlock(&de->d_lock);
} else {
return -ENOMEM;
}
}
LASSERT(de->d_op == &ll_d_ops);
struct ll_dentry_data *lld = kzalloc(sizeof(*lld), GFP_KERNEL);
if (unlikely(!lld))
return -ENOMEM;
lld->lld_invalid = 1;
de->d_fsdata = lld;
return 0;
}
@ -300,6 +280,7 @@ static int ll_revalidate_nd(struct dentry *dentry, unsigned int flags)
}
const struct dentry_operations ll_d_ops = {
.d_init = ll_d_init,
.d_revalidate = ll_revalidate_nd,
.d_release = ll_release,
.d_delete = ll_ddelete,

View File

@ -769,7 +769,6 @@ int ll_hsm_release(struct inode *inode);
/* llite/dcache.c */
int ll_d_init(struct dentry *de);
extern const struct dentry_operations ll_d_ops;
void ll_intent_drop_lock(struct lookup_intent *);
void ll_intent_release(struct lookup_intent *);
@ -1148,7 +1147,7 @@ dentry_may_statahead(struct inode *dir, struct dentry *dentry)
* 'lld_sa_generation == lli->lli_sa_generation'.
*/
ldd = ll_d2d(dentry);
if (ldd && ldd->lld_sa_generation == lli->lli_sa_generation)
if (ldd->lld_sa_generation == lli->lli_sa_generation)
return false;
return true;
@ -1267,17 +1266,7 @@ static inline void ll_set_lock_data(struct obd_export *exp, struct inode *inode,
static inline int d_lustre_invalid(const struct dentry *dentry)
{
struct ll_dentry_data *lld = ll_d2d(dentry);
return !lld || lld->lld_invalid;
}
static inline void __d_lustre_invalidate(struct dentry *dentry)
{
struct ll_dentry_data *lld = ll_d2d(dentry);
if (lld)
lld->lld_invalid = 1;
return ll_d2d(dentry)->lld_invalid;
}
/*
@ -1293,7 +1282,7 @@ static inline void d_lustre_invalidate(struct dentry *dentry, int nested)
spin_lock_nested(&dentry->d_lock,
nested ? DENTRY_D_LOCK_NESTED : DENTRY_D_LOCK_NORMAL);
__d_lustre_invalidate(dentry);
ll_d2d(dentry)->lld_invalid = 1;
/*
* We should be careful about dentries created by d_obtain_alias().
* These dentries are not put in the dentry tree, instead they are

View File

@ -169,22 +169,12 @@ ll_iget_for_nfs(struct super_block *sb, struct lu_fid *fid, struct lu_fid *paren
/* N.B. d_obtain_alias() drops inode ref on error */
result = d_obtain_alias(inode);
if (!IS_ERR(result)) {
int rc;
rc = ll_d_init(result);
if (rc < 0) {
dput(result);
result = ERR_PTR(rc);
} else {
struct ll_dentry_data *ldd = ll_d2d(result);
/*
* Need to signal to the ll_intent_file_open that
* we came from NFS and so opencache needs to be
* enabled for this one
*/
ldd->lld_nfs_dentry = 1;
}
/*
* Need to signal to the ll_intent_file_open that
* we came from NFS and so opencache needs to be
* enabled for this one
*/
ll_d2d(result)->lld_nfs_dentry = 1;
}
return result;

View File

@ -432,17 +432,9 @@ static struct dentry *ll_find_alias(struct inode *inode, struct dentry *dentry)
*/
struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
{
struct dentry *new;
int rc;
if (inode) {
new = ll_find_alias(inode, de);
struct dentry *new = ll_find_alias(inode, de);
if (new) {
rc = ll_d_init(new);
if (rc < 0) {
dput(new);
return ERR_PTR(rc);
}
d_move(new, de);
iput(inode);
CDEBUG(D_DENTRY,
@ -451,9 +443,6 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de)
return new;
}
}
rc = ll_d_init(de);
if (rc < 0)
return ERR_PTR(rc);
d_add(de, inode);
CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n",
de, d_inode(de), d_count(de), de->d_flags);

View File

@ -1519,9 +1519,7 @@ static int revalidate_statahead_dentry(struct inode *dir,
* dentry_may_statahead().
*/
ldd = ll_d2d(*dentryp);
/* ldd can be NULL if llite lookup failed. */
if (ldd)
ldd->lld_sa_generation = lli->lli_sa_generation;
ldd->lld_sa_generation = lli->lli_sa_generation;
sa_put(sai, entry);
return rc;
}

View File

@ -143,7 +143,7 @@ static ssize_t target_core_item_dbroot_store(struct config_item *item,
pr_err("db_root: cannot open: %s\n", db_root_stage);
return -EINVAL;
}
if (!S_ISDIR(fp->f_inode->i_mode)) {
if (!S_ISDIR(file_inode(fp)->i_mode)) {
filp_close(fp, 0);
mutex_unlock(&g_tf_lock);
pr_err("db_root: not a directory: %s\n", db_root_stage);

View File

@ -949,7 +949,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
goto error_mutex;
}
if (!io_data->read &&
copy_from_iter(data, data_len, &io_data->data) != data_len) {
!copy_from_iter_full(data, data_len, &io_data->data)) {
ret = -EFAULT;
goto error_mutex;
}

View File

@ -667,7 +667,7 @@ ep_write_iter(struct kiocb *iocb, struct iov_iter *from)
return -ENOMEM;
}
if (unlikely(copy_from_iter(buf, len, from) != len)) {
if (unlikely(!copy_from_iter_full(buf, len, from))) {
value = -EFAULT;
goto out;
}

View File

@ -922,8 +922,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
*/
iov_iter_init(&out_iter, WRITE, vq->iov, out, out_size);
ret = copy_from_iter(req, req_size, &out_iter);
if (unlikely(ret != req_size)) {
if (unlikely(!copy_from_iter_full(req, req_size, &out_iter))) {
vq_err(vq, "Faulted on copy_from_iter\n");
vhost_scsi_send_bad_target(vs, vq, head, out);
continue;

View File

@ -1863,8 +1863,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
i, count);
return -EINVAL;
}
if (unlikely(copy_from_iter(&desc, sizeof(desc), &from) !=
sizeof(desc))) {
if (unlikely(!copy_from_iter_full(&desc, sizeof(desc), &from))) {
vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc);
return -EINVAL;

View File

@ -234,7 +234,6 @@ source "fs/efs/Kconfig"
source "fs/jffs2/Kconfig"
# UBIFS File system configuration
source "fs/ubifs/Kconfig"
source "fs/logfs/Kconfig"
source "fs/cramfs/Kconfig"
source "fs/squashfs/Kconfig"
source "fs/freevxfs/Kconfig"

View File

@ -97,7 +97,6 @@ obj-$(CONFIG_NTFS_FS) += ntfs/
obj-$(CONFIG_UFS_FS) += ufs/
obj-$(CONFIG_EFS_FS) += efs/
obj-$(CONFIG_JFFS2_FS) += jffs2/
obj-$(CONFIG_LOGFS) += logfs/
obj-$(CONFIG_UBIFS_FS) += ubifs/
obj-$(CONFIG_AFFS_FS) += affs/
obj-$(CONFIG_ROMFS_FS) += romfs/

View File

@ -277,10 +277,10 @@ static void put_aio_ring_file(struct kioctx *ctx)
struct address_space *i_mapping;
if (aio_ring_file) {
truncate_setsize(aio_ring_file->f_inode, 0);
truncate_setsize(file_inode(aio_ring_file), 0);
/* Prevent further access to the kioctx from migratepages */
i_mapping = aio_ring_file->f_inode->i_mapping;
i_mapping = aio_ring_file->f_mapping;
spin_lock(&i_mapping->private_lock);
i_mapping->private_data = NULL;
ctx->aio_ring_file = NULL;
@ -483,7 +483,7 @@ static int aio_setup_ring(struct kioctx *ctx)
for (i = 0; i < nr_pages; i++) {
struct page *page;
page = find_or_create_page(file->f_inode->i_mapping,
page = find_or_create_page(file->f_mapping,
i, GFP_HIGHUSER | __GFP_ZERO);
if (!page)
break;

View File

@ -94,7 +94,7 @@ static int autofs4_show_options(struct seq_file *m, struct dentry *root)
seq_printf(m, ",indirect");
#ifdef CONFIG_CHECKPOINT_RESTORE
if (sbi->pipe)
seq_printf(m, ",pipe_ino=%ld", sbi->pipe->f_inode->i_ino);
seq_printf(m, ",pipe_ino=%ld", file_inode(sbi->pipe)->i_ino);
else
seq_printf(m, ",pipe_ino=-1");
#endif

View File

@ -32,40 +32,19 @@ const struct dentry_operations ceph_dentry_ops;
/*
* Initialize ceph dentry state.
*/
int ceph_init_dentry(struct dentry *dentry)
static int ceph_d_init(struct dentry *dentry)
{
struct ceph_dentry_info *di;
if (dentry->d_fsdata)
return 0;
di = kmem_cache_zalloc(ceph_dentry_cachep, GFP_KERNEL);
if (!di)
return -ENOMEM; /* oh well */
spin_lock(&dentry->d_lock);
if (dentry->d_fsdata) {
/* lost a race */
kmem_cache_free(ceph_dentry_cachep, di);
goto out_unlock;
}
if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_NOSNAP)
d_set_d_op(dentry, &ceph_dentry_ops);
else if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_SNAPDIR)
d_set_d_op(dentry, &ceph_snapdir_dentry_ops);
else
d_set_d_op(dentry, &ceph_snap_dentry_ops);
di->dentry = dentry;
di->lease_session = NULL;
di->time = jiffies;
/* avoid reordering d_fsdata setup so that the check above is safe */
smp_mb();
dentry->d_fsdata = di;
ceph_dentry_lru_add(dentry);
out_unlock:
spin_unlock(&dentry->d_lock);
return 0;
}
@ -737,10 +716,6 @@ static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
if (dentry->d_name.len > NAME_MAX)
return ERR_PTR(-ENAMETOOLONG);
err = ceph_init_dentry(dentry);
if (err < 0)
return ERR_PTR(err);
/* can we conclude ENOENT locally? */
if (d_really_is_negative(dentry)) {
struct ceph_inode_info *ci = ceph_inode(dir);
@ -1323,16 +1298,6 @@ static void ceph_d_release(struct dentry *dentry)
kmem_cache_free(ceph_dentry_cachep, di);
}
static int ceph_snapdir_d_revalidate(struct dentry *dentry,
unsigned int flags)
{
/*
* Eventually, we'll want to revalidate snapped metadata
* too... probably...
*/
return 1;
}
/*
* When the VFS prunes a dentry from the cache, we need to clear the
* complete flag on the parent directory.
@ -1351,6 +1316,9 @@ static void ceph_d_prune(struct dentry *dentry)
if (d_unhashed(dentry))
return;
if (ceph_snap(d_inode(dentry->d_parent)) == CEPH_SNAPDIR)
return;
/*
* we hold d_lock, so d_parent is stable, and d_fsdata is never
* cleared until d_release
@ -1521,14 +1489,5 @@ const struct dentry_operations ceph_dentry_ops = {
.d_revalidate = ceph_d_revalidate,
.d_release = ceph_d_release,
.d_prune = ceph_d_prune,
};
const struct dentry_operations ceph_snapdir_dentry_ops = {
.d_revalidate = ceph_snapdir_d_revalidate,
.d_release = ceph_d_release,
};
const struct dentry_operations ceph_snap_dentry_ops = {
.d_release = ceph_d_release,
.d_prune = ceph_d_prune,
.d_init = ceph_d_init,
};

View File

@ -62,7 +62,6 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
{
struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc;
struct inode *inode;
struct dentry *dentry;
struct ceph_vino vino;
int err;
@ -94,16 +93,7 @@ static struct dentry *__fh_to_dentry(struct super_block *sb, u64 ino)
return ERR_PTR(-ESTALE);
}
dentry = d_obtain_alias(inode);
if (IS_ERR(dentry))
return dentry;
err = ceph_init_dentry(dentry);
if (err < 0) {
dput(dentry);
return ERR_PTR(err);
}
dout("__fh_to_dentry %llx %p dentry %p\n", ino, inode, dentry);
return dentry;
return d_obtain_alias(inode);
}
/*
@ -131,7 +121,6 @@ static struct dentry *__get_parent(struct super_block *sb,
struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc;
struct ceph_mds_request *req;
struct inode *inode;
struct dentry *dentry;
int mask;
int err;
@ -164,18 +153,7 @@ static struct dentry *__get_parent(struct super_block *sb,
if (!inode)
return ERR_PTR(-ENOENT);
dentry = d_obtain_alias(inode);
if (IS_ERR(dentry))
return dentry;
err = ceph_init_dentry(dentry);
if (err < 0) {
dput(dentry);
return ERR_PTR(err);
}
dout("__get_parent ino %llx parent %p ino %llx.%llx\n",
child ? ceph_ino(d_inode(child)) : ino,
dentry, ceph_vinop(inode));
return dentry;
return d_obtain_alias(inode);
}
static struct dentry *ceph_get_parent(struct dentry *child)

View File

@ -351,10 +351,6 @@ int ceph_atomic_open(struct inode *dir, struct dentry *dentry,
if (dentry->d_name.len > NAME_MAX)
return -ENAMETOOLONG;
err = ceph_init_dentry(dentry);
if (err < 0)
return err;
if (flags & O_CREAT) {
err = ceph_pre_init_acls(dir, &mode, &acls);
if (err < 0)

View File

@ -1023,16 +1023,17 @@ static void update_dentry_lease(struct dentry *dentry,
long unsigned half_ttl = from_time + (duration * HZ / 2) / 1000;
struct inode *dir;
/* only track leases on regular dentries */
if (dentry->d_op != &ceph_dentry_ops)
return;
spin_lock(&dentry->d_lock);
dout("update_dentry_lease %p duration %lu ms ttl %lu\n",
dentry, duration, ttl);
/* make lease_rdcache_gen match directory */
dir = d_inode(dentry->d_parent);
/* only track leases on regular dentries */
if (ceph_snap(dir) != CEPH_NOSNAP)
goto out_unlock;
di->lease_shared_gen = ceph_inode(dir)->i_shared_gen;
if (duration == 0)
@ -1202,12 +1203,7 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req,
err = -ENOMEM;
goto done;
}
err = ceph_init_dentry(dn);
if (err < 0) {
dput(dn);
dput(parent);
goto done;
}
err = 0;
} else if (d_really_is_positive(dn) &&
(ceph_ino(d_inode(dn)) != vino.ino ||
ceph_snap(d_inode(dn)) != vino.snap)) {
@ -1561,12 +1557,6 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
err = -ENOMEM;
goto out;
}
ret = ceph_init_dentry(dn);
if (ret < 0) {
dput(dn);
err = ret;
goto out;
}
} else if (d_really_is_positive(dn) &&
(ceph_ino(d_inode(dn)) != vino.ino ||
ceph_snap(d_inode(dn)) != vino.snap)) {

View File

@ -795,7 +795,6 @@ static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
root = ERR_PTR(-ENOMEM);
goto out;
}
ceph_init_dentry(root);
dout("open_root_inode success, root dentry is %p\n", root);
} else {
root = ERR_PTR(err);
@ -879,6 +878,7 @@ static int ceph_set_super(struct super_block *s, void *data)
fsc->sb = s;
s->s_op = &ceph_super_ops;
s->s_d_op = &ceph_dentry_ops;
s->s_export_op = &ceph_export_ops;
s->s_time_gran = 1000; /* 1000 ns == 1 us */

View File

@ -934,8 +934,7 @@ extern const struct file_operations ceph_dir_fops;
extern const struct file_operations ceph_snapdir_fops;
extern const struct inode_operations ceph_dir_iops;
extern const struct inode_operations ceph_snapdir_iops;
extern const struct dentry_operations ceph_dentry_ops, ceph_snap_dentry_ops,
ceph_snapdir_dentry_ops;
extern const struct dentry_operations ceph_dentry_ops;
extern loff_t ceph_make_fpos(unsigned high, unsigned off, bool hash_order);
extern int ceph_handle_notrace_create(struct inode *dir, struct dentry *dentry);
@ -951,13 +950,6 @@ extern void ceph_invalidate_dentry_lease(struct dentry *dentry);
extern unsigned ceph_dentry_hash(struct inode *dir, struct dentry *dn);
extern void ceph_readdir_cache_release(struct ceph_readdir_cache_control *ctl);
/*
* our d_ops vary depending on whether the inode is live,
* snapshotted (read-only), or a virtual ".snap" directory.
*/
int ceph_init_dentry(struct dentry *dentry);
/* ioctl.c */
extern long ceph_ioctl(struct file *file, unsigned int cmd, unsigned long arg);

View File

@ -253,9 +253,9 @@ COMPAT_SYSCALL_DEFINE2(fstatfs, unsigned int, fd, struct compat_statfs __user *,
static int put_compat_statfs64(struct compat_statfs64 __user *ubuf, struct kstatfs *kbuf)
{
if (sizeof ubuf->f_blocks == 4) {
if ((kbuf->f_blocks | kbuf->f_bfree | kbuf->f_bavail |
kbuf->f_bsize | kbuf->f_frsize) & 0xffffffff00000000ULL)
if (sizeof(ubuf->f_bsize) == 4) {
if ((kbuf->f_type | kbuf->f_bsize | kbuf->f_namelen |
kbuf->f_frsize | kbuf->f_flags) & 0xffffffff00000000ULL)
return -EOVERFLOW;
/* f_files and f_ffree may be -1; it's okay
* to stuff that into 32 bits */

View File

@ -52,7 +52,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
arg |= O_NONBLOCK;
/* Pipe packetized mode is controlled by O_DIRECT flag */
if (!S_ISFIFO(filp->f_inode->i_mode) && (arg & O_DIRECT)) {
if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) {
if (!filp->f_mapping || !filp->f_mapping->a_ops ||
!filp->f_mapping->a_ops->direct_IO)
return -EINVAL;

View File

@ -1,17 +0,0 @@
config LOGFS
tristate "LogFS file system"
depends on MTD || (!MTD && BLOCK)
select ZLIB_INFLATE
select ZLIB_DEFLATE
select CRC32
select BTREE
help
Flash filesystem aimed to scale efficiently to large devices.
In comparison to JFFS2 it offers significantly faster mount
times and potentially less RAM usage, although the latter has
not been measured yet.
In its current state it is still very experimental and should
not be used for other than testing purposes.
If unsure, say N.

View File

@ -1,13 +0,0 @@
obj-$(CONFIG_LOGFS) += logfs.o
logfs-y += compr.o
logfs-y += dir.o
logfs-y += file.o
logfs-y += gc.o
logfs-y += inode.o
logfs-y += journal.o
logfs-y += readwrite.o
logfs-y += segment.o
logfs-y += super.o
logfs-$(CONFIG_BLOCK) += dev_bdev.o
logfs-$(CONFIG_MTD) += dev_mtd.o

View File

@ -1,95 +0,0 @@
/*
* fs/logfs/compr.c - compression routines
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/vmalloc.h>
#include <linux/zlib.h>
#define COMPR_LEVEL 3
static DEFINE_MUTEX(compr_mutex);
static struct z_stream_s stream;
int logfs_compress(void *in, void *out, size_t inlen, size_t outlen)
{
int err, ret;
ret = -EIO;
mutex_lock(&compr_mutex);
err = zlib_deflateInit(&stream, COMPR_LEVEL);
if (err != Z_OK)
goto error;
stream.next_in = in;
stream.avail_in = inlen;
stream.total_in = 0;
stream.next_out = out;
stream.avail_out = outlen;
stream.total_out = 0;
err = zlib_deflate(&stream, Z_FINISH);
if (err != Z_STREAM_END)
goto error;
err = zlib_deflateEnd(&stream);
if (err != Z_OK)
goto error;
if (stream.total_out >= stream.total_in)
goto error;
ret = stream.total_out;
error:
mutex_unlock(&compr_mutex);
return ret;
}
int logfs_uncompress(void *in, void *out, size_t inlen, size_t outlen)
{
int err, ret;
ret = -EIO;
mutex_lock(&compr_mutex);
err = zlib_inflateInit(&stream);
if (err != Z_OK)
goto error;
stream.next_in = in;
stream.avail_in = inlen;
stream.total_in = 0;
stream.next_out = out;
stream.avail_out = outlen;
stream.total_out = 0;
err = zlib_inflate(&stream, Z_FINISH);
if (err != Z_STREAM_END)
goto error;
err = zlib_inflateEnd(&stream);
if (err != Z_OK)
goto error;
ret = 0;
error:
mutex_unlock(&compr_mutex);
return ret;
}
int __init logfs_compr_init(void)
{
size_t size = max(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL),
zlib_inflate_workspacesize());
stream.workspace = vmalloc(size);
if (!stream.workspace)
return -ENOMEM;
return 0;
}
void logfs_compr_exit(void)
{
vfree(stream.workspace);
}

View File

@ -1,290 +0,0 @@
/*
* fs/logfs/dev_bdev.c - Device access methods for block devices
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/buffer_head.h>
#include <linux/gfp.h>
#include <linux/prefetch.h>
#define PAGE_OFS(ofs) ((ofs) & (PAGE_SIZE-1))
static int sync_request(struct page *page, struct block_device *bdev, int op)
{
struct bio bio;
struct bio_vec bio_vec;
bio_init(&bio, &bio_vec, 1);
bio.bi_bdev = bdev;
bio_add_page(&bio, page, PAGE_SIZE, 0);
bio.bi_iter.bi_sector = page->index * (PAGE_SIZE >> 9);
bio_set_op_attrs(&bio, op, 0);
return submit_bio_wait(&bio);
}
static int bdev_readpage(void *_sb, struct page *page)
{
struct super_block *sb = _sb;
struct block_device *bdev = logfs_super(sb)->s_bdev;
int err;
err = sync_request(page, bdev, READ);
if (err) {
ClearPageUptodate(page);
SetPageError(page);
} else {
SetPageUptodate(page);
ClearPageError(page);
}
unlock_page(page);
return err;
}
static DECLARE_WAIT_QUEUE_HEAD(wq);
static void writeseg_end_io(struct bio *bio)
{
struct bio_vec *bvec;
int i;
struct super_block *sb = bio->bi_private;
struct logfs_super *super = logfs_super(sb);
BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
bio_for_each_segment_all(bvec, bio, i) {
end_page_writeback(bvec->bv_page);
put_page(bvec->bv_page);
}
bio_put(bio);
if (atomic_dec_and_test(&super->s_pending_writes))
wake_up(&wq);
}
static int __bdev_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
size_t nr_pages)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
struct bio *bio = NULL;
struct page *page;
unsigned int max_pages;
int i, ret;
max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
for (i = 0; i < nr_pages; i++) {
if (!bio) {
bio = bio_alloc(GFP_NOFS, max_pages);
BUG_ON(!bio);
bio->bi_bdev = super->s_bdev;
bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = writeseg_end_io;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
}
page = find_lock_page(mapping, index + i);
BUG_ON(!page);
ret = bio_add_page(bio, page, PAGE_SIZE, 0);
BUG_ON(PageWriteback(page));
set_page_writeback(page);
unlock_page(page);
if (!ret) {
/* Block layer cannot split bios :( */
ofs += bio->bi_iter.bi_size;
atomic_inc(&super->s_pending_writes);
submit_bio(bio);
bio = NULL;
}
}
if (bio) {
atomic_inc(&super->s_pending_writes);
submit_bio(bio);
}
return 0;
}
static void bdev_writeseg(struct super_block *sb, u64 ofs, size_t len)
{
struct logfs_super *super = logfs_super(sb);
int head;
BUG_ON(super->s_flags & LOGFS_SB_FLAG_RO);
if (len == 0) {
/* This can happen when the object fit perfectly into a
* segment, the segment gets written per sync and subsequently
* closed.
*/
return;
}
head = ofs & (PAGE_SIZE - 1);
if (head) {
ofs -= head;
len += head;
}
len = PAGE_ALIGN(len);
__bdev_writeseg(sb, ofs, ofs >> PAGE_SHIFT, len >> PAGE_SHIFT);
}
static void erase_end_io(struct bio *bio)
{
struct super_block *sb = bio->bi_private;
struct logfs_super *super = logfs_super(sb);
BUG_ON(bio->bi_error); /* FIXME: Retry io or write elsewhere */
bio_put(bio);
if (atomic_dec_and_test(&super->s_pending_writes))
wake_up(&wq);
}
static int do_erase(struct super_block *sb, u64 ofs, pgoff_t index,
size_t nr_pages)
{
struct logfs_super *super = logfs_super(sb);
struct bio *bio = NULL;
unsigned int max_pages;
int i, ret;
max_pages = min_t(size_t, nr_pages, BIO_MAX_PAGES);
for (i = 0; i < nr_pages; i++) {
if (!bio) {
bio = bio_alloc(GFP_NOFS, max_pages);
BUG_ON(!bio);
bio->bi_bdev = super->s_bdev;
bio->bi_iter.bi_sector = ofs >> 9;
bio->bi_private = sb;
bio->bi_end_io = erase_end_io;
bio_set_op_attrs(bio, REQ_OP_WRITE, 0);
}
ret = bio_add_page(bio, super->s_erase_page, PAGE_SIZE, 0);
if (!ret) {
/* Block layer cannot split bios :( */
ofs += bio->bi_iter.bi_size;
atomic_inc(&super->s_pending_writes);
submit_bio(bio);
}
}
if (bio) {
atomic_inc(&super->s_pending_writes);
submit_bio(bio);
}
return 0;
}
static int bdev_erase(struct super_block *sb, loff_t to, size_t len,
int ensure_write)
{
struct logfs_super *super = logfs_super(sb);
BUG_ON(to & (PAGE_SIZE - 1));
BUG_ON(len & (PAGE_SIZE - 1));
if (super->s_flags & LOGFS_SB_FLAG_RO)
return -EROFS;
if (ensure_write) {
/*
* Object store doesn't care whether erases happen or not.
* But for the journal they are required. Otherwise a scan
* can find an old commit entry and assume it is the current
* one, travelling back in time.
*/
do_erase(sb, to, to >> PAGE_SHIFT, len >> PAGE_SHIFT);
}
return 0;
}
static void bdev_sync(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
wait_event(wq, atomic_read(&super->s_pending_writes) == 0);
}
static struct page *bdev_find_first_sb(struct super_block *sb, u64 *ofs)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
filler_t *filler = bdev_readpage;
*ofs = 0;
return read_cache_page(mapping, 0, filler, sb);
}
static struct page *bdev_find_last_sb(struct super_block *sb, u64 *ofs)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
filler_t *filler = bdev_readpage;
u64 pos = (super->s_bdev->bd_inode->i_size & ~0xfffULL) - 0x1000;
pgoff_t index = pos >> PAGE_SHIFT;
*ofs = pos;
return read_cache_page(mapping, index, filler, sb);
}
static int bdev_write_sb(struct super_block *sb, struct page *page)
{
struct block_device *bdev = logfs_super(sb)->s_bdev;
/* Nothing special to do for block devices. */
return sync_request(page, bdev, WRITE);
}
static void bdev_put_device(struct logfs_super *s)
{
blkdev_put(s->s_bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
}
static int bdev_can_write_buf(struct super_block *sb, u64 ofs)
{
return 0;
}
static const struct logfs_device_ops bd_devops = {
.find_first_sb = bdev_find_first_sb,
.find_last_sb = bdev_find_last_sb,
.write_sb = bdev_write_sb,
.readpage = bdev_readpage,
.writeseg = bdev_writeseg,
.erase = bdev_erase,
.can_write_buf = bdev_can_write_buf,
.sync = bdev_sync,
.put_device = bdev_put_device,
};
int logfs_get_sb_bdev(struct logfs_super *p, struct file_system_type *type,
const char *devname)
{
struct block_device *bdev;
bdev = blkdev_get_by_path(devname, FMODE_READ|FMODE_WRITE|FMODE_EXCL,
type);
if (IS_ERR(bdev))
return PTR_ERR(bdev);
if (MAJOR(bdev->bd_dev) == MTD_BLOCK_MAJOR) {
int mtdnr = MINOR(bdev->bd_dev);
blkdev_put(bdev, FMODE_READ|FMODE_WRITE|FMODE_EXCL);
return logfs_get_sb_mtd(p, mtdnr);
}
p->s_bdev = bdev;
p->s_mtd = NULL;
p->s_devops = &bd_devops;
return 0;
}

View File

@ -1,274 +0,0 @@
/*
* fs/logfs/dev_mtd.c - Device access methods for MTD
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/completion.h>
#include <linux/mount.h>
#include <linux/sched.h>
#include <linux/slab.h>
#define PAGE_OFS(ofs) ((ofs) & (PAGE_SIZE-1))
static int logfs_mtd_read(struct super_block *sb, loff_t ofs, size_t len,
void *buf)
{
struct mtd_info *mtd = logfs_super(sb)->s_mtd;
size_t retlen;
int ret;
ret = mtd_read(mtd, ofs, len, &retlen, buf);
BUG_ON(ret == -EINVAL);
if (ret)
return ret;
/* Not sure if we should loop instead. */
if (retlen != len)
return -EIO;
return 0;
}
static int loffs_mtd_write(struct super_block *sb, loff_t ofs, size_t len,
void *buf)
{
struct logfs_super *super = logfs_super(sb);
struct mtd_info *mtd = super->s_mtd;
size_t retlen;
loff_t page_start, page_end;
int ret;
if (super->s_flags & LOGFS_SB_FLAG_RO)
return -EROFS;
BUG_ON((ofs >= mtd->size) || (len > mtd->size - ofs));
BUG_ON(ofs != (ofs >> super->s_writeshift) << super->s_writeshift);
BUG_ON(len > PAGE_SIZE);
page_start = ofs & PAGE_MASK;
page_end = PAGE_ALIGN(ofs + len) - 1;
ret = mtd_write(mtd, ofs, len, &retlen, buf);
if (ret || (retlen != len))
return -EIO;
return 0;
}
/*
* For as long as I can remember (since about 2001) mtd->erase has been an
* asynchronous interface lacking the first driver to actually use the
* asynchronous properties. So just to prevent the first implementor of such
* a thing from breaking logfs in 2350, we do the usual pointless dance to
* declare a completion variable and wait for completion before returning
* from logfs_mtd_erase(). What an exercise in futility!
*/
static void logfs_erase_callback(struct erase_info *ei)
{
complete((struct completion *)ei->priv);
}
static int logfs_mtd_erase_mapping(struct super_block *sb, loff_t ofs,
size_t len)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
struct page *page;
pgoff_t index = ofs >> PAGE_SHIFT;
for (index = ofs >> PAGE_SHIFT; index < (ofs + len) >> PAGE_SHIFT; index++) {
page = find_get_page(mapping, index);
if (!page)
continue;
memset(page_address(page), 0xFF, PAGE_SIZE);
put_page(page);
}
return 0;
}
static int logfs_mtd_erase(struct super_block *sb, loff_t ofs, size_t len,
int ensure_write)
{
struct mtd_info *mtd = logfs_super(sb)->s_mtd;
struct erase_info ei;
DECLARE_COMPLETION_ONSTACK(complete);
int ret;
BUG_ON(len % mtd->erasesize);
if (logfs_super(sb)->s_flags & LOGFS_SB_FLAG_RO)
return -EROFS;
memset(&ei, 0, sizeof(ei));
ei.mtd = mtd;
ei.addr = ofs;
ei.len = len;
ei.callback = logfs_erase_callback;
ei.priv = (long)&complete;
ret = mtd_erase(mtd, &ei);
if (ret)
return -EIO;
wait_for_completion(&complete);
if (ei.state != MTD_ERASE_DONE)
return -EIO;
return logfs_mtd_erase_mapping(sb, ofs, len);
}
static void logfs_mtd_sync(struct super_block *sb)
{
struct mtd_info *mtd = logfs_super(sb)->s_mtd;
mtd_sync(mtd);
}
static int logfs_mtd_readpage(void *_sb, struct page *page)
{
struct super_block *sb = _sb;
int err;
err = logfs_mtd_read(sb, page->index << PAGE_SHIFT, PAGE_SIZE,
page_address(page));
if (err == -EUCLEAN || err == -EBADMSG) {
/* -EBADMSG happens regularly on power failures */
err = 0;
/* FIXME: force GC this segment */
}
if (err) {
ClearPageUptodate(page);
SetPageError(page);
} else {
SetPageUptodate(page);
ClearPageError(page);
}
unlock_page(page);
return err;
}
static struct page *logfs_mtd_find_first_sb(struct super_block *sb, u64 *ofs)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
filler_t *filler = logfs_mtd_readpage;
struct mtd_info *mtd = super->s_mtd;
*ofs = 0;
while (mtd_block_isbad(mtd, *ofs)) {
*ofs += mtd->erasesize;
if (*ofs >= mtd->size)
return NULL;
}
BUG_ON(*ofs & ~PAGE_MASK);
return read_cache_page(mapping, *ofs >> PAGE_SHIFT, filler, sb);
}
static struct page *logfs_mtd_find_last_sb(struct super_block *sb, u64 *ofs)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
filler_t *filler = logfs_mtd_readpage;
struct mtd_info *mtd = super->s_mtd;
*ofs = mtd->size - mtd->erasesize;
while (mtd_block_isbad(mtd, *ofs)) {
*ofs -= mtd->erasesize;
if (*ofs <= 0)
return NULL;
}
*ofs = *ofs + mtd->erasesize - 0x1000;
BUG_ON(*ofs & ~PAGE_MASK);
return read_cache_page(mapping, *ofs >> PAGE_SHIFT, filler, sb);
}
static int __logfs_mtd_writeseg(struct super_block *sb, u64 ofs, pgoff_t index,
size_t nr_pages)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
struct page *page;
int i, err;
for (i = 0; i < nr_pages; i++) {
page = find_lock_page(mapping, index + i);
BUG_ON(!page);
err = loffs_mtd_write(sb, page->index << PAGE_SHIFT, PAGE_SIZE,
page_address(page));
unlock_page(page);
put_page(page);
if (err)
return err;
}
return 0;
}
static void logfs_mtd_writeseg(struct super_block *sb, u64 ofs, size_t len)
{
struct logfs_super *super = logfs_super(sb);
int head;
if (super->s_flags & LOGFS_SB_FLAG_RO)
return;
if (len == 0) {
/* This can happen when the object fit perfectly into a
* segment, the segment gets written per sync and subsequently
* closed.
*/
return;
}
head = ofs & (PAGE_SIZE - 1);
if (head) {
ofs -= head;
len += head;
}
len = PAGE_ALIGN(len);
__logfs_mtd_writeseg(sb, ofs, ofs >> PAGE_SHIFT, len >> PAGE_SHIFT);
}
static void logfs_mtd_put_device(struct logfs_super *s)
{
put_mtd_device(s->s_mtd);
}
static int logfs_mtd_can_write_buf(struct super_block *sb, u64 ofs)
{
struct logfs_super *super = logfs_super(sb);
void *buf;
int err;
buf = kmalloc(super->s_writesize, GFP_KERNEL);
if (!buf)
return -ENOMEM;
err = logfs_mtd_read(sb, ofs, super->s_writesize, buf);
if (err)
goto out;
if (memchr_inv(buf, 0xff, super->s_writesize))
err = -EIO;
kfree(buf);
out:
return err;
}
static const struct logfs_device_ops mtd_devops = {
.find_first_sb = logfs_mtd_find_first_sb,
.find_last_sb = logfs_mtd_find_last_sb,
.readpage = logfs_mtd_readpage,
.writeseg = logfs_mtd_writeseg,
.erase = logfs_mtd_erase,
.can_write_buf = logfs_mtd_can_write_buf,
.sync = logfs_mtd_sync,
.put_device = logfs_mtd_put_device,
};
int logfs_get_sb_mtd(struct logfs_super *s, int mtdnr)
{
struct mtd_info *mtd = get_mtd_device(NULL, mtdnr);
if (IS_ERR(mtd))
return PTR_ERR(mtd);
s->s_bdev = NULL;
s->s_mtd = mtd;
s->s_devops = &mtd_devops;
return 0;
}

View File

@ -1,801 +0,0 @@
/*
* fs/logfs/dir.c - directory-related code
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/slab.h>
/*
* Atomic dir operations
*
* Directory operations are by default not atomic. Dentries and Inodes are
* created/removed/altered in separate operations. Therefore we need to do
* a small amount of journaling.
*
* Create, link, mkdir, mknod and symlink all share the same function to do
* the work: __logfs_create. This function works in two atomic steps:
* 1. allocate inode (remember in journal)
* 2. allocate dentry (clear journal)
*
* As we can only get interrupted between the two, when the inode we just
* created is simply stored in the anchor. On next mount, if we were
* interrupted, we delete the inode. From a users point of view the
* operation never happened.
*
* Unlink and rmdir also share the same function: unlink. Again, this
* function works in two atomic steps
* 1. remove dentry (remember inode in journal)
* 2. unlink inode (clear journal)
*
* And again, on the next mount, if we were interrupted, we delete the inode.
* From a users point of view the operation succeeded.
*
* Rename is the real pain to deal with, harder than all the other methods
* combined. Depending on the circumstances we can run into three cases.
* A "target rename" where the target dentry already existed, a "local
* rename" where both parent directories are identical or a "cross-directory
* rename" in the remaining case.
*
* Local rename is atomic, as the old dentry is simply rewritten with a new
* name.
*
* Cross-directory rename works in two steps, similar to __logfs_create and
* logfs_unlink:
* 1. Write new dentry (remember old dentry in journal)
* 2. Remove old dentry (clear journal)
*
* Here we remember a dentry instead of an inode. On next mount, if we were
* interrupted, we delete the dentry. From a users point of view, the
* operation succeeded.
*
* Target rename works in three atomic steps:
* 1. Attach old inode to new dentry (remember old dentry and new inode)
* 2. Remove old dentry (still remember the new inode)
* 3. Remove victim inode
*
* Here we remember both an inode an a dentry. If we get interrupted
* between steps 1 and 2, we delete both the dentry and the inode. If
* we get interrupted between steps 2 and 3, we delete just the inode.
* In either case, the remaining objects are deleted on next mount. From
* a users point of view, the operation succeeded.
*/
static int write_dir(struct inode *dir, struct logfs_disk_dentry *dd,
loff_t pos)
{
return logfs_inode_write(dir, dd, sizeof(*dd), pos, WF_LOCK, NULL);
}
static int write_inode(struct inode *inode)
{
return __logfs_write_inode(inode, NULL, WF_LOCK);
}
static s64 dir_seek_data(struct inode *inode, s64 pos)
{
s64 new_pos = logfs_seek_data(inode, pos);
return max(pos, new_pos - 1);
}
static int beyond_eof(struct inode *inode, loff_t bix)
{
loff_t pos = bix << inode->i_sb->s_blocksize_bits;
return pos >= i_size_read(inode);
}
/*
* Prime value was chosen to be roughly 256 + 26. r5 hash uses 11,
* so short names (len <= 9) don't even occupy the complete 32bit name
* space. A prime >256 ensures short names quickly spread the 32bit
* name space. Add about 26 for the estimated amount of information
* of each character and pick a prime nearby, preferably a bit-sparse
* one.
*/
static u32 logfs_hash_32(const char *s, int len, u32 seed)
{
u32 hash = seed;
int i;
for (i = 0; i < len; i++)
hash = hash * 293 + s[i];
return hash;
}
/*
* We have to satisfy several conflicting requirements here. Small
* directories should stay fairly compact and not require too many
* indirect blocks. The number of possible locations for a given hash
* should be small to make lookup() fast. And we should try hard not
* to overflow the 32bit name space or nfs and 32bit host systems will
* be unhappy.
*
* So we use the following scheme. First we reduce the hash to 0..15
* and try a direct block. If that is occupied we reduce the hash to
* 16..255 and try an indirect block. Same for 2x and 3x indirect
* blocks. Lastly we reduce the hash to 0x800_0000 .. 0xffff_ffff,
* but use buckets containing eight entries instead of a single one.
*
* Using 16 entries should allow for a reasonable amount of hash
* collisions, so the 32bit name space can be packed fairly tight
* before overflowing. Oh and currently we don't overflow but return
* and error.
*
* How likely are collisions? Doing the appropriate math is beyond me
* and the Bronstein textbook. But running a test program to brute
* force collisions for a couple of days showed that on average the
* first collision occurs after 598M entries, with 290M being the
* smallest result. Obviously 21 entries could already cause a
* collision if all entries are carefully chosen.
*/
static pgoff_t hash_index(u32 hash, int round)
{
u32 i0_blocks = I0_BLOCKS;
u32 i1_blocks = I1_BLOCKS;
u32 i2_blocks = I2_BLOCKS;
u32 i3_blocks = I3_BLOCKS;
switch (round) {
case 0:
return hash % i0_blocks;
case 1:
return i0_blocks + hash % (i1_blocks - i0_blocks);
case 2:
return i1_blocks + hash % (i2_blocks - i1_blocks);
case 3:
return i2_blocks + hash % (i3_blocks - i2_blocks);
case 4 ... 19:
return i3_blocks + 16 * (hash % (((1<<31) - i3_blocks) / 16))
+ round - 4;
}
BUG();
}
static struct page *logfs_get_dd_page(struct inode *dir, struct dentry *dentry)
{
const struct qstr *name = &dentry->d_name;
struct page *page;
struct logfs_disk_dentry *dd;
u32 hash = logfs_hash_32(name->name, name->len, 0);
pgoff_t index;
int round;
if (name->len > LOGFS_MAX_NAMELEN)
return ERR_PTR(-ENAMETOOLONG);
for (round = 0; round < 20; round++) {
index = hash_index(hash, round);
if (beyond_eof(dir, index))
return NULL;
if (!logfs_exist_block(dir, index))
continue;
page = read_cache_page(dir->i_mapping, index,
(filler_t *)logfs_readpage, NULL);
if (IS_ERR(page))
return page;
dd = kmap_atomic(page);
BUG_ON(dd->namelen == 0);
if (name->len != be16_to_cpu(dd->namelen) ||
memcmp(name->name, dd->name, name->len)) {
kunmap_atomic(dd);
put_page(page);
continue;
}
kunmap_atomic(dd);
return page;
}
return NULL;
}
static int logfs_remove_inode(struct inode *inode)
{
int ret;
drop_nlink(inode);
ret = write_inode(inode);
LOGFS_BUG_ON(ret, inode->i_sb);
return ret;
}
static void abort_transaction(struct inode *inode, struct logfs_transaction *ta)
{
if (logfs_inode(inode)->li_block)
logfs_inode(inode)->li_block->ta = NULL;
kfree(ta);
}
static int logfs_unlink(struct inode *dir, struct dentry *dentry)
{
struct logfs_super *super = logfs_super(dir->i_sb);
struct inode *inode = d_inode(dentry);
struct logfs_transaction *ta;
struct page *page;
pgoff_t index;
int ret;
ta = kzalloc(sizeof(*ta), GFP_KERNEL);
if (!ta)
return -ENOMEM;
ta->state = UNLINK_1;
ta->ino = inode->i_ino;
inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
page = logfs_get_dd_page(dir, dentry);
if (!page) {
kfree(ta);
return -ENOENT;
}
if (IS_ERR(page)) {
kfree(ta);
return PTR_ERR(page);
}
index = page->index;
put_page(page);
mutex_lock(&super->s_dirop_mutex);
logfs_add_transaction(dir, ta);
ret = logfs_delete(dir, index, NULL);
if (!ret)
ret = write_inode(dir);
if (ret) {
abort_transaction(dir, ta);
printk(KERN_ERR"LOGFS: unable to delete inode\n");
goto out;
}
ta->state = UNLINK_2;
logfs_add_transaction(inode, ta);
ret = logfs_remove_inode(inode);
out:
mutex_unlock(&super->s_dirop_mutex);
return ret;
}
static inline int logfs_empty_dir(struct inode *dir)
{
u64 data;
data = logfs_seek_data(dir, 0) << dir->i_sb->s_blocksize_bits;
return data >= i_size_read(dir);
}
static int logfs_rmdir(struct inode *dir, struct dentry *dentry)
{
struct inode *inode = d_inode(dentry);
if (!logfs_empty_dir(inode))
return -ENOTEMPTY;
return logfs_unlink(dir, dentry);
}
/* FIXME: readdir currently has it's own dir_walk code. I don't see a good
* way to combine the two copies */
static int logfs_readdir(struct file *file, struct dir_context *ctx)
{
struct inode *dir = file_inode(file);
loff_t pos;
struct page *page;
struct logfs_disk_dentry *dd;
if (ctx->pos < 0)
return -EINVAL;
if (!dir_emit_dots(file, ctx))
return 0;
pos = ctx->pos - 2;
BUG_ON(pos < 0);
for (;; pos++, ctx->pos++) {
bool full;
if (beyond_eof(dir, pos))
break;
if (!logfs_exist_block(dir, pos)) {
/* deleted dentry */
pos = dir_seek_data(dir, pos);
continue;
}
page = read_cache_page(dir->i_mapping, pos,
(filler_t *)logfs_readpage, NULL);
if (IS_ERR(page))
return PTR_ERR(page);
dd = kmap(page);
BUG_ON(dd->namelen == 0);
full = !dir_emit(ctx, (char *)dd->name,
be16_to_cpu(dd->namelen),
be64_to_cpu(dd->ino), dd->type);
kunmap(page);
put_page(page);
if (full)
break;
}
return 0;
}
static void logfs_set_name(struct logfs_disk_dentry *dd, const struct qstr *name)
{
dd->namelen = cpu_to_be16(name->len);
memcpy(dd->name, name->name, name->len);
}
static struct dentry *logfs_lookup(struct inode *dir, struct dentry *dentry,
unsigned int flags)
{
struct page *page;
struct logfs_disk_dentry *dd;
pgoff_t index;
u64 ino = 0;
struct inode *inode;
page = logfs_get_dd_page(dir, dentry);
if (IS_ERR(page))
return ERR_CAST(page);
if (!page) {
d_add(dentry, NULL);
return NULL;
}
index = page->index;
dd = kmap_atomic(page);
ino = be64_to_cpu(dd->ino);
kunmap_atomic(dd);
put_page(page);
inode = logfs_iget(dir->i_sb, ino);
if (IS_ERR(inode))
printk(KERN_ERR"LogFS: Cannot read inode #%llx for dentry (%lx, %lx)n",
ino, dir->i_ino, index);
return d_splice_alias(inode, dentry);
}
static void grow_dir(struct inode *dir, loff_t index)
{
index = (index + 1) << dir->i_sb->s_blocksize_bits;
if (i_size_read(dir) < index)
i_size_write(dir, index);
}
static int logfs_write_dir(struct inode *dir, struct dentry *dentry,
struct inode *inode)
{
struct page *page;
struct logfs_disk_dentry *dd;
u32 hash = logfs_hash_32(dentry->d_name.name, dentry->d_name.len, 0);
pgoff_t index;
int round, err;
for (round = 0; round < 20; round++) {
index = hash_index(hash, round);
if (logfs_exist_block(dir, index))
continue;
page = find_or_create_page(dir->i_mapping, index, GFP_KERNEL);
if (!page)
return -ENOMEM;
dd = kmap_atomic(page);
memset(dd, 0, sizeof(*dd));
dd->ino = cpu_to_be64(inode->i_ino);
dd->type = logfs_type(inode);
logfs_set_name(dd, &dentry->d_name);
kunmap_atomic(dd);
err = logfs_write_buf(dir, page, WF_LOCK);
unlock_page(page);
put_page(page);
if (!err)
grow_dir(dir, index);
return err;
}
/* FIXME: Is there a better return value? In most cases neither
* the filesystem nor the directory are full. But we have had
* too many collisions for this particular hash and no fallback.
*/
return -ENOSPC;
}
static int __logfs_create(struct inode *dir, struct dentry *dentry,
struct inode *inode, const char *dest, long destlen)
{
struct logfs_super *super = logfs_super(dir->i_sb);
struct logfs_inode *li = logfs_inode(inode);
struct logfs_transaction *ta;
int ret;
ta = kzalloc(sizeof(*ta), GFP_KERNEL);
if (!ta) {
drop_nlink(inode);
iput(inode);
return -ENOMEM;
}
ta->state = CREATE_1;
ta->ino = inode->i_ino;
mutex_lock(&super->s_dirop_mutex);
logfs_add_transaction(inode, ta);
if (dest) {
/* symlink */
ret = logfs_inode_write(inode, dest, destlen, 0, WF_LOCK, NULL);
if (!ret)
ret = write_inode(inode);
} else {
/* creat/mkdir/mknod */
ret = write_inode(inode);
}
if (ret) {
abort_transaction(inode, ta);
li->li_flags |= LOGFS_IF_STILLBORN;
/* FIXME: truncate symlink */
drop_nlink(inode);
iput(inode);
goto out;
}
ta->state = CREATE_2;
logfs_add_transaction(dir, ta);
ret = logfs_write_dir(dir, dentry, inode);
/* sync directory */
if (!ret)
ret = write_inode(dir);
if (ret) {
logfs_del_transaction(dir, ta);
ta->state = CREATE_2;
logfs_add_transaction(inode, ta);
logfs_remove_inode(inode);
iput(inode);
goto out;
}
d_instantiate(dentry, inode);
out:
mutex_unlock(&super->s_dirop_mutex);
return ret;
}
static int logfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
struct inode *inode;
/*
* FIXME: why do we have to fill in S_IFDIR, while the mode is
* correct for mknod, creat, etc.? Smells like the vfs *should*
* do it for us but for some reason fails to do so.
*/
inode = logfs_new_inode(dir, S_IFDIR | mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &logfs_dir_iops;
inode->i_fop = &logfs_dir_fops;
return __logfs_create(dir, dentry, inode, NULL, 0);
}
static int logfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
bool excl)
{
struct inode *inode;
inode = logfs_new_inode(dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &logfs_reg_iops;
inode->i_fop = &logfs_reg_fops;
inode->i_mapping->a_ops = &logfs_reg_aops;
return __logfs_create(dir, dentry, inode, NULL, 0);
}
static int logfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
dev_t rdev)
{
struct inode *inode;
if (dentry->d_name.len > LOGFS_MAX_NAMELEN)
return -ENAMETOOLONG;
inode = logfs_new_inode(dir, mode);
if (IS_ERR(inode))
return PTR_ERR(inode);
init_special_inode(inode, mode, rdev);
return __logfs_create(dir, dentry, inode, NULL, 0);
}
static int logfs_symlink(struct inode *dir, struct dentry *dentry,
const char *target)
{
struct inode *inode;
size_t destlen = strlen(target) + 1;
if (destlen > dir->i_sb->s_blocksize)
return -ENAMETOOLONG;
inode = logfs_new_inode(dir, S_IFLNK | 0777);
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &page_symlink_inode_operations;
inode_nohighmem(inode);
inode->i_mapping->a_ops = &logfs_reg_aops;
return __logfs_create(dir, dentry, inode, target, destlen);
}
static int logfs_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
struct inode *inode = d_inode(old_dentry);
inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
ihold(inode);
inc_nlink(inode);
mark_inode_dirty_sync(inode);
return __logfs_create(dir, dentry, inode, NULL, 0);
}
static int logfs_get_dd(struct inode *dir, struct dentry *dentry,
struct logfs_disk_dentry *dd, loff_t *pos)
{
struct page *page;
void *map;
page = logfs_get_dd_page(dir, dentry);
if (IS_ERR(page))
return PTR_ERR(page);
*pos = page->index;
map = kmap_atomic(page);
memcpy(dd, map, sizeof(*dd));
kunmap_atomic(map);
put_page(page);
return 0;
}
static int logfs_delete_dd(struct inode *dir, loff_t pos)
{
/*
* Getting called with pos somewhere beyond eof is either a goofup
* within this file or means someone maliciously edited the
* (crc-protected) journal.
*/
BUG_ON(beyond_eof(dir, pos));
dir->i_ctime = dir->i_mtime = current_time(dir);
log_dir(" Delete dentry (%lx, %llx)\n", dir->i_ino, pos);
return logfs_delete(dir, pos, NULL);
}
/*
* Cross-directory rename, target does not exist. Just a little nasty.
* Create a new dentry in the target dir, then remove the old dentry,
* all the while taking care to remember our operation in the journal.
*/
static int logfs_rename_cross(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
struct logfs_super *super = logfs_super(old_dir->i_sb);
struct logfs_disk_dentry dd;
struct logfs_transaction *ta;
loff_t pos;
int err;
/* 1. locate source dd */
err = logfs_get_dd(old_dir, old_dentry, &dd, &pos);
if (err)
return err;
ta = kzalloc(sizeof(*ta), GFP_KERNEL);
if (!ta)
return -ENOMEM;
ta->state = CROSS_RENAME_1;
ta->dir = old_dir->i_ino;
ta->pos = pos;
/* 2. write target dd */
mutex_lock(&super->s_dirop_mutex);
logfs_add_transaction(new_dir, ta);
err = logfs_write_dir(new_dir, new_dentry, d_inode(old_dentry));
if (!err)
err = write_inode(new_dir);
if (err) {
super->s_rename_dir = 0;
super->s_rename_pos = 0;
abort_transaction(new_dir, ta);
goto out;
}
/* 3. remove source dd */
ta->state = CROSS_RENAME_2;
logfs_add_transaction(old_dir, ta);
err = logfs_delete_dd(old_dir, pos);
if (!err)
err = write_inode(old_dir);
LOGFS_BUG_ON(err, old_dir->i_sb);
out:
mutex_unlock(&super->s_dirop_mutex);
return err;
}
static int logfs_replace_inode(struct inode *dir, struct dentry *dentry,
struct logfs_disk_dentry *dd, struct inode *inode)
{
loff_t pos;
int err;
err = logfs_get_dd(dir, dentry, dd, &pos);
if (err)
return err;
dd->ino = cpu_to_be64(inode->i_ino);
dd->type = logfs_type(inode);
err = write_dir(dir, dd, pos);
if (err)
return err;
log_dir("Replace dentry (%lx, %llx) %s -> %llx\n", dir->i_ino, pos,
dd->name, be64_to_cpu(dd->ino));
return write_inode(dir);
}
/* Target dentry exists - the worst case. We need to attach the source
* inode to the target dentry, then remove the orphaned target inode and
* source dentry.
*/
static int logfs_rename_target(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
struct logfs_super *super = logfs_super(old_dir->i_sb);
struct inode *old_inode = d_inode(old_dentry);
struct inode *new_inode = d_inode(new_dentry);
int isdir = S_ISDIR(old_inode->i_mode);
struct logfs_disk_dentry dd;
struct logfs_transaction *ta;
loff_t pos;
int err;
BUG_ON(isdir != S_ISDIR(new_inode->i_mode));
if (isdir) {
if (!logfs_empty_dir(new_inode))
return -ENOTEMPTY;
}
/* 1. locate source dd */
err = logfs_get_dd(old_dir, old_dentry, &dd, &pos);
if (err)
return err;
ta = kzalloc(sizeof(*ta), GFP_KERNEL);
if (!ta)
return -ENOMEM;
ta->state = TARGET_RENAME_1;
ta->dir = old_dir->i_ino;
ta->pos = pos;
ta->ino = new_inode->i_ino;
/* 2. attach source inode to target dd */
mutex_lock(&super->s_dirop_mutex);
logfs_add_transaction(new_dir, ta);
err = logfs_replace_inode(new_dir, new_dentry, &dd, old_inode);
if (err) {
super->s_rename_dir = 0;
super->s_rename_pos = 0;
super->s_victim_ino = 0;
abort_transaction(new_dir, ta);
goto out;
}
/* 3. remove source dd */
ta->state = TARGET_RENAME_2;
logfs_add_transaction(old_dir, ta);
err = logfs_delete_dd(old_dir, pos);
if (!err)
err = write_inode(old_dir);
LOGFS_BUG_ON(err, old_dir->i_sb);
/* 4. remove target inode */
ta->state = TARGET_RENAME_3;
logfs_add_transaction(new_inode, ta);
err = logfs_remove_inode(new_inode);
out:
mutex_unlock(&super->s_dirop_mutex);
return err;
}
static int logfs_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
if (flags & ~RENAME_NOREPLACE)
return -EINVAL;
if (d_really_is_positive(new_dentry))
return logfs_rename_target(old_dir, old_dentry,
new_dir, new_dentry);
return logfs_rename_cross(old_dir, old_dentry, new_dir, new_dentry);
}
/* No locking done here, as this is called before .get_sb() returns. */
int logfs_replay_journal(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct inode *inode;
u64 ino, pos;
int err;
if (super->s_victim_ino) {
/* delete victim inode */
ino = super->s_victim_ino;
printk(KERN_INFO"LogFS: delete unmapped inode #%llx\n", ino);
inode = logfs_iget(sb, ino);
if (IS_ERR(inode))
goto fail;
LOGFS_BUG_ON(i_size_read(inode) > 0, sb);
super->s_victim_ino = 0;
err = logfs_remove_inode(inode);
iput(inode);
if (err) {
super->s_victim_ino = ino;
goto fail;
}
}
if (super->s_rename_dir) {
/* delete old dd from rename */
ino = super->s_rename_dir;
pos = super->s_rename_pos;
printk(KERN_INFO"LogFS: delete unbacked dentry (%llx, %llx)\n",
ino, pos);
inode = logfs_iget(sb, ino);
if (IS_ERR(inode))
goto fail;
super->s_rename_dir = 0;
super->s_rename_pos = 0;
err = logfs_delete_dd(inode, pos);
iput(inode);
if (err) {
super->s_rename_dir = ino;
super->s_rename_pos = pos;
goto fail;
}
}
return 0;
fail:
LOGFS_BUG(sb);
return -EIO;
}
const struct inode_operations logfs_dir_iops = {
.create = logfs_create,
.link = logfs_link,
.lookup = logfs_lookup,
.mkdir = logfs_mkdir,
.mknod = logfs_mknod,
.rename = logfs_rename,
.rmdir = logfs_rmdir,
.symlink = logfs_symlink,
.unlink = logfs_unlink,
};
const struct file_operations logfs_dir_fops = {
.fsync = logfs_fsync,
.unlocked_ioctl = logfs_ioctl,
.iterate_shared = logfs_readdir,
.read = generic_read_dir,
.llseek = generic_file_llseek,
};

View File

@ -1,285 +0,0 @@
/*
* fs/logfs/file.c - prepare_write, commit_write and friends
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/sched.h>
#include <linux/writeback.h>
static int logfs_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
{
struct inode *inode = mapping->host;
struct page *page;
pgoff_t index = pos >> PAGE_SHIFT;
page = grab_cache_page_write_begin(mapping, index, flags);
if (!page)
return -ENOMEM;
*pagep = page;
if ((len == PAGE_SIZE) || PageUptodate(page))
return 0;
if ((pos & PAGE_MASK) >= i_size_read(inode)) {
unsigned start = pos & (PAGE_SIZE - 1);
unsigned end = start + len;
/* Reading beyond i_size is simple: memset to zero */
zero_user_segments(page, 0, start, end, PAGE_SIZE);
return 0;
}
return logfs_readpage_nolock(page);
}
static int logfs_write_end(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied, struct page *page,
void *fsdata)
{
struct inode *inode = mapping->host;
pgoff_t index = page->index;
unsigned start = pos & (PAGE_SIZE - 1);
unsigned end = start + copied;
int ret = 0;
BUG_ON(PAGE_SIZE != inode->i_sb->s_blocksize);
BUG_ON(page->index > I3_BLOCKS);
if (copied < len) {
/*
* Short write of a non-initialized paged. Just tell userspace
* to retry the entire page.
*/
if (!PageUptodate(page)) {
copied = 0;
goto out;
}
}
if (copied == 0)
goto out; /* FIXME: do we need to update inode? */
if (i_size_read(inode) < (index << PAGE_SHIFT) + end) {
i_size_write(inode, (index << PAGE_SHIFT) + end);
mark_inode_dirty_sync(inode);
}
SetPageUptodate(page);
if (!PageDirty(page)) {
if (!get_page_reserve(inode, page))
__set_page_dirty_nobuffers(page);
else
ret = logfs_write_buf(inode, page, WF_LOCK);
}
out:
unlock_page(page);
put_page(page);
return ret ? ret : copied;
}
int logfs_readpage(struct file *file, struct page *page)
{
int ret;
ret = logfs_readpage_nolock(page);
unlock_page(page);
return ret;
}
/* Clear the page's dirty flag in the radix tree. */
/* TODO: mucking with PageWriteback is silly. Add a generic function to clear
* the dirty bit from the radix tree for filesystems that don't have to wait
* for page writeback to finish (i.e. any compressing filesystem).
*/
static void clear_radix_tree_dirty(struct page *page)
{
BUG_ON(PagePrivate(page) || page->private);
set_page_writeback(page);
end_page_writeback(page);
}
static int __logfs_writepage(struct page *page)
{
struct inode *inode = page->mapping->host;
int err;
err = logfs_write_buf(inode, page, WF_LOCK);
if (err)
set_page_dirty(page);
else
clear_radix_tree_dirty(page);
unlock_page(page);
return err;
}
static int logfs_writepage(struct page *page, struct writeback_control *wbc)
{
struct inode *inode = page->mapping->host;
loff_t i_size = i_size_read(inode);
pgoff_t end_index = i_size >> PAGE_SHIFT;
unsigned offset;
u64 bix;
level_t level;
log_file("logfs_writepage(%lx, %lx, %p)\n", inode->i_ino, page->index,
page);
logfs_unpack_index(page->index, &bix, &level);
/* Indirect blocks are never truncated */
if (level != 0)
return __logfs_writepage(page);
/*
* TODO: everything below is a near-verbatim copy of nobh_writepage().
* The relevant bits should be factored out after logfs is merged.
*/
/* Is the page fully inside i_size? */
if (bix < end_index)
return __logfs_writepage(page);
/* Is the page fully outside i_size? (truncate in progress) */
offset = i_size & (PAGE_SIZE-1);
if (bix > end_index || offset == 0) {
unlock_page(page);
return 0; /* don't care */
}
/*
* The page straddles i_size. It must be zeroed out on each and every
* writepage invokation because it may be mmapped. "A file is mapped
* in multiples of the page size. For a file that is not a multiple of
* the page size, the remaining memory is zeroed when mapped, and
* writes to that region are not written out to the file."
*/
zero_user_segment(page, offset, PAGE_SIZE);
return __logfs_writepage(page);
}
static void logfs_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
struct logfs_block *block = logfs_block(page);
if (block->reserved_bytes) {
struct super_block *sb = page->mapping->host->i_sb;
struct logfs_super *super = logfs_super(sb);
super->s_dirty_pages -= block->reserved_bytes;
block->ops->free_block(sb, block);
BUG_ON(bitmap_weight(block->alias_map, LOGFS_BLOCK_FACTOR));
} else
move_page_to_btree(page);
BUG_ON(PagePrivate(page) || page->private);
}
static int logfs_releasepage(struct page *page, gfp_t only_xfs_uses_this)
{
return 0; /* None of these are easy to release */
}
long logfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
struct inode *inode = file_inode(file);
struct logfs_inode *li = logfs_inode(inode);
unsigned int oldflags, flags;
int err;
switch (cmd) {
case FS_IOC_GETFLAGS:
flags = li->li_flags & LOGFS_FL_USER_VISIBLE;
return put_user(flags, (int __user *)arg);
case FS_IOC_SETFLAGS:
if (IS_RDONLY(inode))
return -EROFS;
if (!inode_owner_or_capable(inode))
return -EACCES;
err = get_user(flags, (int __user *)arg);
if (err)
return err;
inode_lock(inode);
oldflags = li->li_flags;
flags &= LOGFS_FL_USER_MODIFIABLE;
flags |= oldflags & ~LOGFS_FL_USER_MODIFIABLE;
li->li_flags = flags;
inode_unlock(inode);
inode->i_ctime = current_time(inode);
mark_inode_dirty_sync(inode);
return 0;
default:
return -ENOTTY;
}
}
int logfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
struct super_block *sb = file->f_mapping->host->i_sb;
struct inode *inode = file->f_mapping->host;
int ret;
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret)
return ret;
inode_lock(inode);
logfs_get_wblocks(sb, NULL, WF_LOCK);
logfs_write_anchor(sb);
logfs_put_wblocks(sb, NULL, WF_LOCK);
inode_unlock(inode);
return 0;
}
static int logfs_setattr(struct dentry *dentry, struct iattr *attr)
{
struct inode *inode = d_inode(dentry);
int err = 0;
err = setattr_prepare(dentry, attr);
if (err)
return err;
if (attr->ia_valid & ATTR_SIZE) {
err = logfs_truncate(inode, attr->ia_size);
if (err)
return err;
}
setattr_copy(inode, attr);
mark_inode_dirty(inode);
return 0;
}
const struct inode_operations logfs_reg_iops = {
.setattr = logfs_setattr,
};
const struct file_operations logfs_reg_fops = {
.read_iter = generic_file_read_iter,
.write_iter = generic_file_write_iter,
.fsync = logfs_fsync,
.unlocked_ioctl = logfs_ioctl,
.llseek = generic_file_llseek,
.mmap = generic_file_readonly_mmap,
.open = generic_file_open,
};
const struct address_space_operations logfs_reg_aops = {
.invalidatepage = logfs_invalidatepage,
.readpage = logfs_readpage,
.releasepage = logfs_releasepage,
.set_page_dirty = __set_page_dirty_nobuffers,
.writepage = logfs_writepage,
.writepages = generic_writepages,
.write_begin = logfs_write_begin,
.write_end = logfs_write_end,
};

View File

@ -1,732 +0,0 @@
/*
* fs/logfs/gc.c - garbage collection code
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/sched.h>
#include <linux/slab.h>
/*
* Wear leveling needs to kick in when the difference between low erase
* counts and high erase counts gets too big. A good value for "too big"
* may be somewhat below 10% of maximum erase count for the device.
* Why not 397, to pick a nice round number with no specific meaning? :)
*
* WL_RATELIMIT is the minimum time between two wear level events. A huge
* number of segments may fulfil the requirements for wear leveling at the
* same time. If that happens we don't want to cause a latency from hell,
* but just gently pick one segment every so often and minimize overhead.
*/
#define WL_DELTA 397
#define WL_RATELIMIT 100
#define MAX_OBJ_ALIASES 2600
#define SCAN_RATIO 512 /* number of scanned segments per gc'd segment */
#define LIST_SIZE 64 /* base size of candidate lists */
#define SCAN_ROUNDS 128 /* maximum number of complete medium scans */
#define SCAN_ROUNDS_HIGH 4 /* maximum number of higher-level scans */
static int no_free_segments(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
return super->s_free_list.count;
}
/* journal has distance -1, top-most ifile layer distance 0 */
static u8 root_distance(struct super_block *sb, gc_level_t __gc_level)
{
struct logfs_super *super = logfs_super(sb);
u8 gc_level = (__force u8)__gc_level;
switch (gc_level) {
case 0: /* fall through */
case 1: /* fall through */
case 2: /* fall through */
case 3:
/* file data or indirect blocks */
return super->s_ifile_levels + super->s_iblock_levels - gc_level;
case 6: /* fall through */
case 7: /* fall through */
case 8: /* fall through */
case 9:
/* inode file data or indirect blocks */
return super->s_ifile_levels - (gc_level - 6);
default:
printk(KERN_ERR"LOGFS: segment of unknown level %x found\n",
gc_level);
WARN_ON(1);
return super->s_ifile_levels + super->s_iblock_levels;
}
}
static int segment_is_reserved(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area;
void *reserved;
int i;
/* Some segments are reserved. Just pretend they were all valid */
reserved = btree_lookup32(&super->s_reserved_segments, segno);
if (reserved)
return 1;
/* Currently open segments */
for_each_area(i) {
area = super->s_area[i];
if (area->a_is_open && area->a_segno == segno)
return 1;
}
return 0;
}
static void logfs_mark_segment_bad(struct super_block *sb, u32 segno)
{
BUG();
}
/*
* Returns the bytes consumed by valid objects in this segment. Object headers
* are counted, the segment header is not.
*/
static u32 logfs_valid_bytes(struct super_block *sb, u32 segno, u32 *ec,
gc_level_t *gc_level)
{
struct logfs_segment_entry se;
u32 ec_level;
logfs_get_segment_entry(sb, segno, &se);
if (se.ec_level == cpu_to_be32(BADSEG) ||
se.valid == cpu_to_be32(RESERVED))
return RESERVED;
ec_level = be32_to_cpu(se.ec_level);
*ec = ec_level >> 4;
*gc_level = GC_LEVEL(ec_level & 0xf);
return be32_to_cpu(se.valid);
}
static void logfs_cleanse_block(struct super_block *sb, u64 ofs, u64 ino,
u64 bix, gc_level_t gc_level)
{
struct inode *inode;
int err, cookie;
inode = logfs_safe_iget(sb, ino, &cookie);
err = logfs_rewrite_block(inode, bix, ofs, gc_level, 0);
BUG_ON(err);
logfs_safe_iput(inode, cookie);
}
static u32 logfs_gc_segment(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_segment_header sh;
struct logfs_object_header oh;
u64 ofs, ino, bix;
u32 seg_ofs, logical_segno, cleaned = 0;
int err, len, valid;
gc_level_t gc_level;
LOGFS_BUG_ON(segment_is_reserved(sb, segno), sb);
btree_insert32(&super->s_reserved_segments, segno, (void *)1, GFP_NOFS);
err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), &sh);
BUG_ON(err);
gc_level = GC_LEVEL(sh.level);
logical_segno = be32_to_cpu(sh.segno);
if (sh.crc != logfs_crc32(&sh, sizeof(sh), 4)) {
logfs_mark_segment_bad(sb, segno);
cleaned = -1;
goto out;
}
for (seg_ofs = LOGFS_SEGMENT_HEADERSIZE;
seg_ofs + sizeof(oh) < super->s_segsize; ) {
ofs = dev_ofs(sb, logical_segno, seg_ofs);
err = wbuf_read(sb, dev_ofs(sb, segno, seg_ofs), sizeof(oh),
&oh);
BUG_ON(err);
if (!memchr_inv(&oh, 0xff, sizeof(oh)))
break;
if (oh.crc != logfs_crc32(&oh, sizeof(oh) - 4, 4)) {
logfs_mark_segment_bad(sb, segno);
cleaned = super->s_segsize - 1;
goto out;
}
ino = be64_to_cpu(oh.ino);
bix = be64_to_cpu(oh.bix);
len = sizeof(oh) + be16_to_cpu(oh.len);
valid = logfs_is_valid_block(sb, ofs, ino, bix, gc_level);
if (valid == 1) {
logfs_cleanse_block(sb, ofs, ino, bix, gc_level);
cleaned += len;
} else if (valid == 2) {
/* Will be invalid upon journal commit */
cleaned += len;
}
seg_ofs += len;
}
out:
btree_remove32(&super->s_reserved_segments, segno);
return cleaned;
}
static struct gc_candidate *add_list(struct gc_candidate *cand,
struct candidate_list *list)
{
struct rb_node **p = &list->rb_tree.rb_node;
struct rb_node *parent = NULL;
struct gc_candidate *cur;
int comp;
cand->list = list;
while (*p) {
parent = *p;
cur = rb_entry(parent, struct gc_candidate, rb_node);
if (list->sort_by_ec)
comp = cand->erase_count < cur->erase_count;
else
comp = cand->valid < cur->valid;
if (comp)
p = &parent->rb_left;
else
p = &parent->rb_right;
}
rb_link_node(&cand->rb_node, parent, p);
rb_insert_color(&cand->rb_node, &list->rb_tree);
if (list->count <= list->maxcount) {
list->count++;
return NULL;
}
cand = rb_entry(rb_last(&list->rb_tree), struct gc_candidate, rb_node);
rb_erase(&cand->rb_node, &list->rb_tree);
cand->list = NULL;
return cand;
}
static void remove_from_list(struct gc_candidate *cand)
{
struct candidate_list *list = cand->list;
rb_erase(&cand->rb_node, &list->rb_tree);
list->count--;
}
static void free_candidate(struct super_block *sb, struct gc_candidate *cand)
{
struct logfs_super *super = logfs_super(sb);
btree_remove32(&super->s_cand_tree, cand->segno);
kfree(cand);
}
u32 get_best_cand(struct super_block *sb, struct candidate_list *list, u32 *ec)
{
struct gc_candidate *cand;
u32 segno;
BUG_ON(list->count == 0);
cand = rb_entry(rb_first(&list->rb_tree), struct gc_candidate, rb_node);
remove_from_list(cand);
segno = cand->segno;
if (ec)
*ec = cand->erase_count;
free_candidate(sb, cand);
return segno;
}
/*
* We have several lists to manage segments with. The reserve_list is used to
* deal with bad blocks. We try to keep the best (lowest ec) segments on this
* list.
* The free_list contains free segments for normal usage. It usually gets the
* second pick after the reserve_list. But when the free_list is running short
* it is more important to keep the free_list full than to keep a reserve.
*
* Segments that are not free are put onto a per-level low_list. If we have
* to run garbage collection, we pick a candidate from there. All segments on
* those lists should have at least some free space so GC will make progress.
*
* And last we have the ec_list, which is used to pick segments for wear
* leveling.
*
* If all appropriate lists are full, we simply free the candidate and forget
* about that segment for a while. We have better candidates for each purpose.
*/
static void __add_candidate(struct super_block *sb, struct gc_candidate *cand)
{
struct logfs_super *super = logfs_super(sb);
u32 full = super->s_segsize - LOGFS_SEGMENT_RESERVE;
if (cand->valid == 0) {
/* 100% free segments */
log_gc_noisy("add reserve segment %x (ec %x) at %llx\n",
cand->segno, cand->erase_count,
dev_ofs(sb, cand->segno, 0));
cand = add_list(cand, &super->s_reserve_list);
if (cand) {
log_gc_noisy("add free segment %x (ec %x) at %llx\n",
cand->segno, cand->erase_count,
dev_ofs(sb, cand->segno, 0));
cand = add_list(cand, &super->s_free_list);
}
} else {
/* good candidates for Garbage Collection */
if (cand->valid < full)
cand = add_list(cand, &super->s_low_list[cand->dist]);
/* good candidates for wear leveling,
* segments that were recently written get ignored */
if (cand)
cand = add_list(cand, &super->s_ec_list);
}
if (cand)
free_candidate(sb, cand);
}
static int add_candidate(struct super_block *sb, u32 segno, u32 valid, u32 ec,
u8 dist)
{
struct logfs_super *super = logfs_super(sb);
struct gc_candidate *cand;
cand = kmalloc(sizeof(*cand), GFP_NOFS);
if (!cand)
return -ENOMEM;
cand->segno = segno;
cand->valid = valid;
cand->erase_count = ec;
cand->dist = dist;
btree_insert32(&super->s_cand_tree, segno, cand, GFP_NOFS);
__add_candidate(sb, cand);
return 0;
}
static void remove_segment_from_lists(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct gc_candidate *cand;
cand = btree_lookup32(&super->s_cand_tree, segno);
if (cand) {
remove_from_list(cand);
free_candidate(sb, cand);
}
}
static void scan_segment(struct super_block *sb, u32 segno)
{
u32 valid, ec = 0;
gc_level_t gc_level = 0;
u8 dist;
if (segment_is_reserved(sb, segno))
return;
remove_segment_from_lists(sb, segno);
valid = logfs_valid_bytes(sb, segno, &ec, &gc_level);
if (valid == RESERVED)
return;
dist = root_distance(sb, gc_level);
add_candidate(sb, segno, valid, ec, dist);
}
static struct gc_candidate *first_in_list(struct candidate_list *list)
{
if (list->count == 0)
return NULL;
return rb_entry(rb_first(&list->rb_tree), struct gc_candidate, rb_node);
}
/*
* Find the best segment for garbage collection. Main criterion is
* the segment requiring the least effort to clean. Secondary
* criterion is to GC on the lowest level available.
*
* So we search the least effort segment on the lowest level first,
* then move up and pick another segment iff is requires significantly
* less effort. Hence the LOGFS_MAX_OBJECTSIZE in the comparison.
*/
static struct gc_candidate *get_candidate(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i, max_dist;
struct gc_candidate *cand = NULL, *this;
max_dist = min(no_free_segments(sb), LOGFS_NO_AREAS - 1);
for (i = max_dist; i >= 0; i--) {
this = first_in_list(&super->s_low_list[i]);
if (!this)
continue;
if (!cand)
cand = this;
if (this->valid + LOGFS_MAX_OBJECTSIZE <= cand->valid)
cand = this;
}
return cand;
}
static int __logfs_gc_once(struct super_block *sb, struct gc_candidate *cand)
{
struct logfs_super *super = logfs_super(sb);
gc_level_t gc_level;
u32 cleaned, valid, segno, ec;
u8 dist;
if (!cand) {
log_gc("GC attempted, but no candidate found\n");
return 0;
}
segno = cand->segno;
dist = cand->dist;
valid = logfs_valid_bytes(sb, segno, &ec, &gc_level);
free_candidate(sb, cand);
log_gc("GC segment #%02x at %llx, %x required, %x free, %x valid, %llx free\n",
segno, (u64)segno << super->s_segshift,
dist, no_free_segments(sb), valid,
super->s_free_bytes);
cleaned = logfs_gc_segment(sb, segno);
log_gc("GC segment #%02x complete - now %x valid\n", segno,
valid - cleaned);
BUG_ON(cleaned != valid);
return 1;
}
static int logfs_gc_once(struct super_block *sb)
{
struct gc_candidate *cand;
cand = get_candidate(sb);
if (cand)
remove_from_list(cand);
return __logfs_gc_once(sb, cand);
}
/* returns 1 if a wrap occurs, 0 otherwise */
static int logfs_scan_some(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
u32 segno;
int i, ret = 0;
segno = super->s_sweeper;
for (i = SCAN_RATIO; i > 0; i--) {
segno++;
if (segno >= super->s_no_segs) {
segno = 0;
ret = 1;
/* Break out of the loop. We want to read a single
* block from the segment size on next invocation if
* SCAN_RATIO is set to match block size
*/
break;
}
scan_segment(sb, segno);
}
super->s_sweeper = segno;
return ret;
}
/*
* In principle, this function should loop forever, looking for GC candidates
* and moving data. LogFS is designed in such a way that this loop is
* guaranteed to terminate.
*
* Limiting the loop to some iterations serves purely to catch cases when
* these guarantees have failed. An actual endless loop is an obvious bug
* and should be reported as such.
*/
static void __logfs_gc_pass(struct super_block *sb, int target)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_block *block;
int round, progress, last_progress = 0;
/*
* Doing too many changes to the segfile at once would result
* in a large number of aliases. Write the journal before
* things get out of hand.
*/
if (super->s_shadow_tree.no_shadowed_segments >= MAX_OBJ_ALIASES)
logfs_write_anchor(sb);
if (no_free_segments(sb) >= target &&
super->s_no_object_aliases < MAX_OBJ_ALIASES)
return;
log_gc("__logfs_gc_pass(%x)\n", target);
for (round = 0; round < SCAN_ROUNDS; ) {
if (no_free_segments(sb) >= target)
goto write_alias;
/* Sync in-memory state with on-medium state in case they
* diverged */
logfs_write_anchor(sb);
round += logfs_scan_some(sb);
if (no_free_segments(sb) >= target)
goto write_alias;
progress = logfs_gc_once(sb);
if (progress)
last_progress = round;
else if (round - last_progress > 2)
break;
continue;
/*
* The goto logic is nasty, I just don't know a better way to
* code it. GC is supposed to ensure two things:
* 1. Enough free segments are available.
* 2. The number of aliases is bounded.
* When 1. is achieved, we take a look at 2. and write back
* some alias-containing blocks, if necessary. However, after
* each such write we need to go back to 1., as writes can
* consume free segments.
*/
write_alias:
if (super->s_no_object_aliases < MAX_OBJ_ALIASES)
return;
if (list_empty(&super->s_object_alias)) {
/* All aliases are still in btree */
return;
}
log_gc("Write back one alias\n");
block = list_entry(super->s_object_alias.next,
struct logfs_block, alias_list);
block->ops->write_block(block);
/*
* To round off the nasty goto logic, we reset round here. It
* is a safety-net for GC not making any progress and limited
* to something reasonably small. If incremented it for every
* single alias, the loop could terminate rather quickly.
*/
round = 0;
}
LOGFS_BUG(sb);
}
static int wl_ratelimit(struct super_block *sb, u64 *next_event)
{
struct logfs_super *super = logfs_super(sb);
if (*next_event < super->s_gec) {
*next_event = super->s_gec + WL_RATELIMIT;
return 0;
}
return 1;
}
static void logfs_wl_pass(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct gc_candidate *wl_cand, *free_cand;
if (wl_ratelimit(sb, &super->s_wl_gec_ostore))
return;
wl_cand = first_in_list(&super->s_ec_list);
if (!wl_cand)
return;
free_cand = first_in_list(&super->s_free_list);
if (!free_cand)
return;
if (wl_cand->erase_count < free_cand->erase_count + WL_DELTA) {
remove_from_list(wl_cand);
__logfs_gc_once(sb, wl_cand);
}
}
/*
* The journal needs wear leveling as well. But moving the journal is an
* expensive operation so we try to avoid it as much as possible. And if we
* have to do it, we move the whole journal, not individual segments.
*
* Ratelimiting is not strictly necessary here, it mainly serves to avoid the
* calculations. First we check whether moving the journal would be a
* significant improvement. That means that a) the current journal segments
* have more wear than the future journal segments and b) the current journal
* segments have more wear than normal ostore segments.
* Rationale for b) is that we don't have to move the journal if it is aging
* less than the ostore, even if the reserve segments age even less (they are
* excluded from wear leveling, after all).
* Next we check that the superblocks have less wear than the journal. Since
* moving the journal requires writing the superblocks, we have to protect the
* superblocks even more than the journal.
*
* Also we double the acceptable wear difference, compared to ostore wear
* leveling. Journal data is read and rewritten rapidly, comparatively. So
* soft errors have much less time to accumulate and we allow the journal to
* be a bit worse than the ostore.
*/
static void logfs_journal_wl_pass(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct gc_candidate *cand;
u32 min_journal_ec = -1, max_reserve_ec = 0;
int i;
if (wl_ratelimit(sb, &super->s_wl_gec_journal))
return;
if (super->s_reserve_list.count < super->s_no_journal_segs) {
/* Reserve is not full enough to move complete journal */
return;
}
journal_for_each(i)
if (super->s_journal_seg[i])
min_journal_ec = min(min_journal_ec,
super->s_journal_ec[i]);
cand = rb_entry(rb_first(&super->s_free_list.rb_tree),
struct gc_candidate, rb_node);
max_reserve_ec = cand->erase_count;
for (i = 0; i < 2; i++) {
struct logfs_segment_entry se;
u32 segno = seg_no(sb, super->s_sb_ofs[i]);
u32 ec;
logfs_get_segment_entry(sb, segno, &se);
ec = be32_to_cpu(se.ec_level) >> 4;
max_reserve_ec = max(max_reserve_ec, ec);
}
if (min_journal_ec > max_reserve_ec + 2 * WL_DELTA) {
do_logfs_journal_wl_pass(sb);
}
}
void logfs_gc_pass(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
//BUG_ON(mutex_trylock(&logfs_super(sb)->s_w_mutex));
/* Write journal before free space is getting saturated with dirty
* objects.
*/
if (super->s_dirty_used_bytes + super->s_dirty_free_bytes
+ LOGFS_MAX_OBJECTSIZE >= super->s_free_bytes)
logfs_write_anchor(sb);
__logfs_gc_pass(sb, super->s_total_levels);
logfs_wl_pass(sb);
logfs_journal_wl_pass(sb);
}
static int check_area(struct super_block *sb, int i)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_area[i];
gc_level_t gc_level;
u32 cleaned, valid, ec;
u32 segno = area->a_segno;
u64 ofs = dev_ofs(sb, area->a_segno, area->a_written_bytes);
if (!area->a_is_open)
return 0;
if (super->s_devops->can_write_buf(sb, ofs) == 0)
return 0;
printk(KERN_INFO"LogFS: Possibly incomplete write at %llx\n", ofs);
/*
* The device cannot write back the write buffer. Most likely the
* wbuf was already written out and the system crashed at some point
* before the journal commit happened. In that case we wouldn't have
* to do anything. But if the crash happened before the wbuf was
* written out correctly, we must GC this segment. So assume the
* worst and always do the GC run.
*/
area->a_is_open = 0;
valid = logfs_valid_bytes(sb, segno, &ec, &gc_level);
cleaned = logfs_gc_segment(sb, segno);
if (cleaned != valid)
return -EIO;
return 0;
}
int logfs_check_areas(struct super_block *sb)
{
int i, err;
for_each_area(i) {
err = check_area(sb, i);
if (err)
return err;
}
return 0;
}
static void logfs_init_candlist(struct candidate_list *list, int maxcount,
int sort_by_ec)
{
list->count = 0;
list->maxcount = maxcount;
list->sort_by_ec = sort_by_ec;
list->rb_tree = RB_ROOT;
}
int logfs_init_gc(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i;
btree_init_mempool32(&super->s_cand_tree, super->s_btree_pool);
logfs_init_candlist(&super->s_free_list, LIST_SIZE + SCAN_RATIO, 1);
logfs_init_candlist(&super->s_reserve_list,
super->s_bad_seg_reserve, 1);
for_each_area(i)
logfs_init_candlist(&super->s_low_list[i], LIST_SIZE, 0);
logfs_init_candlist(&super->s_ec_list, LIST_SIZE, 1);
return 0;
}
static void logfs_cleanup_list(struct super_block *sb,
struct candidate_list *list)
{
struct gc_candidate *cand;
while (list->count) {
cand = rb_entry(list->rb_tree.rb_node, struct gc_candidate,
rb_node);
remove_from_list(cand);
free_candidate(sb, cand);
}
BUG_ON(list->rb_tree.rb_node);
}
void logfs_cleanup_gc(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i;
if (!super->s_free_list.count)
return;
/*
* FIXME: The btree may still contain a single empty node. So we
* call the grim visitor to clean up that mess. Btree code should
* do it for us, really.
*/
btree_grim_visitor32(&super->s_cand_tree, 0, NULL);
logfs_cleanup_list(sb, &super->s_free_list);
logfs_cleanup_list(sb, &super->s_reserve_list);
for_each_area(i)
logfs_cleanup_list(sb, &super->s_low_list[i]);
logfs_cleanup_list(sb, &super->s_ec_list);
}

View File

@ -1,428 +0,0 @@
/*
* fs/logfs/inode.c - inode handling code
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/slab.h>
#include <linux/writeback.h>
#include <linux/backing-dev.h>
/*
* How soon to reuse old inode numbers? LogFS doesn't store deleted inodes
* on the medium. It therefore also lacks a method to store the previous
* generation number for deleted inodes. Instead a single generation number
* is stored which will be used for new inodes. Being just a 32bit counter,
* this can obvious wrap relatively quickly. So we only reuse inodes if we
* know that a fair number of inodes can be created before we have to increment
* the generation again - effectively adding some bits to the counter.
* But being too aggressive here means we keep a very large and very sparse
* inode file, wasting space on indirect blocks.
* So what is a good value? Beats me. 64k seems moderately bad on both
* fronts, so let's use that for now...
*
* NFS sucks, as everyone already knows.
*/
#define INOS_PER_WRAP (0x10000)
/*
* Logfs' requirement to read inodes for garbage collection makes life a bit
* harder. GC may have to read inodes that are in I_FREEING state, when they
* are being written out - and waiting for GC to make progress, naturally.
*
* So we cannot just call iget() or some variant of it, but first have to check
* whether the inode in question might be in I_FREEING state. Therefore we
* maintain our own per-sb list of "almost deleted" inodes and check against
* that list first. Normally this should be at most 1-2 entries long.
*
* Also, inodes have logfs-specific reference counting on top of what the vfs
* does. When .destroy_inode is called, normally the reference count will drop
* to zero and the inode gets deleted. But if GC accessed the inode, its
* refcount will remain nonzero and final deletion will have to wait.
*
* As a result we have two sets of functions to get/put inodes:
* logfs_safe_iget/logfs_safe_iput - safe to call from GC context
* logfs_iget/iput - normal version
*/
static struct kmem_cache *logfs_inode_cache;
static DEFINE_SPINLOCK(logfs_inode_lock);
static void logfs_inode_setops(struct inode *inode)
{
switch (inode->i_mode & S_IFMT) {
case S_IFDIR:
inode->i_op = &logfs_dir_iops;
inode->i_fop = &logfs_dir_fops;
inode->i_mapping->a_ops = &logfs_reg_aops;
break;
case S_IFREG:
inode->i_op = &logfs_reg_iops;
inode->i_fop = &logfs_reg_fops;
inode->i_mapping->a_ops = &logfs_reg_aops;
break;
case S_IFLNK:
inode->i_op = &page_symlink_inode_operations;
inode_nohighmem(inode);
inode->i_mapping->a_ops = &logfs_reg_aops;
break;
case S_IFSOCK: /* fall through */
case S_IFBLK: /* fall through */
case S_IFCHR: /* fall through */
case S_IFIFO:
init_special_inode(inode, inode->i_mode, inode->i_rdev);
break;
default:
BUG();
}
}
static struct inode *__logfs_iget(struct super_block *sb, ino_t ino)
{
struct inode *inode = iget_locked(sb, ino);
int err;
if (!inode)
return ERR_PTR(-ENOMEM);
if (!(inode->i_state & I_NEW))
return inode;
err = logfs_read_inode(inode);
if (err || inode->i_nlink == 0) {
/* inode->i_nlink == 0 can be true when called from
* block validator */
/* set i_nlink to 0 to prevent caching */
clear_nlink(inode);
logfs_inode(inode)->li_flags |= LOGFS_IF_ZOMBIE;
iget_failed(inode);
if (!err)
err = -ENOENT;
return ERR_PTR(err);
}
logfs_inode_setops(inode);
unlock_new_inode(inode);
return inode;
}
struct inode *logfs_iget(struct super_block *sb, ino_t ino)
{
BUG_ON(ino == LOGFS_INO_MASTER);
BUG_ON(ino == LOGFS_INO_SEGFILE);
return __logfs_iget(sb, ino);
}
/*
* is_cached is set to 1 if we hand out a cached inode, 0 otherwise.
* this allows logfs_iput to do the right thing later
*/
struct inode *logfs_safe_iget(struct super_block *sb, ino_t ino, int *is_cached)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_inode *li;
if (ino == LOGFS_INO_MASTER)
return super->s_master_inode;
if (ino == LOGFS_INO_SEGFILE)
return super->s_segfile_inode;
spin_lock(&logfs_inode_lock);
list_for_each_entry(li, &super->s_freeing_list, li_freeing_list)
if (li->vfs_inode.i_ino == ino) {
li->li_refcount++;
spin_unlock(&logfs_inode_lock);
*is_cached = 1;
return &li->vfs_inode;
}
spin_unlock(&logfs_inode_lock);
*is_cached = 0;
return __logfs_iget(sb, ino);
}
static void logfs_i_callback(struct rcu_head *head)
{
struct inode *inode = container_of(head, struct inode, i_rcu);
kmem_cache_free(logfs_inode_cache, logfs_inode(inode));
}
static void __logfs_destroy_inode(struct inode *inode)
{
struct logfs_inode *li = logfs_inode(inode);
BUG_ON(li->li_block);
list_del(&li->li_freeing_list);
call_rcu(&inode->i_rcu, logfs_i_callback);
}
static void __logfs_destroy_meta_inode(struct inode *inode)
{
struct logfs_inode *li = logfs_inode(inode);
BUG_ON(li->li_block);
call_rcu(&inode->i_rcu, logfs_i_callback);
}
static void logfs_destroy_inode(struct inode *inode)
{
struct logfs_inode *li = logfs_inode(inode);
if (inode->i_ino < LOGFS_RESERVED_INOS) {
/*
* The reserved inodes are never destroyed unless we are in
* unmont path.
*/
__logfs_destroy_meta_inode(inode);
return;
}
BUG_ON(list_empty(&li->li_freeing_list));
spin_lock(&logfs_inode_lock);
li->li_refcount--;
if (li->li_refcount == 0)
__logfs_destroy_inode(inode);
spin_unlock(&logfs_inode_lock);
}
void logfs_safe_iput(struct inode *inode, int is_cached)
{
if (inode->i_ino == LOGFS_INO_MASTER)
return;
if (inode->i_ino == LOGFS_INO_SEGFILE)
return;
if (is_cached) {
logfs_destroy_inode(inode);
return;
}
iput(inode);
}
static void logfs_init_inode(struct super_block *sb, struct inode *inode)
{
struct logfs_inode *li = logfs_inode(inode);
int i;
li->li_flags = 0;
li->li_height = 0;
li->li_used_bytes = 0;
li->li_block = NULL;
i_uid_write(inode, 0);
i_gid_write(inode, 0);
inode->i_size = 0;
inode->i_blocks = 0;
inode->i_ctime = current_time(inode);
inode->i_mtime = current_time(inode);
li->li_refcount = 1;
INIT_LIST_HEAD(&li->li_freeing_list);
for (i = 0; i < LOGFS_EMBEDDED_FIELDS; i++)
li->li_data[i] = 0;
return;
}
static struct inode *logfs_alloc_inode(struct super_block *sb)
{
struct logfs_inode *li;
li = kmem_cache_alloc(logfs_inode_cache, GFP_NOFS);
if (!li)
return NULL;
logfs_init_inode(sb, &li->vfs_inode);
return &li->vfs_inode;
}
/*
* In logfs inodes are written to an inode file. The inode file, like any
* other file, is managed with a inode. The inode file's inode, aka master
* inode, requires special handling in several respects. First, it cannot be
* written to the inode file, so it is stored in the journal instead.
*
* Secondly, this inode cannot be written back and destroyed before all other
* inodes have been written. The ordering is important. Linux' VFS is happily
* unaware of the ordering constraint and would ordinarily destroy the master
* inode at umount time while other inodes are still in use and dirty. Not
* good.
*
* So logfs makes sure the master inode is not written until all other inodes
* have been destroyed. Sadly, this method has another side-effect. The VFS
* will notice one remaining inode and print a frightening warning message.
* Worse, it is impossible to judge whether such a warning was caused by the
* master inode or any other inodes have leaked as well.
*
* Our attempt of solving this is with logfs_new_meta_inode() below. Its
* purpose is to create a new inode that will not trigger the warning if such
* an inode is still in use. An ugly hack, no doubt. Suggections for
* improvement are welcome.
*
* AV: that's what ->put_super() is for...
*/
struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino)
{
struct inode *inode;
inode = new_inode(sb);
if (!inode)
return ERR_PTR(-ENOMEM);
inode->i_mode = S_IFREG;
inode->i_ino = ino;
inode->i_data.a_ops = &logfs_reg_aops;
mapping_set_gfp_mask(&inode->i_data, GFP_NOFS);
return inode;
}
struct inode *logfs_read_meta_inode(struct super_block *sb, u64 ino)
{
struct inode *inode;
int err;
inode = logfs_new_meta_inode(sb, ino);
if (IS_ERR(inode))
return inode;
err = logfs_read_inode(inode);
if (err) {
iput(inode);
return ERR_PTR(err);
}
logfs_inode_setops(inode);
return inode;
}
static int logfs_write_inode(struct inode *inode, struct writeback_control *wbc)
{
int ret;
long flags = WF_LOCK;
/* Can only happen if creat() failed. Safe to skip. */
if (logfs_inode(inode)->li_flags & LOGFS_IF_STILLBORN)
return 0;
ret = __logfs_write_inode(inode, NULL, flags);
LOGFS_BUG_ON(ret, inode->i_sb);
return ret;
}
/* called with inode->i_lock held */
static int logfs_drop_inode(struct inode *inode)
{
struct logfs_super *super = logfs_super(inode->i_sb);
struct logfs_inode *li = logfs_inode(inode);
spin_lock(&logfs_inode_lock);
list_move(&li->li_freeing_list, &super->s_freeing_list);
spin_unlock(&logfs_inode_lock);
return generic_drop_inode(inode);
}
static void logfs_set_ino_generation(struct super_block *sb,
struct inode *inode)
{
struct logfs_super *super = logfs_super(sb);
u64 ino;
mutex_lock(&super->s_journal_mutex);
ino = logfs_seek_hole(super->s_master_inode, super->s_last_ino + 1);
super->s_last_ino = ino;
super->s_inos_till_wrap--;
if (super->s_inos_till_wrap < 0) {
super->s_last_ino = LOGFS_RESERVED_INOS;
super->s_generation++;
super->s_inos_till_wrap = INOS_PER_WRAP;
}
inode->i_ino = ino;
inode->i_generation = super->s_generation;
mutex_unlock(&super->s_journal_mutex);
}
struct inode *logfs_new_inode(struct inode *dir, umode_t mode)
{
struct super_block *sb = dir->i_sb;
struct inode *inode;
inode = new_inode(sb);
if (!inode)
return ERR_PTR(-ENOMEM);
logfs_init_inode(sb, inode);
/* inherit parent flags */
logfs_inode(inode)->li_flags |=
logfs_inode(dir)->li_flags & LOGFS_FL_INHERITED;
inode->i_mode = mode;
logfs_set_ino_generation(sb, inode);
inode_init_owner(inode, dir, mode);
logfs_inode_setops(inode);
insert_inode_hash(inode);
return inode;
}
static void logfs_init_once(void *_li)
{
struct logfs_inode *li = _li;
int i;
li->li_flags = 0;
li->li_used_bytes = 0;
li->li_refcount = 1;
for (i = 0; i < LOGFS_EMBEDDED_FIELDS; i++)
li->li_data[i] = 0;
inode_init_once(&li->vfs_inode);
}
static int logfs_sync_fs(struct super_block *sb, int wait)
{
logfs_get_wblocks(sb, NULL, WF_LOCK);
logfs_write_anchor(sb);
logfs_put_wblocks(sb, NULL, WF_LOCK);
return 0;
}
static void logfs_put_super(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
/* kill the meta-inodes */
iput(super->s_segfile_inode);
iput(super->s_master_inode);
iput(super->s_mapping_inode);
}
const struct super_operations logfs_super_operations = {
.alloc_inode = logfs_alloc_inode,
.destroy_inode = logfs_destroy_inode,
.evict_inode = logfs_evict_inode,
.drop_inode = logfs_drop_inode,
.put_super = logfs_put_super,
.write_inode = logfs_write_inode,
.statfs = logfs_statfs,
.sync_fs = logfs_sync_fs,
};
int logfs_init_inode_cache(void)
{
logfs_inode_cache = kmem_cache_create("logfs_inode_cache",
sizeof(struct logfs_inode), 0,
SLAB_RECLAIM_ACCOUNT|SLAB_ACCOUNT,
logfs_init_once);
if (!logfs_inode_cache)
return -ENOMEM;
return 0;
}
void logfs_destroy_inode_cache(void)
{
/*
* Make sure all delayed rcu free inodes are flushed before we
* destroy cache.
*/
rcu_barrier();
kmem_cache_destroy(logfs_inode_cache);
}

View File

@ -1,894 +0,0 @@
/*
* fs/logfs/journal.c - journal handling code
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*/
#include "logfs.h"
#include <linux/slab.h>
static void logfs_calc_free(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
u64 reserve, no_segs = super->s_no_segs;
s64 free;
int i;
/* superblock segments */
no_segs -= 2;
super->s_no_journal_segs = 0;
/* journal */
journal_for_each(i)
if (super->s_journal_seg[i]) {
no_segs--;
super->s_no_journal_segs++;
}
/* open segments plus one extra per level for GC */
no_segs -= 2 * super->s_total_levels;
free = no_segs * (super->s_segsize - LOGFS_SEGMENT_RESERVE);
free -= super->s_used_bytes;
/* just a bit extra */
free -= super->s_total_levels * 4096;
/* Bad blocks are 'paid' for with speed reserve - the filesystem
* simply gets slower as bad blocks accumulate. Until the bad blocks
* exceed the speed reserve - then the filesystem gets smaller.
*/
reserve = super->s_bad_segments + super->s_bad_seg_reserve;
reserve *= super->s_segsize - LOGFS_SEGMENT_RESERVE;
reserve = max(reserve, super->s_speed_reserve);
free -= reserve;
if (free < 0)
free = 0;
super->s_free_bytes = free;
}
static void reserve_sb_and_journal(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct btree_head32 *head = &super->s_reserved_segments;
int i, err;
err = btree_insert32(head, seg_no(sb, super->s_sb_ofs[0]), (void *)1,
GFP_KERNEL);
BUG_ON(err);
err = btree_insert32(head, seg_no(sb, super->s_sb_ofs[1]), (void *)1,
GFP_KERNEL);
BUG_ON(err);
journal_for_each(i) {
if (!super->s_journal_seg[i])
continue;
err = btree_insert32(head, super->s_journal_seg[i], (void *)1,
GFP_KERNEL);
BUG_ON(err);
}
}
static void read_dynsb(struct super_block *sb,
struct logfs_je_dynsb *dynsb)
{
struct logfs_super *super = logfs_super(sb);
super->s_gec = be64_to_cpu(dynsb->ds_gec);
super->s_sweeper = be64_to_cpu(dynsb->ds_sweeper);
super->s_victim_ino = be64_to_cpu(dynsb->ds_victim_ino);
super->s_rename_dir = be64_to_cpu(dynsb->ds_rename_dir);
super->s_rename_pos = be64_to_cpu(dynsb->ds_rename_pos);
super->s_used_bytes = be64_to_cpu(dynsb->ds_used_bytes);
super->s_generation = be32_to_cpu(dynsb->ds_generation);
}
static void read_anchor(struct super_block *sb,
struct logfs_je_anchor *da)
{
struct logfs_super *super = logfs_super(sb);
struct inode *inode = super->s_master_inode;
struct logfs_inode *li = logfs_inode(inode);
int i;
super->s_last_ino = be64_to_cpu(da->da_last_ino);
li->li_flags = 0;
li->li_height = da->da_height;
i_size_write(inode, be64_to_cpu(da->da_size));
li->li_used_bytes = be64_to_cpu(da->da_used_bytes);
for (i = 0; i < LOGFS_EMBEDDED_FIELDS; i++)
li->li_data[i] = be64_to_cpu(da->da_data[i]);
}
static void read_erasecount(struct super_block *sb,
struct logfs_je_journal_ec *ec)
{
struct logfs_super *super = logfs_super(sb);
int i;
journal_for_each(i)
super->s_journal_ec[i] = be32_to_cpu(ec->ec[i]);
}
static int read_area(struct super_block *sb, struct logfs_je_area *a)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_area[a->gc_level];
u64 ofs;
u32 writemask = ~(super->s_writesize - 1);
if (a->gc_level >= LOGFS_NO_AREAS)
return -EIO;
if (a->vim != VIM_DEFAULT)
return -EIO; /* TODO: close area and continue */
area->a_used_bytes = be32_to_cpu(a->used_bytes);
area->a_written_bytes = area->a_used_bytes & writemask;
area->a_segno = be32_to_cpu(a->segno);
if (area->a_segno)
area->a_is_open = 1;
ofs = dev_ofs(sb, area->a_segno, area->a_written_bytes);
if (super->s_writesize > 1)
return logfs_buf_recover(area, ofs, a + 1, super->s_writesize);
else
return logfs_buf_recover(area, ofs, NULL, 0);
}
static void *unpack(void *from, void *to)
{
struct logfs_journal_header *jh = from;
void *data = from + sizeof(struct logfs_journal_header);
int err;
size_t inlen, outlen;
inlen = be16_to_cpu(jh->h_len);
outlen = be16_to_cpu(jh->h_datalen);
if (jh->h_compr == COMPR_NONE)
memcpy(to, data, inlen);
else {
err = logfs_uncompress(data, to, inlen, outlen);
BUG_ON(err);
}
return to;
}
static int __read_je_header(struct super_block *sb, u64 ofs,
struct logfs_journal_header *jh)
{
struct logfs_super *super = logfs_super(sb);
size_t bufsize = max_t(size_t, sb->s_blocksize, super->s_writesize)
+ MAX_JOURNAL_HEADER;
u16 type, len, datalen;
int err;
/* read header only */
err = wbuf_read(sb, ofs, sizeof(*jh), jh);
if (err)
return err;
type = be16_to_cpu(jh->h_type);
len = be16_to_cpu(jh->h_len);
datalen = be16_to_cpu(jh->h_datalen);
if (len > sb->s_blocksize)
return -EIO;
if ((type < JE_FIRST) || (type > JE_LAST))
return -EIO;
if (datalen > bufsize)
return -EIO;
return 0;
}
static int __read_je_payload(struct super_block *sb, u64 ofs,
struct logfs_journal_header *jh)
{
u16 len;
int err;
len = be16_to_cpu(jh->h_len);
err = wbuf_read(sb, ofs + sizeof(*jh), len, jh + 1);
if (err)
return err;
if (jh->h_crc != logfs_crc32(jh, len + sizeof(*jh), 4)) {
/* Old code was confused. It forgot about the header length
* and stopped calculating the crc 16 bytes before the end
* of data - ick!
* FIXME: Remove this hack once the old code is fixed.
*/
if (jh->h_crc == logfs_crc32(jh, len, 4))
WARN_ON_ONCE(1);
else
return -EIO;
}
return 0;
}
/*
* jh needs to be large enough to hold the complete entry, not just the header
*/
static int __read_je(struct super_block *sb, u64 ofs,
struct logfs_journal_header *jh)
{
int err;
err = __read_je_header(sb, ofs, jh);
if (err)
return err;
return __read_je_payload(sb, ofs, jh);
}
static int read_je(struct super_block *sb, u64 ofs)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_journal_header *jh = super->s_compressed_je;
void *scratch = super->s_je;
u16 type, datalen;
int err;
err = __read_je(sb, ofs, jh);
if (err)
return err;
type = be16_to_cpu(jh->h_type);
datalen = be16_to_cpu(jh->h_datalen);
switch (type) {
case JE_DYNSB:
read_dynsb(sb, unpack(jh, scratch));
break;
case JE_ANCHOR:
read_anchor(sb, unpack(jh, scratch));
break;
case JE_ERASECOUNT:
read_erasecount(sb, unpack(jh, scratch));
break;
case JE_AREA:
err = read_area(sb, unpack(jh, scratch));
break;
case JE_OBJ_ALIAS:
err = logfs_load_object_aliases(sb, unpack(jh, scratch),
datalen);
break;
default:
WARN_ON_ONCE(1);
return -EIO;
}
return err;
}
static int logfs_read_segment(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_journal_header *jh = super->s_compressed_je;
u64 ofs, seg_ofs = dev_ofs(sb, segno, 0);
u32 h_ofs, last_ofs = 0;
u16 len, datalen, last_len = 0;
int i, err;
/* search for most recent commit */
for (h_ofs = 0; h_ofs < super->s_segsize; h_ofs += sizeof(*jh)) {
ofs = seg_ofs + h_ofs;
err = __read_je_header(sb, ofs, jh);
if (err)
continue;
if (jh->h_type != cpu_to_be16(JE_COMMIT))
continue;
err = __read_je_payload(sb, ofs, jh);
if (err)
continue;
len = be16_to_cpu(jh->h_len);
datalen = be16_to_cpu(jh->h_datalen);
if ((datalen > sizeof(super->s_je_array)) ||
(datalen % sizeof(__be64)))
continue;
last_ofs = h_ofs;
last_len = datalen;
h_ofs += ALIGN(len, sizeof(*jh)) - sizeof(*jh);
}
/* read commit */
if (last_ofs == 0)
return -ENOENT;
ofs = seg_ofs + last_ofs;
log_journal("Read commit from %llx\n", ofs);
err = __read_je(sb, ofs, jh);
BUG_ON(err); /* We should have caught it in the scan loop already */
if (err)
return err;
/* uncompress */
unpack(jh, super->s_je_array);
super->s_no_je = last_len / sizeof(__be64);
/* iterate over array */
for (i = 0; i < super->s_no_je; i++) {
err = read_je(sb, be64_to_cpu(super->s_je_array[i]));
if (err)
return err;
}
super->s_journal_area->a_segno = segno;
return 0;
}
static u64 read_gec(struct super_block *sb, u32 segno)
{
struct logfs_segment_header sh;
__be32 crc;
int err;
if (!segno)
return 0;
err = wbuf_read(sb, dev_ofs(sb, segno, 0), sizeof(sh), &sh);
if (err)
return 0;
crc = logfs_crc32(&sh, sizeof(sh), 4);
if (crc != sh.crc) {
WARN_ON(sh.gec != cpu_to_be64(0xffffffffffffffffull));
/* Most likely it was just erased */
return 0;
}
return be64_to_cpu(sh.gec);
}
static int logfs_read_journal(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
u64 gec[LOGFS_JOURNAL_SEGS], max;
u32 segno;
int i, max_i;
max = 0;
max_i = -1;
journal_for_each(i) {
segno = super->s_journal_seg[i];
gec[i] = read_gec(sb, super->s_journal_seg[i]);
if (gec[i] > max) {
max = gec[i];
max_i = i;
}
}
if (max_i == -1)
return -EIO;
/* FIXME: Try older segments in case of error */
return logfs_read_segment(sb, super->s_journal_seg[max_i]);
}
/*
* First search the current segment (outer loop), then pick the next segment
* in the array, skipping any zero entries (inner loop).
*/
static void journal_get_free_segment(struct logfs_area *area)
{
struct logfs_super *super = logfs_super(area->a_sb);
int i;
journal_for_each(i) {
if (area->a_segno != super->s_journal_seg[i])
continue;
do {
i++;
if (i == LOGFS_JOURNAL_SEGS)
i = 0;
} while (!super->s_journal_seg[i]);
area->a_segno = super->s_journal_seg[i];
area->a_erase_count = ++(super->s_journal_ec[i]);
log_journal("Journal now at %x (ec %x)\n", area->a_segno,
area->a_erase_count);
return;
}
BUG();
}
static void journal_get_erase_count(struct logfs_area *area)
{
/* erase count is stored globally and incremented in
* journal_get_free_segment() - nothing to do here */
}
static int journal_erase_segment(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
union {
struct logfs_segment_header sh;
unsigned char c[ALIGN(sizeof(struct logfs_segment_header), 16)];
} u;
u64 ofs;
int err;
err = logfs_erase_segment(sb, area->a_segno, 1);
if (err)
return err;
memset(&u, 0, sizeof(u));
u.sh.pad = 0;
u.sh.type = SEG_JOURNAL;
u.sh.level = 0;
u.sh.segno = cpu_to_be32(area->a_segno);
u.sh.ec = cpu_to_be32(area->a_erase_count);
u.sh.gec = cpu_to_be64(logfs_super(sb)->s_gec);
u.sh.crc = logfs_crc32(&u.sh, sizeof(u.sh), 4);
/* This causes a bug in segment.c. Not yet. */
//logfs_set_segment_erased(sb, area->a_segno, area->a_erase_count, 0);
ofs = dev_ofs(sb, area->a_segno, 0);
area->a_used_bytes = sizeof(u);
logfs_buf_write(area, ofs, &u, sizeof(u));
return 0;
}
static size_t __logfs_write_header(struct logfs_super *super,
struct logfs_journal_header *jh, size_t len, size_t datalen,
u16 type, u8 compr)
{
jh->h_len = cpu_to_be16(len);
jh->h_type = cpu_to_be16(type);
jh->h_datalen = cpu_to_be16(datalen);
jh->h_compr = compr;
jh->h_pad[0] = 'H';
jh->h_pad[1] = 'E';
jh->h_pad[2] = 'A';
jh->h_pad[3] = 'D';
jh->h_pad[4] = 'R';
jh->h_crc = logfs_crc32(jh, len + sizeof(*jh), 4);
return ALIGN(len, 16) + sizeof(*jh);
}
static size_t logfs_write_header(struct logfs_super *super,
struct logfs_journal_header *jh, size_t datalen, u16 type)
{
size_t len = datalen;
return __logfs_write_header(super, jh, len, datalen, type, COMPR_NONE);
}
static inline size_t logfs_journal_erasecount_size(struct logfs_super *super)
{
return LOGFS_JOURNAL_SEGS * sizeof(__be32);
}
static void *logfs_write_erasecount(struct super_block *sb, void *_ec,
u16 *type, size_t *len)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_je_journal_ec *ec = _ec;
int i;
journal_for_each(i)
ec->ec[i] = cpu_to_be32(super->s_journal_ec[i]);
*type = JE_ERASECOUNT;
*len = logfs_journal_erasecount_size(super);
return ec;
}
static void account_shadow(void *_shadow, unsigned long _sb, u64 ignore,
size_t ignore2)
{
struct logfs_shadow *shadow = _shadow;
struct super_block *sb = (void *)_sb;
struct logfs_super *super = logfs_super(sb);
/* consume new space */
super->s_free_bytes -= shadow->new_len;
super->s_used_bytes += shadow->new_len;
super->s_dirty_used_bytes -= shadow->new_len;
/* free up old space */
super->s_free_bytes += shadow->old_len;
super->s_used_bytes -= shadow->old_len;
super->s_dirty_free_bytes -= shadow->old_len;
logfs_set_segment_used(sb, shadow->old_ofs, -shadow->old_len);
logfs_set_segment_used(sb, shadow->new_ofs, shadow->new_len);
log_journal("account_shadow(%llx, %llx, %x) %llx->%llx %x->%x\n",
shadow->ino, shadow->bix, shadow->gc_level,
shadow->old_ofs, shadow->new_ofs,
shadow->old_len, shadow->new_len);
mempool_free(shadow, super->s_shadow_pool);
}
static void account_shadows(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct inode *inode = super->s_master_inode;
struct logfs_inode *li = logfs_inode(inode);
struct shadow_tree *tree = &super->s_shadow_tree;
btree_grim_visitor64(&tree->new, (unsigned long)sb, account_shadow);
btree_grim_visitor64(&tree->old, (unsigned long)sb, account_shadow);
btree_grim_visitor32(&tree->segment_map, 0, NULL);
tree->no_shadowed_segments = 0;
if (li->li_block) {
/*
* We never actually use the structure, when attached to the
* master inode. But it is easier to always free it here than
* to have checks in several places elsewhere when allocating
* it.
*/
li->li_block->ops->free_block(sb, li->li_block);
}
BUG_ON((s64)li->li_used_bytes < 0);
}
static void *__logfs_write_anchor(struct super_block *sb, void *_da,
u16 *type, size_t *len)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_je_anchor *da = _da;
struct inode *inode = super->s_master_inode;
struct logfs_inode *li = logfs_inode(inode);
int i;
da->da_height = li->li_height;
da->da_last_ino = cpu_to_be64(super->s_last_ino);
da->da_size = cpu_to_be64(i_size_read(inode));
da->da_used_bytes = cpu_to_be64(li->li_used_bytes);
for (i = 0; i < LOGFS_EMBEDDED_FIELDS; i++)
da->da_data[i] = cpu_to_be64(li->li_data[i]);
*type = JE_ANCHOR;
*len = sizeof(*da);
return da;
}
static void *logfs_write_dynsb(struct super_block *sb, void *_dynsb,
u16 *type, size_t *len)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_je_dynsb *dynsb = _dynsb;
dynsb->ds_gec = cpu_to_be64(super->s_gec);
dynsb->ds_sweeper = cpu_to_be64(super->s_sweeper);
dynsb->ds_victim_ino = cpu_to_be64(super->s_victim_ino);
dynsb->ds_rename_dir = cpu_to_be64(super->s_rename_dir);
dynsb->ds_rename_pos = cpu_to_be64(super->s_rename_pos);
dynsb->ds_used_bytes = cpu_to_be64(super->s_used_bytes);
dynsb->ds_generation = cpu_to_be32(super->s_generation);
*type = JE_DYNSB;
*len = sizeof(*dynsb);
return dynsb;
}
static void write_wbuf(struct super_block *sb, struct logfs_area *area,
void *wbuf)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
u64 ofs;
pgoff_t index;
int page_ofs;
struct page *page;
ofs = dev_ofs(sb, area->a_segno,
area->a_used_bytes & ~(super->s_writesize - 1));
index = ofs >> PAGE_SHIFT;
page_ofs = ofs & (PAGE_SIZE - 1);
page = find_or_create_page(mapping, index, GFP_NOFS);
BUG_ON(!page);
memcpy(wbuf, page_address(page) + page_ofs, super->s_writesize);
unlock_page(page);
}
static void *logfs_write_area(struct super_block *sb, void *_a,
u16 *type, size_t *len)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_area[super->s_sum_index];
struct logfs_je_area *a = _a;
a->vim = VIM_DEFAULT;
a->gc_level = super->s_sum_index;
a->used_bytes = cpu_to_be32(area->a_used_bytes);
a->segno = cpu_to_be32(area->a_segno);
if (super->s_writesize > 1)
write_wbuf(sb, area, a + 1);
*type = JE_AREA;
*len = sizeof(*a) + super->s_writesize;
return a;
}
static void *logfs_write_commit(struct super_block *sb, void *h,
u16 *type, size_t *len)
{
struct logfs_super *super = logfs_super(sb);
*type = JE_COMMIT;
*len = super->s_no_je * sizeof(__be64);
return super->s_je_array;
}
static size_t __logfs_write_je(struct super_block *sb, void *buf, u16 type,
size_t len)
{
struct logfs_super *super = logfs_super(sb);
void *header = super->s_compressed_je;
void *data = header + sizeof(struct logfs_journal_header);
ssize_t compr_len, pad_len;
u8 compr = COMPR_ZLIB;
if (len == 0)
return logfs_write_header(super, header, 0, type);
compr_len = logfs_compress(buf, data, len, sb->s_blocksize);
if (compr_len < 0 || type == JE_ANCHOR) {
memcpy(data, buf, len);
compr_len = len;
compr = COMPR_NONE;
}
pad_len = ALIGN(compr_len, 16);
memset(data + compr_len, 0, pad_len - compr_len);
return __logfs_write_header(super, header, compr_len, len, type, compr);
}
static s64 logfs_get_free_bytes(struct logfs_area *area, size_t *bytes,
int must_pad)
{
u32 writesize = logfs_super(area->a_sb)->s_writesize;
s32 ofs;
int ret;
ret = logfs_open_area(area, *bytes);
if (ret)
return -EAGAIN;
ofs = area->a_used_bytes;
area->a_used_bytes += *bytes;
if (must_pad) {
area->a_used_bytes = ALIGN(area->a_used_bytes, writesize);
*bytes = area->a_used_bytes - ofs;
}
return dev_ofs(area->a_sb, area->a_segno, ofs);
}
static int logfs_write_je_buf(struct super_block *sb, void *buf, u16 type,
size_t buf_len)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_journal_area;
struct logfs_journal_header *jh = super->s_compressed_je;
size_t len;
int must_pad = 0;
s64 ofs;
len = __logfs_write_je(sb, buf, type, buf_len);
if (jh->h_type == cpu_to_be16(JE_COMMIT))
must_pad = 1;
ofs = logfs_get_free_bytes(area, &len, must_pad);
if (ofs < 0)
return ofs;
logfs_buf_write(area, ofs, super->s_compressed_je, len);
BUG_ON(super->s_no_je >= MAX_JOURNAL_ENTRIES);
super->s_je_array[super->s_no_je++] = cpu_to_be64(ofs);
return 0;
}
static int logfs_write_je(struct super_block *sb,
void* (*write)(struct super_block *sb, void *scratch,
u16 *type, size_t *len))
{
void *buf;
size_t len;
u16 type;
buf = write(sb, logfs_super(sb)->s_je, &type, &len);
return logfs_write_je_buf(sb, buf, type, len);
}
int write_alias_journal(struct super_block *sb, u64 ino, u64 bix,
level_t level, int child_no, __be64 val)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_obj_alias *oa = super->s_je;
int err = 0, fill = super->s_je_fill;
log_aliases("logfs_write_obj_aliases #%x(%llx, %llx, %x, %x) %llx\n",
fill, ino, bix, level, child_no, be64_to_cpu(val));
oa[fill].ino = cpu_to_be64(ino);
oa[fill].bix = cpu_to_be64(bix);
oa[fill].val = val;
oa[fill].level = (__force u8)level;
oa[fill].child_no = cpu_to_be16(child_no);
fill++;
if (fill >= sb->s_blocksize / sizeof(*oa)) {
err = logfs_write_je_buf(sb, oa, JE_OBJ_ALIAS, sb->s_blocksize);
fill = 0;
}
super->s_je_fill = fill;
return err;
}
static int logfs_write_obj_aliases(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int err;
log_journal("logfs_write_obj_aliases: %d aliases to write\n",
super->s_no_object_aliases);
super->s_je_fill = 0;
err = logfs_write_obj_aliases_pagecache(sb);
if (err)
return err;
if (super->s_je_fill)
err = logfs_write_je_buf(sb, super->s_je, JE_OBJ_ALIAS,
super->s_je_fill
* sizeof(struct logfs_obj_alias));
return err;
}
/*
* Write all journal entries. The goto logic ensures that all journal entries
* are written whenever a new segment is used. It is ugly and potentially a
* bit wasteful, but robustness is more important. With this we can *always*
* erase all journal segments except the one containing the most recent commit.
*/
void logfs_write_anchor(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_journal_area;
int i, err;
if (!(super->s_flags & LOGFS_SB_FLAG_DIRTY))
return;
super->s_flags &= ~LOGFS_SB_FLAG_DIRTY;
BUG_ON(super->s_flags & LOGFS_SB_FLAG_SHUTDOWN);
mutex_lock(&super->s_journal_mutex);
/* Do this first or suffer corruption */
logfs_sync_segments(sb);
account_shadows(sb);
again:
super->s_no_je = 0;
for_each_area(i) {
if (!super->s_area[i]->a_is_open)
continue;
super->s_sum_index = i;
err = logfs_write_je(sb, logfs_write_area);
if (err)
goto again;
}
err = logfs_write_obj_aliases(sb);
if (err)
goto again;
err = logfs_write_je(sb, logfs_write_erasecount);
if (err)
goto again;
err = logfs_write_je(sb, __logfs_write_anchor);
if (err)
goto again;
err = logfs_write_je(sb, logfs_write_dynsb);
if (err)
goto again;
/*
* Order is imperative. First we sync all writes, including the
* non-committed journal writes. Then we write the final commit and
* sync the current journal segment.
* There is a theoretical bug here. Syncing the journal segment will
* write a number of journal entries and the final commit. All these
* are written in a single operation. If the device layer writes the
* data back-to-front, the commit will precede the other journal
* entries, leaving a race window.
* Two fixes are possible. Preferred is to fix the device layer to
* ensure writes happen front-to-back. Alternatively we can insert
* another logfs_sync_area() super->s_devops->sync() combo before
* writing the commit.
*/
/*
* On another subject, super->s_devops->sync is usually not necessary.
* Unless called from sys_sync or friends, a barrier would suffice.
*/
super->s_devops->sync(sb);
err = logfs_write_je(sb, logfs_write_commit);
if (err)
goto again;
log_journal("Write commit to %llx\n",
be64_to_cpu(super->s_je_array[super->s_no_je - 1]));
logfs_sync_area(area);
BUG_ON(area->a_used_bytes != area->a_written_bytes);
super->s_devops->sync(sb);
mutex_unlock(&super->s_journal_mutex);
return;
}
void do_logfs_journal_wl_pass(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_area *area = super->s_journal_area;
struct btree_head32 *head = &super->s_reserved_segments;
u32 segno, ec;
int i, err;
log_journal("Journal requires wear-leveling.\n");
/* Drop old segments */
journal_for_each(i)
if (super->s_journal_seg[i]) {
btree_remove32(head, super->s_journal_seg[i]);
logfs_set_segment_unreserved(sb,
super->s_journal_seg[i],
super->s_journal_ec[i]);
super->s_journal_seg[i] = 0;
super->s_journal_ec[i] = 0;
}
/* Get new segments */
for (i = 0; i < super->s_no_journal_segs; i++) {
segno = get_best_cand(sb, &super->s_reserve_list, &ec);
super->s_journal_seg[i] = segno;
super->s_journal_ec[i] = ec;
logfs_set_segment_reserved(sb, segno);
err = btree_insert32(head, segno, (void *)1, GFP_NOFS);
BUG_ON(err); /* mempool should prevent this */
err = logfs_erase_segment(sb, segno, 1);
BUG_ON(err); /* FIXME: remount-ro would be nicer */
}
/* Manually move journal_area */
freeseg(sb, area->a_segno);
area->a_segno = super->s_journal_seg[0];
area->a_is_open = 0;
area->a_used_bytes = 0;
/* Write journal */
logfs_write_anchor(sb);
/* Write superblocks */
err = logfs_write_sb(sb);
BUG_ON(err);
}
static const struct logfs_area_ops journal_area_ops = {
.get_free_segment = journal_get_free_segment,
.get_erase_count = journal_get_erase_count,
.erase_segment = journal_erase_segment,
};
int logfs_init_journal(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
size_t bufsize = max_t(size_t, sb->s_blocksize, super->s_writesize)
+ MAX_JOURNAL_HEADER;
int ret = -ENOMEM;
mutex_init(&super->s_journal_mutex);
btree_init_mempool32(&super->s_reserved_segments, super->s_btree_pool);
super->s_je = kzalloc(bufsize, GFP_KERNEL);
if (!super->s_je)
return ret;
super->s_compressed_je = kzalloc(bufsize, GFP_KERNEL);
if (!super->s_compressed_je)
return ret;
super->s_master_inode = logfs_new_meta_inode(sb, LOGFS_INO_MASTER);
if (IS_ERR(super->s_master_inode))
return PTR_ERR(super->s_master_inode);
ret = logfs_read_journal(sb);
if (ret)
return -EIO;
reserve_sb_and_journal(sb);
logfs_calc_free(sb);
super->s_journal_area->a_ops = &journal_area_ops;
return 0;
}
void logfs_cleanup_journal(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
btree_grim_visitor32(&super->s_reserved_segments, 0, NULL);
kfree(super->s_compressed_je);
kfree(super->s_je);
}

View File

@ -1,733 +0,0 @@
/*
* fs/logfs/logfs.h
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*
* Private header for logfs.
*/
#ifndef FS_LOGFS_LOGFS_H
#define FS_LOGFS_LOGFS_H
#include <linux/types.h>
#include <linux/btree.h>
#include <linux/crc32.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/mempool.h>
#include <linux/pagemap.h>
#include <linux/mtd/mtd.h>
#include "logfs_abi.h"
#define LOGFS_DEBUG_SUPER (0x0001)
#define LOGFS_DEBUG_SEGMENT (0x0002)
#define LOGFS_DEBUG_JOURNAL (0x0004)
#define LOGFS_DEBUG_DIR (0x0008)
#define LOGFS_DEBUG_FILE (0x0010)
#define LOGFS_DEBUG_INODE (0x0020)
#define LOGFS_DEBUG_READWRITE (0x0040)
#define LOGFS_DEBUG_GC (0x0080)
#define LOGFS_DEBUG_GC_NOISY (0x0100)
#define LOGFS_DEBUG_ALIASES (0x0200)
#define LOGFS_DEBUG_BLOCKMOVE (0x0400)
#define LOGFS_DEBUG_ALL (0xffffffff)
#define LOGFS_DEBUG (0x01)
/*
* To enable specific log messages, simply define LOGFS_DEBUG to match any
* or all of the above.
*/
#ifndef LOGFS_DEBUG
#define LOGFS_DEBUG (0)
#endif
#define log_cond(cond, fmt, arg...) do { \
if (cond) \
printk(KERN_DEBUG fmt, ##arg); \
} while (0)
#define log_super(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_SUPER, fmt, ##arg)
#define log_segment(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_SEGMENT, fmt, ##arg)
#define log_journal(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_JOURNAL, fmt, ##arg)
#define log_dir(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_DIR, fmt, ##arg)
#define log_file(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_FILE, fmt, ##arg)
#define log_inode(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_INODE, fmt, ##arg)
#define log_readwrite(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_READWRITE, fmt, ##arg)
#define log_gc(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_GC, fmt, ##arg)
#define log_gc_noisy(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_GC_NOISY, fmt, ##arg)
#define log_aliases(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_ALIASES, fmt, ##arg)
#define log_blockmove(fmt, arg...) \
log_cond(LOGFS_DEBUG & LOGFS_DEBUG_BLOCKMOVE, fmt, ##arg)
#define PG_pre_locked PG_owner_priv_1
#define PagePreLocked(page) test_bit(PG_pre_locked, &(page)->flags)
#define SetPagePreLocked(page) set_bit(PG_pre_locked, &(page)->flags)
#define ClearPagePreLocked(page) clear_bit(PG_pre_locked, &(page)->flags)
/* FIXME: This should really be somewhere in the 64bit area. */
#define LOGFS_LINK_MAX (1<<30)
/* Read-only filesystem */
#define LOGFS_SB_FLAG_RO 0x0001
#define LOGFS_SB_FLAG_DIRTY 0x0002
#define LOGFS_SB_FLAG_OBJ_ALIAS 0x0004
#define LOGFS_SB_FLAG_SHUTDOWN 0x0008
/* Write Control Flags */
#define WF_LOCK 0x01 /* take write lock */
#define WF_WRITE 0x02 /* write block */
#define WF_DELETE 0x04 /* delete old block */
typedef u8 __bitwise level_t;
typedef u8 __bitwise gc_level_t;
#define LEVEL(level) ((__force level_t)(level))
#define GC_LEVEL(gc_level) ((__force gc_level_t)(gc_level))
#define SUBLEVEL(level) ( (void)((level) == LEVEL(1)), \
(__force level_t)((__force u8)(level) - 1) )
/**
* struct logfs_area - area management information
*
* @a_sb: the superblock this area belongs to
* @a_is_open: 1 if the area is currently open, else 0
* @a_segno: segment number of area
* @a_written_bytes: number of bytes already written back
* @a_used_bytes: number of used bytes
* @a_ops: area operations (either journal or ostore)
* @a_erase_count: erase count
* @a_level: GC level
*/
struct logfs_area { /* a segment open for writing */
struct super_block *a_sb;
int a_is_open;
u32 a_segno;
u32 a_written_bytes;
u32 a_used_bytes;
const struct logfs_area_ops *a_ops;
u32 a_erase_count;
gc_level_t a_level;
};
/**
* struct logfs_area_ops - area operations
*
* @get_free_segment: fill area->ofs with the offset of a free segment
* @get_erase_count: fill area->erase_count (needs area->ofs)
* @erase_segment: erase and setup segment
*/
struct logfs_area_ops {
void (*get_free_segment)(struct logfs_area *area);
void (*get_erase_count)(struct logfs_area *area);
int (*erase_segment)(struct logfs_area *area);
};
struct logfs_super; /* forward */
/**
* struct logfs_device_ops - device access operations
*
* @readpage: read one page (mm page)
* @writeseg: write one segment. may be a partial segment
* @erase: erase one segment
* @read: read from the device
* @erase: erase part of the device
* @can_write_buf: decide whether wbuf can be written to ofs
*/
struct logfs_device_ops {
struct page *(*find_first_sb)(struct super_block *sb, u64 *ofs);
struct page *(*find_last_sb)(struct super_block *sb, u64 *ofs);
int (*write_sb)(struct super_block *sb, struct page *page);
int (*readpage)(void *_sb, struct page *page);
void (*writeseg)(struct super_block *sb, u64 ofs, size_t len);
int (*erase)(struct super_block *sb, loff_t ofs, size_t len,
int ensure_write);
int (*can_write_buf)(struct super_block *sb, u64 ofs);
void (*sync)(struct super_block *sb);
void (*put_device)(struct logfs_super *s);
};
/**
* struct candidate_list - list of similar candidates
*/
struct candidate_list {
struct rb_root rb_tree;
int count;
int maxcount;
int sort_by_ec;
};
/**
* struct gc_candidate - "candidate" segment to be garbage collected next
*
* @list: list (either free of low)
* @segno: segment number
* @valid: number of valid bytes
* @erase_count: erase count of segment
* @dist: distance from tree root
*
* Candidates can be on two lists. The free list contains electees rather
* than candidates - segments that no longer contain any valid data. The
* low list contains candidates to be picked for GC. It should be kept
* short. It is not required to always pick a perfect candidate. In the
* worst case GC will have to move more data than absolutely necessary.
*/
struct gc_candidate {
struct rb_node rb_node;
struct candidate_list *list;
u32 segno;
u32 valid;
u32 erase_count;
u8 dist;
};
/**
* struct logfs_journal_entry - temporary structure used during journal scan
*
* @used:
* @version: normalized version
* @len: length
* @offset: offset
*/
struct logfs_journal_entry {
int used;
s16 version;
u16 len;
u16 datalen;
u64 offset;
};
enum transaction_state {
CREATE_1 = 1,
CREATE_2,
UNLINK_1,
UNLINK_2,
CROSS_RENAME_1,
CROSS_RENAME_2,
TARGET_RENAME_1,
TARGET_RENAME_2,
TARGET_RENAME_3
};
/**
* struct logfs_transaction - essential fields to support atomic dirops
*
* @ino: target inode
* @dir: inode of directory containing dentry
* @pos: pos of dentry in directory
*/
struct logfs_transaction {
enum transaction_state state;
u64 ino;
u64 dir;
u64 pos;
};
/**
* struct logfs_shadow - old block in the shadow of a not-yet-committed new one
* @old_ofs: offset of old block on medium
* @new_ofs: offset of new block on medium
* @ino: inode number
* @bix: block index
* @old_len: size of old block, including header
* @new_len: size of new block, including header
* @level: block level
*/
struct logfs_shadow {
u64 old_ofs;
u64 new_ofs;
u64 ino;
u64 bix;
int old_len;
int new_len;
gc_level_t gc_level;
};
/**
* struct shadow_tree
* @new: shadows where old_ofs==0, indexed by new_ofs
* @old: shadows where old_ofs!=0, indexed by old_ofs
* @segment_map: bitfield of segments containing shadows
* @no_shadowed_segment: number of segments containing shadows
*/
struct shadow_tree {
struct btree_head64 new;
struct btree_head64 old;
struct btree_head32 segment_map;
int no_shadowed_segments;
};
struct object_alias_item {
struct list_head list;
__be64 val;
int child_no;
};
/**
* struct logfs_block - contains any block state
* @type: indirect block or inode
* @full: number of fully populated children
* @partial: number of partially populated children
*
* Most blocks are directly represented by page cache pages. But when a block
* becomes dirty, is part of a transaction, contains aliases or is otherwise
* special, a struct logfs_block is allocated to track the additional state.
* Inodes are very similar to indirect blocks, so they can also get one of
* these structures added when appropriate.
*/
#define BLOCK_INDIRECT 1 /* Indirect block */
#define BLOCK_INODE 2 /* Inode */
struct logfs_block_ops;
struct logfs_block {
struct list_head alias_list;
struct list_head item_list;
struct super_block *sb;
u64 ino;
u64 bix;
level_t level;
struct page *page;
struct inode *inode;
struct logfs_transaction *ta;
unsigned long alias_map[LOGFS_BLOCK_FACTOR / BITS_PER_LONG];
const struct logfs_block_ops *ops;
int full;
int partial;
int reserved_bytes;
};
typedef int write_alias_t(struct super_block *sb, u64 ino, u64 bix,
level_t level, int child_no, __be64 val);
struct logfs_block_ops {
void (*write_block)(struct logfs_block *block);
void (*free_block)(struct super_block *sb, struct logfs_block*block);
int (*write_alias)(struct super_block *sb,
struct logfs_block *block,
write_alias_t *write_one_alias);
};
#define MAX_JOURNAL_ENTRIES 256
struct logfs_super {
struct mtd_info *s_mtd; /* underlying device */
struct block_device *s_bdev; /* underlying device */
const struct logfs_device_ops *s_devops;/* device access */
struct inode *s_master_inode; /* inode file */
struct inode *s_segfile_inode; /* segment file */
struct inode *s_mapping_inode; /* device mapping */
atomic_t s_pending_writes; /* outstanting bios */
long s_flags;
mempool_t *s_btree_pool; /* for btree nodes */
mempool_t *s_alias_pool; /* aliases in segment.c */
u64 s_feature_incompat;
u64 s_feature_ro_compat;
u64 s_feature_compat;
u64 s_feature_flags;
u64 s_sb_ofs[2];
struct page *s_erase_page; /* for dev_bdev.c */
/* alias.c fields */
struct btree_head32 s_segment_alias; /* remapped segments */
int s_no_object_aliases;
struct list_head s_object_alias; /* remapped objects */
struct btree_head128 s_object_alias_tree; /* remapped objects */
struct mutex s_object_alias_mutex;
/* dir.c fields */
struct mutex s_dirop_mutex; /* for creat/unlink/rename */
u64 s_victim_ino; /* used for atomic dir-ops */
u64 s_rename_dir; /* source directory ino */
u64 s_rename_pos; /* position of source dd */
/* gc.c fields */
long s_segsize; /* size of a segment */
int s_segshift; /* log2 of segment size */
long s_segmask; /* 1 << s_segshift - 1 */
long s_no_segs; /* segments on device */
long s_no_journal_segs; /* segments used for journal */
long s_no_blocks; /* blocks per segment */
long s_writesize; /* minimum write size */
int s_writeshift; /* log2 of write size */
u64 s_size; /* filesystem size */
struct logfs_area *s_area[LOGFS_NO_AREAS]; /* open segment array */
u64 s_gec; /* global erase count */
u64 s_wl_gec_ostore; /* time of last wl event */
u64 s_wl_gec_journal; /* time of last wl event */
u64 s_sweeper; /* current sweeper pos */
u8 s_ifile_levels; /* max level of ifile */
u8 s_iblock_levels; /* max level of regular files */
u8 s_data_levels; /* # of segments to leaf block*/
u8 s_total_levels; /* sum of above three */
struct btree_head32 s_cand_tree; /* all candidates */
struct candidate_list s_free_list; /* 100% free segments */
struct candidate_list s_reserve_list; /* Bad segment reserve */
struct candidate_list s_low_list[LOGFS_NO_AREAS];/* good candidates */
struct candidate_list s_ec_list; /* wear level candidates */
struct btree_head32 s_reserved_segments;/* sb, journal, bad, etc. */
/* inode.c fields */
u64 s_last_ino; /* highest ino used */
long s_inos_till_wrap;
u32 s_generation; /* i_generation for new files */
struct list_head s_freeing_list; /* inodes being freed */
/* journal.c fields */
struct mutex s_journal_mutex;
void *s_je; /* journal entry to compress */
void *s_compressed_je; /* block to write to journal */
u32 s_journal_seg[LOGFS_JOURNAL_SEGS]; /* journal segments */
u32 s_journal_ec[LOGFS_JOURNAL_SEGS]; /* journal erasecounts */
u64 s_last_version;
struct logfs_area *s_journal_area; /* open journal segment */
__be64 s_je_array[MAX_JOURNAL_ENTRIES];
int s_no_je;
int s_sum_index; /* for the 12 summaries */
struct shadow_tree s_shadow_tree;
int s_je_fill; /* index of current je */
/* readwrite.c fields */
struct mutex s_write_mutex;
int s_lock_count;
mempool_t *s_block_pool; /* struct logfs_block pool */
mempool_t *s_shadow_pool; /* struct logfs_shadow pool */
struct list_head s_writeback_list; /* writeback pages */
/*
* Space accounting:
* - s_used_bytes specifies space used to store valid data objects.
* - s_dirty_used_bytes is space used to store non-committed data
* objects. Those objects have already been written themselves,
* but they don't become valid until all indirect blocks up to the
* journal have been written as well.
* - s_dirty_free_bytes is space used to store the old copy of a
* replaced object, as long as the replacement is non-committed.
* In other words, it is the amount of space freed when all dirty
* blocks are written back.
* - s_free_bytes is the amount of free space available for any
* purpose.
* - s_root_reserve is the amount of free space available only to
* the root user. Non-privileged users can no longer write once
* this watermark has been reached.
* - s_speed_reserve is space which remains unused to speed up
* garbage collection performance.
* - s_dirty_pages is the space reserved for currently dirty pages.
* It is a pessimistic estimate, so some/most will get freed on
* page writeback.
*
* s_used_bytes + s_free_bytes + s_speed_reserve = total usable size
*/
u64 s_free_bytes;
u64 s_used_bytes;
u64 s_dirty_free_bytes;
u64 s_dirty_used_bytes;
u64 s_root_reserve;
u64 s_speed_reserve;
u64 s_dirty_pages;
/* Bad block handling:
* - s_bad_seg_reserve is a number of segments usually kept
* free. When encountering bad blocks, the affected segment's data
* is _temporarily_ moved to a reserved segment.
* - s_bad_segments is the number of known bad segments.
*/
u32 s_bad_seg_reserve;
u32 s_bad_segments;
};
/**
* struct logfs_inode - in-memory inode
*
* @vfs_inode: struct inode
* @li_data: data pointers
* @li_used_bytes: number of used bytes
* @li_freeing_list: used to track inodes currently being freed
* @li_flags: inode flags
* @li_refcount: number of internal (GC-induced) references
*/
struct logfs_inode {
struct inode vfs_inode;
u64 li_data[LOGFS_EMBEDDED_FIELDS];
u64 li_used_bytes;
struct list_head li_freeing_list;
struct logfs_block *li_block;
u32 li_flags;
u8 li_height;
int li_refcount;
};
#define journal_for_each(__i) for (__i = 0; __i < LOGFS_JOURNAL_SEGS; __i++)
#define for_each_area(__i) for (__i = 0; __i < LOGFS_NO_AREAS; __i++)
#define for_each_area_down(__i) for (__i = LOGFS_NO_AREAS - 1; __i >= 0; __i--)
/* compr.c */
int logfs_compress(void *in, void *out, size_t inlen, size_t outlen);
int logfs_uncompress(void *in, void *out, size_t inlen, size_t outlen);
int __init logfs_compr_init(void);
void logfs_compr_exit(void);
/* dev_bdev.c */
#ifdef CONFIG_BLOCK
int logfs_get_sb_bdev(struct logfs_super *s,
struct file_system_type *type,
const char *devname);
#else
static inline int logfs_get_sb_bdev(struct logfs_super *s,
struct file_system_type *type,
const char *devname)
{
return -ENODEV;
}
#endif
/* dev_mtd.c */
#if IS_ENABLED(CONFIG_MTD)
int logfs_get_sb_mtd(struct logfs_super *s, int mtdnr);
#else
static inline int logfs_get_sb_mtd(struct logfs_super *s, int mtdnr)
{
return -ENODEV;
}
#endif
/* dir.c */
extern const struct inode_operations logfs_dir_iops;
extern const struct file_operations logfs_dir_fops;
int logfs_replay_journal(struct super_block *sb);
/* file.c */
extern const struct inode_operations logfs_reg_iops;
extern const struct file_operations logfs_reg_fops;
extern const struct address_space_operations logfs_reg_aops;
int logfs_readpage(struct file *file, struct page *page);
long logfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
int logfs_fsync(struct file *file, loff_t start, loff_t end, int datasync);
/* gc.c */
u32 get_best_cand(struct super_block *sb, struct candidate_list *list, u32 *ec);
void logfs_gc_pass(struct super_block *sb);
int logfs_check_areas(struct super_block *sb);
int logfs_init_gc(struct super_block *sb);
void logfs_cleanup_gc(struct super_block *sb);
/* inode.c */
extern const struct super_operations logfs_super_operations;
struct inode *logfs_iget(struct super_block *sb, ino_t ino);
struct inode *logfs_safe_iget(struct super_block *sb, ino_t ino, int *cookie);
void logfs_safe_iput(struct inode *inode, int cookie);
struct inode *logfs_new_inode(struct inode *dir, umode_t mode);
struct inode *logfs_new_meta_inode(struct super_block *sb, u64 ino);
struct inode *logfs_read_meta_inode(struct super_block *sb, u64 ino);
int logfs_init_inode_cache(void);
void logfs_destroy_inode_cache(void);
void logfs_set_blocks(struct inode *inode, u64 no);
/* these logically belong into inode.c but actually reside in readwrite.c */
int logfs_read_inode(struct inode *inode);
int __logfs_write_inode(struct inode *inode, struct page *, long flags);
void logfs_evict_inode(struct inode *inode);
/* journal.c */
void logfs_write_anchor(struct super_block *sb);
int logfs_init_journal(struct super_block *sb);
void logfs_cleanup_journal(struct super_block *sb);
int write_alias_journal(struct super_block *sb, u64 ino, u64 bix,
level_t level, int child_no, __be64 val);
void do_logfs_journal_wl_pass(struct super_block *sb);
/* readwrite.c */
pgoff_t logfs_pack_index(u64 bix, level_t level);
void logfs_unpack_index(pgoff_t index, u64 *bix, level_t *level);
int logfs_inode_write(struct inode *inode, const void *buf, size_t count,
loff_t bix, long flags, struct shadow_tree *shadow_tree);
int logfs_readpage_nolock(struct page *page);
int logfs_write_buf(struct inode *inode, struct page *page, long flags);
int logfs_delete(struct inode *inode, pgoff_t index,
struct shadow_tree *shadow_tree);
int logfs_rewrite_block(struct inode *inode, u64 bix, u64 ofs,
gc_level_t gc_level, long flags);
int logfs_is_valid_block(struct super_block *sb, u64 ofs, u64 ino, u64 bix,
gc_level_t gc_level);
int logfs_truncate(struct inode *inode, u64 size);
u64 logfs_seek_hole(struct inode *inode, u64 bix);
u64 logfs_seek_data(struct inode *inode, u64 bix);
int logfs_open_segfile(struct super_block *sb);
int logfs_init_rw(struct super_block *sb);
void logfs_cleanup_rw(struct super_block *sb);
void logfs_add_transaction(struct inode *inode, struct logfs_transaction *ta);
void logfs_del_transaction(struct inode *inode, struct logfs_transaction *ta);
void logfs_write_block(struct logfs_block *block, long flags);
int logfs_write_obj_aliases_pagecache(struct super_block *sb);
void logfs_get_segment_entry(struct super_block *sb, u32 segno,
struct logfs_segment_entry *se);
void logfs_set_segment_used(struct super_block *sb, u64 ofs, int increment);
void logfs_set_segment_erased(struct super_block *sb, u32 segno, u32 ec,
gc_level_t gc_level);
void logfs_set_segment_reserved(struct super_block *sb, u32 segno);
void logfs_set_segment_unreserved(struct super_block *sb, u32 segno, u32 ec);
struct logfs_block *__alloc_block(struct super_block *sb,
u64 ino, u64 bix, level_t level);
void __free_block(struct super_block *sb, struct logfs_block *block);
void btree_write_block(struct logfs_block *block);
void initialize_block_counters(struct page *page, struct logfs_block *block,
__be64 *array, int page_is_empty);
int logfs_exist_block(struct inode *inode, u64 bix);
int get_page_reserve(struct inode *inode, struct page *page);
void logfs_get_wblocks(struct super_block *sb, struct page *page, int lock);
void logfs_put_wblocks(struct super_block *sb, struct page *page, int lock);
extern const struct logfs_block_ops indirect_block_ops;
/* segment.c */
int logfs_erase_segment(struct super_block *sb, u32 ofs, int ensure_erase);
int wbuf_read(struct super_block *sb, u64 ofs, size_t len, void *buf);
int logfs_segment_read(struct inode *inode, struct page *page, u64 ofs, u64 bix,
level_t level);
int logfs_segment_write(struct inode *inode, struct page *page,
struct logfs_shadow *shadow);
int logfs_segment_delete(struct inode *inode, struct logfs_shadow *shadow);
int logfs_load_object_aliases(struct super_block *sb,
struct logfs_obj_alias *oa, int count);
void move_page_to_btree(struct page *page);
int logfs_init_mapping(struct super_block *sb);
void logfs_sync_area(struct logfs_area *area);
void logfs_sync_segments(struct super_block *sb);
void freeseg(struct super_block *sb, u32 segno);
void free_areas(struct super_block *sb);
/* area handling */
int logfs_init_areas(struct super_block *sb);
void logfs_cleanup_areas(struct super_block *sb);
int logfs_open_area(struct logfs_area *area, size_t bytes);
int __logfs_buf_write(struct logfs_area *area, u64 ofs, void *buf, size_t len,
int use_filler);
static inline int logfs_buf_write(struct logfs_area *area, u64 ofs,
void *buf, size_t len)
{
return __logfs_buf_write(area, ofs, buf, len, 0);
}
static inline int logfs_buf_recover(struct logfs_area *area, u64 ofs,
void *buf, size_t len)
{
return __logfs_buf_write(area, ofs, buf, len, 1);
}
/* super.c */
struct page *emergency_read_begin(struct address_space *mapping, pgoff_t index);
void emergency_read_end(struct page *page);
void logfs_crash_dump(struct super_block *sb);
int logfs_statfs(struct dentry *dentry, struct kstatfs *stats);
int logfs_check_ds(struct logfs_disk_super *ds);
int logfs_write_sb(struct super_block *sb);
static inline struct logfs_super *logfs_super(struct super_block *sb)
{
return sb->s_fs_info;
}
static inline struct logfs_inode *logfs_inode(struct inode *inode)
{
return container_of(inode, struct logfs_inode, vfs_inode);
}
static inline void logfs_set_ro(struct super_block *sb)
{
logfs_super(sb)->s_flags |= LOGFS_SB_FLAG_RO;
}
#define LOGFS_BUG(sb) do { \
struct super_block *__sb = sb; \
logfs_crash_dump(__sb); \
logfs_super(__sb)->s_flags |= LOGFS_SB_FLAG_RO; \
BUG(); \
} while (0)
#define LOGFS_BUG_ON(condition, sb) \
do { if (unlikely(condition)) LOGFS_BUG((sb)); } while (0)
static inline __be32 logfs_crc32(void *data, size_t len, size_t skip)
{
return cpu_to_be32(crc32(~0, data+skip, len-skip));
}
static inline u8 logfs_type(struct inode *inode)
{
return (inode->i_mode >> 12) & 15;
}
static inline pgoff_t logfs_index(struct super_block *sb, u64 pos)
{
return pos >> sb->s_blocksize_bits;
}
static inline u64 dev_ofs(struct super_block *sb, u32 segno, u32 ofs)
{
return ((u64)segno << logfs_super(sb)->s_segshift) + ofs;
}
static inline u32 seg_no(struct super_block *sb, u64 ofs)
{
return ofs >> logfs_super(sb)->s_segshift;
}
static inline u32 seg_ofs(struct super_block *sb, u64 ofs)
{
return ofs & logfs_super(sb)->s_segmask;
}
static inline u64 seg_align(struct super_block *sb, u64 ofs)
{
return ofs & ~logfs_super(sb)->s_segmask;
}
static inline struct logfs_block *logfs_block(struct page *page)
{
return (void *)page->private;
}
static inline level_t shrink_level(gc_level_t __level)
{
u8 level = (__force u8)__level;
if (level >= LOGFS_MAX_LEVELS)
level -= LOGFS_MAX_LEVELS;
return (__force level_t)level;
}
static inline gc_level_t expand_level(u64 ino, level_t __level)
{
u8 level = (__force u8)__level;
if (ino == LOGFS_INO_MASTER) {
/* ifile has separate areas */
level += LOGFS_MAX_LEVELS;
}
return (__force gc_level_t)level;
}
static inline int logfs_block_shift(struct super_block *sb, level_t level)
{
level = shrink_level((__force gc_level_t)level);
return (__force int)level * (sb->s_blocksize_bits - 3);
}
static inline u64 logfs_block_mask(struct super_block *sb, level_t level)
{
return ~0ull << logfs_block_shift(sb, level);
}
static inline struct logfs_area *get_area(struct super_block *sb,
gc_level_t gc_level)
{
return logfs_super(sb)->s_area[(__force u8)gc_level];
}
static inline void logfs_mempool_destroy(mempool_t *pool)
{
if (pool)
mempool_destroy(pool);
}
#endif

View File

@ -1,629 +0,0 @@
/*
* fs/logfs/logfs_abi.h
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*
* Public header for logfs.
*/
#ifndef FS_LOGFS_LOGFS_ABI_H
#define FS_LOGFS_LOGFS_ABI_H
/* For out-of-kernel compiles */
#ifndef BUILD_BUG_ON
#define BUILD_BUG_ON(condition) /**/
#endif
#define SIZE_CHECK(type, size) \
static inline void check_##type(void) \
{ \
BUILD_BUG_ON(sizeof(struct type) != (size)); \
}
/*
* Throughout the logfs code, we're constantly dealing with blocks at
* various positions or offsets. To remove confusion, we stricly
* distinguish between a "position" - the logical position within a
* file and an "offset" - the physical location within the device.
*
* Any usage of the term offset for a logical location or position for
* a physical one is a bug and should get fixed.
*/
/*
* Block are allocated in one of several segments depending on their
* level. The following levels are used:
* 0 - regular data block
* 1 - i1 indirect blocks
* 2 - i2 indirect blocks
* 3 - i3 indirect blocks
* 4 - i4 indirect blocks
* 5 - i5 indirect blocks
* 6 - ifile data blocks
* 7 - ifile i1 indirect blocks
* 8 - ifile i2 indirect blocks
* 9 - ifile i3 indirect blocks
* 10 - ifile i4 indirect blocks
* 11 - ifile i5 indirect blocks
* Potential levels to be used in the future:
* 12 - gc recycled blocks, long-lived data
* 13 - replacement blocks, short-lived data
*
* Levels 1-11 are necessary for robust gc operations and help separate
* short-lived metadata from longer-lived file data. In the future,
* file data should get separated into several segments based on simple
* heuristics. Old data recycled during gc operation is expected to be
* long-lived. New data is of uncertain life expectancy. New data
* used to replace older blocks in existing files is expected to be
* short-lived.
*/
/* Magic numbers. 64bit for superblock, 32bit for statfs f_type */
#define LOGFS_MAGIC 0x7a3a8e5cb9d5bf67ull
#define LOGFS_MAGIC_U32 0xc97e8168u
/*
* Various blocksize related macros. Blocksize is currently fixed at 4KiB.
* Sooner or later that should become configurable and the macros replaced
* by something superblock-dependent. Pointers in indirect blocks are and
* will remain 64bit.
*
* LOGFS_BLOCKSIZE - self-explaining
* LOGFS_BLOCK_FACTOR - number of pointers per indirect block
* LOGFS_BLOCK_BITS - log2 of LOGFS_BLOCK_FACTOR, used for shifts
*/
#define LOGFS_BLOCKSIZE (4096ull)
#define LOGFS_BLOCK_FACTOR (LOGFS_BLOCKSIZE / sizeof(u64))
#define LOGFS_BLOCK_BITS (9)
/*
* Number of blocks at various levels of indirection. There are 16 direct
* block pointers plus a single indirect pointer.
*/
#define I0_BLOCKS (16)
#define I1_BLOCKS LOGFS_BLOCK_FACTOR
#define I2_BLOCKS (LOGFS_BLOCK_FACTOR * I1_BLOCKS)
#define I3_BLOCKS (LOGFS_BLOCK_FACTOR * I2_BLOCKS)
#define I4_BLOCKS (LOGFS_BLOCK_FACTOR * I3_BLOCKS)
#define I5_BLOCKS (LOGFS_BLOCK_FACTOR * I4_BLOCKS)
#define INDIRECT_INDEX I0_BLOCKS
#define LOGFS_EMBEDDED_FIELDS (I0_BLOCKS + 1)
/*
* Sizes at which files require another level of indirection. Files smaller
* than LOGFS_EMBEDDED_SIZE can be completely stored in the inode itself,
* similar like ext2 fast symlinks.
*
* Data at a position smaller than LOGFS_I0_SIZE is accessed through the
* direct pointers, else through the 1x indirect pointer and so forth.
*/
#define LOGFS_EMBEDDED_SIZE (LOGFS_EMBEDDED_FIELDS * sizeof(u64))
#define LOGFS_I0_SIZE (I0_BLOCKS * LOGFS_BLOCKSIZE)
#define LOGFS_I1_SIZE (I1_BLOCKS * LOGFS_BLOCKSIZE)
#define LOGFS_I2_SIZE (I2_BLOCKS * LOGFS_BLOCKSIZE)
#define LOGFS_I3_SIZE (I3_BLOCKS * LOGFS_BLOCKSIZE)
#define LOGFS_I4_SIZE (I4_BLOCKS * LOGFS_BLOCKSIZE)
#define LOGFS_I5_SIZE (I5_BLOCKS * LOGFS_BLOCKSIZE)
/*
* Each indirect block pointer must have this flag set, if all block pointers
* behind it are set, i.e. there is no hole hidden in the shadow of this
* indirect block pointer.
*/
#define LOGFS_FULLY_POPULATED (1ULL << 63)
#define pure_ofs(ofs) (ofs & ~LOGFS_FULLY_POPULATED)
/*
* LogFS needs to separate data into levels. Each level is defined as the
* maximal possible distance from the master inode (inode of the inode file).
* Data blocks reside on level 0, 1x indirect block on level 1, etc.
* Inodes reside on level 6, indirect blocks for the inode file on levels 7-11.
* This effort is necessary to guarantee garbage collection to always make
* progress.
*
* LOGFS_MAX_INDIRECT is the maximal indirection through indirect blocks,
* LOGFS_MAX_LEVELS is one more for the actual data level of a file. It is
* the maximal number of levels for one file.
* LOGFS_NO_AREAS is twice that, as the inode file and regular files are
* effectively stacked on top of each other.
*/
#define LOGFS_MAX_INDIRECT (5)
#define LOGFS_MAX_LEVELS (LOGFS_MAX_INDIRECT + 1)
#define LOGFS_NO_AREAS (2 * LOGFS_MAX_LEVELS)
/* Maximum size of filenames */
#define LOGFS_MAX_NAMELEN (255)
/* Number of segments in the primary journal. */
#define LOGFS_JOURNAL_SEGS (16)
/* Maximum number of free/erased/etc. segments in journal entries */
#define MAX_CACHED_SEGS (64)
/*
* LOGFS_OBJECT_HEADERSIZE is the size of a single header in the object store,
* LOGFS_MAX_OBJECTSIZE the size of the largest possible object, including
* its header,
* LOGFS_SEGMENT_RESERVE is the amount of space reserved for each segment for
* its segment header and the padded space at the end when no further objects
* fit.
*/
#define LOGFS_OBJECT_HEADERSIZE (0x1c)
#define LOGFS_SEGMENT_HEADERSIZE (0x18)
#define LOGFS_MAX_OBJECTSIZE (LOGFS_OBJECT_HEADERSIZE + LOGFS_BLOCKSIZE)
#define LOGFS_SEGMENT_RESERVE \
(LOGFS_SEGMENT_HEADERSIZE + LOGFS_MAX_OBJECTSIZE - 1)
/*
* Segment types:
* SEG_SUPER - Data or indirect block
* SEG_JOURNAL - Inode
* SEG_OSTORE - Dentry
*/
enum {
SEG_SUPER = 0x01,
SEG_JOURNAL = 0x02,
SEG_OSTORE = 0x03,
};
/**
* struct logfs_segment_header - per-segment header in the ostore
*
* @crc: crc32 of header (there is no data)
* @pad: unused, must be 0
* @type: segment type, see above
* @level: GC level for all objects in this segment
* @segno: segment number
* @ec: erase count for this segment
* @gec: global erase count at time of writing
*/
struct logfs_segment_header {
__be32 crc;
__be16 pad;
__u8 type;
__u8 level;
__be32 segno;
__be32 ec;
__be64 gec;
};
SIZE_CHECK(logfs_segment_header, LOGFS_SEGMENT_HEADERSIZE);
#define LOGFS_FEATURES_INCOMPAT (0ull)
#define LOGFS_FEATURES_RO_COMPAT (0ull)
#define LOGFS_FEATURES_COMPAT (0ull)
/**
* struct logfs_disk_super - on-medium superblock
*
* @ds_magic: magic number, must equal LOGFS_MAGIC
* @ds_crc: crc32 of structure starting with the next field
* @ds_ifile_levels: maximum number of levels for ifile
* @ds_iblock_levels: maximum number of levels for regular files
* @ds_data_levels: number of separate levels for data
* @pad0: reserved, must be 0
* @ds_feature_incompat: incompatible filesystem features
* @ds_feature_ro_compat: read-only compatible filesystem features
* @ds_feature_compat: compatible filesystem features
* @ds_flags: flags
* @ds_segment_shift: log2 of segment size
* @ds_block_shift: log2 of block size
* @ds_write_shift: log2 of write size
* @pad1: reserved, must be 0
* @ds_journal_seg: segments used by primary journal
* @ds_root_reserve: bytes reserved for the superuser
* @ds_speed_reserve: bytes reserved to speed up GC
* @ds_bad_seg_reserve: number of segments reserved to handle bad blocks
* @pad2: reserved, must be 0
* @pad3: reserved, must be 0
*
* Contains only read-only fields. Read-write fields like the amount of used
* space is tracked in the dynamic superblock, which is stored in the journal.
*/
struct logfs_disk_super {
struct logfs_segment_header ds_sh;
__be64 ds_magic;
__be32 ds_crc;
__u8 ds_ifile_levels;
__u8 ds_iblock_levels;
__u8 ds_data_levels;
__u8 ds_segment_shift;
__u8 ds_block_shift;
__u8 ds_write_shift;
__u8 pad0[6];
__be64 ds_filesystem_size;
__be32 ds_segment_size;
__be32 ds_bad_seg_reserve;
__be64 ds_feature_incompat;
__be64 ds_feature_ro_compat;
__be64 ds_feature_compat;
__be64 ds_feature_flags;
__be64 ds_root_reserve;
__be64 ds_speed_reserve;
__be32 ds_journal_seg[LOGFS_JOURNAL_SEGS];
__be64 ds_super_ofs[2];
__be64 pad3[8];
};
SIZE_CHECK(logfs_disk_super, 256);
/*
* Object types:
* OBJ_BLOCK - Data or indirect block
* OBJ_INODE - Inode
* OBJ_DENTRY - Dentry
*/
enum {
OBJ_BLOCK = 0x04,
OBJ_INODE = 0x05,
OBJ_DENTRY = 0x06,
};
/**
* struct logfs_object_header - per-object header in the ostore
*
* @crc: crc32 of header, excluding data_crc
* @len: length of data
* @type: object type, see above
* @compr: compression type
* @ino: inode number
* @bix: block index
* @data_crc: crc32 of payload
*/
struct logfs_object_header {
__be32 crc;
__be16 len;
__u8 type;
__u8 compr;
__be64 ino;
__be64 bix;
__be32 data_crc;
} __attribute__((packed));
SIZE_CHECK(logfs_object_header, LOGFS_OBJECT_HEADERSIZE);
/*
* Reserved inode numbers:
* LOGFS_INO_MASTER - master inode (for inode file)
* LOGFS_INO_ROOT - root directory
* LOGFS_INO_SEGFILE - per-segment used bytes and erase count
*/
enum {
LOGFS_INO_MAPPING = 0x00,
LOGFS_INO_MASTER = 0x01,
LOGFS_INO_ROOT = 0x02,
LOGFS_INO_SEGFILE = 0x03,
LOGFS_RESERVED_INOS = 0x10,
};
/*
* Inode flags. High bits should never be written to the medium. They are
* reserved for in-memory usage.
* Low bits should either remain in sync with the corresponding FS_*_FL or
* reuse slots that obviously don't make sense for logfs.
*
* LOGFS_IF_DIRTY Inode must be written back
* LOGFS_IF_ZOMBIE Inode has been deleted
* LOGFS_IF_STILLBORN -ENOSPC happened when creating inode
*/
#define LOGFS_IF_COMPRESSED 0x00000004 /* == FS_COMPR_FL */
#define LOGFS_IF_DIRTY 0x20000000
#define LOGFS_IF_ZOMBIE 0x40000000
#define LOGFS_IF_STILLBORN 0x80000000
/* Flags available to chattr */
#define LOGFS_FL_USER_VISIBLE (LOGFS_IF_COMPRESSED)
#define LOGFS_FL_USER_MODIFIABLE (LOGFS_IF_COMPRESSED)
/* Flags inherited from parent directory on file/directory creation */
#define LOGFS_FL_INHERITED (LOGFS_IF_COMPRESSED)
/**
* struct logfs_disk_inode - on-medium inode
*
* @di_mode: file mode
* @di_pad: reserved, must be 0
* @di_flags: inode flags, see above
* @di_uid: user id
* @di_gid: group id
* @di_ctime: change time
* @di_mtime: modify time
* @di_refcount: reference count (aka nlink or link count)
* @di_generation: inode generation, for nfs
* @di_used_bytes: number of bytes used
* @di_size: file size
* @di_data: data pointers
*/
struct logfs_disk_inode {
__be16 di_mode;
__u8 di_height;
__u8 di_pad;
__be32 di_flags;
__be32 di_uid;
__be32 di_gid;
__be64 di_ctime;
__be64 di_mtime;
__be64 di_atime;
__be32 di_refcount;
__be32 di_generation;
__be64 di_used_bytes;
__be64 di_size;
__be64 di_data[LOGFS_EMBEDDED_FIELDS];
};
SIZE_CHECK(logfs_disk_inode, 200);
#define INODE_POINTER_OFS \
(offsetof(struct logfs_disk_inode, di_data) / sizeof(__be64))
#define INODE_USED_OFS \
(offsetof(struct logfs_disk_inode, di_used_bytes) / sizeof(__be64))
#define INODE_SIZE_OFS \
(offsetof(struct logfs_disk_inode, di_size) / sizeof(__be64))
#define INODE_HEIGHT_OFS (0)
/**
* struct logfs_disk_dentry - on-medium dentry structure
*
* @ino: inode number
* @namelen: length of file name
* @type: file type, identical to bits 12..15 of mode
* @name: file name
*/
/* FIXME: add 6 bytes of padding to remove the __packed */
struct logfs_disk_dentry {
__be64 ino;
__be16 namelen;
__u8 type;
__u8 name[LOGFS_MAX_NAMELEN];
} __attribute__((packed));
SIZE_CHECK(logfs_disk_dentry, 266);
#define RESERVED 0xffffffff
#define BADSEG 0xffffffff
/**
* struct logfs_segment_entry - segment file entry
*
* @ec_level: erase count and level
* @valid: number of valid bytes
*
* Segment file contains one entry for every segment. ec_level contains the
* erasecount in the upper 28 bits and the level in the lower 4 bits. An
* ec_level of BADSEG (-1) identifies bad segments. valid contains the number
* of valid bytes or RESERVED (-1 again) if the segment is used for either the
* superblock or the journal, or when the segment is bad.
*/
struct logfs_segment_entry {
__be32 ec_level;
__be32 valid;
};
SIZE_CHECK(logfs_segment_entry, 8);
/**
* struct logfs_journal_header - header for journal entries (JEs)
*
* @h_crc: crc32 of journal entry
* @h_len: length of compressed journal entry,
* not including header
* @h_datalen: length of uncompressed data
* @h_type: JE type
* @h_compr: compression type
* @h_pad: reserved
*/
struct logfs_journal_header {
__be32 h_crc;
__be16 h_len;
__be16 h_datalen;
__be16 h_type;
__u8 h_compr;
__u8 h_pad[5];
};
SIZE_CHECK(logfs_journal_header, 16);
/*
* Life expectency of data.
* VIM_DEFAULT - default vim
* VIM_SEGFILE - for segment file only - very short-living
* VIM_GC - GC'd data - likely long-living
*/
enum logfs_vim {
VIM_DEFAULT = 0,
VIM_SEGFILE = 1,
};
/**
* struct logfs_je_area - wbuf header
*
* @segno: segment number of area
* @used_bytes: number of bytes already used
* @gc_level: GC level
* @vim: life expectancy of data
*
* "Areas" are segments currently being used for writing. There is at least
* one area per GC level. Several may be used to separate long-living from
* short-living data. If an area with unknown vim is encountered, it can
* simply be closed.
* The write buffer immediately follow this header.
*/
struct logfs_je_area {
__be32 segno;
__be32 used_bytes;
__u8 gc_level;
__u8 vim;
} __attribute__((packed));
SIZE_CHECK(logfs_je_area, 10);
#define MAX_JOURNAL_HEADER \
(sizeof(struct logfs_journal_header) + sizeof(struct logfs_je_area))
/**
* struct logfs_je_dynsb - dynamic superblock
*
* @ds_gec: global erase count
* @ds_sweeper: current position of GC "sweeper"
* @ds_rename_dir: source directory ino (see dir.c documentation)
* @ds_rename_pos: position of source dd (see dir.c documentation)
* @ds_victim_ino: victims of incomplete dir operation (see dir.c)
* @ds_victim_ino: parent inode of victim (see dir.c)
* @ds_used_bytes: number of used bytes
*/
struct logfs_je_dynsb {
__be64 ds_gec;
__be64 ds_sweeper;
__be64 ds_rename_dir;
__be64 ds_rename_pos;
__be64 ds_victim_ino;
__be64 ds_victim_parent; /* XXX */
__be64 ds_used_bytes;
__be32 ds_generation;
__be32 pad;
};
SIZE_CHECK(logfs_je_dynsb, 64);
/**
* struct logfs_je_anchor - anchor of filesystem tree, aka master inode
*
* @da_size: size of inode file
* @da_last_ino: last created inode
* @da_used_bytes: number of bytes used
* @da_data: data pointers
*/
struct logfs_je_anchor {
__be64 da_size;
__be64 da_last_ino;
__be64 da_used_bytes;
u8 da_height;
u8 pad[7];
__be64 da_data[LOGFS_EMBEDDED_FIELDS];
};
SIZE_CHECK(logfs_je_anchor, 168);
/**
* struct logfs_je_spillout - spillout entry (from 1st to 2nd journal)
*
* @so_segment: segments used for 2nd journal
*
* Length of the array is given by h_len field in the header.
*/
struct logfs_je_spillout {
__be64 so_segment[0];
};
SIZE_CHECK(logfs_je_spillout, 0);
/**
* struct logfs_je_journal_ec - erase counts for all journal segments
*
* @ec: erase count
*
* Length of the array is given by h_len field in the header.
*/
struct logfs_je_journal_ec {
__be32 ec[0];
};
SIZE_CHECK(logfs_je_journal_ec, 0);
/**
* struct logfs_je_free_segments - list of free segmetns with erase count
*/
struct logfs_je_free_segments {
__be32 segno;
__be32 ec;
};
SIZE_CHECK(logfs_je_free_segments, 8);
/**
* struct logfs_seg_alias - list of segment aliases
*/
struct logfs_seg_alias {
__be32 old_segno;
__be32 new_segno;
};
SIZE_CHECK(logfs_seg_alias, 8);
/**
* struct logfs_obj_alias - list of object aliases
*/
struct logfs_obj_alias {
__be64 ino;
__be64 bix;
__be64 val;
u8 level;
u8 pad[5];
__be16 child_no;
};
SIZE_CHECK(logfs_obj_alias, 32);
/**
* Compression types.
*
* COMPR_NONE - uncompressed
* COMPR_ZLIB - compressed with zlib
*/
enum {
COMPR_NONE = 0,
COMPR_ZLIB = 1,
};
/*
* Journal entries come in groups of 16. First group contains unique
* entries, next groups contain one entry per level
*
* JE_FIRST - smallest possible journal entry number
*
* JEG_BASE - base group, containing unique entries
* JE_COMMIT - commit entry, validates all previous entries
* JE_DYNSB - dynamic superblock, anything that ought to be in the
* superblock but cannot because it is read-write data
* JE_ANCHOR - anchor aka master inode aka inode file's inode
* JE_ERASECOUNT erasecounts for all journal segments
* JE_SPILLOUT - unused
* JE_SEG_ALIAS - aliases segments
* JE_AREA - area description
*
* JE_LAST - largest possible journal entry number
*/
enum {
JE_FIRST = 0x01,
JEG_BASE = 0x00,
JE_COMMIT = 0x02,
JE_DYNSB = 0x03,
JE_ANCHOR = 0x04,
JE_ERASECOUNT = 0x05,
JE_SPILLOUT = 0x06,
JE_OBJ_ALIAS = 0x0d,
JE_AREA = 0x0e,
JE_LAST = 0x0e,
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,961 +0,0 @@
/*
* fs/logfs/segment.c - Handling the Object Store
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*
* Object store or ostore makes up the complete device with exception of
* the superblock and journal areas. Apart from its own metadata it stores
* three kinds of objects: inodes, dentries and blocks, both data and indirect.
*/
#include "logfs.h"
#include <linux/slab.h>
static int logfs_mark_segment_bad(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct btree_head32 *head = &super->s_reserved_segments;
int err;
err = btree_insert32(head, segno, (void *)1, GFP_NOFS);
if (err)
return err;
logfs_super(sb)->s_bad_segments++;
/* FIXME: write to journal */
return 0;
}
int logfs_erase_segment(struct super_block *sb, u32 segno, int ensure_erase)
{
struct logfs_super *super = logfs_super(sb);
super->s_gec++;
return super->s_devops->erase(sb, (u64)segno << super->s_segshift,
super->s_segsize, ensure_erase);
}
static s64 logfs_get_free_bytes(struct logfs_area *area, size_t bytes)
{
s32 ofs;
logfs_open_area(area, bytes);
ofs = area->a_used_bytes;
area->a_used_bytes += bytes;
BUG_ON(area->a_used_bytes >= logfs_super(area->a_sb)->s_segsize);
return dev_ofs(area->a_sb, area->a_segno, ofs);
}
static struct page *get_mapping_page(struct super_block *sb, pgoff_t index,
int use_filler)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
filler_t *filler = super->s_devops->readpage;
struct page *page;
BUG_ON(mapping_gfp_constraint(mapping, __GFP_FS));
if (use_filler)
page = read_cache_page(mapping, index, filler, sb);
else {
page = find_or_create_page(mapping, index, GFP_NOFS);
if (page)
unlock_page(page);
}
return page;
}
int __logfs_buf_write(struct logfs_area *area, u64 ofs, void *buf, size_t len,
int use_filler)
{
pgoff_t index = ofs >> PAGE_SHIFT;
struct page *page;
long offset = ofs & (PAGE_SIZE-1);
long copylen;
/* Only logfs_wbuf_recover may use len==0 */
BUG_ON(!len && !use_filler);
do {
copylen = min((ulong)len, PAGE_SIZE - offset);
page = get_mapping_page(area->a_sb, index, use_filler);
if (IS_ERR(page))
return PTR_ERR(page);
BUG_ON(!page); /* FIXME: reserve a pool */
SetPageUptodate(page);
memcpy(page_address(page) + offset, buf, copylen);
if (!PagePrivate(page)) {
SetPagePrivate(page);
get_page(page);
}
put_page(page);
buf += copylen;
len -= copylen;
offset = 0;
index++;
} while (len);
return 0;
}
static void pad_partial_page(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
struct page *page;
u64 ofs = dev_ofs(sb, area->a_segno, area->a_used_bytes);
pgoff_t index = ofs >> PAGE_SHIFT;
long offset = ofs & (PAGE_SIZE-1);
u32 len = PAGE_SIZE - offset;
if (len % PAGE_SIZE) {
page = get_mapping_page(sb, index, 0);
BUG_ON(!page); /* FIXME: reserve a pool */
memset(page_address(page) + offset, 0xff, len);
if (!PagePrivate(page)) {
SetPagePrivate(page);
get_page(page);
}
put_page(page);
}
}
static void pad_full_pages(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
struct logfs_super *super = logfs_super(sb);
u64 ofs = dev_ofs(sb, area->a_segno, area->a_used_bytes);
u32 len = super->s_segsize - area->a_used_bytes;
pgoff_t index = PAGE_ALIGN(ofs) >> PAGE_SHIFT;
pgoff_t no_indizes = len >> PAGE_SHIFT;
struct page *page;
while (no_indizes) {
page = get_mapping_page(sb, index, 0);
BUG_ON(!page); /* FIXME: reserve a pool */
SetPageUptodate(page);
memset(page_address(page), 0xff, PAGE_SIZE);
if (!PagePrivate(page)) {
SetPagePrivate(page);
get_page(page);
}
put_page(page);
index++;
no_indizes--;
}
}
/*
* bdev_writeseg will write full pages. Memset the tail to prevent data leaks.
* Also make sure we allocate (and memset) all pages for final writeout.
*/
static void pad_wbuf(struct logfs_area *area, int final)
{
pad_partial_page(area);
if (final)
pad_full_pages(area);
}
/*
* We have to be careful with the alias tree. Since lookup is done by bix,
* it needs to be normalized, so 14, 15, 16, etc. all match when dealing with
* indirect blocks. So always use it through accessor functions.
*/
static void *alias_tree_lookup(struct super_block *sb, u64 ino, u64 bix,
level_t level)
{
struct btree_head128 *head = &logfs_super(sb)->s_object_alias_tree;
pgoff_t index = logfs_pack_index(bix, level);
return btree_lookup128(head, ino, index);
}
static int alias_tree_insert(struct super_block *sb, u64 ino, u64 bix,
level_t level, void *val)
{
struct btree_head128 *head = &logfs_super(sb)->s_object_alias_tree;
pgoff_t index = logfs_pack_index(bix, level);
return btree_insert128(head, ino, index, val, GFP_NOFS);
}
static int btree_write_alias(struct super_block *sb, struct logfs_block *block,
write_alias_t *write_one_alias)
{
struct object_alias_item *item;
int err;
list_for_each_entry(item, &block->item_list, list) {
err = write_alias_journal(sb, block->ino, block->bix,
block->level, item->child_no, item->val);
if (err)
return err;
}
return 0;
}
static const struct logfs_block_ops btree_block_ops = {
.write_block = btree_write_block,
.free_block = __free_block,
.write_alias = btree_write_alias,
};
int logfs_load_object_aliases(struct super_block *sb,
struct logfs_obj_alias *oa, int count)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_block *block;
struct object_alias_item *item;
u64 ino, bix;
level_t level;
int i, err;
super->s_flags |= LOGFS_SB_FLAG_OBJ_ALIAS;
count /= sizeof(*oa);
for (i = 0; i < count; i++) {
item = mempool_alloc(super->s_alias_pool, GFP_NOFS);
if (!item)
return -ENOMEM;
memset(item, 0, sizeof(*item));
super->s_no_object_aliases++;
item->val = oa[i].val;
item->child_no = be16_to_cpu(oa[i].child_no);
ino = be64_to_cpu(oa[i].ino);
bix = be64_to_cpu(oa[i].bix);
level = LEVEL(oa[i].level);
log_aliases("logfs_load_object_aliases(%llx, %llx, %x, %x) %llx\n",
ino, bix, level, item->child_no,
be64_to_cpu(item->val));
block = alias_tree_lookup(sb, ino, bix, level);
if (!block) {
block = __alloc_block(sb, ino, bix, level);
block->ops = &btree_block_ops;
err = alias_tree_insert(sb, ino, bix, level, block);
BUG_ON(err); /* mempool empty */
}
if (test_and_set_bit(item->child_no, block->alias_map)) {
printk(KERN_ERR"LogFS: Alias collision detected\n");
return -EIO;
}
list_move_tail(&block->alias_list, &super->s_object_alias);
list_add(&item->list, &block->item_list);
}
return 0;
}
static void kill_alias(void *_block, unsigned long ignore0,
u64 ignore1, u64 ignore2, size_t ignore3)
{
struct logfs_block *block = _block;
struct super_block *sb = block->sb;
struct logfs_super *super = logfs_super(sb);
struct object_alias_item *item;
while (!list_empty(&block->item_list)) {
item = list_entry(block->item_list.next, typeof(*item), list);
list_del(&item->list);
mempool_free(item, super->s_alias_pool);
}
block->ops->free_block(sb, block);
}
static int obj_type(struct inode *inode, level_t level)
{
if (level == 0) {
if (S_ISDIR(inode->i_mode))
return OBJ_DENTRY;
if (inode->i_ino == LOGFS_INO_MASTER)
return OBJ_INODE;
}
return OBJ_BLOCK;
}
static int obj_len(struct super_block *sb, int obj_type)
{
switch (obj_type) {
case OBJ_DENTRY:
return sizeof(struct logfs_disk_dentry);
case OBJ_INODE:
return sizeof(struct logfs_disk_inode);
case OBJ_BLOCK:
return sb->s_blocksize;
default:
BUG();
}
}
static int __logfs_segment_write(struct inode *inode, void *buf,
struct logfs_shadow *shadow, int type, int len, int compr)
{
struct logfs_area *area;
struct super_block *sb = inode->i_sb;
s64 ofs;
struct logfs_object_header h;
int acc_len;
if (shadow->gc_level == 0)
acc_len = len;
else
acc_len = obj_len(sb, type);
area = get_area(sb, shadow->gc_level);
ofs = logfs_get_free_bytes(area, len + LOGFS_OBJECT_HEADERSIZE);
LOGFS_BUG_ON(ofs <= 0, sb);
/*
* Order is important. logfs_get_free_bytes(), by modifying the
* segment file, may modify the content of the very page we're about
* to write now. Which is fine, as long as the calculated crc and
* written data still match. So do the modifications _before_
* calculating the crc.
*/
h.len = cpu_to_be16(len);
h.type = type;
h.compr = compr;
h.ino = cpu_to_be64(inode->i_ino);
h.bix = cpu_to_be64(shadow->bix);
h.crc = logfs_crc32(&h, sizeof(h) - 4, 4);
h.data_crc = logfs_crc32(buf, len, 0);
logfs_buf_write(area, ofs, &h, sizeof(h));
logfs_buf_write(area, ofs + LOGFS_OBJECT_HEADERSIZE, buf, len);
shadow->new_ofs = ofs;
shadow->new_len = acc_len + LOGFS_OBJECT_HEADERSIZE;
return 0;
}
static s64 logfs_segment_write_compress(struct inode *inode, void *buf,
struct logfs_shadow *shadow, int type, int len)
{
struct super_block *sb = inode->i_sb;
void *compressor_buf = logfs_super(sb)->s_compressed_je;
ssize_t compr_len;
int ret;
mutex_lock(&logfs_super(sb)->s_journal_mutex);
compr_len = logfs_compress(buf, compressor_buf, len, len);
if (compr_len >= 0) {
ret = __logfs_segment_write(inode, compressor_buf, shadow,
type, compr_len, COMPR_ZLIB);
} else {
ret = __logfs_segment_write(inode, buf, shadow, type, len,
COMPR_NONE);
}
mutex_unlock(&logfs_super(sb)->s_journal_mutex);
return ret;
}
/**
* logfs_segment_write - write data block to object store
* @inode: inode containing data
*
* Returns an errno or zero.
*/
int logfs_segment_write(struct inode *inode, struct page *page,
struct logfs_shadow *shadow)
{
struct super_block *sb = inode->i_sb;
struct logfs_super *super = logfs_super(sb);
int do_compress, type, len;
int ret;
void *buf;
super->s_flags |= LOGFS_SB_FLAG_DIRTY;
BUG_ON(super->s_flags & LOGFS_SB_FLAG_SHUTDOWN);
do_compress = logfs_inode(inode)->li_flags & LOGFS_IF_COMPRESSED;
if (shadow->gc_level != 0) {
/* temporarily disable compression for indirect blocks */
do_compress = 0;
}
type = obj_type(inode, shrink_level(shadow->gc_level));
len = obj_len(sb, type);
buf = kmap(page);
if (do_compress)
ret = logfs_segment_write_compress(inode, buf, shadow, type,
len);
else
ret = __logfs_segment_write(inode, buf, shadow, type, len,
COMPR_NONE);
kunmap(page);
log_segment("logfs_segment_write(%llx, %llx, %x) %llx->%llx %x->%x\n",
shadow->ino, shadow->bix, shadow->gc_level,
shadow->old_ofs, shadow->new_ofs,
shadow->old_len, shadow->new_len);
/* this BUG_ON did catch a locking bug. useful */
BUG_ON(!(shadow->new_ofs & (super->s_segsize - 1)));
return ret;
}
int wbuf_read(struct super_block *sb, u64 ofs, size_t len, void *buf)
{
pgoff_t index = ofs >> PAGE_SHIFT;
struct page *page;
long offset = ofs & (PAGE_SIZE-1);
long copylen;
while (len) {
copylen = min((ulong)len, PAGE_SIZE - offset);
page = get_mapping_page(sb, index, 1);
if (IS_ERR(page))
return PTR_ERR(page);
memcpy(buf, page_address(page) + offset, copylen);
put_page(page);
buf += copylen;
len -= copylen;
offset = 0;
index++;
}
return 0;
}
/*
* The "position" of indirect blocks is ambiguous. It can be the position
* of any data block somewhere behind this indirect block. So we need to
* normalize the positions through logfs_block_mask() before comparing.
*/
static int check_pos(struct super_block *sb, u64 pos1, u64 pos2, level_t level)
{
return (pos1 & logfs_block_mask(sb, level)) !=
(pos2 & logfs_block_mask(sb, level));
}
#if 0
static int read_seg_header(struct super_block *sb, u64 ofs,
struct logfs_segment_header *sh)
{
__be32 crc;
int err;
err = wbuf_read(sb, ofs, sizeof(*sh), sh);
if (err)
return err;
crc = logfs_crc32(sh, sizeof(*sh), 4);
if (crc != sh->crc) {
printk(KERN_ERR"LOGFS: header crc error at %llx: expected %x, "
"got %x\n", ofs, be32_to_cpu(sh->crc),
be32_to_cpu(crc));
return -EIO;
}
return 0;
}
#endif
static int read_obj_header(struct super_block *sb, u64 ofs,
struct logfs_object_header *oh)
{
__be32 crc;
int err;
err = wbuf_read(sb, ofs, sizeof(*oh), oh);
if (err)
return err;
crc = logfs_crc32(oh, sizeof(*oh) - 4, 4);
if (crc != oh->crc) {
printk(KERN_ERR"LOGFS: header crc error at %llx: expected %x, "
"got %x\n", ofs, be32_to_cpu(oh->crc),
be32_to_cpu(crc));
return -EIO;
}
return 0;
}
static void move_btree_to_page(struct inode *inode, struct page *page,
__be64 *data)
{
struct super_block *sb = inode->i_sb;
struct logfs_super *super = logfs_super(sb);
struct btree_head128 *head = &super->s_object_alias_tree;
struct logfs_block *block;
struct object_alias_item *item, *next;
if (!(super->s_flags & LOGFS_SB_FLAG_OBJ_ALIAS))
return;
block = btree_remove128(head, inode->i_ino, page->index);
if (!block)
return;
log_blockmove("move_btree_to_page(%llx, %llx, %x)\n",
block->ino, block->bix, block->level);
list_for_each_entry_safe(item, next, &block->item_list, list) {
data[item->child_no] = item->val;
list_del(&item->list);
mempool_free(item, super->s_alias_pool);
}
block->page = page;
if (!PagePrivate(page)) {
SetPagePrivate(page);
get_page(page);
set_page_private(page, (unsigned long) block);
}
block->ops = &indirect_block_ops;
initialize_block_counters(page, block, data, 0);
}
/*
* This silences a false, yet annoying gcc warning. I hate it when my editor
* jumps into bitops.h each time I recompile this file.
* TODO: Complain to gcc folks about this and upgrade compiler.
*/
static unsigned long fnb(const unsigned long *addr,
unsigned long size, unsigned long offset)
{
return find_next_bit(addr, size, offset);
}
void move_page_to_btree(struct page *page)
{
struct logfs_block *block = logfs_block(page);
struct super_block *sb = block->sb;
struct logfs_super *super = logfs_super(sb);
struct object_alias_item *item;
unsigned long pos;
__be64 *child;
int err;
if (super->s_flags & LOGFS_SB_FLAG_SHUTDOWN) {
block->ops->free_block(sb, block);
return;
}
log_blockmove("move_page_to_btree(%llx, %llx, %x)\n",
block->ino, block->bix, block->level);
super->s_flags |= LOGFS_SB_FLAG_OBJ_ALIAS;
for (pos = 0; ; pos++) {
pos = fnb(block->alias_map, LOGFS_BLOCK_FACTOR, pos);
if (pos >= LOGFS_BLOCK_FACTOR)
break;
item = mempool_alloc(super->s_alias_pool, GFP_NOFS);
BUG_ON(!item); /* mempool empty */
memset(item, 0, sizeof(*item));
child = kmap_atomic(page);
item->val = child[pos];
kunmap_atomic(child);
item->child_no = pos;
list_add(&item->list, &block->item_list);
}
block->page = NULL;
if (PagePrivate(page)) {
ClearPagePrivate(page);
put_page(page);
set_page_private(page, 0);
}
block->ops = &btree_block_ops;
err = alias_tree_insert(block->sb, block->ino, block->bix, block->level,
block);
BUG_ON(err); /* mempool empty */
ClearPageUptodate(page);
}
static int __logfs_segment_read(struct inode *inode, void *buf,
u64 ofs, u64 bix, level_t level)
{
struct super_block *sb = inode->i_sb;
void *compressor_buf = logfs_super(sb)->s_compressed_je;
struct logfs_object_header oh;
__be32 crc;
u16 len;
int err, block_len;
block_len = obj_len(sb, obj_type(inode, level));
err = read_obj_header(sb, ofs, &oh);
if (err)
goto out_err;
err = -EIO;
if (be64_to_cpu(oh.ino) != inode->i_ino
|| check_pos(sb, be64_to_cpu(oh.bix), bix, level)) {
printk(KERN_ERR"LOGFS: (ino, bix) don't match at %llx: "
"expected (%lx, %llx), got (%llx, %llx)\n",
ofs, inode->i_ino, bix,
be64_to_cpu(oh.ino), be64_to_cpu(oh.bix));
goto out_err;
}
len = be16_to_cpu(oh.len);
switch (oh.compr) {
case COMPR_NONE:
err = wbuf_read(sb, ofs + LOGFS_OBJECT_HEADERSIZE, len, buf);
if (err)
goto out_err;
crc = logfs_crc32(buf, len, 0);
if (crc != oh.data_crc) {
printk(KERN_ERR"LOGFS: uncompressed data crc error at "
"%llx: expected %x, got %x\n", ofs,
be32_to_cpu(oh.data_crc),
be32_to_cpu(crc));
goto out_err;
}
break;
case COMPR_ZLIB:
mutex_lock(&logfs_super(sb)->s_journal_mutex);
err = wbuf_read(sb, ofs + LOGFS_OBJECT_HEADERSIZE, len,
compressor_buf);
if (err) {
mutex_unlock(&logfs_super(sb)->s_journal_mutex);
goto out_err;
}
crc = logfs_crc32(compressor_buf, len, 0);
if (crc != oh.data_crc) {
printk(KERN_ERR"LOGFS: compressed data crc error at "
"%llx: expected %x, got %x\n", ofs,
be32_to_cpu(oh.data_crc),
be32_to_cpu(crc));
mutex_unlock(&logfs_super(sb)->s_journal_mutex);
goto out_err;
}
err = logfs_uncompress(compressor_buf, buf, len, block_len);
mutex_unlock(&logfs_super(sb)->s_journal_mutex);
if (err) {
printk(KERN_ERR"LOGFS: uncompress error at %llx\n", ofs);
goto out_err;
}
break;
default:
LOGFS_BUG(sb);
err = -EIO;
goto out_err;
}
return 0;
out_err:
logfs_set_ro(sb);
printk(KERN_ERR"LOGFS: device is read-only now\n");
LOGFS_BUG(sb);
return err;
}
/**
* logfs_segment_read - read data block from object store
* @inode: inode containing data
* @buf: data buffer
* @ofs: physical data offset
* @bix: block index
* @level: block level
*
* Returns 0 on success or a negative errno.
*/
int logfs_segment_read(struct inode *inode, struct page *page,
u64 ofs, u64 bix, level_t level)
{
int err;
void *buf;
if (PageUptodate(page))
return 0;
ofs &= ~LOGFS_FULLY_POPULATED;
buf = kmap(page);
err = __logfs_segment_read(inode, buf, ofs, bix, level);
if (!err) {
move_btree_to_page(inode, page, buf);
SetPageUptodate(page);
}
kunmap(page);
log_segment("logfs_segment_read(%lx, %llx, %x) %llx (%d)\n",
inode->i_ino, bix, level, ofs, err);
return err;
}
int logfs_segment_delete(struct inode *inode, struct logfs_shadow *shadow)
{
struct super_block *sb = inode->i_sb;
struct logfs_super *super = logfs_super(sb);
struct logfs_object_header h;
u16 len;
int err;
super->s_flags |= LOGFS_SB_FLAG_DIRTY;
BUG_ON(super->s_flags & LOGFS_SB_FLAG_SHUTDOWN);
BUG_ON(shadow->old_ofs & LOGFS_FULLY_POPULATED);
if (!shadow->old_ofs)
return 0;
log_segment("logfs_segment_delete(%llx, %llx, %x) %llx->%llx %x->%x\n",
shadow->ino, shadow->bix, shadow->gc_level,
shadow->old_ofs, shadow->new_ofs,
shadow->old_len, shadow->new_len);
err = read_obj_header(sb, shadow->old_ofs, &h);
LOGFS_BUG_ON(err, sb);
LOGFS_BUG_ON(be64_to_cpu(h.ino) != inode->i_ino, sb);
LOGFS_BUG_ON(check_pos(sb, shadow->bix, be64_to_cpu(h.bix),
shrink_level(shadow->gc_level)), sb);
if (shadow->gc_level == 0)
len = be16_to_cpu(h.len);
else
len = obj_len(sb, h.type);
shadow->old_len = len + sizeof(h);
return 0;
}
void freeseg(struct super_block *sb, u32 segno)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping = super->s_mapping_inode->i_mapping;
struct page *page;
u64 ofs, start, end;
start = dev_ofs(sb, segno, 0);
end = dev_ofs(sb, segno + 1, 0);
for (ofs = start; ofs < end; ofs += PAGE_SIZE) {
page = find_get_page(mapping, ofs >> PAGE_SHIFT);
if (!page)
continue;
if (PagePrivate(page)) {
ClearPagePrivate(page);
put_page(page);
}
put_page(page);
}
}
int logfs_open_area(struct logfs_area *area, size_t bytes)
{
struct super_block *sb = area->a_sb;
struct logfs_super *super = logfs_super(sb);
int err, closed = 0;
if (area->a_is_open && area->a_used_bytes + bytes <= super->s_segsize)
return 0;
if (area->a_is_open) {
u64 ofs = dev_ofs(sb, area->a_segno, area->a_written_bytes);
u32 len = super->s_segsize - area->a_written_bytes;
log_gc("logfs_close_area(%x)\n", area->a_segno);
pad_wbuf(area, 1);
super->s_devops->writeseg(area->a_sb, ofs, len);
freeseg(sb, area->a_segno);
closed = 1;
}
area->a_used_bytes = 0;
area->a_written_bytes = 0;
again:
area->a_ops->get_free_segment(area);
area->a_ops->get_erase_count(area);
log_gc("logfs_open_area(%x, %x)\n", area->a_segno, area->a_level);
err = area->a_ops->erase_segment(area);
if (err) {
printk(KERN_WARNING "LogFS: Error erasing segment %x\n",
area->a_segno);
logfs_mark_segment_bad(sb, area->a_segno);
goto again;
}
area->a_is_open = 1;
return closed;
}
void logfs_sync_area(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
struct logfs_super *super = logfs_super(sb);
u64 ofs = dev_ofs(sb, area->a_segno, area->a_written_bytes);
u32 len = (area->a_used_bytes - area->a_written_bytes);
if (super->s_writesize)
len &= ~(super->s_writesize - 1);
if (len == 0)
return;
pad_wbuf(area, 0);
super->s_devops->writeseg(sb, ofs, len);
area->a_written_bytes += len;
}
void logfs_sync_segments(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i;
for_each_area(i)
logfs_sync_area(super->s_area[i]);
}
/*
* Pick a free segment to be used for this area. Effectively takes a
* candidate from the free list (not really a candidate anymore).
*/
static void ostore_get_free_segment(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
struct logfs_super *super = logfs_super(sb);
if (super->s_free_list.count == 0) {
printk(KERN_ERR"LOGFS: ran out of free segments\n");
LOGFS_BUG(sb);
}
area->a_segno = get_best_cand(sb, &super->s_free_list, NULL);
}
static void ostore_get_erase_count(struct logfs_area *area)
{
struct logfs_segment_entry se;
u32 ec_level;
logfs_get_segment_entry(area->a_sb, area->a_segno, &se);
BUG_ON(se.ec_level == cpu_to_be32(BADSEG) ||
se.valid == cpu_to_be32(RESERVED));
ec_level = be32_to_cpu(se.ec_level);
area->a_erase_count = (ec_level >> 4) + 1;
}
static int ostore_erase_segment(struct logfs_area *area)
{
struct super_block *sb = area->a_sb;
struct logfs_segment_header sh;
u64 ofs;
int err;
err = logfs_erase_segment(sb, area->a_segno, 0);
if (err)
return err;
sh.pad = 0;
sh.type = SEG_OSTORE;
sh.level = (__force u8)area->a_level;
sh.segno = cpu_to_be32(area->a_segno);
sh.ec = cpu_to_be32(area->a_erase_count);
sh.gec = cpu_to_be64(logfs_super(sb)->s_gec);
sh.crc = logfs_crc32(&sh, sizeof(sh), 4);
logfs_set_segment_erased(sb, area->a_segno, area->a_erase_count,
area->a_level);
ofs = dev_ofs(sb, area->a_segno, 0);
area->a_used_bytes = sizeof(sh);
logfs_buf_write(area, ofs, &sh, sizeof(sh));
return 0;
}
static const struct logfs_area_ops ostore_area_ops = {
.get_free_segment = ostore_get_free_segment,
.get_erase_count = ostore_get_erase_count,
.erase_segment = ostore_erase_segment,
};
static void free_area(struct logfs_area *area)
{
if (area)
freeseg(area->a_sb, area->a_segno);
kfree(area);
}
void free_areas(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i;
for_each_area(i)
free_area(super->s_area[i]);
free_area(super->s_journal_area);
}
static struct logfs_area *alloc_area(struct super_block *sb)
{
struct logfs_area *area;
area = kzalloc(sizeof(*area), GFP_KERNEL);
if (!area)
return NULL;
area->a_sb = sb;
return area;
}
static void map_invalidatepage(struct page *page, unsigned int o,
unsigned int l)
{
return;
}
static int map_releasepage(struct page *page, gfp_t g)
{
/* Don't release these pages */
return 0;
}
static const struct address_space_operations mapping_aops = {
.invalidatepage = map_invalidatepage,
.releasepage = map_releasepage,
.set_page_dirty = __set_page_dirty_nobuffers,
};
int logfs_init_mapping(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct address_space *mapping;
struct inode *inode;
inode = logfs_new_meta_inode(sb, LOGFS_INO_MAPPING);
if (IS_ERR(inode))
return PTR_ERR(inode);
super->s_mapping_inode = inode;
mapping = inode->i_mapping;
mapping->a_ops = &mapping_aops;
/* Would it be possible to use __GFP_HIGHMEM as well? */
mapping_set_gfp_mask(mapping, GFP_NOFS);
return 0;
}
int logfs_init_areas(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int i = -1;
super->s_alias_pool = mempool_create_kmalloc_pool(600,
sizeof(struct object_alias_item));
if (!super->s_alias_pool)
return -ENOMEM;
super->s_journal_area = alloc_area(sb);
if (!super->s_journal_area)
goto err;
for_each_area(i) {
super->s_area[i] = alloc_area(sb);
if (!super->s_area[i])
goto err;
super->s_area[i]->a_level = GC_LEVEL(i);
super->s_area[i]->a_ops = &ostore_area_ops;
}
btree_init_mempool128(&super->s_object_alias_tree,
super->s_btree_pool);
return 0;
err:
for (i--; i >= 0; i--)
free_area(super->s_area[i]);
free_area(super->s_journal_area);
logfs_mempool_destroy(super->s_alias_pool);
return -ENOMEM;
}
void logfs_cleanup_areas(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
btree_grim_visitor128(&super->s_object_alias_tree, 0, kill_alias);
}

View File

@ -1,653 +0,0 @@
/*
* fs/logfs/super.c
*
* As should be obvious for Linux kernel code, license is GPLv2
*
* Copyright (c) 2005-2008 Joern Engel <joern@logfs.org>
*
* Generally contains mount/umount code and also serves as a dump area for
* any functions that don't fit elsewhere and neither justify a file of their
* own.
*/
#include "logfs.h"
#include <linux/bio.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
#include <linux/module.h>
#include <linux/mtd/mtd.h>
#include <linux/statfs.h>
#include <linux/buffer_head.h>
static DEFINE_MUTEX(emergency_mutex);
static struct page *emergency_page;
struct page *emergency_read_begin(struct address_space *mapping, pgoff_t index)
{
filler_t *filler = (filler_t *)mapping->a_ops->readpage;
struct page *page;
int err;
page = read_cache_page(mapping, index, filler, NULL);
if (page)
return page;
/* No more pages available, switch to emergency page */
printk(KERN_INFO"Logfs: Using emergency page\n");
mutex_lock(&emergency_mutex);
err = filler(NULL, emergency_page);
if (err) {
mutex_unlock(&emergency_mutex);
printk(KERN_EMERG"Logfs: Error reading emergency page\n");
return ERR_PTR(err);
}
return emergency_page;
}
void emergency_read_end(struct page *page)
{
if (page == emergency_page)
mutex_unlock(&emergency_mutex);
else
put_page(page);
}
static void dump_segfile(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_segment_entry se;
u32 segno;
for (segno = 0; segno < super->s_no_segs; segno++) {
logfs_get_segment_entry(sb, segno, &se);
printk("%3x: %6x %8x", segno, be32_to_cpu(se.ec_level),
be32_to_cpu(se.valid));
if (++segno < super->s_no_segs) {
logfs_get_segment_entry(sb, segno, &se);
printk(" %6x %8x", be32_to_cpu(se.ec_level),
be32_to_cpu(se.valid));
}
if (++segno < super->s_no_segs) {
logfs_get_segment_entry(sb, segno, &se);
printk(" %6x %8x", be32_to_cpu(se.ec_level),
be32_to_cpu(se.valid));
}
if (++segno < super->s_no_segs) {
logfs_get_segment_entry(sb, segno, &se);
printk(" %6x %8x", be32_to_cpu(se.ec_level),
be32_to_cpu(se.valid));
}
printk("\n");
}
}
/*
* logfs_crash_dump - dump debug information to device
*
* The LogFS superblock only occupies part of a segment. This function will
* write as much debug information as it can gather into the spare space.
*/
void logfs_crash_dump(struct super_block *sb)
{
dump_segfile(sb);
}
/*
* FIXME: There should be a reserve for root, similar to ext2.
*/
int logfs_statfs(struct dentry *dentry, struct kstatfs *stats)
{
struct super_block *sb = dentry->d_sb;
struct logfs_super *super = logfs_super(sb);
stats->f_type = LOGFS_MAGIC_U32;
stats->f_bsize = sb->s_blocksize;
stats->f_blocks = super->s_size >> LOGFS_BLOCK_BITS >> 3;
stats->f_bfree = super->s_free_bytes >> sb->s_blocksize_bits;
stats->f_bavail = super->s_free_bytes >> sb->s_blocksize_bits;
stats->f_files = 0;
stats->f_ffree = 0;
stats->f_namelen = LOGFS_MAX_NAMELEN;
return 0;
}
static int logfs_sb_set(struct super_block *sb, void *_super)
{
struct logfs_super *super = _super;
sb->s_fs_info = super;
sb->s_mtd = super->s_mtd;
sb->s_bdev = super->s_bdev;
#ifdef CONFIG_BLOCK
if (sb->s_bdev)
sb->s_bdi = &bdev_get_queue(sb->s_bdev)->backing_dev_info;
#endif
#ifdef CONFIG_MTD
if (sb->s_mtd)
sb->s_bdi = sb->s_mtd->backing_dev_info;
#endif
return 0;
}
static int logfs_sb_test(struct super_block *sb, void *_super)
{
struct logfs_super *super = _super;
struct mtd_info *mtd = super->s_mtd;
if (mtd && sb->s_mtd == mtd)
return 1;
if (super->s_bdev && sb->s_bdev == super->s_bdev)
return 1;
return 0;
}
static void set_segment_header(struct logfs_segment_header *sh, u8 type,
u8 level, u32 segno, u32 ec)
{
sh->pad = 0;
sh->type = type;
sh->level = level;
sh->segno = cpu_to_be32(segno);
sh->ec = cpu_to_be32(ec);
sh->gec = cpu_to_be64(segno);
sh->crc = logfs_crc32(sh, LOGFS_SEGMENT_HEADERSIZE, 4);
}
static void logfs_write_ds(struct super_block *sb, struct logfs_disk_super *ds,
u32 segno, u32 ec)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_segment_header *sh = &ds->ds_sh;
int i;
memset(ds, 0, sizeof(*ds));
set_segment_header(sh, SEG_SUPER, 0, segno, ec);
ds->ds_ifile_levels = super->s_ifile_levels;
ds->ds_iblock_levels = super->s_iblock_levels;
ds->ds_data_levels = super->s_data_levels; /* XXX: Remove */
ds->ds_segment_shift = super->s_segshift;
ds->ds_block_shift = sb->s_blocksize_bits;
ds->ds_write_shift = super->s_writeshift;
ds->ds_filesystem_size = cpu_to_be64(super->s_size);
ds->ds_segment_size = cpu_to_be32(super->s_segsize);
ds->ds_bad_seg_reserve = cpu_to_be32(super->s_bad_seg_reserve);
ds->ds_feature_incompat = cpu_to_be64(super->s_feature_incompat);
ds->ds_feature_ro_compat= cpu_to_be64(super->s_feature_ro_compat);
ds->ds_feature_compat = cpu_to_be64(super->s_feature_compat);
ds->ds_feature_flags = cpu_to_be64(super->s_feature_flags);
ds->ds_root_reserve = cpu_to_be64(super->s_root_reserve);
ds->ds_speed_reserve = cpu_to_be64(super->s_speed_reserve);
journal_for_each(i)
ds->ds_journal_seg[i] = cpu_to_be32(super->s_journal_seg[i]);
ds->ds_magic = cpu_to_be64(LOGFS_MAGIC);
ds->ds_crc = logfs_crc32(ds, sizeof(*ds),
LOGFS_SEGMENT_HEADERSIZE + 12);
}
static int write_one_sb(struct super_block *sb,
struct page *(*find_sb)(struct super_block *sb, u64 *ofs))
{
struct logfs_super *super = logfs_super(sb);
struct logfs_disk_super *ds;
struct logfs_segment_entry se;
struct page *page;
u64 ofs;
u32 ec, segno;
int err;
page = find_sb(sb, &ofs);
if (!page)
return -EIO;
ds = page_address(page);
segno = seg_no(sb, ofs);
logfs_get_segment_entry(sb, segno, &se);
ec = be32_to_cpu(se.ec_level) >> 4;
ec++;
logfs_set_segment_erased(sb, segno, ec, 0);
logfs_write_ds(sb, ds, segno, ec);
err = super->s_devops->write_sb(sb, page);
put_page(page);
return err;
}
int logfs_write_sb(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
int err;
/* First superblock */
err = write_one_sb(sb, super->s_devops->find_first_sb);
if (err)
return err;
/* Last superblock */
err = write_one_sb(sb, super->s_devops->find_last_sb);
if (err)
return err;
return 0;
}
static int ds_cmp(const void *ds0, const void *ds1)
{
size_t len = sizeof(struct logfs_disk_super);
/* We know the segment headers differ, so ignore them */
len -= LOGFS_SEGMENT_HEADERSIZE;
ds0 += LOGFS_SEGMENT_HEADERSIZE;
ds1 += LOGFS_SEGMENT_HEADERSIZE;
return memcmp(ds0, ds1, len);
}
static int logfs_recover_sb(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct logfs_disk_super _ds0, *ds0 = &_ds0;
struct logfs_disk_super _ds1, *ds1 = &_ds1;
int err, valid0, valid1;
/* read first superblock */
err = wbuf_read(sb, super->s_sb_ofs[0], sizeof(*ds0), ds0);
if (err)
return err;
/* read last superblock */
err = wbuf_read(sb, super->s_sb_ofs[1], sizeof(*ds1), ds1);
if (err)
return err;
valid0 = logfs_check_ds(ds0) == 0;
valid1 = logfs_check_ds(ds1) == 0;
if (!valid0 && valid1) {
printk(KERN_INFO"First superblock is invalid - fixing.\n");
return write_one_sb(sb, super->s_devops->find_first_sb);
}
if (valid0 && !valid1) {
printk(KERN_INFO"Last superblock is invalid - fixing.\n");
return write_one_sb(sb, super->s_devops->find_last_sb);
}
if (valid0 && valid1 && ds_cmp(ds0, ds1)) {
printk(KERN_INFO"Superblocks don't match - fixing.\n");
return logfs_write_sb(sb);
}
/* If neither is valid now, something's wrong. Didn't we properly
* check them before?!? */
BUG_ON(!valid0 && !valid1);
return 0;
}
static int logfs_make_writeable(struct super_block *sb)
{
int err;
err = logfs_open_segfile(sb);
if (err)
return err;
/* Repair any broken superblock copies */
err = logfs_recover_sb(sb);
if (err)
return err;
/* Check areas for trailing unaccounted data */
err = logfs_check_areas(sb);
if (err)
return err;
/* Do one GC pass before any data gets dirtied */
logfs_gc_pass(sb);
/* after all initializations are done, replay the journal
* for rw-mounts, if necessary */
err = logfs_replay_journal(sb);
if (err)
return err;
return 0;
}
static int logfs_get_sb_final(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct inode *rootdir;
int err;
/* root dir */
rootdir = logfs_iget(sb, LOGFS_INO_ROOT);
if (IS_ERR(rootdir))
goto fail;
sb->s_root = d_make_root(rootdir);
if (!sb->s_root)
goto fail;
/* at that point we know that ->put_super() will be called */
super->s_erase_page = alloc_pages(GFP_KERNEL, 0);
if (!super->s_erase_page)
return -ENOMEM;
memset(page_address(super->s_erase_page), 0xFF, PAGE_SIZE);
/* FIXME: check for read-only mounts */
err = logfs_make_writeable(sb);
if (err) {
__free_page(super->s_erase_page);
return err;
}
log_super("LogFS: Finished mounting\n");
return 0;
fail:
iput(super->s_master_inode);
iput(super->s_segfile_inode);
iput(super->s_mapping_inode);
return -EIO;
}
int logfs_check_ds(struct logfs_disk_super *ds)
{
struct logfs_segment_header *sh = &ds->ds_sh;
if (ds->ds_magic != cpu_to_be64(LOGFS_MAGIC))
return -EINVAL;
if (sh->crc != logfs_crc32(sh, LOGFS_SEGMENT_HEADERSIZE, 4))
return -EINVAL;
if (ds->ds_crc != logfs_crc32(ds, sizeof(*ds),
LOGFS_SEGMENT_HEADERSIZE + 12))
return -EINVAL;
return 0;
}
static struct page *find_super_block(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct page *first, *last;
first = super->s_devops->find_first_sb(sb, &super->s_sb_ofs[0]);
if (!first || IS_ERR(first))
return NULL;
last = super->s_devops->find_last_sb(sb, &super->s_sb_ofs[1]);
if (!last || IS_ERR(last)) {
put_page(first);
return NULL;
}
if (!logfs_check_ds(page_address(first))) {
put_page(last);
return first;
}
/* First one didn't work, try the second superblock */
if (!logfs_check_ds(page_address(last))) {
put_page(first);
return last;
}
/* Neither worked, sorry folks */
put_page(first);
put_page(last);
return NULL;
}
static int __logfs_read_sb(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
struct page *page;
struct logfs_disk_super *ds;
int i;
page = find_super_block(sb);
if (!page)
return -EINVAL;
ds = page_address(page);
super->s_size = be64_to_cpu(ds->ds_filesystem_size);
super->s_root_reserve = be64_to_cpu(ds->ds_root_reserve);
super->s_speed_reserve = be64_to_cpu(ds->ds_speed_reserve);
super->s_bad_seg_reserve = be32_to_cpu(ds->ds_bad_seg_reserve);
super->s_segsize = 1 << ds->ds_segment_shift;
super->s_segmask = (1 << ds->ds_segment_shift) - 1;
super->s_segshift = ds->ds_segment_shift;
sb->s_blocksize = 1 << ds->ds_block_shift;
sb->s_blocksize_bits = ds->ds_block_shift;
super->s_writesize = 1 << ds->ds_write_shift;
super->s_writeshift = ds->ds_write_shift;
super->s_no_segs = super->s_size >> super->s_segshift;
super->s_no_blocks = super->s_segsize >> sb->s_blocksize_bits;
super->s_feature_incompat = be64_to_cpu(ds->ds_feature_incompat);
super->s_feature_ro_compat = be64_to_cpu(ds->ds_feature_ro_compat);
super->s_feature_compat = be64_to_cpu(ds->ds_feature_compat);
super->s_feature_flags = be64_to_cpu(ds->ds_feature_flags);
journal_for_each(i)
super->s_journal_seg[i] = be32_to_cpu(ds->ds_journal_seg[i]);
super->s_ifile_levels = ds->ds_ifile_levels;
super->s_iblock_levels = ds->ds_iblock_levels;
super->s_data_levels = ds->ds_data_levels;
super->s_total_levels = super->s_ifile_levels + super->s_iblock_levels
+ super->s_data_levels;
put_page(page);
return 0;
}
static int logfs_read_sb(struct super_block *sb, int read_only)
{
struct logfs_super *super = logfs_super(sb);
int ret;
super->s_btree_pool = mempool_create(32, btree_alloc, btree_free, NULL);
if (!super->s_btree_pool)
return -ENOMEM;
btree_init_mempool64(&super->s_shadow_tree.new, super->s_btree_pool);
btree_init_mempool64(&super->s_shadow_tree.old, super->s_btree_pool);
btree_init_mempool32(&super->s_shadow_tree.segment_map,
super->s_btree_pool);
ret = logfs_init_mapping(sb);
if (ret)
return ret;
ret = __logfs_read_sb(sb);
if (ret)
return ret;
if (super->s_feature_incompat & ~LOGFS_FEATURES_INCOMPAT)
return -EIO;
if ((super->s_feature_ro_compat & ~LOGFS_FEATURES_RO_COMPAT) &&
!read_only)
return -EIO;
ret = logfs_init_rw(sb);
if (ret)
return ret;
ret = logfs_init_areas(sb);
if (ret)
return ret;
ret = logfs_init_gc(sb);
if (ret)
return ret;
ret = logfs_init_journal(sb);
if (ret)
return ret;
return 0;
}
static void logfs_kill_sb(struct super_block *sb)
{
struct logfs_super *super = logfs_super(sb);
log_super("LogFS: Start unmounting\n");
/* Alias entries slow down mount, so evict as many as possible */
sync_filesystem(sb);
logfs_write_anchor(sb);
free_areas(sb);
/*
* From this point on alias entries are simply dropped - and any
* writes to the object store are considered bugs.
*/
log_super("LogFS: Now in shutdown\n");
generic_shutdown_super(sb);
super->s_flags |= LOGFS_SB_FLAG_SHUTDOWN;
BUG_ON(super->s_dirty_used_bytes || super->s_dirty_free_bytes);
logfs_cleanup_gc(sb);
logfs_cleanup_journal(sb);
logfs_cleanup_areas(sb);
logfs_cleanup_rw(sb);
if (super->s_erase_page)
__free_page(super->s_erase_page);
super->s_devops->put_device(super);
logfs_mempool_destroy(super->s_btree_pool);
logfs_mempool_destroy(super->s_alias_pool);
kfree(super);
log_super("LogFS: Finished unmounting\n");
}
static struct dentry *logfs_get_sb_device(struct logfs_super *super,
struct file_system_type *type, int flags)
{
struct super_block *sb;
int err = -ENOMEM;
static int mount_count;
log_super("LogFS: Start mount %x\n", mount_count++);
err = -EINVAL;
sb = sget(type, logfs_sb_test, logfs_sb_set, flags | MS_NOATIME, super);
if (IS_ERR(sb)) {
super->s_devops->put_device(super);
kfree(super);
return ERR_CAST(sb);
}
if (sb->s_root) {
/* Device is already in use */
super->s_devops->put_device(super);
kfree(super);
return dget(sb->s_root);
}
/*
* sb->s_maxbytes is limited to 8TB. On 32bit systems, the page cache
* only covers 16TB and the upper 8TB are used for indirect blocks.
* On 64bit system we could bump up the limit, but that would make
* the filesystem incompatible with 32bit systems.
*/
sb->s_maxbytes = (1ull << 43) - 1;
sb->s_max_links = LOGFS_LINK_MAX;
sb->s_op = &logfs_super_operations;
err = logfs_read_sb(sb, sb->s_flags & MS_RDONLY);
if (err)
goto err1;
sb->s_flags |= MS_ACTIVE;
err = logfs_get_sb_final(sb);
if (err) {
deactivate_locked_super(sb);
return ERR_PTR(err);
}
return dget(sb->s_root);
err1:
/* no ->s_root, no ->put_super() */
iput(super->s_master_inode);
iput(super->s_segfile_inode);
iput(super->s_mapping_inode);
deactivate_locked_super(sb);
return ERR_PTR(err);
}
static struct dentry *logfs_mount(struct file_system_type *type, int flags,
const char *devname, void *data)
{
ulong mtdnr;
struct logfs_super *super;
int err;
super = kzalloc(sizeof(*super), GFP_KERNEL);
if (!super)
return ERR_PTR(-ENOMEM);
mutex_init(&super->s_dirop_mutex);
mutex_init(&super->s_object_alias_mutex);
INIT_LIST_HEAD(&super->s_freeing_list);
if (!devname)
err = logfs_get_sb_bdev(super, type, devname);
else if (strncmp(devname, "mtd", 3))
err = logfs_get_sb_bdev(super, type, devname);
else {
char *garbage;
mtdnr = simple_strtoul(devname+3, &garbage, 0);
if (*garbage)
err = -EINVAL;
else
err = logfs_get_sb_mtd(super, mtdnr);
}
if (err) {
kfree(super);
return ERR_PTR(err);
}
return logfs_get_sb_device(super, type, flags);
}
static struct file_system_type logfs_fs_type = {
.owner = THIS_MODULE,
.name = "logfs",
.mount = logfs_mount,
.kill_sb = logfs_kill_sb,
.fs_flags = FS_REQUIRES_DEV,
};
MODULE_ALIAS_FS("logfs");
static int __init logfs_init(void)
{
int ret;
emergency_page = alloc_pages(GFP_KERNEL, 0);
if (!emergency_page)
return -ENOMEM;
ret = logfs_compr_init();
if (ret)
goto out1;
ret = logfs_init_inode_cache();
if (ret)
goto out2;
ret = register_filesystem(&logfs_fs_type);
if (!ret)
return 0;
logfs_destroy_inode_cache();
out2:
logfs_compr_exit();
out1:
__free_pages(emergency_page, 0);
return ret;
}
static void __exit logfs_exit(void)
{
unregister_filesystem(&logfs_fs_type);
logfs_destroy_inode_cache();
logfs_compr_exit();
__free_pages(emergency_page, 0);
}
module_init(logfs_init);
module_exit(logfs_exit);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Joern Engel <joern@logfs.org>");
MODULE_DESCRIPTION("scalable flash filesystem");

View File

@ -1725,30 +1725,35 @@ static int pick_link(struct nameidata *nd, struct path *link,
return 1;
}
enum {WALK_FOLLOW = 1, WALK_MORE = 2};
/*
* Do we need to follow links? We _really_ want to be able
* to do this check without having to look at inode->i_op,
* so we keep a cache of "no, this doesn't need follow_link"
* for the common case.
*/
static inline int should_follow_link(struct nameidata *nd, struct path *link,
int follow,
struct inode *inode, unsigned seq)
static inline int step_into(struct nameidata *nd, struct path *path,
int flags, struct inode *inode, unsigned seq)
{
if (likely(!d_is_symlink(link->dentry)))
return 0;
if (!follow)
if (!(flags & WALK_MORE) && nd->depth)
put_link(nd);
if (likely(!d_is_symlink(path->dentry)) ||
!(flags & WALK_FOLLOW || nd->flags & LOOKUP_FOLLOW)) {
/* not a symlink or should not follow */
path_to_nameidata(path, nd);
nd->inode = inode;
nd->seq = seq;
return 0;
}
/* make sure that d_is_symlink above matches inode */
if (nd->flags & LOOKUP_RCU) {
if (read_seqcount_retry(&link->dentry->d_seq, seq))
if (read_seqcount_retry(&path->dentry->d_seq, seq))
return -ECHILD;
}
return pick_link(nd, link, inode, seq);
return pick_link(nd, path, inode, seq);
}
enum {WALK_GET = 1, WALK_PUT = 2};
static int walk_component(struct nameidata *nd, int flags)
{
struct path path;
@ -1762,7 +1767,7 @@ static int walk_component(struct nameidata *nd, int flags)
*/
if (unlikely(nd->last_type != LAST_NORM)) {
err = handle_dots(nd, nd->last_type);
if (flags & WALK_PUT)
if (!(flags & WALK_MORE) && nd->depth)
put_link(nd);
return err;
}
@ -1789,15 +1794,7 @@ static int walk_component(struct nameidata *nd, int flags)
inode = d_backing_inode(path.dentry);
}
if (flags & WALK_PUT)
put_link(nd);
err = should_follow_link(nd, &path, flags & WALK_GET, inode, seq);
if (unlikely(err))
return err;
path_to_nameidata(&path, nd);
nd->inode = inode;
nd->seq = seq;
return 0;
return step_into(nd, &path, flags, inode, seq);
}
/*
@ -2104,9 +2101,10 @@ static int link_path_walk(const char *name, struct nameidata *nd)
if (!name)
return 0;
/* last component of nested symlink */
err = walk_component(nd, WALK_GET | WALK_PUT);
err = walk_component(nd, WALK_FOLLOW);
} else {
err = walk_component(nd, WALK_GET);
/* not the last component */
err = walk_component(nd, WALK_FOLLOW | WALK_MORE);
}
if (err < 0)
return err;
@ -2248,12 +2246,7 @@ static inline int lookup_last(struct nameidata *nd)
nd->flags |= LOOKUP_FOLLOW | LOOKUP_DIRECTORY;
nd->flags &= ~LOOKUP_PARENT;
return walk_component(nd,
nd->flags & LOOKUP_FOLLOW
? nd->depth
? WALK_PUT | WALK_GET
: WALK_GET
: 0);
return walk_component(nd, 0);
}
/* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
@ -2558,28 +2551,9 @@ int user_path_at_empty(int dfd, const char __user *name, unsigned flags,
}
EXPORT_SYMBOL(user_path_at_empty);
/*
* NB: most callers don't do anything directly with the reference to the
* to struct filename, but the nd->last pointer points into the name string
* allocated by getname. So we must hold the reference to it until all
* path-walking is complete.
*/
static inline struct filename *
user_path_parent(int dfd, const char __user *path,
struct path *parent,
struct qstr *last,
int *type,
unsigned int flags)
{
/* only LOOKUP_REVAL is allowed in extra flags */
return filename_parentat(dfd, getname(path), flags & LOOKUP_REVAL,
parent, last, type);
}
/**
* mountpoint_last - look up last component for umount
* @nd: pathwalk nameidata - currently pointing at parent directory of "last"
* @path: pointer to container for result
*
* This is a special lookup_last function just for umount. In this case, we
* need to resolve the path without doing any revalidation.
@ -2592,23 +2566,20 @@ user_path_parent(int dfd, const char __user *path,
*
* Returns:
* -error: if there was an error during lookup. This includes -ENOENT if the
* lookup found a negative dentry. The nd->path reference will also be
* put in this case.
* lookup found a negative dentry.
*
* 0: if we successfully resolved nd->path and found it to not to be a
* symlink that needs to be followed. "path" will also be populated.
* The nd->path reference will also be put.
* 0: if we successfully resolved nd->last and found it to not to be a
* symlink that needs to be followed.
*
* 1: if we successfully resolved nd->last and found it to be a symlink
* that needs to be followed. "path" will be populated with the path
* to the link, and nd->path will *not* be put.
* that needs to be followed.
*/
static int
mountpoint_last(struct nameidata *nd, struct path *path)
mountpoint_last(struct nameidata *nd)
{
int error = 0;
struct dentry *dentry;
struct dentry *dir = nd->path.dentry;
struct path path;
/* If we're in rcuwalk, drop out of it to handle last component */
if (nd->flags & LOOKUP_RCU) {
@ -2622,37 +2593,28 @@ mountpoint_last(struct nameidata *nd, struct path *path)
error = handle_dots(nd, nd->last_type);
if (error)
return error;
dentry = dget(nd->path.dentry);
path.dentry = dget(nd->path.dentry);
} else {
dentry = d_lookup(dir, &nd->last);
if (!dentry) {
path.dentry = d_lookup(dir, &nd->last);
if (!path.dentry) {
/*
* No cached dentry. Mounted dentries are pinned in the
* cache, so that means that this dentry is probably
* a symlink or the path doesn't actually point
* to a mounted dentry.
*/
dentry = lookup_slow(&nd->last, dir,
path.dentry = lookup_slow(&nd->last, dir,
nd->flags | LOOKUP_NO_REVAL);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
if (IS_ERR(path.dentry))
return PTR_ERR(path.dentry);
}
}
if (d_is_negative(dentry)) {
dput(dentry);
if (d_is_negative(path.dentry)) {
dput(path.dentry);
return -ENOENT;
}
if (nd->depth)
put_link(nd);
path->dentry = dentry;
path->mnt = nd->path.mnt;
error = should_follow_link(nd, path, nd->flags & LOOKUP_FOLLOW,
d_backing_inode(dentry), 0);
if (unlikely(error))
return error;
mntget(path->mnt);
follow_mount(path);
return 0;
path.mnt = nd->path.mnt;
return step_into(nd, &path, 0, d_backing_inode(path.dentry), 0);
}
/**
@ -2672,13 +2634,19 @@ path_mountpoint(struct nameidata *nd, unsigned flags, struct path *path)
if (IS_ERR(s))
return PTR_ERR(s);
while (!(err = link_path_walk(s, nd)) &&
(err = mountpoint_last(nd, path)) > 0) {
(err = mountpoint_last(nd)) > 0) {
s = trailing_symlink(nd);
if (IS_ERR(s)) {
err = PTR_ERR(s);
break;
}
}
if (!err) {
*path = nd->path;
nd->path.mnt = NULL;
nd->path.dentry = NULL;
follow_mount(path);
}
terminate_walk(nd);
return err;
}
@ -3335,18 +3303,11 @@ static int do_last(struct nameidata *nd,
seq = 0; /* out of RCU mode, so the value doesn't matter */
inode = d_backing_inode(path.dentry);
finish_lookup:
if (nd->depth)
put_link(nd);
error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW,
inode, seq);
error = step_into(nd, &path, 0, inode, seq);
if (unlikely(error))
return error;
path_to_nameidata(&path, nd);
nd->inode = inode;
nd->seq = seq;
/* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */
finish_open:
/* Why this, you ask? _Now_ we might have grown LOOKUP_JUMPED... */
error = complete_walk(nd);
if (error)
return error;
@ -3861,8 +3822,8 @@ static long do_rmdir(int dfd, const char __user *pathname)
int type;
unsigned int lookup_flags = 0;
retry:
name = user_path_parent(dfd, pathname,
&path, &last, &type, lookup_flags);
name = filename_parentat(dfd, getname(pathname), lookup_flags,
&path, &last, &type);
if (IS_ERR(name))
return PTR_ERR(name);
@ -3991,8 +3952,8 @@ static long do_unlinkat(int dfd, const char __user *pathname)
struct inode *delegated_inode = NULL;
unsigned int lookup_flags = 0;
retry:
name = user_path_parent(dfd, pathname,
&path, &last, &type, lookup_flags);
name = filename_parentat(dfd, getname(pathname), lookup_flags,
&path, &last, &type);
if (IS_ERR(name))
return PTR_ERR(name);
@ -4491,15 +4452,15 @@ SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
target_flags = 0;
retry:
from = user_path_parent(olddfd, oldname,
&old_path, &old_last, &old_type, lookup_flags);
from = filename_parentat(olddfd, getname(oldname), lookup_flags,
&old_path, &old_last, &old_type);
if (IS_ERR(from)) {
error = PTR_ERR(from);
goto exit;
}
to = user_path_parent(newdfd, newname,
&new_path, &new_last, &new_type, lookup_flags);
to = filename_parentat(newdfd, getname(newname), lookup_flags,
&new_path, &new_last, &new_type);
if (IS_ERR(to)) {
error = PTR_ERR(to);
goto exit1;

View File

@ -203,7 +203,7 @@ ncp_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
bufsize - (pos % bufsize),
iov_iter_count(from));
if (copy_from_iter(bouncebuffer, to_write, from) != to_write) {
if (!copy_from_iter_full(bouncebuffer, to_write, from)) {
errno = -EFAULT;
break;
}

View File

@ -355,7 +355,6 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
__u64 tag;
} head;
int total = ret = iov_iter_count(iter);
int n;
int downcall_size = sizeof(struct orangefs_downcall_s);
int head_size = sizeof(head);
@ -372,8 +371,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
return -EFAULT;
}
n = copy_from_iter(&head, head_size, iter);
if (n < head_size) {
if (!copy_from_iter_full(&head, head_size, iter)) {
gossip_err("%s: failed to copy head.\n", __func__);
return -EFAULT;
}
@ -407,8 +405,7 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
return ret;
}
n = copy_from_iter(&op->downcall, downcall_size, iter);
if (n != downcall_size) {
if (!copy_from_iter_full(&op->downcall, downcall_size, iter)) {
gossip_err("%s: failed to copy downcall.\n", __func__);
goto Efault;
}
@ -462,10 +459,8 @@ static ssize_t orangefs_devreq_write_iter(struct kiocb *iocb,
goto Enomem;
}
memset(op->downcall.trailer_buf, 0, op->downcall.trailer_size);
n = copy_from_iter(op->downcall.trailer_buf,
op->downcall.trailer_size,
iter);
if (n != op->downcall.trailer_size) {
if (!copy_from_iter_full(op->downcall.trailer_buf,
op->downcall.trailer_size, iter)) {
gossip_err("%s: failed to copy trailer.\n", __func__);
vfree(op->downcall.trailer_buf);
goto Efault;

View File

@ -724,7 +724,7 @@ static int orangefs_lock(struct file *filp, int cmd, struct file_lock *fl)
{
int rc = -EINVAL;
if (ORANGEFS_SB(filp->f_inode->i_sb)->flags & ORANGEFS_OPT_LOCAL_LOCK) {
if (ORANGEFS_SB(file_inode(filp)->i_sb)->flags & ORANGEFS_OPT_LOCAL_LOCK) {
if (cmd == F_GETLK) {
rc = 0;
posix_test_lock(filp, fl);

View File

@ -434,6 +434,7 @@ static ssize_t orangefs_debug_write(struct file *file,
char *debug_string;
struct orangefs_kernel_op_s *new_op = NULL;
struct client_debug_mask c_mask = { NULL, 0, 0 };
char *s;
gossip_debug(GOSSIP_DEBUGFS_DEBUG,
"orangefs_debug_write: %pD\n",
@ -521,8 +522,9 @@ static ssize_t orangefs_debug_write(struct file *file,
}
mutex_lock(&orangefs_debug_lock);
memset(file->f_inode->i_private, 0, ORANGEFS_MAX_DEBUG_STRING_LEN);
sprintf((char *)file->f_inode->i_private, "%s\n", debug_string);
s = file_inode(file)->i_private;
memset(s, 0, ORANGEFS_MAX_DEBUG_STRING_LEN);
sprintf(s, "%s\n", debug_string);
mutex_unlock(&orangefs_debug_lock);
*ppos += count;

View File

@ -33,7 +33,7 @@ static int ovl_check_fd(const void *data, struct file *f, unsigned int fd)
{
const struct dentry *dentry = data;
if (f->f_inode == d_inode(dentry))
if (file_inode(f) == d_inode(dentry))
pr_warn_ratelimited("overlayfs: Warning: Copying up %pD, but open R/O on fd %u which will cease to be coherent [pid=%d %s]\n",
f, fd, current->pid, current->comm);
return 0;

View File

@ -2818,12 +2818,12 @@ static inline int skb_add_data(struct sk_buff *skb,
if (skb->ip_summed == CHECKSUM_NONE) {
__wsum csum = 0;
if (csum_and_copy_from_iter(skb_put(skb, copy), copy,
&csum, from) == copy) {
if (csum_and_copy_from_iter_full(skb_put(skb, copy), copy,
&csum, from)) {
skb->csum = csum_block_add(skb->csum, csum, off);
return 0;
}
} else if (copy_from_iter(skb_put(skb, copy), copy, from) == copy)
} else if (copy_from_iter_full(skb_put(skb, copy), copy, from))
return 0;
__skb_trim(skb, off);

View File

@ -89,7 +89,9 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i);
size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i);
bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i);
size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i);
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i);
size_t iov_iter_zero(size_t bytes, struct iov_iter *);
unsigned long iov_iter_alignment(const struct iov_iter *i);
unsigned long iov_iter_gap_alignment(const struct iov_iter *i);
@ -155,6 +157,7 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count)
}
size_t csum_and_copy_to_iter(const void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i);
int import_iovec(int type, const struct iovec __user * uvector,
unsigned nr_segs, unsigned fast_segs,

View File

@ -1836,13 +1836,13 @@ static inline int skb_do_copy_data_nocache(struct sock *sk, struct sk_buff *skb,
{
if (skb->ip_summed == CHECKSUM_NONE) {
__wsum csum = 0;
if (csum_and_copy_from_iter(to, copy, &csum, from) != copy)
if (!csum_and_copy_from_iter_full(to, copy, &csum, from))
return -EFAULT;
skb->csum = csum_block_add(skb->csum, csum, offset);
} else if (sk->sk_route_caps & NETIF_F_NOCACHE_COPY) {
if (copy_from_iter_nocache(to, copy, from) != copy)
if (!copy_from_iter_full_nocache(to, copy, from))
return -EFAULT;
} else if (copy_from_iter(to, copy, from) != copy)
} else if (!copy_from_iter_full(to, copy, from))
return -EFAULT;
return 0;

View File

@ -20,7 +20,7 @@ static __inline__ int udplite_getfrag(void *from, char *to, int offset,
int len, int odd, struct sk_buff *skb)
{
struct msghdr *msg = from;
return copy_from_iter(to, len, &msg->msg_iter) != len ? -EFAULT : 0;
return copy_from_iter_full(to, len, &msg->msg_iter) ? 0 : -EFAULT;
}
/* Designate sk as UDP-Lite socket */

View File

@ -547,8 +547,8 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark)
exe_file = get_task_exe_file(tsk);
if (!exe_file)
return 0;
ino = exe_file->f_inode->i_ino;
dev = exe_file->f_inode->i_sb->s_dev;
ino = file_inode(exe_file)->i_ino;
dev = file_inode(exe_file)->i_sb->s_dev;
fput(exe_file);
return audit_mark_compare(mark, ino, dev);
}

View File

@ -6698,7 +6698,7 @@ static bool perf_addr_filter_match(struct perf_addr_filter *filter,
struct file *file, unsigned long offset,
unsigned long size)
{
if (filter->inode != file->f_inode)
if (filter->inode != file_inode(file))
return false;
if (filter->offset > offset + size)

View File

@ -108,11 +108,7 @@ static ssize_t qstat_read(struct file *file, char __user *user_buf,
/*
* Get the counter ID stored in file->f_inode->i_private
*/
if (!file->f_inode) {
WARN_ON_ONCE(1);
return -EBADF;
}
counter = (long)(file->f_inode->i_private);
counter = (long)file_inode(file)->i_private;
if (counter >= qstat_num)
return -EBADF;
@ -177,11 +173,7 @@ static ssize_t qstat_write(struct file *file, const char __user *user_buf,
/*
* Get the counter ID stored in file->f_inode->i_private
*/
if (!file->f_inode) {
WARN_ON_ONCE(1);
return -EBADF;
}
if ((long)(file->f_inode->i_private) != qstat_reset_cnts)
if ((long)file_inode(file)->i_private != qstat_reset_cnts)
return count;
for_each_possible_cpu(cpu) {

View File

@ -733,7 +733,7 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
return -ENOMEM;
buf[len] = '\0';
if (copy_from_iter(buf, len, from) != len) {
if (!copy_from_iter_full(buf, len, from)) {
kfree(buf);
return -EFAULT;
}

View File

@ -569,6 +569,31 @@ size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
}
EXPORT_SYMBOL(copy_from_iter);
bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(i->type & ITER_PIPE)) {
WARN_ON(1);
return false;
}
if (unlikely(i->count < bytes)) \
return false;
iterate_all_kinds(i, bytes, v, ({
if (__copy_from_user((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len))
return false;
0;}),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
v.bv_offset, v.bv_len),
memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)
iov_iter_advance(i, bytes);
return true;
}
EXPORT_SYMBOL(copy_from_iter_full);
size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
@ -588,6 +613,30 @@ size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i)
}
EXPORT_SYMBOL(copy_from_iter_nocache);
bool copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(i->type & ITER_PIPE)) {
WARN_ON(1);
return false;
}
if (unlikely(i->count < bytes)) \
return false;
iterate_all_kinds(i, bytes, v, ({
if (__copy_from_user_nocache((to += v.iov_len) - v.iov_len,
v.iov_base, v.iov_len))
return false;
0;}),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
v.bv_offset, v.bv_len),
memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)
iov_iter_advance(i, bytes);
return true;
}
EXPORT_SYMBOL(copy_from_iter_full_nocache);
size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes,
struct iov_iter *i)
{
@ -1009,7 +1058,7 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
}
iterate_and_advance(i, bytes, v, ({
int err = 0;
next = csum_and_copy_from_user(v.iov_base,
next = csum_and_copy_from_user(v.iov_base,
(to += v.iov_len) - v.iov_len,
v.iov_len, 0, &err);
if (!err) {
@ -1038,6 +1087,51 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum,
}
EXPORT_SYMBOL(csum_and_copy_from_iter);
bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
char *to = addr;
__wsum sum, next;
size_t off = 0;
sum = *csum;
if (unlikely(i->type & ITER_PIPE)) {
WARN_ON(1);
return false;
}
if (unlikely(i->count < bytes))
return false;
iterate_all_kinds(i, bytes, v, ({
int err = 0;
next = csum_and_copy_from_user(v.iov_base,
(to += v.iov_len) - v.iov_len,
v.iov_len, 0, &err);
if (err)
return false;
sum = csum_block_add(sum, next, off);
off += v.iov_len;
0;
}), ({
char *p = kmap_atomic(v.bv_page);
next = csum_partial_copy_nocheck(p + v.bv_offset,
(to += v.bv_len) - v.bv_len,
v.bv_len, 0);
kunmap_atomic(p);
sum = csum_block_add(sum, next, off);
off += v.bv_len;
}),({
next = csum_partial_copy_nocheck(v.iov_base,
(to += v.iov_len) - v.iov_len,
v.iov_len, 0);
sum = csum_block_add(sum, next, off);
off += v.iov_len;
})
)
*csum = sum;
iov_iter_advance(i, bytes);
return true;
}
EXPORT_SYMBOL(csum_and_copy_from_iter_full);
size_t csum_and_copy_to_iter(const void *addr, size_t bytes, __wsum *csum,
struct iov_iter *i)
{
@ -1052,7 +1146,7 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, __wsum *csum,
iterate_and_advance(i, bytes, v, ({
int err = 0;
next = csum_and_copy_to_user((from += v.iov_len) - v.iov_len,
v.iov_base,
v.iov_base,
v.iov_len, 0, &err);
if (!err) {
sum = csum_block_add(sum, next, off);

View File

@ -630,7 +630,7 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
goto out;
skb->dev = NULL; /* for paths shared with net_device interfaces */
ATM_SKB(skb)->atm_options = vcc->atm_options;
if (copy_from_iter(skb_put(skb, size), size, &m->msg_iter) != size) {
if (!copy_from_iter_full(skb_put(skb, size), size, &m->msg_iter)) {
kfree_skb(skb);
error = -EFAULT;
goto out;

View File

@ -2127,7 +2127,7 @@ static inline int l2cap_skbuff_fromiovec(struct l2cap_chan *chan,
struct sk_buff **frag;
int sent = 0;
if (copy_from_iter(skb_put(skb, count), count, &msg->msg_iter) != count)
if (!copy_from_iter_full(skb_put(skb, count), count, &msg->msg_iter))
return -EFAULT;
sent += count;
@ -2147,8 +2147,8 @@ static inline int l2cap_skbuff_fromiovec(struct l2cap_chan *chan,
*frag = tmp;
if (copy_from_iter(skb_put(*frag, count), count,
&msg->msg_iter) != count)
if (!copy_from_iter_full(skb_put(*frag, count), count,
&msg->msg_iter))
return -EFAULT;
sent += count;

View File

@ -826,11 +826,11 @@ ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk
struct msghdr *msg = from;
if (skb->ip_summed == CHECKSUM_PARTIAL) {
if (copy_from_iter(to, len, &msg->msg_iter) != len)
if (!copy_from_iter_full(to, len, &msg->msg_iter))
return -EFAULT;
} else {
__wsum csum = 0;
if (csum_and_copy_from_iter(to, len, &csum, &msg->msg_iter) != len)
if (!csum_and_copy_from_iter_full(to, len, &csum, &msg->msg_iter))
return -EFAULT;
skb->csum = csum_block_add(skb->csum, csum, odd);
}

View File

@ -609,15 +609,15 @@ int ping_getfrag(void *from, char *to,
fraglen -= sizeof(struct icmphdr);
if (fraglen < 0)
BUG();
if (csum_and_copy_from_iter(to + sizeof(struct icmphdr),
if (!csum_and_copy_from_iter_full(to + sizeof(struct icmphdr),
fraglen, &pfh->wcheck,
&pfh->msg->msg_iter) != fraglen)
&pfh->msg->msg_iter))
return -EFAULT;
} else if (offset < sizeof(struct icmphdr)) {
BUG();
} else {
if (csum_and_copy_from_iter(to, fraglen, &pfh->wcheck,
&pfh->msg->msg_iter) != fraglen)
if (!csum_and_copy_from_iter_full(to, fraglen, &pfh->wcheck,
&pfh->msg->msg_iter))
return -EFAULT;
}

View File

@ -2397,14 +2397,11 @@ static int __packet_snd_vnet_parse(struct virtio_net_hdr *vnet_hdr, size_t len)
static int packet_snd_vnet_parse(struct msghdr *msg, size_t *len,
struct virtio_net_hdr *vnet_hdr)
{
int n;
if (*len < sizeof(*vnet_hdr))
return -EINVAL;
*len -= sizeof(*vnet_hdr);
n = copy_from_iter(vnet_hdr, sizeof(*vnet_hdr), &msg->msg_iter);
if (n != sizeof(*vnet_hdr))
if (!copy_from_iter_full(vnet_hdr, sizeof(*vnet_hdr), &msg->msg_iter))
return -EFAULT;
return __packet_snd_vnet_parse(vnet_hdr, *len);

View File

@ -268,7 +268,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m,
__skb_queue_tail(list, skb);
skb_copy_to_linear_data(skb, mhdr, mhsz);
pktpos = skb->data + mhsz;
if (copy_from_iter(pktpos, dsz, &m->msg_iter) == dsz)
if (copy_from_iter_full(pktpos, dsz, &m->msg_iter))
return dsz;
rc = -EFAULT;
goto error;
@ -299,7 +299,7 @@ int tipc_msg_build(struct tipc_msg *mhdr, struct msghdr *m,
if (drem < pktrem)
pktrem = drem;
if (copy_from_iter(pktpos, pktrem, &m->msg_iter) != pktrem) {
if (!copy_from_iter_full(pktpos, pktrem, &m->msg_iter)) {
rc = -EFAULT;
goto error;
}

View File

@ -1074,7 +1074,7 @@ long keyctl_instantiate_key_common(key_serial_t id,
}
ret = -EFAULT;
if (copy_from_iter(payload, plen, from) != plen)
if (!copy_from_iter_full(payload, plen, from))
goto error2;
}

View File

@ -225,7 +225,7 @@ static int smk_bu_credfile(const struct cred *cred, struct file *file,
{
struct task_smack *tsp = cred->security;
struct smack_known *sskp = tsp->smk_task;
struct inode *inode = file->f_inode;
struct inode *inode = file_inode(file);
struct inode_smack *isp = inode->i_security;
char acc[SMK_NUM_ACCESS_TYPE + 1];