Fix argument order. We want to move the item to the end of the list, not
change the position of the head.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
If we hold the EXCL cap, we cannot trust the dir stats from the MDS (num
files, subdirs) and must not incorrectly conclude that the directory is
empty. If we do, we get can bad results from lookup (bad ENOENT) and
bad readdir results.
Signed-off-by: Sage Weil <sage@newdream.net>
This reminded me... you have two pr_debugs in fanotify_should_send_event
which output redundant information. Maybe you intended it like that so
it is selectable how much log spam you want, or if not you may want to
apply this patch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
When an fanotify listener is closing it may cause a deadlock between the
listener and the original task doing an fs operation. If the original task
is waiting for a permissions response it will be holding the srcu lock. The
listener cannot clean up and exit until after that srcu lock is syncronized.
Thus deadlock. The fix introduced here is to stop accepting new permissions
events when a listener is shutting down and to grant permission for all
outstanding events. Thus the original task will eventually release the srcu
lock and the listener can complete shutdown.
Reported-by: Andreas Gruenbacher <agruen@suse.de>
Cc: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>
The interesting 2 list lockstep walking didn't quite work out if the inode
marks only had ignores and the vfsmount list requested events. The code to
shortcut list traversal would not run the inode list since it didn't have real
event requests. This code forces inode list traversal when a vfsmount mark
matches the event type. Maybe we could add an i_fsnotify_ignored_mask field
to struct inode to get the shortcut back, but it doesn't seem worth it to grow
struct inode again.
I bet with the recent changes to lock the way we do now it would actually not
be a major perf hit to just drop i_fsnotify_mark_mask altogether. But that is
for another day.
Signed-off-by: Eric Paris <eparis@redhat.com>
The fsnotify main loop has 2 booleans which tell if a particular mark was
sent to the listeners or if it should be processed in the next pass. The
problem is that the booleans were not reset on each traversal of the loop.
So marks could get skipped even when they were not sent to the notifiers.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
The fanotify code is supposed to get the group from the mark. It accidentally
only used the inode_mark. If the vfsmount_mark was set but not the inode_mark
it would deref the NULL inode_mark. Get the group from the correct place.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
This allows code outside of the mm core to safely manipulate page state
and not worry about the other accounting. Not using these routines means
that some code will lose track of the accounting and we get bugs. This
has happened once already.
Signed-off-by: Michael Rubin <mrubin@google.com>
Signed-off-by: Sage Weil <sage@newdream.net>
When making a request in the virtual snapdir or a snapped portion of the
namespace, we should choose the MDS based on the first nonsnap parent (and
its caps). If that is not the best place, we will get forward hints to
find the right MDS in the cluster. This fixes ESTALE errors when using
the .snap directory and namespace with multiple MDSs.
Signed-off-by: Sage Weil <sage@newdream.net>
When a realm is updated, we need to queue writeback on inodes in that
realm _and_ its children. Otherwise, if the inode gets cowed on the
server, we can get a hang later due to out-of-sync cap/snap state.
Signed-off-by: Sage Weil <sage@newdream.net>
When we snapshot dirty metadata that needs to be written back to the MDS,
include dirty xattr metadata. Make the capsnap reference the encoded
xattr blob so that it will be written back in the FLUSHSNAP op.
Also fix the capsnap creation guard to include dirty auth or file bits,
not just tests specific to dirty file data or file writes in progress
(this fixes auth metadata writeback).
Signed-off-by: Sage Weil <sage@newdream.net>
We should include the xattr metadata blob in the cap update message any
time we are flushing dirty state, NOT just when we are also dropping the
cap. This fixes async xattr writeback.
Also, clean up the code slightly to avoid duplicating the bit test.
Signed-off-by: Sage Weil <sage@newdream.net>
The use of a completion when waiting for session shutdown during umount is
inappropriate, given the complexity of the condition. For multiple MDS's,
this resulted in the umount thread spinning, often preventing the session
close message from being processed in some cases.
Switch to a waitqueue and defined a condition helper. This cleans things
up nicely.
Signed-off-by: Sage Weil <sage@newdream.net>
Make ntlmv2 as an authentication mechanism within ntlmssp
instead of ntlmv1.
Parse type 2 response in ntlmssp negotiation to pluck
AV pairs and use them to calculate ntlmv2 response token.
Also, assign domain name from the sever response in type 2
packet of ntlmssp and use that (netbios) domain name in
calculation of response.
Enable cifs/smb signing using rc4 and md5.
Changed name of the structure mac_key to session_key to reflect
the type of key it holds.
Use kernel crypto_shash_* APIs instead of the equivalent cifs functions.
Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steve French <sfrench@us.ibm.com>
This patch corrects a problem of compilation errors at removal of
UNIUPR_NOLOWER definition and adds include guards to cifs_unicode.h.
Signed-off-by: Igor Druzhinin <jaxbrigs@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix an Oops in the NFSv4 atomic open code
NFS: Fix the selection of security flavours in Kconfig
NFS: fix the return value of nfs_file_fsync()
rpcrdma: Fix SQ size calculation when memreg is FRMR
xprtrdma: Do not truncate iova_start values in frmr registrations.
nfs: Remove redundant NULL check upon kfree()
nfs: Add "lookupcache" to displayed mount options
NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
cifs_find_smb_ses assumes that the vol->password field is a valid
pointer, but that's only the case if a password was passed in via
the options string. It's possible that one won't be if there is
no mount helper on the box.
Reported-by: diabel <gacek-2004@wp.pl>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
nilfs_discard_segment() doesn't wait for completion of discard
requests. This specifies BLKDEV_IFL_WAIT flag when calling
blkdev_issue_discard() in order to fix the sync failure.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
fs: brlock vfsmount_lock
Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.
A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.
The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).
The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: scale files_lock
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).
One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.
However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.
A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.
Testing results:
On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.
Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.
Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
throughput
2.6.34-rc2 24.5
+patch 24.9
us sys idle IO wait (in %)
2.6.34-rc2 51.25 28.25 17.25 3.25
+patch 53.75 18.5 19 8.75
So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.
Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.
Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
tty: fix fu_list abuse
tty code abuses fu_list, which causes a bug in remount,ro handling.
If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".
Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.
The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.
[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: cleanup files_lock locking
Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.
Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: remove extra lookup in __lookup_hash
Optimize lookup for create operations, where no dentry should often be
common-case. In cases where it is not, such as unlink, the added overhead
is much smaller than the removed.
Also, move comments about __d_lookup racyness to the __d_lookup call site.
d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
- We are interested in how the RCU lookup works here, particularly with
renames. Make that explicit, and point to the document where it is explained
in more detail.
- RCU is pretty standard now, and macros make implementations pretty mindless.
If we want to know about RCU barrier details, we look in RCU code.
- Delete some boring legacy comments because we don't care much about how the
code used to work, more about the interesting parts of how it works now. So
comments about lazy LRU may be interesting, but would better be done in the
LRU or refcount management code.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: fs_struct rwlock to spinlock
struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.
Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: dentry allocation consolidation
There are 2 duplicate copies of code in dentry allocation in path lookup.
Consolidate them into a single function.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs: fix do_lookup false negative
In do_lookup, if we initially find no dentry, we take the directory i_mutex and
re-check the lookup. If we find a dentry there, then we revalidate it if
needed. However if that revalidate asks for the dentry to be invalidated, we
return -ENOENT from do_lookup. What should happen instead is an attempt to
allocate and lookup a new dentry.
This is probably not noticed because it is rare. It is only reached if a
concurrent create races in first (in which case, the dentry probably won't be
invalidated anyway), or if the racy __d_lookup has failed due to a
false-negative (which is very rare).
Fix this by removing code and have it use the normal reval path.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Limit the maximum number of mb_cache entries depending on the number of
hash buckets: if the only limit to the number of cache entries is the
available memory the hash chains can grow very long, taking a long time
to search.
At least partially solves https://bugzilla.lustre.org/show_bug.cgi?id=22771.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
we want the assignment to err done inside the if () to be
visible after it, so (re)declaring err inside if () body
is wrong.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.
Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers. Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.
In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
generic_acl_set didn't update the ctime of the file when its permission was
changed.
Steps to reproduce:
# touch aaa
# stat -c %Z aaa
1275289822
# setfacl -m 'u::x,g::x,o::x' aaa
# stat -c %Z aaa
1275289822 <- unchanged
But, according to the spec of the ctime, vfs must update it.
Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>.
CC: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit 77b8a75f5b introduced a warning at fs/inode.c:692 unlock_new_inode(),
caused by unlock_new_inode() being called on existing inodes as well.
This patch changes setup_inode() to only call unlock_new_inode() for I_NEW
inodes.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
reiserfs_evict_inode calls end_writeback two times hitting
kernel BUG at fs/inode.c:298 becase inode->i_state is I_CLEAR already.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix false warning saying one of two super blocks is broken
nilfs2: fix list corruption after ifile creation failure
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:
arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().
do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.
Further kernel_execve() and sys_execve() need to be changed to match.
This has been test built on x86_64, frv, arm and mips.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap reports:
ERROR: "svc_gss_principal" [fs/nfs/nfs.ko] undefined!
because in fs/nfs/Kconfig, NFS_V4 selects RPCSEC_GSS_KRB5
and/or in fs/nfsd/Kconfig, NFSD_V4 selects RPCSEC_GSS_KRB5.
RPCSEC_GSS_KRB5 does 5 selects, but none of these is enforced/followed
by the fs/nfs[d]/Kconfig configs:
select SUNRPC_GSS
select CRYPTO
select CRYPTO_MD5
select CRYPTO_DES
select CRYPTO_CBC
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
cifs has a lot of complicated functions that have to clean up things on
error, but some of them don't have all of the cleanup code
well-consolidated. Clean up and consolidate error handling in several
functions.
This is in preparation of later patches that will need to put references
to the tcon link container.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Get rid of some nesting and add a label we can goto on error.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
After applying commit b2ac86e1, the following message got appeared
after unclean shutdown:
> NILFS warning: broken superblock. using spare superblock.
This turns out to be a false message due to the change which updates
two super blocks alternately. The secondary super block now can be
selected if it's newer than the primary one.
This kills the false warning by suppressing it if another super block
is not actually broken.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
This commit makes the stack guard page somewhat less visible to user
space. It does this by:
- not showing the guard page in /proc/<pid>/maps
It looks like lvm-tools will actually read /proc/self/maps to figure
out where all its mappings are, and effectively do a specialized
"mlockall()" in user space. By not showing the guard page as part of
the mapping (by just adding PAGE_SIZE to the start for grows-up
pages), lvm-tools ends up not being aware of it.
- by also teaching the _real_ mlock() functionality not to try to lock
the guard page.
That would just expand the mapping down to create a new guard page,
so there really is no point in trying to lock it in place.
It would perhaps be nice to show the guard page specially in
/proc/<pid>/maps (or at least mark grow-down segments some way), but
let's not open ourselves up to more breakage by user space from programs
that depends on the exact deails of the 'maps' file.
Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
source code to see what was going on with the whole new warning.
Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be
Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix parameter name in kernel-doc notation (causes a warning).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark arguments to certain system calls as being const where they should be but
aren't. The list includes:
(*) The filename arguments of various stat syscalls, execve(), various utimes
syscalls and some mount syscalls.
(*) The filename arguments of some syscall helpers relating to the above.
(*) The buffer argument of various write syscalls.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The last user is gone, so we can safely remove this
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
logfs does not need the BKL, so use ->unlocked_ioctl instead
of ->ioctl in file operations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Joern Engel <joern@logfs.org>
[ fixed trivial conflict ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
O2net: Disallow o2net accept connection request from itself.
ocfs2/dlm: remove potential deadlock -V3
ocfs2/dlm: avoid incorrect bit set in refmap on recovery master
Fix the nested PR lock calling issue in ACL
ocfs2: Count more refcount records in file system fragmentation.
ocfs2 fix o2dlm dlm run purgelist (rev 3)
ocfs2/dlm: fix a dead lock
ocfs2: do not overwrite error codes in ocfs2_init_acl
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[NFS] Set CONFIG_KEYS when CONFIG_NFS_USE_KERNEL_DNS is set
AFS: Implement an autocell mount capability [ver #2]
DNS: If the DNS server returns an error, allow that to be cached [ver #2]
NFS: Use kernel DNS resolver [ver #2]
cifs: update README to include details about 'fsc' option
9c867fbe "partitions: fix sometimes unreadable partition strings" coverted
one line within the ibm partition code incorrectly. Fix this to get rid of
a build error.
fs/partitions/ibm.c: In function 'ibm_partition':
[...]
fs/partitions/ibm.c:185: error: too many arguments to function 'strlcat'
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This reverts commit 3bcf3860a4 (and the
accompanying commit c1e5c95402 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).
The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.
Fix up various conflicts due to later fsnotify work.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previous patch relied on DNS_RESOLVER setting CONFIG_KEYS
but needs to be selected in NFS config when using the new
DNS resolver
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
* 'params' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (22 commits)
param: don't deref arg in __same_type() checks
param: update drivers/acpi/debug.c to new scheme
param: use module_param in drivers/message/fusion/mptbase.c
ide: use module_param_named rather than module_param_call
param: update drivers/char/ipmi/ipmi_watchdog.c to new scheme
param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes.
param: lock myri10ge_fw_name against sysfs changes.
param: simple locking for sysfs-writable charp parameters
param: remove unnecessary writable charp
param: add kerneldoc to moduleparam.h
param: locking for kernel parameters
param: make param sections const.
param: use free hook for charp (fix leak of charp parameters)
param: add a free hook to kernel_param_ops.
param: silence .init.text references from param ops
Add param ops struct for hvc_iucv driver.
nfs: update for module_param_named API change
AppArmor: update for module_param_named API change
param: use ops in struct kernel_param, rather than get and set fns directly
param: move the EXPORT_SYMBOL to after the definitions.
...
Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 83ba7b071f ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished. Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false. This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().
This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.
NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued. But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Unify the logic for kupdate and non-kupdate cases. There won't be
starvation because the inodes requeued into b_more_io will later be
spliced _after_ the remaining inodes in b_io, hence won't stand in the way
of other inodes in the next run.
It avoids unnecessary redirty_tail() calls, hence the update of
i_dirtied_when. The timestamp update is undesirable because it could
later delay the inode's periodic writeback, or may exclude the inode from
the data integrity sync operation (which checks timestamp to avoid extra
work and livelock).
===
How the redirty_tail() comes about:
It was a long story.. This redirty_tail() was introduced with
wbc.more_io. The initial patch for more_io actually does not have the
redirty_tail(), and when it's merged, several 100% iowait bug reports
arised:
reiserfs:
http://lkml.org/lkml/2007/10/23/93
jfs:
commit 29a424f283
JFS: clear PAGECACHE_TAG_DIRTY for no-write pages
ext2:
http://www.spinics.net/linux/lists/linux-ext4/msg04762.html
They are all old bugs hidden in various filesystems that become "visible"
with the more_io patch. At the time, the ext2 bug is thought to be
"trivial", so not fixed. Instead the following updated more_io patch with
redirty_tail() is merged:
http://www.spinics.net/linux/lists/linux-ext4/msg04507.html
This will in general prevent 100% on ext2 and possibly other unknown FS bugs.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This was not a bug, since b_io is empty for kupdate writeback. The next
patch will do requeue_io() for non-kupdate writeback, so let's fix it.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Martin Bligh <mbligh@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid delaying writeback for an expire inode with lots of dirty pages, but
no active dirtier at the moment. Previously we only do that for the
kupdate case.
Any filesystem that does delayed allocation or unwritten extent conversion
after IO completion will cause this - for example, XFS.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
that the latter can be avoided when under global dirty background
threshold (which is the normal state for most systems).
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the ability for the root directory of a mounted AFS filesystem to
accept lookups of arbitrary directory names, to interpet the names as the names
of cells, to look the cell names up in the DNS for AFSDB records and to mount
the root.cell volume of the nominated cell on the pseudo-directory created by
lookup.
This facility is requested by passing:
-o autocell
to the mountpoint for which this is desired, usually the /afs mount.
To use this facility, a DNS upcall program is required for AFSDB records. This
can be obtained from:
http://people.redhat.com/~dhowells/afs/dns.afsdb.c
It should be compiled with -lresolv and -lkeyutils and installed as, say:
/usr/sbin/dns.afsdb
Then the following line needs to be added to /sbin/request-key.conf:
create dns_resolver afsdb:* * /usr/sbin/dns.afsdb %k
This can be tested by mounting AFS, say:
insmod dns_resolver.ko
insmod af-rxrpc.ko
insmod kafs.ko rootcell=grand.central.org
mount -t afs "#grand.central.org:root.cell." /afs -o autocell
and doing:
ls /afs/grand.central.org/
which should show:
archive/ cvs/ doc/ local/ project/ service/ software/ user/ www/
if it works.
Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
If the DNS server returns an error, allow that to be cached in the DNS resolver
key in lieu of a value. Userspace passes the desired error number as an option
in the payload:
"#dnserror=<number>"
Userspace must map h_errno from the name resolution routines to an appropriate
Linux error before passing it up. Something like the following mapping is
recommended:
[HOST_NOT_FOUND] = ENODATA,
[TRY_AGAIN] = EAGAIN,
[NO_RECOVERY] = ECONNREFUSED,
[NO_DATA] = ENODATA,
in lieu of Linux errors specifically for representing name service errors. The
filesystem must map these errors appropropriately before passing them to
userspace. AFS is made to map ENODATA and EAGAIN to EDESTADDRREQ for the
return to userspace; ECONNREFUSED is allowed to stand as is.
The error can be seen in /proc/keys as a negative number after the description
of the key. Compare, for example, the following key entries:
2f97238c I--Q-- 1 53s 3f010000 0 0 dns_resol afsdb:grand.centrall.org: -61
338bfbbe I--Q-- 1 59m 3f010000 0 0 dns_resol afsdb:grand.central.org: 37
If the error option is supplied in the payload, the main part of the payload is
discarded. The key should have an expiry time set by userspace.
Signed-off-by: Wang Lei <wang840925@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Use the kernel DNS resolver to translate hostnames to IP addresses. Create a
new config option to choose between the legacy DNS resolver and the new
resolver.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
By the commit af7fa16 2010-08-03 NFS: Fix up the fsync code
close(2) became returning the non-zero value even if it went well.
nfs_file_fsync() should return 0 when "status" is positive.
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
isofs: Fix lseek() to position beyond 4 GB
vfs: remove unused MNT_STRICTATIME
vfs: show unreachable paths in getcwd and proc
vfs: only add " (deleted)" where necessary
vfs: add prepend_path() helper
vfs: __d_path: dont prepend the name of the root dentry
ia64: perfmon: add d_dname method
vfs: add helpers to get root and pwd
cachefiles: use path_get instead of lone dget
fs/sysv/super.c: add support for non-PDP11 v7 filesystems
V7: Adjust sanity checks for some volumes
Add v7 alias
v9fs: fixup for inode_setattr being removed
Manual merge to take Al's version of the fs/sysv/super.c file: it merged
cleanly, but Al had removed an unnecessary header include, so his side
was better.
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: fix checkpatch.pl warnings
Squashfs: fix filename typo
Squashfs: update Kconfig and documentation for LZO
Squashfs: fix block size use in LZO decompressor
Squashfs: Add LZO compression support
squashfs: fix filename in header comment
Squashfs: Make XATTR config name consistent with other file systems
squashfs: fix compiler inline warning
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Fix groups code when num_devices is not divisible by group_width
exofs: Remove useless optimization
exofs: exofs_file_fsync and exofs_file_flush correctness
exofs: Remove superfluous dependency on buffer_head and writeback
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
ceph: generalize mon requests, add pool op support
ceph: only queue async writeback on cap revocation if there is dirty data
ceph: do not ignore osd_idle_ttl mount option
ceph: constify dentry_operations
ceph: whitespace cleanup
ceph: add flock/fcntl lock support
ceph: define on-wire types, constants for file locking support
ceph: add CEPH_FEATURE_FLOCK to the supported feature bits
ceph: support v2 reconnect encoding
ceph: support v2 client_caps encoding
ceph: move AES iv definition to shared header
ceph: fix decoding of pool snap info
ceph: make ->sync_fs not wait if wait==0
ceph: warn on missing snap realm
ceph: print useful error message when crush rule not found
ceph: use %pU to print uuid (fsid)
ceph: sync header defs with server code
ceph: clean up header guards
ceph: strip misleading/obsolete version, feature info
ceph: specify supported features in super.h
...
This adds byte order autodetection (of PDP-11 and LE filesystems). No
attempt is made to detect big-endian filesystems -- were there any?
Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Newly mkfs-ed filesystems from Seventh Edition have last modification time
set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
So that the module gets autoloaded when a v7 filesystem is mounted.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can clean up the work queue on this error path. This function is
called from afs_init().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the entire fs/proc directory is conditionally included based on
CONFIG_PROC_FS, it's redundant to check that same variable within that
directory.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If signalfd is used to consume a signal generated by a POSIX interval
timer or POSIX message queue, the ssi_int field does not reflect the data
(sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3). (The
ssi_ptr field, however, is filled in.)
This behavior differs from signalfd's treatment of sigqueue-generated
signals -- see the default case in signalfd_copyinfo. It also gives
results that differ from the case when a signal is handled conventionally
via a sigaction-registered handler.
So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER,
__SI_MESGQ) where ssi_ptr is set.
akpm: a non-back-compatible change. Merge into -stable to minimise the
number of kernels which are in the field and which miss this feature.
Signed-off-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The cgroup device whitelist code gets confused when trying to grant
permission to a disk partition that is not currently open. Part of
blkdev_open() includes __blkdev_get() on the whole disk.
Basically, the only ways to reliably allow a cgroup access to a partition
on a block device when using the whitelist are to 1) also give it access
to the whole block device or 2) make sure the partition is already open in
a different context.
The patch avoids the cgroup check for the whole disk case when opening a
partition.
Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Tested-by: Serge E. Hallyn <serue@us.ibm.com>
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Daniel P. Berrange" <berrange@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After 97e7449a7a: "autofs4: fix indirect mount pending expire race" we no
longer assumed that "ino" can be null. The other null checks got removed
but this was one was missed.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use kmalloc() to allocate fdmem if possible.
vmalloc() is used as a fallback solution for fdmem allocation. A new
helper function __free_fdtable() is introduced to reduce the lines of
code.
A potential bug, vfree() a memory allocated by kmalloc(), is fixed.
[akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()]
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The O_* bit numbers are defined in 20+ arch/*, and can silently overlap.
Add a compile time check to ensure the uniqueness as suggested by David
Miller.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Improve the description of fget_light(), which is currently incorrect
about needing a prior refcnt (judging by the way it is actually used).
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is more kernel-ish, saves some space, and also allows us to
expand the ops without breaking all the callers who are happy for the
new members to be NULL.
The few places which defined their own param types are changed to the
new scheme (more which crept in recently fixed in following patches).
Since we're touching them anyway, we change get() and set() to take a
const struct kernel_param (which they really are). This causes some
harmless warnings until we fix them (in following patches).
To reduce churn, module_param_call creates the ops struct so the callers
don't have to change (and casts the functions to reduce warnings).
The modern version which takes an ops struct is called module_param_cb.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ville Syrjala <syrjala@sci.fi>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Alessandro Rubini <rubini@ipvvis.unipv.it>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: linux-kernel@vger.kernel.org
Cc: linux-input@vger.kernel.org
Cc: linux-fbdev-devel@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
isofs supports files larger than 4 GB by using multi-extent files.
However an lseek() to a position beyond 4 GB in such a file will
fail with EINVAL, because s_maxbytes in the isofs superblock is
initialized to 2^32-1, and generic_file_llseek() checks against
that value.
I therefore suggest increasing the value of s_maxbytes to have
full support for large files in isofs. With multi-extent files, file
size is only limited by the maximum size of the file system (8 TB),
so this seems a reasonable value for s_maxbytes.
Signed-off-by: Jan Andres <jandres@gmx.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Commit d0adde574b added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Prepend "(unreachable)" to path strings if the path is not reachable
from the current root.
Two places updated are
- the return string from getcwd()
- and symlinks under /proc/$PID.
Other uses of d_path() are left unchanged (we know that some old
software crashes if /proc/mounts is changed).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
__d_path() has 4 callers:
d_path()
sys_getcwd()
seq_path_root()
tomoyo_realpath_from_path2()
Of these the only one which needs the " (deleted)" ending is d_path().
sys_getcwd() checks for existence before calling __d_path().
seq_path_root() is used to show the mountpoint path in
/proc/PID/mountinfo, which is always a positive.
And tomoyo doesn't want the deleted ending.
Create a helper "path_with_deleted()" as subsequent patches will need
this in multiple places.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Split off prepend_path() from __d_path(). This new helper takes an
end-of-buffer pointer and buffer-length pointer just like the other
prepend_* functions. Move the " (deleted)" postfix out to __d_path().
This patch doesn't change any functionality but paves the way for the
following patches.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In the old times pseudo-filesystems set the name of theroot dentry to
some prefix like "pipe:" and the name of the child dentry to "[123]"
and relied on a hack in __d_path() to replace the preceding slash with
the root's name to get "pipe:[123]".
Then the d_dname() dentry operation was introduced which solved the
same problem without having to pre-fill the name in each dentry.
Currently the following pseudo filesystems exist in the kernel:
perfmon
mtd
anon_inode
bdev
pipe
socket
Of these only perfmon, anon_inode, pipe and socket create
sub-dentries, all of which have now been switched to using d_dname().
bdev and mtd only create inodes.
This means that now the hack to overwrite the slash can be removed, so
for unreachable paths (e.g. within a detached mount) the path string
won't be polluted with garbage. For these cases a subsequent patch
will add a prefix, indicating that the path is unreachable.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.
get_fs_root()
get_fs_pwd()
get_fs_root_and_pwd()
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dentry references should not be acquired without a corresponding
vfsmount ref.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This adds byte order autodetection (of PDP-11 and LE filesystems). No
attempt is made to detect big-endian filesystems -- were there any?
Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy.
[akpm@linux-foundation.org: coding-style fixes]
[AV: parser.h inclusion was a rudiment of discarded stuff]
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Newly mkfs-ed filesystems from Seventh Edition have last modification
time set to zero, but are otherwise perfectly valid.
Also, tighten up other sanity checks to filter out most filesystems with
different bytesex than we're using.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
So that the module gets autoloaded when a v7 filesystem is mounted.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>