linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 04:46:46 +07:00

Author	SHA1	Message	Date
David Chinner	714082bc12	[XFS] Report errors from xfs_reserve_blocks(). xfs_reserve_blocks() can fail in interesting ways. In neither case is it a fatal error, but the result can lead to sub-optimal behaviour. Warn to the syslog if the call fails but otherwise continue. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30784a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:53:51 +10:00
David Chinner	36fbe6e6bd	[XFS] xfs_icsb_counter_disabled() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30782a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:52:43 +10:00
David Chinner	a414047fc9	[XFS] Remove useless whitespace in function prototypes Makes it simpler to annotate function prototypes with __must_check via sed scripts. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30781a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:58 +10:00
David Chinner	3c85c36cc2	[XFS] xfs_quiesce_fs() never returns an error. Mark it void. SGI-PV: 980084 SGI-Modid: xfs-linux-melb:xfs-kern:30780a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:46 +10:00
Christoph Hellwig	b6ddc4e6fe	[XFS] Don't validate symlink target component length This target component validation is not POSIX conformant and it is not done by any other Linux filesystem so remove it from XFS. SGI-PV: 980080 SGI-Modid: xfs-linux-melb:xfs-kern:30776a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:36 +10:00
Harvey Harrison	34a622b2e1	[XFS] replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30775a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:26 +10:00
Harvey Harrison	0225da1f35	[XFS] Replace __inline with inline Remove the remaining uses of __inline in the XFS code base. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30774a Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:15 +10:00
David Chinner	6b1d1a732f	[XFS] Fix lock inversion in forced shutdown. Recent changes to xlog_state_release_iclog() placed the grant_lock inside the icloglock. forced unmount of the log does this the opposite way around, but does not depend on the order for correct working. Fix the inversion by changing the order locks are gained in xfs_log_force_umount(). SGI-PV: 979661 SGI-Modid: xfs-linux-melb:xfs-kern:30773a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:51:04 +10:00
David Chinner	4679b2d36d	[XFS] Reorganise xlog_t for better cacheline isolation of contention To reduce contention on the log in large CPU count, separate out different parts of the xlog_t structure onto different cachelines. Move each lock onto a different cacheline along with all the members that are accessed/modified while that lock is held. Also, move the debugging code into debug code. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30772a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:53 +10:00
David Chinner	eb01c9cd87	[XFS] Remove the xlog_ticket allocator The ticket allocator is just a simple slab implementation internal to the log. It requires the icloglock to be held when manipulating it and this contributes to contention on that lock. Just kill the entire allocator and use a memory zone instead. While there, allow us to gracefully fail allocation with ENOMEM. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30771a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:39 +10:00
David Chinner	114d23aae5	[XFS] Per iclog callback chain lock Rather than use the icloglock for protecting the iclog completion callback chain, use a new per-iclog lock so that walking the callback chain doesn't require holding a global lock. This reduces contention on the icloglock during transaction commit and log I/O completion by reducing the number of times we need to hold the global icloglock during these operations. SGI-PV: 978729 SGI-Modid: xfs-linux-melb:xfs-kern:30770a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:50:22 +10:00
Lachlan McIlroy	2abdb8c881	[XFS] Prevent xfs_bmap_check_leaf_extents() referencing unmapped memory. While investigating the extent corruption bug I ran into this bug in debug only code. xfs_bmap_check_leaf_extents() loops through the leaf blocks of the extent btree checking that every extent is entirely before the next extent. It also compares the last extent in the previous block to the first extent in the current block when the previous block has been released and potentially unmapped. So take a copy of the last extent instead of a pointer. Also move the last extent check out of the loop because we only need to do it once. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30718a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-04-18 11:49:51 +10:00
Christoph Hellwig	433550990e	[XFS] remove most calls to VN_RELE Most VN_RELE calls either directly contain a XFS_ITOV or have the corresponding xfs_inode already in scope. Use the IRELE helper instead of VN_RELE to clarify the code. With a little more work we can kill VN_RELE altogether and define IRELE in terms of iput directly. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30710a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:49:08 +10:00
Lachlan McIlroy	df26cfe849	[XFS] split xfs_ioc_xattr The three subcases of xfs_ioc_xattr don't share any semantics and almost no code, so split it into three separate helpers. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30709a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:44:03 +10:00
Christoph Hellwig	f3dcc13f6f	[XFS] cleanup root inode handling in xfs_fs_fill_super - rename rootvp to root for clarify - remove useless vn_to_inode call - check is_bad_inode before calling d_alloc_root - use iput instead of VN_RELE in the error case SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30708a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:36 +10:00
David Chinner	59a33f9f77	[XFS] Ensure a btree insert returns a valid cursor. When writing into preallocated regions there is a case where XFS can oops or hang doing the unwritten extent conversion on I/O completion. It turns out that the problem is related to the btree cursor being invalid. When we do an insert into the tree, we may need to split blocks in the tree. When we only split at the leaf level (i.e. level 0), everything works just fine. However, if we have a multi-level split in the btreee, the cursor passed to the insert function is no longer valid once the insert is complete. The leaf level split is handled correctly because all the operations at level 0 are done using the original cursor, hence it is updated correctly. However, when we need to update the next level up the tree, we don't use that cursor - we use a cloned cursor that points to the index in the next level up where we need to do the insert. Hence if we need to split a second level, the changes to the tree are reflected in the cloned cursor and not the original cursor. This clone-and-move-up-a-level-on-split behaviour recurses all the way to the top of the tree. The complexity here is that these cloned cursors do not point to the original index that was inserted - they point to the newly allocated block (the right block) and the original cursor pointer to that level may still point to the left block. Hence, without deep examination of the cloned cursor and buffers, we cannot update the original cursor with the new path from the cloned cursor. In these cases the original cursor could be pointing to the wrong block(s) and hence a subsequent modification to the tree using that cursor will lead to corruption of the tree. The crash case occurs when the tree changes height - we insert a new level in the tree, and the cursor does not have a buffer in it's path for that level. Hence any attempt to walk back up the cursor to the root block will result in a null pointer dereference. To make matters even more complex, the BMAP BT is rooted in an inode, so we can have a change of height in the btree without a root split. That is, if the root block in the inode is full when we split a leaf node, we cannot fit the pointer to the new block in the root, so we allocate a new block, migrate all the ptrs out of the inode into the new block and point the inode root block at the newly allocated block. This changes the height of the tree without a root split having occurred and hence invalidates the path in the original cursor. The patch below prevents xfs_bmbt_insert() from returning with an invalid cursor by detecting the cases that invalidate the original cursor and refresh it by do a lookup into the btree for the original index we were inserting at. Note that the INOBT, AGFBNO and AGFCNT btree implementations also have this bug, but the cursor is currently always destroyed or revalidated after an insert for those trees. Hence this patch only address the problem in the BMBT code. SGI-PV: 979339 SGI-Modid: xfs-linux-melb:xfs-kern:30701a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:21 +10:00
David Chinner	75de2a91c9	[XFS] Account for inode cluster alignment in all allocations At ENOSPC, we can get a filesystem shutdown due to a cancelling a dirty transaction in xfs_mkdir or xfs_create. This is due to the initial allocation attempt not taking into account inode alignment and hence we can prepare the AGF freelist for allocation when it's not actually possible to do an allocation. This results in inode allocation returning ENOSPC with a dirty transaction, and hence we shut down the filesystem. Because the first allocation is an exact allocation attempt, we must tell the allocator that the alignment does not affect the allocation attempt. i.e. we will accept any extent alignment as long as the extent starts at the block we want. Unfortunately, this means that if the longest free extent is less than the length + alignment necessary for fallback allocation attempts but is long enough to attempt a non-aligned allocation, we will modify the free list. If we then have the exact allocation fail, all other allocation attempts will also fail due to the alignment constraint being taken into account. Hence the initial attempt needs to set the "alignment slop" field so that alignment, while not required, must be taken into account when determining if there is enough space left in the AG to do the allocation. That means if the exact allocation fails, we will not dirty the freelist if there is not enough space available fo a subsequent allocation to succeed. Hence we get an ENOSPC error back to userspace without shutting down the filesystem. SGI-PV: 978886 SGI-Modid: xfs-linux-melb:xfs-kern:30699a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:42:09 +10:00
Josef 'Jeff' Sipek	535f6b3735	[XFS] Replace custom AIL linked-list code with struct list_head Replace the xfs_ail_entry_t with a struct list_head and clean the surrounding code up. Also fixes a livelock in xfs_trans_first_push_ail() by terminating the loop at the head of the list correctly. SGI-PV: 978682 SGI-Modid: xfs-linux-melb:xfs-kern:30636a Signed-off-by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:57 +10:00
Christoph Hellwig	a45c796867	[XFS] Remove superflous xfs_readsb call in xfs_mountfs. When xfs_mountfs is called by xfs_mount xfs_readsb was called 35 lines above unconditionally, so there is no need to try to read the superblock if it's not present. If any other port doesn't have the superblock read at this point it should just call it directly from it's xfs_mount equivalent. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30603a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:46 +10:00
Niv Sardi	dfa18b1179	[XFS] kill t_sema member of struct xfs_trans It's completely unused so we might aswell kill it. Note that there is another t_sema in struct xlog_ticket, which is used and actually an sv_t despite the name. That one is left untouched by this patch. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30591a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:35 +10:00
Christoph Hellwig	5f90150aba	[XFS] cleanup vnode use in xfs_bmap.c SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30553a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:25 +10:00
Christoph Hellwig	af048193fc	[XFS] cleanup vnode use in xfs_iops.c SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30552a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:14 +10:00
Christoph Hellwig	dcf49cc5cf	[XFS] cleanup vnode use in xfs_lrw.c SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30551a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:41:04 +10:00
Christoph Hellwig	ef1f5e7ad3	[XFS] cleanup vnode use in xfs_lookup SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30550a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:55 +10:00
Christoph Hellwig	3937be5ba8	[XFS] cleanup vnode use in xfs_symlink and xfs_rename SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30548a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:45 +10:00
Christoph Hellwig	a3da789640	[XFS] cleanup vnode use in xfs_link SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30547a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:35 +10:00
Christoph Hellwig	979ebab116	[XFS] cleanup vnode use in xfs_create/mknod/mkdir SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30546a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:25 +10:00
Christoph Hellwig	bc4ac74a4e	[XFS] cleanup vnode use in dmapi calls SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30545a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:15 +10:00
David Chinner	d234154125	[XFS] Use power-of-2 sized buffers to reduce overhead Now that the ktrace_enter() code is using atomics, the non-power-of-2 buffer sizes - which require modulus operations to get the index - are showing up as using substantial CPU in the profiles. Force the buffer sizes to be rounded up to the nearest power of two and use masking rather than modulus operations to convert the index counter to the buffer index. This reduces ktrace_enter overhead to 8% of a CPU time, and again almost halves the trace intensive test runtime. SGI-PV: 977546 SGI-Modid: xfs-linux-melb:xfs-kern:30538a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:40:04 +10:00
David Chinner	6ee4752ffe	[XFS] Use atomic counters for ktrace buffer indexes ktrace_enter() is consuming vast amounts of CPU time due to the use of a single global lock for protecting buffer index increments. Change it to use per-buffer atomic counters - this reduces ktrace_enter() overhead during a trace intensive test on a 4p machine from 58% of all CPU time to 12% and halves test runtime. SGI-PV: 977546 SGI-Modid: xfs-linux-melb:xfs-kern:30537a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:55 +10:00
David Chinner	44d814ced4	[XFS] Update c/mtime correctly on truncates XFS changes the c/mtime of an inode when truncating it to the same size. The c/mtime is only supposed to change if the size is changed. Not to be confused with ftruncate, where the c/mtime is supposed to be changed even if the size is not changed. The Linux VFS encodes this semantic difference in the flags it sends down to ->setattr, which XFS currently ignores. We need to make XFS pay attention to the VFS flags and hence Do The Right Thing. SGI-PV: 977547 SGI-Modid: xfs-linux-melb:xfs-kern:30536a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:45 +10:00
Christoph Hellwig	24bd861d1c	[XFS] don't encode parent in nfs filehandles unless nessecary As Dave pointed out after the export ops changes we now always encode the parent into the filehandle for regular files, but it's not actually needed when the filesystem is export with no_subtree_check. This one-liner fixes xfs_fs_encode_fh to skip encoding the parent unless nessecary. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30535a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:35 +10:00
Christoph Hellwig	126468b115	[XFS] kill xfs_rwlock/xfs_rwunlock We can just use xfs_ilock/xfs_iunlock instead and get rid of the ugly bhv_vrwlock_t. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30533a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:25 +10:00
Christoph Hellwig	43973964a3	[XFS] kill xfs_get_dir_entry Instead of of xfs_get_dir_entry use a macro to get the xfs_inode from the dentry in the callers and grab the reference manually. Only grab the reference once as it's fine to keep it over the dmapi calls. (And even that reference is actually superflous in Linux but I'll leave that for another patch) SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30531a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:14 +10:00
Christoph Hellwig	a8b3acd57e	[XFS] vnode cleanup in xfs_fs_subr.c Cleanup the unneeded intermediate vnode step in the flushing helpers and go directly from the xfs_inode to the struct address_space. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30530a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:39:03 +10:00
Christoph Hellwig	db0bb7baa1	[XFS] cleanup xfs_vn_mknod - use proper goto based unwinding instead of the current mess of multiple conditionals - rename ip to inode because that's the normal convention for Linux inodes while ip is the convention for xfs_inodes - remove unlikely checks for the default_acl - branches marked unlikely might lead to extreme branch bredictor slowdons if taken and for some workloads a default acl is quite common - properly indent the switch statements - remove xfs_has_fs_struct as nfsd has a fs_struct in any semi-recent kernel SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30529a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:38:53 +10:00
David Chinner	155cc6b784	[XFS] Use atomics for iclog reference counting Now that we update the log tail LSN less frequently on transaction completion, we pass the contention straight to the global log state lock (l_iclog_lock) during transaction completion. We currently have to take this lock to decrement the iclog reference count. there is a reference count on each iclog, so we need to take �he global lock for all refcount changes. When large numbers of processes are all doing small trnasctions, the iclog reference counts will be quite high, and the state change that absolutely requires the l_iclog_lock is the except rather than the norm. Change the reference counting on the iclogs to use atomic_inc/dec so that we can use atomic_dec_and_lock during transaction completion and avoid the need for grabbing the l_iclog_lock for every reference count decrement except the one that matters - the last. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30505a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:38:10 +10:00
David Chinner	b589334c7a	[XFS] Prevent AIL lock contention during transaction completion When hundreds of processors attempt to commit transactions at the same time, they can contend on the AIL lock when updating the tail LSN held in the in-core log structure. At the moment, the tail LSN is only needed when actually writing out an iclog, so it really does not need to be updated on every single transaction completion - only those that result in switching iclogs and flushing them to disk. The result is that we reduce the number of times we need to grab the AIL lock and the log grant lock by up to two orders of magnitude on large processor count machines. The problem has previously been hidden by AIL lock contention walking the AIL list which was recently solved and uncovered this issue. SGI-PV: 975671 SGI-Modid: xfs-linux-melb:xfs-kern:30504a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:38:01 +10:00
David Chinner	3354040897	[XFS] Use xfs_inode_clean() in more places Remove open coded checks for the whether the inode is clean and replace them with an inlined function. SGI-PV: 977461 SGI-Modid: xfs-linux-melb:xfs-kern:30503a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:51 +10:00
David Chinner	bad5584332	[XFS] Remove the xfs_icluster structure Remove the xfs_icluster structure and replace with a radix tree lookup. We don't need to keep a list of inodes in each cluster around anymore as we can look them up quickly when we need to. The only time we need to do this now is during inode writeback. Factor the inode cluster writeback code out of xfs_iflush and convert it to use radix_tree_gang_lookup() instead of walking a list of inodes built when we first read in the inodes. This remove 3 pointers from each xfs_inode structure and the xfs_icluster structure per inode cluster. Hence we reduce the cache footprint of the xfs_inodes by between 5-10% depending on cluster sparseness. To be truly efficient we need a radix_tree_gang_lookup_range() call to stop searching once we are past the end of the cluster instead of trying to find a full cluster's worth of inodes. Before (ia64): $ cat /sys/slab/xfs_inode/object_size 536 After: $ cat /sys/slab/xfs_inode/object_size 512 SGI-PV: 977460 SGI-Modid: xfs-linux-melb:xfs-kern:30502a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:41 +10:00
David Chinner	a3f74ffb6d	[XFS] Don't block pdflush when writing back inodes When pdflush is writing back inodes, it can get stuck on inode cluster buffers that are currently under I/O. This occurs when we write data to multiple inodes in the same inode cluster at the same time. Effectively, delayed allocation marks the inode dirty during the data writeback. Hence if the inode cluster was flushed during the writeback of the first inode, the writeback of the second inode will block waiting for the inode cluster write to complete before writing it again for the newly dirtied inode. Basically, we want to avoid this from happening so we don't block pdflush and slow down all of writeback. Hence we introduce a non-blocking async inode flush flag that pdflush uses. If this flag is set, we use non-blocking operations (e.g. try locks) whereever we can to avoid blocking or extra I/O being issued. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30501a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:32 +10:00
David Chinner	4ae29b4321	[XFS] Factor xfs_itobp() and xfs_inotobp(). The only difference between the functions is one passes an inode for the lookup, the other passes an inode number. However, they don't do the same validity checking or set all the same state on the buffer that is returned yet they should. Factor the functions into a common implementation. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30500a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:19 +10:00
Lachlan McIlroy	e9a56b7cda	[XFS] Fix regression due to refcache removal SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30490a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com>	2008-04-18 11:37:06 +10:00
Donald Douwsma	163d3686bb	[XFS] Remove the xfs_refcache Remove the xfs_refcache, it was only needed while we were still building for 2.4 kernels. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30472a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:36:55 +10:00
Lachlan McIlroy	461aa8a225	[XFS] make inode reclaim synchronise with xfs_iflush_done() On a forced shutdown, xfs_finish_reclaim() will skip flushing the inode. If the inode flush lock is not already held and there is an outstanding xfs_iflush_done() then we might free the inode prematurely. By acquiring and releasing the flush lock we will synchronise with xfs_iflush_done(). SGI-PV: 909874 SGI-Modid: xfs-linux-melb:xfs-kern:30468a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com>	2008-04-18 11:34:54 +10:00
Niv Sardi	e12070a5dc	[XFS] actually check error returned by xfs_flush_pages, clean up and bailout if fails. SGI-PV: 973041 SGI-Modid: xfs-linux-melb:xfs-kern:30462a Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:34:47 +10:00
Paul Bolle	424b00e2c0	AFS: Do not describe debug parameters with their value Describe debug parameters with their names (and not their values). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-16 07:43:48 -07:00
Jan Kara	335e92e8a5	vfs: fix possible deadlock in ext2, ext3, ext4 when using xattrs mb_cache_entry_alloc() was allocating cache entries with GFP_KERNEL. But filesystems are calling this function while holding xattr_sem so possible recursion into the fs violates locking ordering of xattr_sem and transaction start / i_mutex for ext2-4. Change mb_cache_entry_alloc() so that filesystems can specify desired gfp mask and use GFP_NOFS from all of them. Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Dave Jones <davej@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-15 19:35:41 -07:00
Alexey Korolev	abe2f41430	JFFS2 Fix of panics caused by wrong condition for hole frag creation in write_begin This fixes a regression introduced in commit `205c109a7a` when switching to write_begin/write_end operations in JFFS2. The page offset is miscalculated, leading to corruption of the fragment lists and subsequently to memory corruption and panics. [ Side note: the bug is a fairly direct result of the naming. Nick was likely misled by the use of "offs", since we tend to use the notion of "offset" not as an absolute position, but as an offset _within_ a page or allocation. Alternatively, a "pgoff_t" is a page index, but not a byte offset - our VM naming can be a bit confusing. So in this case, a VM person would likely have called this a "pos", not an "offs", or perhaps talked about byte offsets rather than page offsets (since it's counted in bytes, not pages). - Linus ] Signed-off-by: Alexey Korolev <akorolev@infradead.org> Signed-off-by: Vasiliy Leonenko <vasiliy.leonenko@mail.ru> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-14 15:43:14 -07:00
J. Bruce Fields	19e729a928	locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs Miklos Szeredi found the bug: "Basically what happens is that on the server nlm_fopen() calls nfsd_open() which returns -EACCES, to which nlm_fopen() returns NLM_LCK_DENIED. "On the client this will turn into a -EAGAIN (nlm_stat_to_errno()), which in will cause fcntl_setlk() to retry forever." So, for example, opening a file on an nfs filesystem, changing permissions to forbid further access, then trying to lock the file, could result in an infinite loop. And Trond Myklebust identified the culprit, from Marc Eshel and I: `7723ec9777` "locks: factor out generic/filesystem switch from setlock code" That commit claimed to just be reshuffling code, but actually introduced a behavioral change by calling the lock method repeatedly as long as it returned -EAGAIN. We assumed this would be safe, since we assumed a lock of type SETLKW would only return with either success or an error other than -EAGAIN. However, nfs does can in fact return -EAGAIN in this situation, and independently of whether that behavior is correct or not, we don't actually need this change, and it seems far safer not to depend on such assumptions about the filesystem's ->lock method. Therefore, revert the problematic part of the original commit. This leaves vfs_lock_file() and its other callers unchanged, while returning fcntl_setlk and fcntl_setlk64 to their former behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Tested-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-14 12:22:14 -07:00
Linus Torvalds	14897e35fd	Merge branch 'docs' of git://git.lwn.net/linux-2.6 * 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation	2008-04-11 13:24:16 -07:00
J. Bruce Fields	6ded55da6b	Documentation: move nfsroot.txt to filesystems/ Documentation/ is a little large, and filesystems/ seems an obvious place for this file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-04-11 13:18:01 -06:00
Davide Libenzi	0859ab59a8	signalfd: fix for incorrect SI_QUEUE user data reporting Michael Kerrisk found out that signalfd was not reporting back user data pushed using sigqueue: http://groups.google.com/group/linux.kernel/msg/9397cab8551e3123 The following patch makes signalfd report back the ssi_ptr and ssi_int members of the signalfd_siginfo structure. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Acked-by: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-11 08:06:44 -07:00
Davide Libenzi	8d1c98b0b5	eventfd/kaio integration fix Jeff Roberson discovered a race when using kaio eventfd based notifications. When it occurs it can lead tomissed wakeups and hung userspace. This patch fixes the race by moving the notification inside the spinlocked section of kaio. The operation is safe since eventfd spinlock and kaio one are unrelated. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jeff Roberson <jroberson@chesapeake.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-11 08:06:43 -07:00
Roland McGrath	598af051a7	asmlinkage_protect sys_io_getevents Use asmlinkage_protect in sys_io_getevents, because GCC for i386 with CONFIG_FRAME_POINTER=n can decide to clobber an argument word on the stack, i.e. the user struct pt_regs. Here the problem is not a tail call, but just the compiler's use of the stack when it inlines and optimizes the body of the called function. This seems to avoid it. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-10 17:28:26 -07:00
Roland McGrath	54a0151041	asmlinkage_protect replaces prevent_tail_call The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-10 17:28:26 -07:00
Linus Torvalds	5d69a029ab	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] Ensure "both" features2 slots are consistent [XFS] Fix superblock features2 field alignment problem [XFS] remove shouting-indirection macros from xfs_sb.h	2008-04-10 13:39:29 -07:00
Linus Torvalds	999646e3f9	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: cfq-iosched: do not leak ioc_data across iosched switches splice: fix infinite loop in generic_file_splice_read()	2008-04-10 13:39:07 -07:00
Roman Zippel	76b0c26af2	HFS+: fix unlink of links Some time ago while attempting to handle invalid link counts, I botched the unlink of links itself, so this patch fixes this now correctly, so that only the link count of nodes that don't point to links is ignored. Thanks to Vlado Plaga <rechner@vlado-do.de> to notify me of this problem. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-10 13:37:51 -07:00
Josef Bacik	16c5f06f15	[GFS2] fix GFP_KERNEL misuses There are several places where GFP_KERNEL allocations happen under a glock, which will result in hangs if we're under memory pressure and go to re-enter the fs in order to flush stuff out. This patch changes the culprits to GFS_NOFS to keep this problem from happening. Thank you, Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-04-10 09:55:26 +01:00
Eric Sandeen	e6957ea484	[XFS] Ensure "both" features2 slots are consistent Since older kernels may look in the sb_bad_features2 slot for flags, rather than zeroing it out on fixup, we should make it equal to the sb_features2 value. Also, if the ATTR2 flag was not found prior to features2 fixup, it was not set in the mount flags, so re-check after the fixup so that the current session will use the feature. Also fix up the comments to reflect these changes. SGI-PV: 980085 SGI-Modid: xfs-linux-melb:xfs-kern:30778a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-10 16:25:26 +10:00
David Chinner	ee1c090825	[XFS] Fix superblock features2 field alignment problem Due to the xfs_dsb_t structure not being 64 bit aligned, the last field of the on-disk superblock can vary in location This causes problems when the filesystem gets moved to a different platform, or there is a 32 bit userspace and 64 bit kernel. This patch detects the defect at mount time, logs a warning such as: XFS: correcting sb_features alignment problem in dmesg and corrects the problem so that everything is OK. it also blacklists the bad field in the superblock so it does not get used for something else later on. SGI-PV: 977636 SGI-Modid: xfs-linux-melb:xfs-kern:30539a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-10 16:25:15 +10:00
Eric Sandeen	6211870992	[XFS] remove shouting-indirection macros from xfs_sb.h Remove macro-to-small-function indirection from xfs_sb.h, and remove some which are completely unused. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30528a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-10 16:24:45 +10:00
Jens Axboe	8191ecd1d1	splice: fix infinite loop in generic_file_splice_read() There's a quirky loop in generic_file_splice_read() that could go on indefinitely, if the file splice returns 0 permanently (and not just as a temporary condition). Get rid of the loop and pass back -EAGAIN correctly from __generic_file_splice_read(), so we handle that condition properly as well. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-04-10 08:24:25 +02:00
Bryan Wu	240ee83118	fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private() NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992 Signed-off-by: Bryan Wu <cooloney@kernel.org> Cc: Mike Frysinger <vapier.adi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-04-08 21:06:56 -04:00
Jeff Layton	66d3aac041	NFS: initialize flags field in nfs_open_context The nfs_open_context struct had a "flags" field added recently, but the allocator isn't initializing it. It also looks like the allocator isn't initializing the mode or list either, but they seem to be overwritten by the caller, so that's less of an issue. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-04-08 21:06:53 -04:00
Linus Torvalds	1be62dc190	Be more careful about marking buffers dirty Mikulas Patocka noted that the optimization where we check if a buffer was already dirty (and we avoid re-dirtying it) was not really SMP-safe. Since the read of the old status was not synchronized with anything, an aggressive CPU re-ordering of memory accesses might have moved that read up to before the data was even written to the buffer, and another CPU that cleaned it again, causing the newly dirty state to never actually hit the disk. Admittedly this would probably never trigger in practice, but it's still wrong. Mikulas sent a patch that fixed the problem, but I dislike the subtlety of the whole optimization, so this is an alternate fix that is more explicit about the particular SMP ordering for the optimization, and separates out the speculative reads of the buffer state into its own conditional (and makes the memory barrier only happen if we are likely to actually hit the optimized case in the first place). I considered removing the optimization entirely, but Andrew argued for it's continued existence. I'm a push-over. Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-04 14:38:17 -07:00
Sven Schnelle	ad16df848d	afs: remove smp_prcessor_id() from debug macro Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-03 15:40:53 -07:00
Hugh Dickins	4cd1350465	splice: use mapping_gfp_mask The loop block driver is careful to mask __GFP_IO\|__GFP_FS out of its mapping_gfp_mask, to avoid hangs under memory pressure. But nowadays it uses splice, usually going through __generic_file_splice_read. That must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-03 15:39:49 -07:00
David S. Miller	3bb5da3837	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2008-04-03 14:33:42 -07:00
Robert P. J. Day	865965a66e	efs: update error msg to not refer to deleted read_inode() Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-02 15:28:19 -07:00
Sven Schnelle	a5f37c3252	afs: add missing up_write() on return If afs_cell_alloc() fails, afs_cells_sem doesn't get unlocked, which leads to a deadlock. Unlock it before returning. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-02 07:40:54 -07:00
Denis V. Lunev	856f6ff7a3	[NETNS]: Remove ifdef CONFIG_NET braces in fs/proc/proc_net.c. They are redundant as this file is linked in iff CONFIG_NET is turned on. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-02 00:10:04 -07:00
Julia Lawall	773adff8e9	[GFS2] test for IS_ERR rather than 0 The function gfs2_inode_lookup always returns either a valid pointer or a value made with ERR_PTR, so its result should be tested with IS_ERR, not with a test for 0. The problem was found using the following semantic match. (http://www.emn.fr/x-info/coccinelle/) //<smpl> @a@ expression E, E1; statement S,S1; position p; @@ E = gfs2_inode_lookup(...) ... when != E = E1 if@p (E) S else S1 @n@ position a.p; expression E,E1; statement S,S1; @@ E = NULL ... when != E = E1 if@p (E) S else S1 @depends on !n@ expression E; statement S,S1; position a.p; @@ * if@p (E) S else S1 //</smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:46 +01:00
Benjamin Marzinski	58e9fee13e	[GFS2] Invalidate cache at correct point GFS2 wasn't invalidating its cache before it called into the lock manager with a request that could potentially drop a lock. This was leaving a window where the lock could be actually be held by another node, but the file's page cache would still appear valid, causing coherency problems. This patch moves the cache invalidation to before the lock manager call when dropping a lock. It also adds the option to the lock_dlm lock manager to not use conversion mode deadlock avoidance, which, on a conversion from shared to exclusive, could internally drop the lock, and then reacquire in. GFS2 now asks lock_dlm to not do this. Instead, GFS2 manually drops the lock and reacquires it. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:44 +01:00
akpm@linux-foundation.org	f5a8cd0201	[GFS2] fs/gfs2/recovery.c: suppress warnings fs/gfs2/recovery.c: In function 'get_log_header': fs/gfs2/recovery.c:152: warning: 'lh.lh_sequence' may be used uninitialized in this function fs/gfs2/recovery.c:152: warning: 'lh.lh_flags' may be used uninitialized in this function fs/gfs2/recovery.c:152: warning: 'lh.lh_tail' may be used uninitialized in this function fs/gfs2/recovery.c:152: warning: 'lh.lh_blkno' may be used uninitialized in this function fs/gfs2/recovery.c:152: warning: 'lh.lh_hash' may be used uninitialized in this function Cc: David Teigland <teigland@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:41 +01:00
Bob Peterson	1f466a47e8	[GFS2] Faster gfs2_bitfit algorithm This version of the gfs2_bitfit algorithm includes the latest suggestions from Steve Whitehouse. It is typically eight to ten times faster than the version we're using today. If there is a lot of metadata mixed in (lots of small files) the algorithm is often 15 times faster, and given the right conditions, I've seen peaks of 20 times faster. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:39 +01:00
Steven Whitehouse	d82661d969	[GFS2] Streamline quota lock/check for no-quota case This patch streamlines the quota checking in the "no quota" case by making the check inline in the calling function, thus reducing the number of function calls. Eventually we might be able to remove the checks from the gfs2_quota_lock() and gfs2_quota_check() functions, but currently we can't as there are a very few places in the code which need to call these functions directly still. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>	2008-03-31 10:41:36 +01:00
Steven Whitehouse	860b25d4a9	[GFS2] Remove drop of module ref where not needed In an earlier patch "[GFS2] fix file_system_type leak on gfs2meta mount" we removed the code to grab a ref to the module which was not needed (since we know that the module cannot be unloaded at that time) so this patch removes the code to drop that reference. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:33 +01:00
Abhijith Das	20b95bf2c4	[GFS2] gfs2_adjust_quota has broken unstuffing code This patch combines the 2 patches in bug 434736 to correct the lock ordering in the unstuffing of the quota inode in gfs2_adjust_quota and adjusting the number of revokes in gfs2_write_jdata_pagevec Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:30 +01:00
Cyrill Gorcunov	182fe5abd8	[GFS2] possible null pointer dereference fixup gfs2_alloc_get may fail so we have to check it to prevent NULL pointer dereference. Signed-off-by: Cyrill Gorcunov <gorcunov@gamil.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:28 +01:00
Steven Whitehouse	105284970b	[GFS2] Need to ensure that sector_t is 64bits for GFS2 We need to ensure that sector_t is 64bits for GFS2, so that we need to depend on LBD as well as LSF. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:25 +01:00
Denis Cheng	43a33c53cc	[GFS2] re-support special inode a previous commit removed call to init_special_inode from inode lookuping, this cause problems as: # mknod /mnt/gfs2/dev/null c 1 3 # cat /mnt/gfs2/dev/null cat: /mnt/gfs2/dev/null: Invalid argument without special inode, GFS2 cannot support char device file, block device file, fifo pipe, and socket file, lose many important features as a common file system. this one line patch re add special inode support. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:22 +01:00
Denis Cheng	d83225d45d	[GFS2] remove gfs2_dev_iops struct inode_operations gfs2_dev_iops is always the same as gfs2_file_iops, since Jan 2006, when GFS2 merged into mainstream kernel. So one of them could be removed. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:20 +01:00
Christoph Hellwig	7dc2cf1c8f	[GFS2] fix file_system_type leak on gfs2meta mount get_gfs2_sb does a get_fs_type without doing a put_filesystem and thus leaking a file_system_type reference everytime it's called. Just use gfs2_fs_type directly instead of doing the lookup and thus fix the problem. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:17 +01:00
Steven Whitehouse	9b8c81d1de	[GFS2] Allow bmap to allocate extents We've supported mapping of extents when no block allocation is required for some time. This patch extends that to mapping of extents when an allocation has been requested. In that case we try to allocate as many blocks as are requested, but we might return fewer in case there is something preventing us from returning the complete amount (e.g. an already allocated block is in the way). Currently the only code path which can actually request multiple data blocks in a single bmap call is the page_mkwrite path and even then it only happens if there are multiple blocks per page. What this patch does do however, is merge the allocation requests for metadata (growing the metadata tree in either height or depth) with the allocation of the data blocks in the case that both are needed. This results in lower overheads even in the single block allocation case. The one thing which we can't handle here at the moment is unstuffing. I would like to be able to do that, but the problem which arises is that in order to unstuff one has to get a locked page from the page cache which results in locking problems in the (usual) case that the caller is holding the page lock on the page it wishes to map. So that case will have to be addressed in future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:14 +01:00
Steven Whitehouse	7afd88d916	[GFS2] Fix a page lock / glock deadlock We've previously been using a "try lock" in readpage on the basis that it would prevent deadlocks due to the inverted lock ordering (our normal lock ordering is glock first and then page lock). Unfortunately tests have shown that this isn't enough. If the glock has a demote request queued such that run_queue() in the glock code tries to do a demote when its called under readpage then it will try and write out all the dirty pages which requires locking them. This then deadlocks with the page locked by readpage. The solution is to always require two calls into readpage. The first unlocks the page, gets the glock and returns AOP_TRUNCATED_PAGE, the second does the actual readpage and unlocks the glock & page as required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:12 +01:00
Adrian Bunk	60b779cfc1	[GFS2] proper extern for gfs2/locking/dlm/mount.c:gdlm_ops This patch adds a proper extern declaration for gdlm_ops in fs/gfs2/locking/dlm/lock_dlm.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:09 +01:00
Adrian Bunk	8af4c72f7d	[GFS2] gfs2/ops_file.c should #include "ops_inode.h" Every file should include the headers containing the prototypes for its global functions (in this case for gfs2_set_inode_flags()). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:06 +01:00
Marcin Slusarz	bb16b342b2	[GFS2] be*_add_cpu conversion replace all: big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) + expression_in_cpu_byteorder); with: beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder); generated with semantic patch Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:41:03 +01:00
Steven Whitehouse	840ca0ec70	[GFS2] Fix bug where we called drop_bh incorrectly As a result of an earlier patch, drop_bh was being called in cases when it shouldn't have been. Since we never have a gh in the drop case and we always have a gh in the promote case, we can use that extra information to tell which case has been seen. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>	2008-03-31 10:41:01 +01:00
Steven Whitehouse	e23159d2a7	[GFS2] Get inode buffer only once per block map call In the case that we needed to grow the height of the metadata tree we were looking up the inode buffer and then brelse()ing it despite the fact that it is needed later in the block map process. This patch ensures that we look up the inode's buffer once and only once during the block map process. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:58 +01:00
Steven Whitehouse	77658aad22	[GFS2] Eliminate (almost) duplicate field from gfs2_inode The blocks counter is almost a duplicate of the i_blocks field in the VFS inode. The only difference is that i_blocks can be only 32bits long for 32bit arch without large single file support. Since GFS2 doesn't handle the non-large single file case (for 32 bit anyway) this adds a new config dependency on 64BIT \|\| LSF. This has always been the case, however we've never explicitly said so before. Even if we do add support for the non-LSF case, we will still not require this field to be duplicated since we will not be able to access oversized files anyway. So the net result of all this is that we shave 8 bytes from a gfs2_inode and get our config deps correct. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:55 +01:00
Steven Whitehouse	30cbf189cd	[GFS2] Add a function to interate over an extent This adds a function (currently the only use is during mapping of already allocated blocks, but watch this space) which iterates over a number of pointers in a block and returns the extent length. If the initial pointer is 0 (i.e. unallocated) it will return the number of unallocated blocks in the extent. If the initial pointer is allocated, then it returns the number of contiguously allocated blocks in the extent. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:53 +01:00
Steven Whitehouse	c85a665f06	[GFS2] The case of the missing asterisk A dereference was forgotten. This adds it back correctly. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:50 +01:00
Steven Whitehouse	b45e41d7d5	[GFS2] Add extent allocation to block allocator Rather than having to allocate a single block at a time, this patch allows the block allocator to allocate an extent. Since there is no difference (so far as the block allocator is concerned) between data blocks and indirect blocks, it is posible to allocate a single extent and for the caller to unrevoke just the blocks required for indirect blocks. Currently the only bit of GFS2 to make use of this feature is the build height function. The intention is that gfs2_block_map will be changed to make use of this feature in future patches. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:47 +01:00
Steven Whitehouse	1639431a3f	[GFS2] Merge gfs2_alloc_meta and gfs2_alloc_data Thanks to the preceeding patches, the only difference between these two functions is their name. We can thus merge them and call the new function gfs2_alloc_block to reflect the fact that it can allocate either kind of block. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:45 +01:00
Steven Whitehouse	5731be53e3	[GFS2] Update gfs2_trans_add_unrevoke to accept extents By adding an extra argument to gfs2_trans_add_unrevoke we can now specify an extent length of blocks to unrevoke. This means that we only need to make one pass through the list for each extent rather than each block. Currently the only extent length which is used is 1, but that will change in the future. Also gfs2_trans_add_unrevoke is removed from gfs2_alloc_meta since its the only difference between this and gfs2_alloc_data which is left. This will allow a future patch to merge these two functions into one (i.e. one call to allocate both data and metadata in a single extent in the future). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:42 +01:00
Steven Whitehouse	ac576cc5be	[GFS2] Merge the rd_last_alloc_meta and rd_last_alloc_data fields We don't need to keep track of when we last allocated data and metadata separately since the only thing thats important when searching for a free block is whether its free or not, which is independent from what type of block it is. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:39 +01:00
Steven Whitehouse	ce276b06e8	[GFS2] Reduce inode size by merging fields There were three fields being used to keep track of the location of the most recently allocated block for each inode. These have been merged into a single field in order to better keep the data and metadata for an inode close on disk, and also to reduce the space required for storage. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:37 +01:00
Bob Peterson	9feb7c889f	[GFS2] Remove unused counters This is kind of trivial in the greater scheme of things, but this removes three counters that AFAICT are never used. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:34 +01:00
Steven Whitehouse	9a0045088d	[GFS2] Shrink & rename di_depth This patch forms a pair with the previous patch which shrunk di_height. Like that patch di_depth is renamed i_depth and moved into struct gfs2_inode directly. Also the field goes from 16 bits to 8 bits since it is also limited to a max value which is rather small (17 in this case). In addition we also now validate the field against this maximum value when its read in. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:31 +01:00
Bob Peterson	cf45b752c9	[GFS2] Remove rgrp and glock version numbers This patch further reduces GFS2's memory requirements by eliminating the 64-bit version number fields in lieu of a couple bits. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:29 +01:00
Steven Whitehouse	da755fdb41	[GFS2] Remove lm.[ch] and distribute content The functions in lm.c were just wrappers which were mostly only used in one other file. By moving the functions to the files where they are being used, they can be marked static and also this will usually result in them being inlined since they are often only used from one point in the code. A couple of really trivial functions have been inlined by hand into the function which called them as it makes the code clearer to do that. We also gain from one fewer function call in the glock lock and unlock paths. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:26 +01:00
Bob Peterson	ab0d756681	[GFS2] Eliminate gl_req_bh This patch further reduces the memory needs of GFS2 by eliminating the gl_req_bh variable from struct gfs2_glock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:23 +01:00
Steven Whitehouse	110acf3837	[GFS2] Add consts to various bits of rgrp.c There are a couple of routines which scan bitmaps where we can mark the bitmaps const, plus a couple of call sites that can be updated too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:21 +01:00
Steven Whitehouse	dbac6710a6	[GFS2] Introduce array of buffers to struct metapath The reason for doing this is to allow all the block mapping code to share the same array. As a result we can remove two arguments from lookup_metapath since they are now returned via the array. We also add a function to drop all refs to buffer heads when we are done with the metapath. The build_height function shares the struct metapath, but currently still frees its own buffers, and this will change in a future patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:18 +01:00
Steven Whitehouse	11707ea05e	[GFS2] Move part of gfs2_block_map into a separate function This is required to enable future changes to the block mapping code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:15 +01:00
Bob Peterson	29d38cd163	[GFS2] Get rid of gl_waiters2 This patch reduces memory by replacing the int variable gl_waiters2 by a single bit in the gl_flags. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:13 +01:00
Bob Peterson	42d52e3818	[GFS2] Combine rg_flags and rd_flags This patch reduces the memory required by GFS2 by combining the rd_flags and rg_flags (in core only). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:10 +01:00
Bob Peterson	6bdd9be628	[GFS2] Allocate gfs2_rgrpd from slab memory This patch moves the gfs2_rgrpd structure to its own slab memory. This makes it easier to control and monitor, and yields less memory fragmentation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:07 +01:00
Bob Peterson	3ad62e87cd	[GFS2] Plug an unlikely leak Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:05 +01:00
Adrian Bunk	048786f1e6	[GFS2] make gfs2_glock_hold() static gfs2_glock_hold() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:02 +01:00
Bob Peterson	ef8c441cb7	[GFS2] Only wake the reclaim daemon if we need to This patch only wakes up the glock reclaim daemon if there is actually something to be reclaimed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:40:00 +01:00
Bob Peterson	7eabb77e65	[GFS2] Misc fixups This patch contains two small fixups that didn't fit elsewhere. They are: (1) get rid of temp variable in find_metapath. (2) Remove vestigial "ret" variable from gfs2_writepage_common. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:57 +01:00
Bob Peterson	d0109bfa84	[GFS2] Only do lo_incore_commit once This patch is performance related. When we're doing a log flush, I noticed we were calling buf_lo_incore_commit twice: once for data bufs and once for metadata bufs. Since this is the same function and does the same thing in both cases, there should be no reason to call it twice. Since we only need to call it once, we can also make it faster by removing it from the generic "lops" code and making it a stand-along static function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:54 +01:00
Bob Peterson	ca390601a8	[GFS2] Fix debug inode printing I noticed that the latest change to i_height got rid of the value from the inode dump. This patch adds it back. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:52 +01:00
Bob Peterson	fe6c991c52	[GFS2] Get rid of unneeded parameter in gfs2_rlist_alloc This patch removed the unnecessary parameter from function gfs2_rlist_alloc. The parameter was always passed in as 0. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:49 +01:00
Steven Whitehouse	ecc30c7915	[GFS2] Streamline indirect pointer tree height calculation This patch improves the calculation of the tree height in order to reduce the number of operations which are carried out on each call to gfs2_block_map. In the common case, we now make a single comparison, rather than calculating the required tree height from scratch each time. Also in the case that the tree does need some extra height, we start from the current height rather from zero when we work out what the new height ought to be. In addition the di_height field is moved into the inode proper and reduced in size to a u8 since the value must be between 0 and GFS2_MAX_META_HEIGHT (10). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:46 +01:00
Steven Whitehouse	941e6d7d09	[GFS2] Speed up gfs2_write_alloc_required, deprecate gfs2_extent_map This patch removes the call to gfs2_extent_map from gfs2_write_alloc_required, instead we call gfs2_block_map directly. This results in fewer overall calls to gfs2_block_map in the multi-block case. Also, gfs2_extent_map is marked as deprecated so that people know that its going away as soon as all the callers have been converted. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2008-03-31 10:39:44 +01:00
Al Viro	2b210adcb0	cifs: fix misannotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:20:23 -07:00
Al Viro	9dce07f1a4	NULL noise: fs/, mm/, kernel/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:18:41 -07:00
Al Viro	1076d17ac7	jbd/jbd2 NULL noise Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-30 14:18:41 -07:00
Linus Torvalds	af8be4e4b3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock [PATCH] do shrink_submounts() for all fs types [PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() [PATCH] count ghost references to vfsmounts [PATCH] reduce stack footprint in namespace.c	2008-03-28 15:23:01 -07:00
Sven Schnelle	5214b729e1	afs: prevent double cell registration kafs doesn't check if the cell already exists - so if you do an echo "add newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell again. kobject will also complain about a double registration. To prevent such problems, return -EEXIST in that case. Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-28 14:45:21 -07:00
Dmitri Monakhov	5b41e74ad1	vfs: fix data leak in nobh_write_end() Current nobh_write_end() implementation ignore partial writes(copied < len) case if page was fully mapped and simply mark page as Uptodate, which is totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in previous write_begin call. It simply contains garbage from pagecache and result in data leakage. #TEST_CASE_BEGIN: ~~~~~~~~~~~~~~~~ In fact issue triggered by classical testcase open("/mnt/test", O_RDWR\|O_CREAT\|O_TRUNC, 0666) = 3 ftruncate(3, 409600) = 0 writev(3, [{"a", 1}, {NULL, 4095}], 2) = 1 ##TESTCASE_SOURCE: ~~~~~~~~~~~~~~~~~ #include <stdio.h> #include <stdlib.h> #include <fcntl.h> #include <sys/uio.h> #include <sys/mman.h> #include <errno.h> int main(int argc, char *argv) { int fd, ret; void p; struct iovec iov[2]; fd = open(argv[1], O_RDWR\|O_CREAT\|O_TRUNC, 0666); ftruncate(fd, 409600); iov[0].iov_base="a"; iov[0].iov_len=1; iov[1].iov_base=NULL; iov[1].iov_len=4096; ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec)); printf("writev = %d, err = %d\n", ret, errno); return 0; } ##TESTCASE RESULT: ~~~~~~~~~~~~~~~~~~ [root@ts63 ~]# mount \| grep mnt2 /dev/mapper/test on /mnt2 type ext2 (rw,nobh) [root@ts63 ~]# /tmp/writev /mnt2/test writev = 1, err = 0 [root@ts63 ~]# hexdump -C /mnt2/test 00000000 61 65 62 6f 6f 74 00 00 f0 b9 b4 59 3a 00 00 00 \|aeboot.....Y:...\| 00000010 20 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 \| .......!.......\| 00000020 df df df df df df df df df df df df df df df df \|................\| 00000030 3a 00 00 00 2a 00 00 00 21 00 00 00 00 00 00 00 \|:......!.......\| 00000040 60 c0 8c 00 00 00 00 00 40 4a 8d 00 00 00 00 00 \|`.......@J......\| 00000050 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 \|........A.......\| 00000060 74 69 6d 65 20 64 64 20 69 66 3d 2f 64 65 76 2f \|time dd if=/dev/\| 00000070 6c 6f 6f 70 30 20 20 6f 66 3d 2f 64 65 76 2f 6e \|loop0 of=/dev/n\| skip.. 00000f50 00 00 00 00 00 00 00 00 31 00 00 00 00 00 00 00 \|........1.......\| 00000f60 6d 6b 66 73 2e 65 78 74 33 20 2f 64 65 76 2f 76 \|mkfs.ext3 /dev/v\| 00000f70 7a 76 67 2f 74 65 73 74 20 2d 62 34 30 39 36 00 \|zvg/test -b4096.\| 00000f80 a0 fe 8c 00 00 00 00 00 21 00 00 00 00 00 00 00 \|........!.......\| 00000f90 23 31 32 30 35 39 35 30 34 30 34 00 3a 00 00 00 \|#1205950404.:...\| 00000fa0 20 00 8d 00 00 00 00 00 21 00 00 00 00 00 00 00 \| .......!.......\| 00000fb0 d0 cf 8c 00 00 00 00 00 10 d0 8c 00 00 00 00 00 \|................\| 00000fc0 00 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 \|........A.......\| 00000fd0 6d 6f 75 6e 74 20 2f 64 65 76 2f 76 7a 76 67 2f \|mount /dev/vzvg/\| 00000fe0 74 65 73 74 20 20 2f 76 7a 20 2d 6f 20 64 61 74 \|test /vz -o dat\| 00000ff0 61 3d 77 72 69 74 65 62 61 63 6b 00 00 00 00 00 \|a=writeback.....\| 00001000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \|................\| As you can see file's page contains garbage from pagecache instead of zeros. #TEST_CASE_END Attached patch: - Add sanity check BUG_ON in order to prevent incorrect usage by caller, This is function invariant because page can has buffers and in no zero fadata pointer at the same time. - Always attach buffers to page is it is partial write case. - Always switch back to generic_write_end if page has buffers. This is reasonable because if page already has buffer then generic_write_begin was called previously. Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Reviewed-by: Nick Piggin <npiggin@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-28 14:45:21 -07:00
Al Viro	6758f953d0	[PATCH] mnt_expire is protected by namespace_sem, no need for vfsmount_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:48:04 -04:00
Al Viro	c35038beca	[PATCH] do shrink_submounts() for all fs types ... and take it out of ->umount_begin() instances. Call with all locks already taken (by do_umount()) and leave calling release_mounts() to caller (it will do release_mounts() anyway, so we can just put into the same list). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:58 -04:00
Al Viro	bcc5c7d2b6	[PATCH] sanitize locking in mark_mounts_for_expiry() and shrink_submounts() ... and fix a race on access of ->mnt_share et.al. without namespace_sem in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:52 -04:00
Al Viro	7c4b93d826	[PATCH] count ghost references to vfsmounts make propagate_mount_busy() exclude references from the vfsmounts that had been isolated by umount_tree() and are just waiting for release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:46 -04:00
Al Viro	1a39068954	[PATCH] reduce stack footprint in namespace.c A lot of places misuse struct nameidata when they need struct path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-27 20:47:40 -04:00
Denis V. Lunev	0e5f8be138	[NETNS]: Compile NET /proc support only if CONFIG_NET is set. This fix broken compilation for 'allnoconfig'. This was introduced by Introduced by commit `1218854afa` ("[NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS.") Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-27 14:25:53 -07:00
YOSHIFUJI Hideaki	1218854afa	[NET] NETNS: Omit seq_net_private->net without CONFIG_NET_NS. Without CONFIG_NET_NS, no namespace other than &init_net exists, no need to store net in seq_net_private. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-03-26 04:39:56 +09:00
Linus Torvalds	7ed7fe5e82	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] get stack footprint of pathname resolution back to relative sanity [PATCH] double iput() on failure exit in hugetlb [PATCH] double dput() on failure exit in tiny-shmem [PATCH] fix up new filp allocators [PATCH] check for null vfsmount in dentry_open() [PATCH] reiserfs: eliminate private use of struct file in xattr [PATCH] sanitize hppfs hppfs pass vfsmount to dentry_open() [PATCH] restore export of do_kern_mount()	2008-03-25 08:57:47 -07:00
Andrew Morton	815d2d50da	driver core: debug for bad dev_attr_show() return value. Try to find the culprit who caused http://bugzilla.kernel.org/show_bug.cgi?id=10150 Cc: <balajirrao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-03-24 22:33:49 -07:00
Linus Torvalds	0d995b2b44	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] Fix mem leak on dfs referral [CIFS] file create with acl support enabled is slow [CIFS] Fix mtime on cp -p when file data cached but written out too late [CIFS] Fix build problem [CIFS] cifs: replace remaining __FUNCTION__ occurrences [CIFS] DFS patch that connects inode with dfs handling ops	2008-03-22 17:05:31 -07:00
Hans Rosenfeld	f16278c679	Change pagemap output format to allow for future reporting of huge pages Change pagemap output format to allow for future reporting of huge pages. (Format comment and minor cleanups: mpm@selenic.com) Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-22 17:03:10 -07:00
Steve French	04b6e6ec1a	[CIFS] Fix mem leak on dfs referral Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-22 22:57:44 +00:00
Linus Torvalds	7d3628b230	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits) [NET] ifb: set separate lockdep classes for queue locks [IPV6] KCONFIG: Fix description about IPV6_TUNNEL. [TCP]: Fix shrinking windows with window scaling netpoll: zap_completion_queue: adjust skb->users counter bridge: use time_before() in br_fdb_cleanup() [TG3]: Fix build warning on sparc32. MAINTAINERS: bluez-devel is subscribers-only audit: netlink socket can be auto-bound to pid other than current->pid (v2) [NET]: Fix permissions of /proc/net [SCTP]: Fix a race between module load and protosw access [NETFILTER]: ipt_recent: sanity check hit count [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup() [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning [IPV4]: esp_output() misannotations [8021Q]: vlan_dev misannotations xfrm: ->eth_proto is __be16 [IPV4]: ipv4_is_lbcast() misannotations [SUNRPC]: net/* NULL noise [SCTP]: fix misannotated __sctp_rcv_asconf_lookup() [PKT_SCHED]: annotate cls_u32 ...	2008-03-21 07:57:45 -07:00
Andre Noll	4f42c288e6	[NET]: Fix permissions of /proc/net commit `e9720ac` ([NET]: Make /proc/net a symlink on /proc/self/net (v3)) broke ganglia and probably other applications that read /proc/net/dev. This is due to the change of permissions of /proc/net that was introduced in that commit. Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net This patch restores the permissions to the old value which makes ganglia happy again. Pavel Emelyanov says: This also broke the postfix, as it was reported in bug #10286 and described in detail by Benjamin. Signed-off-by: Andre Noll <maan@systemlinux.org> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-20 15:27:28 -07:00
Linus Torvalds	49ccf74aaf	Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: nfs: don't ignore return value from nfs_pageio_add_request	2008-03-20 11:59:34 -07:00
Andrew Morton	9df130392f	fs/ufs/balloc.c: fix sparc64 printk warning fs/ufs/balloc.c: In function `ufs_change_blocknr': fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2) fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3) sector_t is u64 and we don't know what type the architecture uses to implement u64. Cc: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:37 -07:00
Dave Young	08ca0db8aa	zisofs: fix readpage() outside i_size A read request outside i_size will be handled in do_generic_file_read(). So we just return 0 to avoid getting -EIO as normal reading, let do_generic_file_read do the rest. At the same time we need unlock the page to avoid system stuck. Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227 Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Report-by: Christian Perle <chris@linuxinfotag.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Randy Dunlap	a6b91919e0	fs: fix kernel-doc notation warnings Fix kernel-doc notation warnings in fs/. Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line: * mark_files_ro Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line: * lease_get_mtime Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line: * lookup_one_len: filesystem helper to lookup single pathname component Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line: * bh_uptodate_or_lock: Test whether the buffer is uptodate Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line: * bh_submit_read: Submit a locked buffer for reading Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line: * writeback_acquire: attempt to get exclusive writeback access to a device Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line: * writeback_in_progress: determine whether there is writeback in progress Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line: * writeback_release: relinquish exclusive writeback access against a device. Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line: * void journal_invalidatepage() Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Michael Halcrow	5366dc9fd1	eCryptfs: Swap dput() and mntput() ecryptfs_d_release() is doing a mntput before doing the dput. This patch moves the dput before the mntput. Thanks to Rajouri Jammu for reporting this. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Rajouri Jammu <rajouri.jammu@gmail.com> Cc: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Duane Griffin	d00256766a	jbd2: correctly unescape journal data blocks Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Duane Griffin	439aeec639	jbd: correctly unescape journal data blocks Fix a long-standing typo (predating git) that will cause data corruption if a journal data block needs unescaping. At the moment the wrong buffer head's data is being unescaped. To test this case mount a filesystem with data=journal, start creating and deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then pull the plug on the device. Without this patch the files will contain zeros instead of the correct data after recovery. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
David Howells	4ebf89845b	ROMFS: Fix up an error in iget removal Fix up an error in iget removal in which romfs_lookup() making a successful call to romfs_iget() continues through the negative/error handling (previously the successful case jumped around the negative/error handling case): (1) inode is initialised to NULL at the top of the function, eliminating the need for specific negative-inode handling. This means the positive success handling now flows straight through. (2) Rename the labels to be clearer about what they mean. Also make romfs_lookup()'s result variable of type long so as to avoid 32-bit/64-bit conversions with PTR_ERR() and friends. Based upon a report and patch from Adam Richter. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Adam J. Richter" <adam@yggdrasil.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Josef Bacik	c587f0c0a6	ext3: fix wrong gfp type under transaction There are several places where we make allocations with GFP_KERNEL while under a transaction, which could lead to an assertion panic or lockup if under memory pressure. This patch switches these problem areas to use GFP_NOFS to keep these problems from happening. Signed-off-by: Josef Bacik <jbacik@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:36 -07:00
Jan Kara	87cb055bc1	quota: add possibly missing iput() when quotaon and quotaoff races We should always put inode we have reference to, even if quota was reenabled in the mean time. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Randy Dunlap	0cf01f6685	jbd: fix jbd kernel-doc notation Fix kernel-doc notation in jbd. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Quentin Barnes	6cb2a21049	aio: bad AIO race in aio_complete() leads to process hang My group ran into a AIO process hang on a 2.6.24 kernel with the process sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come and it never would. We ran the tests on x86_64 SMP. The hang only occurred on a Xeon box ("Clovertown") but not a Core2Duo ("Conroe"). On the Xeon, the L2 cache isn't shared between all eight processors, but is L2 is shared between between all two processors on the Core2Duo we use. My analysis of the hang is if you go down to the second while-loop in read_events(), what happens on processor #1: 1) add_wait_queue_exclusive() adds thread to ctx->wait 2) aio_read_evt() to check tail 3) if aio_read_evt() returned 0, call [io_]schedule() and sleep In aio_complete() with processor #2: A) info->tail = tail; B) waitqueue_active(&ctx->wait) C) if waitqueue_active() returned non-0, call wake_up() The way the code is written, step 1 must be seen by all other processors before processor 1 checks for pending events in step 2 (that were recorded by step A) and step A by processor 2 must be seen by all other processors (checked in step 2) before step B is done. The race I believed I was seeing is that steps 1 and 2 were effectively swapped due to the __list_add() being delayed by the L2 cache not shared by some of the other processors. Imagine: proc 2: just before step A proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet proc 1, step 2: checks tail and sees no pending events proc 2, step A: updates tail proc 1, step 3: calls [io_]schedule() and sleeps proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup so proc 1 sleeps indefinitely My patch adds a memory barrier between steps A and B. It ensures that the update in step 1 gets seen on processor 2 before continuing. If processor 1 was just before step 1, the memory barrier makes sure that step A (update tail) gets seen by the time processor 1 makes it to step 2 (check tail). Before the patch our AIO process would hang virtually 100% of the time. After the patch, we have yet to see the process ever hang. Signed-off-by: Quentin Barnes <qbarnes+linux@yahoo-inc.com> Reviewed-by: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ We should probably disallow that "if (waitqueue_active()) wake_up()" coding pattern, because it's so often buggy wrt memory ordering ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-19 18:53:35 -07:00
Fred Isaman	f8512ad0da	nfs: don't ignore return value from nfs_pageio_add_request Ignoring the return value from nfs_pageio_add_request can cause deadlocks. In read path: call nfs_pageio_add_request from readpage_async_filler assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_pageio_add_request to return 0, WITHOUT adding the original request. BUT, since return code is ignored, readpage_async_filler assumes it has been added, and does nothing further, leaving page locked. do_generic_mapping_read will eventually call lock_page, resulting in deadlock In write path: page is marked dirty by generic_perform_write nfs_writepages is called call nfs_pageio_add_request from nfs_page_async_flush assume at this point that there are requests already in desc, that can't be merged with the current request. so nfs_pageio_doio is fired up to clear out desc. assume something goes wrong in setting up the io, so desc->pg_error is set. This causes nfs_page_async_flush to return 0, WITHOUT adding the original request, yet marking the request as locked (PG_BUSY) and in writeback, clearing dirty marks. The next time a write is done to the page, deadlock will result as nfs_write_end calls nfs_update_request Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-19 17:59:02 -04:00
Al Viro	a02f76c34d	[PATCH] get stack footprint of pathname resolution back to relative sanity Somebody had put struct nameidata in stack frame of link_path_walk(). Unfortunately, there are certain realities to deal with: * It's in the middle of recursion. Depth is equal to the nesting depth of symlinks, i.e. up to 8. * struct namiedata is, even if one discards the intent junk, at least 12 pointers + 5 ints. * moreover, adding a stack frame is not free in that situation. * there are fs methods called on top of that, and they also have stack footprint. * kernel stack is not infinite. The thing is, even if one chooses to deal with -ESTALE that way (and it's one hell of an overkill), the only thing that needs to be preserved is vfsmount + dentry, not the entire struct nameidata. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:55:46 -04:00
Al Viro	b4d232e65f	[PATCH] double iput() on failure exit in hugetlb once we'd done d_instantiate(), we should only do dput(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:55:01 -04:00
Dave Hansen	430e285e08	[PATCH] fix up new filp allocators Some new uses of get_empty_filp() have crept in; switched to alloc_file() to make sure that pieces of initialization won't be missing. We really need to kill get_empty_filp(). [AV] fixed dentry leak on failure exit in anon_inode_getfd() Cc: Erez Zadok <ezk@cs.sunysb.edu> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J Bruce Fields" <bfields@fieldses.org> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:54:05 -04:00
Christoph Hellwig	322ee5b36e	[PATCH] check for null vfsmount in dentry_open() Make sure no-one calls dentry_open with a NULL vfsmount argument and crap out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been problematic, but with the per-mount r/o tracking we can't accept anymore at all. [AV] the last place that passed NULL had been eliminated by the previous patch (reiserfs xattr stuff) Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:50:44 -04:00
Jeff Mahoney	3227e14c3c	[PATCH] reiserfs: eliminate private use of struct file in xattr After several posts and bug reports regarding interaction with the NULL nameidata, here's a patch to clean up the mess with struct file in the reiserfs xattr code. As observed in several of the posts, there's really no need for struct file to exist in the xattr code. It was really only passed around due to the f_op->readdir() and a_ops->{prepare,commit}_write prototypes requiring it. reiserfs_prepare_write() and reiserfs_commit_write() don't actually use the struct file passed to it, and the xattr code uses a private version of reiserfs_readdir() to enumerate the xattr directories. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:49:36 -04:00
Al Viro	f382d6e631	[PATCH] sanitize hppfs * hppfs_iget() and its users are racy; there's no need to pollute icache anyway, new_inode() works fine and is safe, unlike the current kludges (these relied on overwriting ->i_ino before another iget_locked() gets to that one - and did it after unlocking). * merge hppfs_iget()/init_inode()/hppfs_read_inode(), while we are at it. * to pass proper vfsmount to dentry_open() store the reference in hppfs superblock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> --	2008-03-19 06:42:18 -04:00
Dave Hansen	1dd0dd111f	hppfs pass vfsmount to dentry_open() Here's patch for hppfs that uses vfs_kern_mount to make sure it always has a procfs instance and passed the vfsmount on through the inode private data. Also fixes a procfs file_system_type leak for every attempted hppfs mount. [ jdike - gave this file a style workover, plus deleted hppfs_dentry_ops ] Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-19 06:39:56 -04:00
Linus Torvalds	f920bb6f5f	Merge branch 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b49' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] export sessionid alongside the loginuid in procfs	2008-03-18 08:43:59 -07:00
Eric Paris	1e0bd7550e	[PATCH] export sessionid alongside the loginuid in procfs Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-18 10:51:22 -04:00
Linus Torvalds	92f53c6f1e	Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Revert "unexport bio_{,un}map_user" relay: fix subbuf_splice_actor() adding too many pages The ps2esdi driver was marked as BROKEN more than two years ago due to being	2008-03-18 07:43:14 -07:00
David S. Miller	2f633928cb	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-17 23:44:31 -07:00
Al Viro	e6f1cebf71	[NET] endianness noise: INADDR_ANY Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-17 22:44:53 -07:00
Al Viro	8a4e98d9d7	[PATCH] restore export of do_kern_mount() vfs_kern_mount() requires having a reference to fs type, which makes it impossible for module to create procfs, etc. private mount. Open-coding is not an option, since e.g. put_filesystem() is _not_ exported, and for a good reason. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2008-03-17 22:58:04 -04:00
Jens Axboe	40044ce0bf	Revert "unexport bio_{,un}map_user" Outside users like asmlib uses the mapping functions. API wise, the export is definitely sane. It's a better idea to keep this export than to require external users to open-code this piece of code instead. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-03-17 21:14:40 +01:00
Al Viro	3d10a15d69	hfs_bnode_find() can fail, resulting in hfs_bnode_split() breakage oops and fs corruption; the latter can happen even on valid fs in case of oom. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-17 09:46:55 -07:00
J. Bruce Fields	b663c6fd98	nfsd: fix oops on access from high-numbered ports This bug was always here, but before my commit `6fa02839bf` ("recheck for secure ports in fh_verify"), it could only be triggered by failure of a kmalloc(). After that commit it could be triggered by a client making a request from a non-reserved port for access to an export marked "secure". (Exports are "secure" by default.) The result is a struct svc_export with a reference count one too low, resulting in likely oopses next time the export is accessed. The reference counting here is not straightforward; a later patch will clean up fh_verify(). Thanks to Lukas Hejtmanek for the bug report and followup. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Lukas Hejtmanek <xhejtman@ics.muni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-14 16:49:15 -07:00
Steve French	8b1327f6ed	[CIFS] file create with acl support enabled is slow Shirish Pargaonkar noted: With cifsacl mount option, when a file is created on the Windows server, exclusive oplock is broken right away because the get cifs acl code again opens the file to obtain security descriptor. The client does not have the newly created file handle or inode in any of its lists yet so it does not respond to oplock break and server waits for its duration and then responds to the second open. This slows down file creation signficantly. The fix is to pass the file descriptor to the get cifsacl code wherever available so that get cifs acl code does not send second open (NT Create ANDX) and oplock is not broken. CC: Shirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-14 22:37:16 +00:00
Steve French	ebe8912be2	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-14 19:29:18 +00:00
Steve French	50531444fa	[CIFS] Fix mtime on cp -p when file data cached but written out too late Kukks noticed that cp -p can write out file data too late, after the timestamp is already set. This was introduced as an unintentional sideeffect of the change in an earlier patch (see below) which fixed some delayed return code propagation. `cea218054a` Author: Jeff Layton <jlayton@redhat.com> Date: Tue Nov 20 23:19:03 2007 +0000 Acked-by: Shirish Pargaonkar <shirishp@us.ibm.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-14 19:21:31 +00:00
Marcelo Tosatti	fb39380b8d	pagemap: proper read error handling Fix pagemap_read() error handling by releasing acquired resources and checking for get_user_pages() partial failure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-03-13 13:11:43 -07:00
Linus Torvalds	609eb39c8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) [SCTP]: Fix local_addr deletions during list traversals. net: fix build with CONFIG_NET=n [TCP]: Prevent sending past receiver window with TSO (at last skb) rt2x00: Add new D-Link USB ID rt2x00: never disable multicast because it disables broadcast too libertas: fix the 'compare command with itself' properly drivers/net/Kconfig: fix whitespace for GELIC_WIRELESS entry [NETFILTER]: nf_queue: don't return error when unregistering a non-existant handler [NETFILTER]: nfnetlink_queue: fix EPERM when binding/unbinding and instance 0 exists [NETFILTER]: nfnetlink_log: fix EPERM when binding/unbinding and instance 0 exists [NETFILTER]: nf_conntrack: replace horrible hack with ksize() [NETFILTER]: nf_conntrack: add \n to "expectation table full" message [NETFILTER]: xt_time: fix failure to match on Sundays [NETFILTER]: nfnetlink_log: fix computation of netlink skb size [NETFILTER]: nfnetlink_queue: fix computation of allocated size for netlink skb. [NETFILTER]: nfnetlink: fix ifdef in nfnetlink_compat.h [NET]: include <linux/types.h> into linux/ethtool.h for __u* typedef [NET]: Make /proc/net a symlink on /proc/self/net (v3) RxRPC: fix rxrpc_recvmsg()'s returning of msg_name net/enc28j60: oops fix ...	2008-03-12 13:08:09 -07:00
Andrew Morton	b2211a361a	net: fix build with CONFIG_NET=n fs/built-in.o:(.rodata+0x1134): undefined reference to `proc_net_inode_operations' fs/built-in.o:(.rodata+0x1138): undefined reference to `proc_net_operations' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-11 18:03:35 -07:00
Steve French	bc5b6e24a1	[CIFS] Fix build problem Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-11 21:07:48 +00:00
Steve French	5b4d4771e2	Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-03-11 19:15:41 +00:00
Tao Ma	cdef59a94c	ocfs2: Fix NULL pointer dereferences in o2net In some situations, ocfs2_set_nn_state might get called with sc = NULL and valid = 0. If sc = NULL, we can't dereference it to get the o2nm_node member. Instead, do what o2net_initialize_handshake does and use NULL when calling o2net_reconnect_delay and o2net_idle_timeout. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:19 -07:00
Sunil Mushran	c824c3c723	ocfs2/dlm: dlm_thread should not sleep while holding the dlm_spinlock This patch addresses the bug in which the dlm_thread could go to sleep while holding the dlm_spinlock. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:17 -07:00
Sunil Mushran	535f7026fd	ocfs2/dlm: Print message showing the recovery master Knowing the dlm recovery master helps in debugging recovery issues. This patch prints a message on the recovery master node. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:14 -07:00
Sunil Mushran	b31cfc0237	ocfs2/dlm: Add missing dlm_lockres_put()s dlm_master_request_handler() forgot to put a lockres when dlm_assert_master_worker() failed or was skipped. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:12 -07:00
Sunil Mushran	52987e2ab4	ocfs2/dlm: Add missing dlm_lockres_put()s in migration path During migration, the recovery master node may be asked to master a lockres it may not know about. In that case, it would not only have to create a lockres and add it to the hash, but also remember to to do the _put_ corresponding to the kref_init in dlm_init_lockres(), as soon as the migration is completed. Yes, we don't wait for the dlm_purge_lockres() to do that matching put. Note the ref added for it being in the hash protects the lockres from being freed prematurely. This patch adds that missing put, as described above, to plug a memleak. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:11 -07:00
Sunil Mushran	2c5c54aca9	ocfs2/dlm: Add missing dlm_lock_put()s Normally locks for remote nodes are freed when that node sends an UNLOCK message to the master. The master node tags an DLM_UNLOCK_FREE_LOCK action to do an extra put on the lock at the end. However, there are times when the master node has to free the locks for the remote nodes forcibly. Two cases when this happens are: 1. When the master has migrated the lockres plus all locks to another node. 2. When the master is clearing all the locks of a dead node. It was in the above two conditions that the dlm was missing the extra put. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:09 -07:00
Tao Ma	4338ab6a75	ocfs2: Fix an endian bug in online resize. In ocfs2_group_add, 'cr' is a disk field of type 'ocfs2_chain_rec', and we were putting cpu byteorder values into it. Swap things to the right endian before storing. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:14:07 -07:00
Jan Engelhardt	90d99779a4	[PATCH] [OCFS2]: constify function pointer tables Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:57 -07:00
Joel Becker	0f71b7b40f	ocfs2: Fix endian bug in o2dlm protocol negotiation. struct dlm_query_join_packet is made up of four one-byte fields. They are effectively in big-endian order already. However, little-endian machines swap them before putting the packet on the wire (because query_join's response is a status, and that status is treated as a u32 on the wire). Thus, a big-endian and little-endian machines will treat this structure differently. The solution is to have little-endian machines swap the structure when converting from the structure to the u32 representation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:54 -07:00
Tao Ma	2af37ce82d	ocfs2: Use dlm_print_one_lock_resource for lock resource print __dlm_print_one_lock_resource must be called with spin_lock the res->spinlock. While in some cases, we use it without this precondition and lead to the failure of assert_spin_locked. So call dlm_print_one_lock_resource instead. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:50 -07:00
Andrew Morton	3a4780a85d	[PATCH] fs/ocfs2/dlm/dlmdomain.c: fix printk warning fs/ocfs2/dlm/dlmdomain.c: In function 'dlm_send_join_cancels': fs/ocfs2/dlm/dlmdomain.c:983: warning: format '%u' expects type 'unsigned int', but argument 7 has type 'long unsigned int' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>	2008-03-10 15:13:39 -07:00
Harvey Harrison	55f78e1771	[CIFS] cifs: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-10 17:14:34 +00:00
Igor Mammedov	7962670e64	[CIFS] DFS patch that connects inode with dfs handling ops if DFS junction point Signed-off-by: Igor Mammedov <niallain@gmail.com> Signed-off-by: Steve French <sfrench@us.ibm.com>	2008-03-09 03:44:18 +00:00
Linus Torvalds	4c1aa6f8b9	Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings NFS: Fix the fsid revalidation in nfs_update_inode() SUNRPC: Fix a nfs4 over rdma transport oops NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c	2008-03-07 12:08:07 -08:00
Trond Myklebust	4e99a1ff34	NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings As long as the directory contents haven't changed, we should just let the path walk proceed to cross the mountpoint. Apart from being an optimisation in the case of 'nohide' mountpoint traversals, it also fixes an issue with referrals: referral inodes don't have valid filehandles, so calling nfs_revalidate_inode() on them is a bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:35:41 -05:00
Trond Myklebust	c37dcd334c	NFS: Fix the fsid revalidation in nfs_update_inode() When we detect that we've crossed a mountpoint on the remote server, we must take care not to use that inode to revalidate the fsid on our current superblock. To do so, we label the inode as a remote mountpoint, and check for that in nfs_update_inode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:35:37 -05:00
Trond Myklebust	af1b8c2ff7	NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c O_SYNC is stored in filp->f_flags. Thanks to Al Viro for pointing out the bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2008-03-07 14:33:40 -05:00
Pavel Emelyanov	e9720acd72	[NET]: Make /proc/net a symlink on /proc/self/net (v3) Current /proc/net is done with so called "shadows", but current implementation is broken and has little chances to get fixed. The problem is that dentries subtree of /proc/net directory has fancy revalidation rules to make processes living in different net namespaces see different entries in /proc/net subtree, but currently, tasks see in the /proc/net subdir the contents of any other namespace, depending on who opened the file first. The proposed fix is to turn /proc/net into a symlink, which points to /proc/self/net, which in turn shows what previously was in /proc/net - the network-related info, from the net namespace the appropriate task lives in. # ls -l /proc/net lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net In other words - this behaves like /proc/mounts, but unlike "mounts", "net" is not a file, but a directory. Changes from v2: * Fixed discrepancy of /proc/net nlink count and selinux labeling screwup pointed out by Stephen. To get the correct nlink count the ->getattr callback for /proc/net is overridden to read one from the net->proc_net entry. To make selinux still work the net->proc_net entry is initialized properly, i.e. with the "net" name and the proc_net parent. Selinux fixes are Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Changes from v1: * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-07 11:08:40 -08:00
Linus Torvalds	910da1a48e	Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 * 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6: [XFS] fix inode leak in xfs_iget_core() [XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many	2008-03-06 08:14:00 -08:00
David Chinner	72772a3b5b	[XFS] fix inode leak in xfs_iget_core() If the radix_tree_preload() fails, we need to destroy the inode we just read in before trying again. This could leak xfs_vnode structures when there is memory pressure. Noticed by Christoph Hellwig. SGI-PV: 977823 SGI-Modid: xfs-linux-melb:xfs-kern:30606a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-03-06 16:38:50 +11:00
David Chinner	92d9cd1059	[XFS] 977545 977545 977545 977545 977545 977545 xfsaild causing too many wakeups Idle state is not being detected properly by the xfsaild push code. The current idle state is detected by an empty list which may never happen with mostly idle filesystem or one using lazy superblock counters. A single dirty item in the list that exists beyond the push target can result repeated looping attempting to push up to the target because it fails to check if the push target has been acheived or not. Fix by considering a dirty list with everything past the target as an idle state and set the timeout appropriately. SGI-PV: 977545 SGI-Modid: xfs-linux-melb:xfs-kern:30532a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-03-06 16:38:17 +11:00
Eric Paris	f9c3a38021	NFS: use new LSM interfaces to explicitly set mount options NFS and SELinux worked together previously because SELinux had NFS specific knowledge built in. This design was approved by both groups back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and the usage of nfs_clone_mount_data showed this to be a poor fragile solution. This patch fixes the NFS functionality regression by making use of the new LSM interfaces to allow an FS to explicitly set its own mount options. The explicit setting of mount options is done in the nfs get_sb functions which are called before the generic vfs hooks try to set mount options for filesystems which use text mount data. This does not currently support NFSv4 as that functionality did not exist in previous kernels and thus there is no regression. I will be adding the needed code, which I believe to be the exact same as the v3 code, in nfs4_get_sb for 2.6.26. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-03-06 08:40:59 +11:00
Eric Paris	e000752989	LSM/SELinux: Interfaces to allow FS to control mount options Introduce new LSM interfaces to allow an FS to deal with their own mount options. This includes a new string parsing function exported from the LSM that an FS can use to get a security data blob and a new security data blob. This is particularly useful for an FS which uses binary mount data, like NFS, which does not pass strings into the vfs to be handled by the loaded LSM. Also fix a BUG() in both SELinux and SMACK when dealing with binary mount data. If the binary mount data is less than one page the copy_page() in security_sb_copy_data() can cause an illegal page fault and boom. Remove all NFSisms from the SELinux code since they were broken by past NFS changes. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-03-06 08:40:53 +11:00

... 2 3 4 5 6 ...

8440 Commits