linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 17:17:54 +07:00

Author	SHA1	Message	Date
Christoph Hellwig	5b03ff1b24	xfs: remove xfs_trans_unlocked_item There is no reason to wake up log space waiters when unlocking inodes or dquots, and the commit log has no explanation for this function either. Given that we now have exact log space wakeups everywhere we can assume the reason for this function was to paper over log space races in earlier XFS versions. Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-02-22 22:17:00 -06:00
Christoph Hellwig	2813d682e8	xfs: remove the i_new_size field in struct xfs_inode Now that we use the VFS i_size field throughout XFS there is no need for the i_new_size field any more given that the VFS i_size field gets updated in ->write_end before unlocking the page, and thus is always uptodate when writeback could see a page. Removing i_new_size also has the advantage that we will never have to trim back di_size during a failed buffered write, given that it never gets updated past i_size. Note that currently the generic direct I/O code only updates i_size after calling our end_io handler, which requires a small workaround to make sure di_size actually makes it to disk. I hope to fix this properly in the generic code. A downside is that we lose the support for parallel non-overlapping O_DIRECT appending writes that recently was added. I don't think keeping the complex and fragile i_new_size infrastructure for this is a good tradeoff - if we really care about parallel appending writers we should investigate turning the iolock into a range lock, which would also allow for parallel non-overlapping buffered writers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:10:19 -06:00
Christoph Hellwig	ce7ae151dd	xfs: remove the i_size field in struct xfs_inode There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:08:53 -06:00
Christoph Hellwig	f392e6319a	xfs: replace i_pin_wait with a bit waitqueue Replace i_pin_wait, which is only used during synchronous inode flushing with a bit waitqueue. This trades off a much smaller inode against slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit) bytes in the XFS inode. Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:07:54 -06:00
Christoph Hellwig	474fce0675	xfs: replace i_flock with a sleeping bitlock We almost never block on i_flock, the exception is synchronous inode flushing. Instead of bloating the inode with a 16/24-byte completion that we abuse as a semaphore just implement it as a bitlock that uses a bit waitqueue for the rare sleeping path. This primarily is a tradeoff between a much smaller inode and a faster non-blocking path vs faster wakeups, and we are much better off with the former. A small downside is that we will lose lockdep checking for i_flock, but given that it's always taken inside the ilock that should be acceptable. Note that for example the inode writeback locking is implemented in a very similar way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:06:45 -06:00
Christoph Hellwig	49e4c70e52	xfs: make i_flags an unsigned long To be used for bit wakeup i_flags needs to be an unsigned long or we'll run into trouble on big endian systems. Because of the 1-byte i_update field right after it this actually causes a fairly large size increase on its own (4 or 8 bytes), but that increase will be more than offset by the next two patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:03:50 -06:00
Christoph Hellwig	8096b1ebb5	xfs: remove the if_ext_max field in struct xfs_ifork We spent a lot of effort to maintain this field, but it always equals to the fork size divided by the constant size of an extent. The prime use of it is to assert that the two stay in sync. Just divide the fork size by the extent size in the few places that we actually use it and remove the overhead of maintaining it. Also introduce a few helpers to consolidate the places where we actually care about the value. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-17 15:02:28 -06:00
Christoph Hellwig	3d2b3129c2	xfs: remove the unused dm_attrs structure .. and the just as dead bhv_desc forward declaration while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-13 12:11:46 -06:00
Christoph Hellwig	673e8e597c	xfs: remove xfs_itruncate_data This wrapper isn't overly useful, not to say rather confusing. Around the call to xfs_itruncate_extents it does: - add tracing - add a few asserts in debug builds - conditionally update the inode size in two places - log the inode Both the tracing and the inode logging can be moved to xfs_itruncate_extents as they are useful for the attribute fork as well - in fact the attr code already does an equivalent xfs_trans_log_inode call just after calling xfs_itruncate_extents. The conditional size updates are a mess, and there was no reason to do them in two places anyway, as the first one was conditional on the inode having extents - but without extents we xfs_itruncate_extents would be a no-op and the placement wouldn't matter anyway. Instead move the size assignments and the asserts that make sense to the callers that want it. As a side effect of this clean up xfs_setattr_size by introducing variables for the old and new inode size, and moving the size updates into a common place. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-01-13 12:11:45 -06:00
Al Viro	576b1d67ce	xfs: propagate umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-01-03 22:55:00 -05:00
Christoph Hellwig	4dd2cb4a28	xfs: force buffer writeback before blocking on the ilock in inode reclaim If we are doing synchronous inode reclaim we block the VM from making progress in memory reclaim. So if we encouter a flush locked inode promote it in the delwri list and wake up xfsbufd to write it out now. Without this we can get hangs of up to 30 seconds during workloads hitting synchronous inode reclaim. The scheme is copied from what we do for dquot reclaims. Reported-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Simon Kirby <sim@hostway.ca> Signed-off-by: Ben Myers <bpm@sgi.com>	2011-11-29 12:06:14 -06:00
Christoph Hellwig	4a06fd262d	xfs: remove i_iocount We now have an i_dio_count filed and surrounding infrastructure to wait for direct I/O completion instead of i_icount, and we have never needed to iocount waits for buffered I/O given that we only set the page uptodate after finishing all required work. Thus remove i_iocount, and replace the actually needed waits with calls to inode_dio_wait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-10-11 21:15:01 -05:00
Al Viro	abbede1b3a	xfs: get rid of open-coded S_ISREG(), etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-07-26 15:05:16 -04:00
Christoph Hellwig	f3ca87389d	xfs: remove i_transp Remove the transaction pointer in the inode. It's only used to avoid passing down an argument in the bmap code, and for a few asserts in the transaction code right now. Also use the local variable ip in a few more places in xfs_inode_item_unlock, so that it isn't only used for debug builds after the above change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:47 +02:00
Christoph Hellwig	8f04c47aa9	xfs: split xfs_itruncate_finish Split the guts of xfs_itruncate_finish that loop over the existing extents and calls xfs_bunmapi on them into a new helper, xfs_itruncate_externs. Make xfs_attr_inactive call it directly instead of xfs_itruncate_finish, which allows to simplify the latter a lot, by only letting it deal with the data fork. As a result xfs_itruncate_finish is renamed to xfs_itruncate_data to make its use case more obvious. Also remove the sync parameter from xfs_itruncate_data, which has been unessecary since the introduction of the busy extent list in 2002, and completely dead code since 2003 when the XFS_BMAPI_ASYNC parameter was made a no-op. I can't actually see why the xfs_attr_inactive needs to set the transaction sync, but let's keep this patch simple and without changes in behaviour. Also avoid passing a useless argument to xfs_isize_check, and make it private to xfs_inode.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:34 +02:00
Christoph Hellwig	857b9778d8	xfs: kill xfs_itruncate_start xfs_itruncate_start is a rather length wrapper that evaluates to a call to xfs_ioend_wait and xfs_tosspages, and only has two callers. Instead of using the complicated checks left over from IRIX where we can to truncate the pagecache just call xfs_tosspages (aka truncate_inode_pages) directly as we want to get rid of all data after i_size, and truncate_inode_pages handles incorrect alignments and too large offsets just fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2011-07-08 14:34:30 +02:00
Dave Chinner	778e24bb6d	xfs: reset inode per-lifetime state when recycling it XFS inodes has several per-lifetime state fields that determine the behaviour of the inode. These state fields are not all reset when an inode is reused from the reclaimable state. This can lead to unexpected behaviour of the new inode such as speculative preallocation not being truncated away in the expected manner for local files until the inode is subsequently truncated, freed or cycles out of the cache. It can also lead to an inode being considered to be a filestream inode or having been truncated when that is not the case. Rework the reinitialisation of the inode when it is recycled to ensure that it is pristine before it is reused. While there, also fix the resetting of state flags in the recycling error paths so the inode does not become unreclaimable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-06-23 22:13:31 -05:00
Christoph Hellwig	ec90c55634	xfs: remove if_lastex The if_lastex field in struct xfs_ifork is only used as a temporary index during xfs_bmapi and xfs_bunmapi. Instead of using the inode fork to store it keep it local in the callchain. Fortunately this is very easy as we already pass a stack copy of it down the whole chain which can simplify be changed to be passed by reference. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-05-25 10:48:37 -05:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Christoph Hellwig	9681153b46	xfs: add lockdep annotations for the rt inodes The rt bitmap and summary inodes do not participate in the normal inode locking protocol. Instead the rt bitmap inode can be locked in any transaction involving rt allocations, and the both of the rt inodes can be locked at the same time. Add specific lockdep subclasses for the rt inodes to prevent lockdep from blowing up. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2011-02-07 13:29:18 -06:00
Dave Chinner	6e857567db	xfs: don't truncate prealloc from frequently accessed inodes A long standing problem for streaming writeѕ through the NFS server has been that the NFS server opens and closes file descriptors on an inode for every write. The result of this behaviour is that the ->release() function is called on every close and that results in XFS truncating speculative preallocation beyond the EOF. This has an adverse effect on file layout when multiple files are being written at the same time - they interleave their extents and can result in severe fragmentation. To avoid this problem, keep track of ->release calls made on a dirty inode. For most cases, an inode is only going to be opened once for writing and then closed again during it's lifetime in cache. Hence if there are multiple ->release calls when the inode is dirty, there is a good chance that the inode is being accessed by the NFS server. Hence set a flag the first time ->release is called while there are delalloc blocks still outstanding on the inode. If this flag is set when ->release is next called, then do no truncate away the speculative preallocation - leave it there so that subsequent writes do not need to reallocate the delalloc space. This will prevent interleaving of extents of different inodes written concurrently to the same AG. If we get this wrong, it is not a big deal as we truncate speculative allocation beyond EOF anyway in xfs_inactive() when the inode is thrown out of the cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-23 12:02:31 +11:00
Dave Chinner	dcfcf20512	xfs: provide a inode iolock lockdep class The XFS iolock needs to be re-initialised to a new lock class before it enters reclaim to prevent lockdep false positives. Unfortunately, this is not sufficient protection as inodes in the XFS_IRECLAIMABLE state can be recycled and not re-initialised before being reused. We need to re-initialise the lock state when transfering out of XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same class as if the inode was just allocated. Hence we need a specific lockdep class variable for the iolock so that both initialisations use the same class. While there, add a specific class for inodes in the reclaim state so that it is easy to tell from lockdep reports what state the inode was in that generated the report. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-12-23 11:57:13 +11:00
Al Viro	7de9c6ee3e	new helper: ihold() Clones an existing reference to inode; caller must already hold one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2010-10-25 21:26:11 -04:00
Arkadiusz Mi?kiewicz	6743099ce5	xfs: Extend project quotas to support 32bit project ids This patch adds support for 32bit project quota identifiers. On disk format is backward compatible with 16bit projid numbers. projid on disk is now kept in two 16bit values - di_projid_lo (which holds the same position as old 16bit projid value) and new di_projid_hi (takes existing padding) and converts from/to 32bit value on the fly. xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used to enable PROJID32BIT support. Signed-off-by: Arkadiusz Miśkiewicz <arekm@maven.pl> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-10-18 15:08:08 -05:00
Christoph Hellwig	6c77b0ea1b	xfs: remove xfs_cred.h We're not actually passing around credentials inside XFS for a while now, so remove all xfs_cred.h with it's cred_t typedef and all instances of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-10-18 15:08:06 -05:00
Dave Chinner	dcd79a1423	xfs: don't use vfs writeback for pure metadata modifications Under heavy multi-way parallel create workloads, the VFS struggles to write back all the inodes that have been changed in age order. The bdi flusher thread becomes CPU bound, spending 85% of it's time in the VFS code, mostly traversing the superblock dirty inode list to separate dirty inodes old enough to flush. We already keep an index of all metadata changes in age order - in the AIL - and continued log pressure will do age ordered writeback without any extra overhead at all. If there is no pressure on the log, the xfssyncd will periodically write back metadata in ascending disk address offset order so will be very efficient. Hence we can stop marking VFS inodes dirty during transaction commit or when changing timestamps during transactions. This will keep the inodes in the superblock dirty list to those containing data or unlogged metadata changes. However, the timstamp changes are slightly more complex than this - there are a couple of places that do unlogged updates of the timestamps, and the VFS need to be informed of these. Hence add a new function xfs_trans_ichgtime() for transactional changes, and leave xfs_ichgtime() for the non-transactional changes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <aelder@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-10-18 15:07:45 -05:00
Dave Chinner	2f11feabb1	xfs: simplify and remove xfs_ireclaim xfs_ireclaim has to get and put te pag structure because it is only called with the inode to reclaim. The one caller of this function already has a reference on the pag and a pointer to is, so move the radix tree delete to the caller and remove xfs_ireclaim completely. This avoids a xfs_perag_get/put on every inode being reclaimed. The overhead was noticed in a bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=16348 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2010-07-26 13:16:48 -05:00
Christoph Hellwig	f2d6761433	xfs: remove xfs_iput xfs_iput is just a small wrapper for xfs_iunlock + IRELE. Having this out of line wrapper means the trace events in those two can't track their caller properly. So just remove the wrapper and opencode the unlock + rele in the few callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:44 -05:00
Christoph Hellwig	ef35e9255d	xfs: remove xfs_iput_new We never get an i_mode of 0 or a locked VFS inode until we pass in the XFS_IGET_CREATE flag to xfs_iget, which makes xfs_iput_new equivalent to xfs_iput for the only caller. In addition to that xfs_nfs_get_inode does not even need to lock the inode given that the generation never changes for a life inode, so just pass a 0 lock_flags to xfs_iget and release the inode using IRELE in the error path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2010-07-26 13:16:44 -05:00
Dave Chinner	7b6259e7a8	xfs: remove block number from inode lookup code The block number comes from bulkstat based inode lookups to shortcut the mapping calculations. We ar enot able to trust anything from bulkstat, so drop the block number as well so that the correct lookups and mappings are always done. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:35:17 +10:00
Dave Chinner	1920779e67	xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED Inode numbers may come from somewhere external to the filesystem (e.g. file handles, bulkstat information) and so are inherently untrusted. Rename the flag we use for these lookups to make it obvious we are doing a lookup of an untrusted inode number and need to verify it completely before trying to read it from disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-06-24 11:15:47 +10:00
Christoph Hellwig	a14a5ab58f	xfs: remove xfs_ipin/xfs_iunpin Inodes are only pinned/unpinned via the inode item methods, and lots of code relies on that fact. So remove the separate xfs_ipin/xfs_iunpin helpers and merge them into their only callers. This also fixes up various duplicate and/or incorrect comments. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-03-01 16:35:56 -06:00
Christoph Hellwig	66d834ea60	xfs: implement optimized fdatasync Allow us to track the difference between timestamp and size updates by using mark_inode_dirty from the I/O completion code, and checking the VFS inode flags in xfs_file_fsync. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2010-03-01 16:34:45 -06:00
Dave Chinner	c854363e80	xfs: Use delayed write for inodes rather than async V2 We currently do background inode flush asynchronously, resulting in inodes being written in whatever order the background writeback issues them. Not only that, there are also blocking and non-blocking asynchronous inode flushes, depending on where the flush comes from. This patch completely removes asynchronous inode writeback. It removes all the strange writeback modes and replaces them with either a synchronous flush or a non-blocking delayed write flush. That is, inode flushes will only issue IO directly if they are synchronous, and background flushing may do nothing if the operation would block (e.g. on a pinned inode or buffer lock). Delayed write flushes will now result in the inode buffer sitting in the delwri queue of the buffer cache to be flushed by either an AIL push or by the xfsbufd timing out the buffer. This will allow accumulation of dirty inode buffers in memory and allow optimisation of inode cluster writeback at the xfsbufd level where we have much greater queue depths than the block layer elevators. We will also get adjacent inode cluster buffer IO merging for free when a later patch in the series allows sorting of the delayed write buffers before dispatch. This effectively means that any inode that is written back by background writeback will be seen as flush locked during AIL pushing, and will result in the buffers being pushed from there. This writeback path is currently non-optimal, but the next patch in the series will fix that problem. A side effect of this delayed write mechanism is that background inode reclaim will no longer directly flush inodes, nor can it wait on the flush lock. The result is that inode reclaim must leave the inode in the reclaimable state until it is clean. Hence attempts to reclaim a dirty inode in the background will simply skip the inode until it is clean and this allows other mechanisms (i.e. xfsbufd) to do more optimal writeback of the dirty buffers. As a result, the inode reclaim code has been rewritten so that it no longer relies on the ambiguous return values of xfs_iflush() to determine whether it is safe to reclaim an inode. Portions of this patch are derived from patches by Christoph Hellwig. Version 2: - cleanup reclaim code as suggested by Christoph - log background reclaim inode flush errors - just pass sync flags to xfs_iflush Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-02-06 12:39:36 +11:00
Dave Chinner	777df5afdb	xfs: Make inode reclaim states explicit A.K.A.: don't rely on xfs_iflush() return value in reclaim We have gradually been moving checks out of the reclaim code because they are duplicated in xfs_iflush(). We've had a history of problems in this area, and many of them stem from the overloading of the return values from xfs_iflush() and interaction with inode flush locking to determine if the inode is safe to reclaim. With the desire to move to delayed write flushing of inodes and non-blocking inode tree reclaim walks, the overloading of the return value of xfs_iflush makes it very difficult to determine the correct thing to do next. This patch explicitly re-adds the checks to the inode reclaim code, removing the reliance on the return value of xfs_iflush() to determine what to do next. It also means that we can clearly document all the inode states that reclaim must handle and hence we can easily see that we handled all the necessary cases. This also removes the need for the xfs_inode_clean() check in xfs_iflush() as all callers now check this first (safely). Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2010-02-06 12:37:26 +11:00
Christoph Hellwig	0b1b213fcf	xfs: event tracing support Convert the old xfs tracing support that could only be used with the out of tree kdb and xfsidbg patches to use the generic event tracer. To use it make sure CONFIG_EVENT_TRACING is enabled and then enable all xfs trace channels by: echo 1 > /sys/kernel/debug/tracing/events/xfs/enable or alternatively enable single events by just doing the same in one event subdirectory, e.g. echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable or set more complex filters, etc. In Documentation/trace/events.txt all this is desctribed in more detail. To reads the events do a cat /sys/kernel/debug/tracing/trace Compared to the last posting this patch converts the tracing mostly to the one tracepoint per callsite model that other users of the new tracing facility also employ. This allows a very fine-grained control of the tracing, a cleaner output of the traces and also enables the perf tool to use each tracepoint as a virtual performance counter, allowing us to e.g. count how often certain workloads git various spots in XFS. Take a look at http://lwn.net/Articles/346470/ for some examples. Also the btree tracing isn't included at all yet, as it will require additional core tracing features not in mainline yet, I plan to deliver it later. And the really nice thing about this patch is that it actually removes many lines of code while adding this nice functionality: fs/xfs/Makefile \| 8 fs/xfs/linux-2.6/xfs_acl.c \| 1 fs/xfs/linux-2.6/xfs_aops.c \| 52 - fs/xfs/linux-2.6/xfs_aops.h \| 2 fs/xfs/linux-2.6/xfs_buf.c \| 117 +-- fs/xfs/linux-2.6/xfs_buf.h \| 33 fs/xfs/linux-2.6/xfs_fs_subr.c \| 3 fs/xfs/linux-2.6/xfs_ioctl.c \| 1 fs/xfs/linux-2.6/xfs_ioctl32.c \| 1 fs/xfs/linux-2.6/xfs_iops.c \| 1 fs/xfs/linux-2.6/xfs_linux.h \| 1 fs/xfs/linux-2.6/xfs_lrw.c \| 87 -- fs/xfs/linux-2.6/xfs_lrw.h \| 45 - fs/xfs/linux-2.6/xfs_super.c \| 104 --- fs/xfs/linux-2.6/xfs_super.h \| 7 fs/xfs/linux-2.6/xfs_sync.c \| 1 fs/xfs/linux-2.6/xfs_trace.c \| 75 ++ fs/xfs/linux-2.6/xfs_trace.h \| 1369 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/linux-2.6/xfs_vnode.h \| 4 fs/xfs/quota/xfs_dquot.c \| 110 --- fs/xfs/quota/xfs_dquot.h \| 21 fs/xfs/quota/xfs_qm.c \| 40 - fs/xfs/quota/xfs_qm_syscalls.c \| 4 fs/xfs/support/ktrace.c \| 323 --------- fs/xfs/support/ktrace.h \| 85 -- fs/xfs/xfs.h \| 16 fs/xfs/xfs_ag.h \| 14 fs/xfs/xfs_alloc.c \| 230 +----- fs/xfs/xfs_alloc.h \| 27 fs/xfs/xfs_alloc_btree.c \| 1 fs/xfs/xfs_attr.c \| 107 --- fs/xfs/xfs_attr.h \| 10 fs/xfs/xfs_attr_leaf.c \| 14 fs/xfs/xfs_attr_sf.h \| 40 - fs/xfs/xfs_bmap.c \| 507 +++------------ fs/xfs/xfs_bmap.h \| 49 - fs/xfs/xfs_bmap_btree.c \| 6 fs/xfs/xfs_btree.c \| 5 fs/xfs/xfs_btree_trace.h \| 17 fs/xfs/xfs_buf_item.c \| 87 -- fs/xfs/xfs_buf_item.h \| 20 fs/xfs/xfs_da_btree.c \| 3 fs/xfs/xfs_da_btree.h \| 7 fs/xfs/xfs_dfrag.c \| 2 fs/xfs/xfs_dir2.c \| 8 fs/xfs/xfs_dir2_block.c \| 20 fs/xfs/xfs_dir2_leaf.c \| 21 fs/xfs/xfs_dir2_node.c \| 27 fs/xfs/xfs_dir2_sf.c \| 26 fs/xfs/xfs_dir2_trace.c \| 216 ------ fs/xfs/xfs_dir2_trace.h \| 72 -- fs/xfs/xfs_filestream.c \| 8 fs/xfs/xfs_fsops.c \| 2 fs/xfs/xfs_iget.c \| 111 --- fs/xfs/xfs_inode.c \| 67 -- fs/xfs/xfs_inode.h \| 76 -- fs/xfs/xfs_inode_item.c \| 5 fs/xfs/xfs_iomap.c \| 85 -- fs/xfs/xfs_iomap.h \| 8 fs/xfs/xfs_log.c \| 181 +---- fs/xfs/xfs_log_priv.h \| 20 fs/xfs/xfs_log_recover.c \| 1 fs/xfs/xfs_mount.c \| 2 fs/xfs/xfs_quota.h \| 8 fs/xfs/xfs_rename.c \| 1 fs/xfs/xfs_rtalloc.c \| 1 fs/xfs/xfs_rw.c \| 3 fs/xfs/xfs_trans.h \| 47 + fs/xfs/xfs_trans_buf.c \| 62 - fs/xfs/xfs_vnodeops.c \| 8 70 files changed, 2151 insertions(+), 2592 deletions(-) Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:16 -06:00
Christoph Hellwig	6ef3554422	xfs: change the xfs_iext_insert / xfs_iext_remove Change the xfs_iext_insert / xfs_iext_remove prototypes to pass more information which will allow pushing the trace points from the callers into those functions. This includes folding the whichfork information into the state variable to minimize the addition stack footprint. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-12-14 23:08:15 -06:00
Christoph Hellwig	f9581b1443	xfs: implement ->dirty_inode to fix timestamp handling This is picking up on Felix's repost of Dave's patch to implement a .dirty_inode method. We really need this notification because the VFS keeps writing directly into the inode structure instead of going through methods to update this state. In addition to the long-known atime issue we now also have a caller in VM code that updates c/mtime that way for shared writeable mmaps. And I found another one that no one has noticed in practice in the FIFO code. So implement ->dirty_inode to set i_update_core whenever the inode gets externally dirtied, and switch the c/mtime handling to the same scheme we already use for atime (always picking up the value from the Linux inode). Note that this patch also removes the xfs_synchronize_atime call in xfs_reclaim it was superflous as we already synchronize the time when writing the inode via the log (xfs_inode_item_format) or the normal buffers (xfs_iflush_int). In addition also remove the I_CLEAR check before copying the Linux timestamps - now that we always have the Linux inode available we can always use the timestamps in it. Also switch to just using file_update_time for regular reads/writes - that will get us all optimization done to it for free and make sure we notice early when it breaks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Alex Elder <aelder@sgi.com>	2009-10-08 12:00:03 -05:00
Christoph Hellwig	aa72a5cf00	xfs: simplify xfs_trans_iget xfs_trans_iget is a wrapper for xfs_iget that adds the inode to the transaction after it is read. Except when the inode already is in the inode cache, in which case it returns the existing locked inode with increment lock recursion counts. Now, no one in the tree every decrements these lock recursion counts, so any user of this gets a potential double unlock when both the original owner of the inode and the xfs_trans_iget caller unlock it. When looking back in a git bisect in the historic XFS tree there was only one place that decremented these counts, xfs_trans_iput. Introduced in commit ca25df7a840f426eb566d52667b6950b92bb84b5 by Adam Sweeney in 1993, and removed in commit 19f899a3ab155ff6a49c0c79b06f2f61059afaf3 by Steve Lord in 2003. And as long as it didn't slip through git bisects cracks never actually used in that time frame. A quick audit of the callers of xfs_trans_iget shows that no caller really relies on this behaviour fortunately - xfs_ialloc allows this inode from disk so it must not be there before, and all the RT allocator routines only every add each RT bitmap inode once. In addition to removing lots of code and reducing the size of the inode item this patch also avoids the double inode cache lookup in each create/mkdir/mknod transaction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Alex Elder <aelder@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:46:16 -05:00
Christoph Hellwig	13e6d5cdde	xfs: merge fsync and O_SYNC handling The guarantees for O_SYNC are exactly the same as the ones we need to make for an fsync call (and given that Linux O_SYNC is O_DSYNC the equivalent is fdadatasync, but we treat both the same in XFS), except with a range data writeout. Jan Kara has started unifying these two path for filesystems using the generic helpers, and I've started to look at XFS. The actual transaction commited by xfs_fsync and xfs_write_sync_logforce has a different transaction number, but actually is exactly the same. We'll only use the fsync transaction going forward. One major difference is that xfs_write_sync_logforce never issues a cache flush unless we commit a transaction causing that as a side-effect, which is an obvious bug in the O_SYNC handling. Second all the locking and i_update_size vs i_update_core changes from `978b723712` never made it to xfs_write_sync_logforce, so we add them back. To make xfs_fsync easily usable from the O_SYNC path, the filemap_fdatawait call is moved up to xfs_file_fsync, so that we don't wait on the whole file after we already waited for our portion in xfs_write. We'll also use a plain call to filemap_write_and_wait_range instead of the previous sync_page_rang which did it in two steps including an half-hearted inode write out that doesn't help us. Once we're done with this also remove the now useless i_update_size tracking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Felix Blyakher <felixb@sgi.com> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-09-01 12:45:57 -05:00
Eric Sandeen	d96f8f891f	xfs: add more statics & drop some unused functions A lot more functions could be made static, but they need forward declarations; this does some easy ones, and also found a few unused functions in the process. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Felix Blyakher <felixb@sgi.com>	2009-08-31 14:46:20 -05:00
Christoph Hellwig	b36ec0428a	xfs: fix freeing of inodes not yet added to the inode cache When freeing an inode that lost race getting added to the inode cache we must not call into ->destroy_inode, because that would delete the inode that won the race from the inode cache radix tree. This patch uses splits a new xfs_inode_free helper out of xfs_ireclaim and uses that plus __destroy_inode to make sure we really only free the memory allocted for the inode that lost the race, and not mess with the inode cache state. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reported-by: Alex Samad <alex@samad.com.au> Reported-by: Andrew Randrianasulu <randrik@mail.ru> Reported-by: Stephane <sharnois@max-t.com> Reported-by: Tommy <tommy@news-service.com> Reported-by: Miah Gregory <mace@darksilence.net> Reported-by: Gabriel Barazer <gabriel@oxeva.fr> Reported-by: Leandro Lucarella <llucax@gmail.com> Reported-by: Daniel Burr <dburr@fami.com.au> Reported-by: Nickolay <newmail@spaces.ru> Reported-by: Michael Guntsche <mike@it-loops.com> Reported-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com> Reported-by: Michael Ole Olsen <gnu@gmx.net> Reported-by: Michael Weissenbacher <mw@dermichi.com> Reported-by: Martin Spott <Martin.Spott@mgras.net> Reported-by: Christian Kujau <lists@nerdbynature.de> Tested-by: Michael Guntsche <mike@it-loops.com> Tested-by: Dan Carley <dan.carley+linuxkern-bugs@gmail.com> Tested-by: Christian Kujau <lists@nerdbynature.de>	2009-08-07 14:38:34 -03:00
Al Viro	1cbd20d820	switch xfs to generic acl caching helpers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2009-06-24 08:17:07 -04:00
Christoph Hellwig	ef14f0c157	xfs: use generic Posix ACL code This patch rips out the XFS ACL handling code and uses the generic fs/posix_acl.c code instead. The ondisk format is of course left unchanged. This also introduces the same ACL caching all other Linux filesystems do by adding pointers to the acl and default acl in struct xfs_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net>	2009-06-10 17:07:47 +02:00
Malcolm Parsons	9da096fd13	xfs: fix various typos Signed-off-by: Malcolm Parsons <malcolm.parsons@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2009-03-29 09:55:42 +02:00
Lachlan McIlroy	0a8c5395f9	[XFS] Fix merge failures Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: fs/xfs/linux-2.6/xfs_cred.h fs/xfs/linux-2.6/xfs_globals.h fs/xfs/linux-2.6/xfs_ioctl.c fs/xfs/xfs_vnodeops.h Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-29 16:47:18 +11:00
Christoph Hellwig	6d73cf133c	[XFS] resync headers with libxfs - xfs_sb.h add the XFS_SB_VERSION2_PARENTBIT features2 that has been around in userspace for some time - xfs_inode.h: move a few things out of __KERNEL__ that are needed by userspace - xfs_mount.h: only include xfs_sync.h under __KERNEL__ - xfs_inode.c: minor whitespace fixup. I accidentaly changes this when importing this file for use by userspace. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-11 13:14:17 +11:00
Lachlan McIlroy	e055f13a6d	[XFS] Remove unused tracing code None of this code appears to be used anywhere so remove it. Reviewed-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-12-10 11:51:54 +11:00
Christoph Hellwig	5a8d0f3c7a	move inode tracing out of xfs_vnode. Move the inode tracing into xfs_iget.c / xfs_inode.h and kill xfs_vnode.c now that it's empty. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:25 +11:00
Christoph Hellwig	6bd16ff270	kill dead inode flags There are a few inode flags around that aren't used anywhere, so remove them. Also update xfsidbg to display all used inode flags correctly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:22 +11:00
Christoph Hellwig	5cafdeb289	cleanup the inode reclaim path Merge xfs_iextract and xfs_idestroy into xfs_ireclaim as they are never called individually. Also rewrite most comments in this area as they were severly out of date. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	ccd0be6cfc	remove unused prototypes for xfs_ihash_init / xfs_ihash_free Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-04 15:39:20 +11:00
Christoph Hellwig	24f211bad0	[XFS] move inode allocation out xfs_iread Allocate the inode in xfs_iget_cache_miss and pass it into xfs_iread. This simplifies the error handling and allows xfs_iread to be shared with userspace which already uses these semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:17 +11:00
Christoph Hellwig	b48d8d6437	[XFS] kill the XFS_IMAP_BULKSTAT flag Just pass down the XFS_IGET_* flags all the way down to xfs_imap instead of translating them mid-way. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:13 +11:00
Christoph Hellwig	92bfc6e7c4	[XFS] embededd struct xfs_imap into xfs_inode Most uses of struct xfs_imap are to map and inode to a buffer. To avoid copying around the inode location information we should just embedd a strcut xfs_imap into the xfs_inode. To make sure it doesn't bloat an inode the im_len is changed to a ushort, which is fine as that's what the users exepect anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:08 +11:00
Christoph Hellwig	94e1b69d1a	[XFS] merge xfs_imap into xfs_dilocate xfs_imap is the only caller of xfs_dilocate and doesn't add any significant value. Merge the two functions and document the various cases we have for inode cluster lookup in the new xfs_imap. Also remove the unused im_agblkno and im_ioffset fields from struct xfs_imap while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:38:03 +11:00
Christoph Hellwig	a194189503	[XFS] remove dead code for old inode item recovery We have removed the support for old-style inode items a while ago and xlog_recover_do_inode_trans is now only called for XFS_LI_INODE items. That means we can remove the call to xfs_imap there and with it the XFS_IMAP_LOOKUP that is set by all other callers. We can also mark xfs_imap static now. (First sent on October 21st) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:58 +11:00
Christoph Hellwig	76d8b277f7	[XFS] stop using xfs_itobp in xfs_iread The only caller of xfs_itobp that doesn't have i_blkno setup is now the initial inode read. It needs access to the whole xfs_imap so using xfs_inotobp is not an option. Instead opencode the buffer lookup in xfs_iread and kill all the functionality for the initial map from xfs_itobp. (First sent on October 21st) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:52 +11:00
Christoph Hellwig	81591fe2db	[XFS] kill xfs_dinode_core_t Now that we have a separate xfs_icdinode_t for the in-core inode which gets logged there is no need anymore for the xfs_dinode vs xfs_dinode_core split - the fact that part of the structure gets logged through the inode log item and a small part not can better be described in a comment. All sizeof operations on the dinode_core either really wanted the icdinode and are switched to that one, or had already added the size of the agi unlinked list pointer. Later both will be replaced with helpers once we get the larger CRC-enabled dinode. Removing the data and attribute fork unions also has the advantage that xfs_dinode.h doesn't need to pull in every header under the sun. While we're at it also add some more comments describing the dinode structure. (First sent on October 7th) Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:35 +11:00
Dave Chinner	26c5295135	[XFS] remove i_gen from incore inode i_gen is incremented in directory operations when the directory is changed. It is never read or otherwise used so it should be removed to help reduce the size of the struct xfs_inode. The patch also removes a duplicate logging of the directory inode core. We only need to do this once per transaction so kill the one associated with the i_gen increment. Signed-off-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Niv Sardi <xaiki@sgi.com>	2008-12-01 11:37:10 +11:00
David Howells	b6dff3ec5e	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:39:16 +11:00
Christoph Hellwig	9ed0451ee0	[XFS] free partially initialized inodes using destroy_inode To make sure we free the security data inodes need to be freed using the proper VFS helper (which we also need to export for this). We mark these inodes bad so we can skip the flush path for them. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32398a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 18:26:04 +11:00
Christoph Hellwig	c679eef052	[XFS] stop using xfs_itobp in xfs_bulkstat xfs_bulkstat only wants the dinode, offset and buffer from a given inode number. Instead of using xfs_itobp on a fake inode which is complicated and currently leads to leaks of the security data just use xfs_inotobp which is designed to do exactly the kind of lookup xfs_bulkstat wants. The only thing that's missing in xfs_inotobp is a flags paramter that let's us pass down XFS_IMAP_BULKSTAT, but that can easily added. SGI-PV: 987246 SGI-Modid: xfs-linux-melb:xfs-kern:32397a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 18:04:13 +11:00
David Chinner	116545130c	[XFS] kill deleted inodes list Now that the deleted inodes list is unused, kill it. This also removes the i_reclaim list head from the xfs_inode, shrinking it by two pointers. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32334a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:49 +11:00
David Chinner	fce08f2f3b	[XFS] move inode reclaim functions to xfs_sync.c Background inode reclaim is run by the xfssyncd. Move the reclaim worker functions to be close to the sync code as the are very similar in structure and are both run from the same background thread. SGI-PV: 988142 SGI-Modid: xfs-linux-melb:xfs-kern:32329a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:37:03 +11:00
David Chinner	bf904248a2	[XFS] Combine the XFS and Linux inodes To avoid issues with different lifecycles of XFS and Linux inodes, embedd the linux inode inside the XFS inode. This means that the linux inode has the same lifecycle as the XFS inode, even when it has been released by the OS. XFS inodes don't live much longer than this (a short stint in reclaim at most), so there isn't significant memory usage penalties here. Version 3 o kill xfs_icount() Version 2 o remove unused commented out code from xfs_iget(). o kill useless cast in VFS_I() SGI-PV: 988141 SGI-Modid: xfs-linux-melb:xfs-kern:32323a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:36:14 +11:00
Christoph Hellwig	7cc95a821d	[XFS] Always use struct xfs_btree_block instead of short / longform structures. Always use the generic xfs_btree_block type instead of the short / long structures. Add XFS_BTREE_SBLOCK_LEN / XFS_BTREE_LBLOCK_LEN defines for the length of a short / long form block. The rationale for this is that we will grow more btree block header variants to support CRCs and other RAS information, and always accessing them through the same datatype with unions for the short / long form pointers makes implementing this much easier. SGI-PV: 988146 SGI-Modid: xfs-linux-melb:xfs-kern:32300a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:14:34 +11:00
David Chinner	6c7699c047	[XFS] remove the mount inode list Now we've removed all users of the mount inode list, we can kill it. This reduces the size of the xfs_inode by 2 pointers. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32293a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:11:29 +11:00
David Chinner	75c68f411b	[XFS] Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() xfs_iflush_all() walks the m_inodes list to find inodes that need reclaiming. We already have such a list - the m_del_inodes list. Replace xfs_iflush_all() with a call to xfs_finish_reclaim_all() and clean up xfs_finish_reclaim_all() to handle the different flush modes now needed. Originally based on a patch from Christoph Hellwig. Version 3 o rediff against new linux-2.6/xfs_sync.c code Version 2 o revert xfs_syncsub() inode reclaim behaviour back to original code o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not XFS_IFLUSH_ASYNC, to prevent change of behaviour. SGI-PV: 988139 SGI-Modid: xfs-linux-melb:xfs-kern:32284a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 17:06:28 +11:00
Barry Naujok	847fff5ca8	[XFS] Sync up kernel and user-space headers SGI-PV: 986558 SGI-Modid: xfs-linux-melb:xfs-kern:32231a Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-10-30 17:05:38 +11:00
Christoph Hellwig	8c4ed633e6	[XFS] make btree tracing generic Make the existing bmap btree tracing generic so that it applies to all btree types. Some fragments lifted from a patch by Dave Chinner. SGI-PV: 985583 SGI-Modid: xfs-linux-melb:xfs-kern:32187a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Bill O'Donnell <billodo@sgi.com> Signed-off-by: David Chinner <david@fromorbit.com>	2008-10-30 16:55:13 +11:00
David Chinner	07c8f67587	[XFS] Make use of the init-once slab optimisation. To avoid having to initialise some fields of the XFS inode on every allocation, we can use the slab init-once feature to initialise them. All we have to guarantee is that when we free the inode, all it's entries are in the initial state. Add asserts where possible to ensure debug kernels check this initial state before freeing and after allocation. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31925a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org>	2008-10-30 16:11:59 +11:00
Christoph Hellwig	dff35fd41f	[XFS] update timestamp in xfs_ialloc manually In xfs_ialloc we just want to set all timestamps to the current time. We don't need to mark the inode dirty like xfs_ichgtime does, and we don't need nor want the opimizations in xfs_ichgtime that I will introduce in the next patch. So just opencode the timestamp update in xfs_ialloc, and remove the new unused XFS_ICHGTIME_ACC case in xfs_ichgtime. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31825a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:44:15 +10:00
David Chinner	c63942d3ee	[XFS] replace inode flush semaphore with a completion Use the new completion flush code to implement the inode flush lock. Removes one of the final users of semaphores in the XFS code base. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31817a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:41:16 +10:00
Christoph Hellwig	5ec7f8c7d1	[XFS] kill bhv_vnode_t All remaining bhv_vnode_t instance are in code that's more or less Linux specific. (Well, for xfs_acl.c that could be argued, but that code is on the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the whole tree. We can clean up variable naming and some useless helpers later. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31781a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:22:40 +10:00
Christoph Hellwig	e1cccd917b	[XFS] kill xfs_lock_dir_and_entry When multiple inodes are locked in XFS it happens in order of the inode number, with the everything but the first inode trylocked if any of the previous inodes is in the AIL. Except for the sorting of the inodes this logic is implemented in xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry in a particularly stupid way adds a lock roundtrip if the inode ordering is not optimal. This patch adds a new helper xfs_lock_two_inodes that takes two inodes and locks them in the most optimal way according to the above locking protocol and uses it for all places that want to lock two inodes. The only caller of xfs_lock_inodes is xfs_rename which might lock up to four inodes. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31772a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:18:07 +10:00
David Chinner	e6064d30c3	[XFS] XFS: Kill xfs_vtoi() xfs_vtoi() is redundant and only unsed in small sections of code. Replace them with widely used XFS_I() inline and kill xfs_vtoi(). SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31725a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:01:45 +10:00
David Chinner	e4f7529108	[XFS] Kill shouty XFS_ITOV() macro Replace XFS_ITOV() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31724a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 16:00:45 +10:00
David Chinner	705db4a24e	[XFS] kill shouty XFS_ITOV_NULL macro Replace XFS_ITOV_NULL() with the new VFS_I() inline. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31722a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 15:47:43 +10:00
David Chinner	0165164625	[XFS] Avoid directly referencing the VFS inode. In several places we directly convert from the XFS inode to the linux (VFS) inode by a simple deference of ip->i_vnode. We should not do this - a helper function should be used to extract the VFS inode from the XFS inode. Introduce the function VFS_I() to extract the VFS inode from the XFS inode. The name was chosen to match XFS_I() which is used to extract the XFS inode from the VFS inode. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31720a Signed-off-by: David Chinner <david@fromorbit.com> Signed-off-by: Niv Sardi <xaiki@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-08-13 15:45:15 +10:00
Christoph Hellwig	61436febae	[XFS] kill xfs_igrow_start and xfs_igrow_finish xfs_igrow_start just expands to xfs_zero_eof with two asserts that are useless in the context of the only caller and some rather confusing comments. xfs_igrow_finish is just a few lines of code decorated again with useless asserts and confusing comments. Just kill those two and merge them into xfs_setattr. SGI-PV: 981498 SGI-Modid: xfs-linux-melb:xfs-kern:31186a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-07-28 16:58:18 +10:00
Christoph Hellwig	cfa853e47d	[XFS] remove manual lookup from xfs_rename and simplify locking ->rename already gets the target inode passed if it exits. Pass it down to xfs_rename so that we can avoid looking it up again. Also simplify locking as the first lock section in xfs_rename can go away now: the isdir is an invariant over the lifetime of the inode, and new_parent and the nlink check are namespace topology protected by i_mutex in the VFS. The projid check needs to move into the second lock section anyway to not be racy. Also kill the now unused xfs_dir_lookup_int and remove the now-unused first_locked argumet to xfs_lock_inodes. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30903a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-29 15:54:12 +10:00
Christoph Hellwig	579aa9caf5	[XFS] shrink mrlock_t The writer field is not needed for non_DEBU builds so remove it. While we're at i also clean up the interface for is locked asserts to go through and xfs_iget.c helper with an interface like the xfs_ilock routines to isolated the XFS codebase from mrlock internals. That way we can kill mrlock_t entirely once rw_semaphores grow an islocked facility. Also remove unused flags to the ilock family of functions. SGI-PV: 976035 SGI-Modid: xfs-linux-melb:xfs-kern:30902a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-29 15:54:02 +10:00
David Chinner	bad5584332	[XFS] Remove the xfs_icluster structure Remove the xfs_icluster structure and replace with a radix tree lookup. We don't need to keep a list of inodes in each cluster around anymore as we can look them up quickly when we need to. The only time we need to do this now is during inode writeback. Factor the inode cluster writeback code out of xfs_iflush and convert it to use radix_tree_gang_lookup() instead of walking a list of inodes built when we first read in the inodes. This remove 3 pointers from each xfs_inode structure and the xfs_icluster structure per inode cluster. Hence we reduce the cache footprint of the xfs_inodes by between 5-10% depending on cluster sparseness. To be truly efficient we need a radix_tree_gang_lookup_range() call to stop searching once we are past the end of the cluster instead of trying to find a full cluster's worth of inodes. Before (ia64): $ cat /sys/slab/xfs_inode/object_size 536 After: $ cat /sys/slab/xfs_inode/object_size 512 SGI-PV: 977460 SGI-Modid: xfs-linux-melb:xfs-kern:30502a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:41 +10:00
David Chinner	a3f74ffb6d	[XFS] Don't block pdflush when writing back inodes When pdflush is writing back inodes, it can get stuck on inode cluster buffers that are currently under I/O. This occurs when we write data to multiple inodes in the same inode cluster at the same time. Effectively, delayed allocation marks the inode dirty during the data writeback. Hence if the inode cluster was flushed during the writeback of the first inode, the writeback of the second inode will block waiting for the inode cluster write to complete before writing it again for the newly dirtied inode. Basically, we want to avoid this from happening so we don't block pdflush and slow down all of writeback. Hence we introduce a non-blocking async inode flush flag that pdflush uses. If this flag is set, we use non-blocking operations (e.g. try locks) whereever we can to avoid blocking or extra I/O being issued. SGI-PV: 970925 SGI-Modid: xfs-linux-melb:xfs-kern:30501a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:37:32 +10:00
Donald Douwsma	163d3686bb	[XFS] Remove the xfs_refcache Remove the xfs_refcache, it was only needed while we were still building for 2.4 kernels. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30472a Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-04-18 11:36:55 +10:00
Christoph Hellwig	4576758db5	[XFS] use generic_permission Now that all direct caller of xfs_iaccess are gone we can kill xfs_iaccess and xfs_access and just use generic_permission with a check_acl callback. This is required for the per-mount read-only patchset in -mm to work properly with XFS. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30370a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:22:38 +11:00
Christoph Hellwig	45ba598e56	[XFS] Remove CFORK macros and use code directly in IFORK and DFORK macros. Currently XFS_IFORK_* and XFS_DFORK* are implemented by means of XFS_CFORK* macros. But given that XFS_IFORK_* operates on an xfs_inode that embedds and xfs_icdinode_core and XFS_DFORK_* operates on an xfs_dinode that embedds a xfs_dinode_core one will have to do endian swapping while the other doesn't. Instead of having the current mess with the CFORK macros that have byteswapping and non-byteswapping version (which are inconsistantly named while we're at it) just define each family of the macros to stand by itself and simplify the whole matter. A few direct references to the CFORK variants were cleaned up to use IFORK or DFORK to make this possible. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30163a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:19:24 +11:00
Robert P. J. Day	40ebd81d1a	[XFS] Use kernel-supplied "roundup_pow_of_two" for simplicity SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:30098a Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:18:19 +11:00
David Chinner	5d51eff453	[XFS] Fix inode allocation latency The log force added in xfs_iget_core() has been a performance issue since it was introduced for tight loops that allocate then unlink a single file. under heavy writeback, this can introduce unnecessary latency due tothe log I/o getting stuck behind bulk data writes. Fix this latency problem by avoinding the need for the log force by moving the place we mark linux inode dirty to the transaction commit rather than on transaction completion. This also closes a potential hole in the sync code where a linux inode is not dirty between the time it is modified and the time the log buffer has been written to disk. SGI-PV: 972753 SGI-Modid: xfs-linux-melb:xfs-kern:30007a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>	2008-02-07 18:16:07 +11:00
Christoph Hellwig	10090be25c	[XFS] cleanup vnode useage in xfs_iget.c Get rid of vnode useage in xfs_iget.c and pass Linux inode / xfs_inode where apropinquate. And kill some useless helpers while we're at it. SGI-PV: 971186 SGI-Modid: xfs-linux-melb:xfs-kern:29808a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:55:46 +11:00
Christoph Hellwig	613d70436c	[XFS] kill xfs_iocore_t xfs_iocore_t is a structure embedded in xfs_inode. Except for one field it just duplicates fields already in xfs_inode, and there is nothing this abstraction buys us on XFS/Linux. This patch removes it and shrinks source and binary size of xfs aswell as shrinking the size of xfs_inode by 60/44 bytes in debug/non-debug builds. SGI-PV: 970852 SGI-Modid: xfs-linux-melb:xfs-kern:29754a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:48:58 +11:00
Eric Sandeen	36e41eebda	[XFS] Cleanup lock goop. Switch last couple lock_t's to spinlock_t's. Remove now-unused spinlock-related macros & types. SGI-PV: 970382 SGI-Modid: xfs-linux-melb:xfs-kern:29748a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:47:35 +11:00
Lachlan McIlroy	cf441eeb79	[XFS] clean up vnode/inode tracing Simplify vnode tracing calls by embedding function name & return addr in the calling macro. Also do a lot of vnode->inode renaming for consistency, while we're at it. SGI-PV: 970335 SGI-Modid: xfs-linux-melb:xfs-kern:29650a Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2008-02-07 16:42:19 +11:00
Christoph Hellwig	bd186aa901	[XFS] kill the vfs_flags member in struct bhv_vfs All flags are added to xfs_mount's m_flag instead. Note that the 32bit inode flag was duplicated in both of them, but only cleared in the mount when it was not nessecary due to the filesystem beeing small enough. Two flags are still required here - one to indicate the mount option setting, and one to indicate if it applies or not. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29507a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:45:57 +10:00
Christoph Hellwig	0a74cd1964	[XFS] kill struct bhv_vnode Now that struct bhv_vnode is empty we can just kill it. Retain bhv_vnode_t as a typedef for struct inode for the time being until all the fallout is cleaned up. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29500a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:40:24 +10:00
Christoph Hellwig	1543d79c45	[XFS] move v_trace from bhv_vnode to xfs_inode struct bhv_vnode is on it's way out, so move the trace buffer to the XFS inode. Note that this makes the tracing macros rather misnamed, but this kind of fallout will be fixed up incrementally later on. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29498a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:39:25 +10:00
Christoph Hellwig	b677c210ce	[XFS] move v_iocount from bhv_vnode to xfs_inode struct bhv_vnode is on it's way out, so move the I/O count to the XFS inode. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29497a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:38:56 +10:00
Christoph Hellwig	09262b4339	[XFS] Create xfs_iflags_test_and_clear helper function SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29496a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:38:36 +10:00
Christoph Hellwig	b3aea4edc2	[XFS] kill the v_flag member in struct bhv_vnode All flags previously handled at the vnode level are not in the xfs_inode where we already have a flags mechanisms and free bits for flags previously in the vnode. SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29495a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 11:37:29 +10:00
Christoph Hellwig	739bfb2a7d	[XFS] call common xfs vnode-level helpers directly and remove vnode operations SGI-PV: 969608 SGI-Modid: xfs-linux-melb:xfs-kern:29493a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-16 10:40:00 +10:00
David Chinner	da353b0d64	[XFS] Radix tree based inode caching One of the perpetual scaling problems XFS has is indexing it's incore inodes. We currently uses hashes and the default hash sizes chosen can only ever be a tradeoff between memory consumption and the maximum realistic size of the cache. As a result, anyone who has millions of inodes cached on a filesystem needs to tunes the size of the cache via the ihashsize mount option to allow decent scalability with inode cache operations. A further problem is the separate inode cluster hash, whose size is based on the ihashsize but is smaller, and so under certain conditions (sparse cluster cache population) this can become a limitation long before the inode hash is causing issues. The following patchset removes the inode hash and cluster hash and replaces them with radix trees to avoid the scalability limitations of the hashes. It also reduces the size of the inodes by 3 pointers.... SGI-PV: 969561 SGI-Modid: xfs-linux-melb:xfs-kern:29481a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:50:50 +10:00
Christoph Hellwig	347d1c0195	[XFS] dinode endianess annotations Biggest bit is duplicating the dinode structure so we have one annotated for native endianess and one for disk endianess. The other significant change is that xfs_xlate_dinode_core is split into one helper per direction to allow for proper annotations, everything else is trivial. As a sidenode splitting out the incore dinode means we can move it into xfs_inode.h in a later patch and severely improving on the include hell in xfs. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29476a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:48:30 +10:00
Christoph Hellwig	a6f64d4aea	[XFS] split ondisk vs incore versions of xfs_bmbt_rec_t currently xfs_bmbt_rec_t is used both for ondisk extents as well as host-endian ones. This patch adds a new xfs_bmbt_rec_host_t for the native endian ones and cleans up the fallout. There have been various endianess issues in the tracing / debug printf code that are fixed by this patch. SGI-PV: 968563 SGI-Modid: xfs-linux-melb:xfs-kern:29318a Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-10-15 16:25:51 +10:00
David Chinner	0f1145cc18	[XFS] Fix lockdep annotations for xfs_lock_inodes SGI-PV: 967035 SGI-Modid: xfs-linux-melb:xfs-kern:29026a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-07-14 18:09:42 +10:00
David Chinner	2a82b8be8a	[XFS] Concurrent Multi-File Data Streams In media spaces, video is often stored in a frame-per-file format. When dealing with uncompressed realtime HD video streams in this format, it is crucial that files do not get fragmented and that multiple files a placed contiguously on disk. When multiple streams are being ingested and played out at the same time, it is critical that the filesystem does not cross the streams and interleave them together as this creates seek and readahead cache miss latency and prevents both ingest and playout from meeting frame rate targets. This patch set creates a "stream of files" concept into the allocator to place all the data from a single stream contiguously on disk so that RAID array readahead can be used effectively. Each additional stream gets placed in different allocation groups within the filesystem, thereby ensuring that we don't cross any streams. When an AG fills up, we select a new AG for the stream that is not in use. The core of the functionality is the stream tracking - each inode that we create in a directory needs to be associated with the directories' stream. Hence every time we create a file, we look up the directories' stream object and associate the new file with that object. Once we have a stream object for a file, we use the AG that the stream object point to for allocations. If we can't allocate in that AG (e.g. it is full) we move the entire stream to another AG. Other inodes in the same stream are moved to the new AG on their next allocation (i.e. lazy update). Stream objects are kept in a cache and hold a reference on the inode. Hence the inode cannot be reclaimed while there is an outstanding stream reference. This means that on unlink we need to remove the stream association and we also need to flush all the associations on certain events that want to reclaim all unreferenced inodes (e.g. filesystem freeze). SGI-PV: 964469 SGI-Modid: xfs-linux-melb:xfs-kern:29096a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Barry Naujok <bnaujok@sgi.com> Signed-off-by: Donald Douwsma <donaldd@sgi.com> Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Tim Shimmin <tes@sgi.com> Signed-off-by: Vlad Apostolov <vapo@sgi.com>	2007-07-14 15:40:53 +10:00
Lachlan McIlroy	f7c66ce3f7	[XFS] Add lockdep support for XFS SGI-PV: 963965 SGI-Modid: xfs-linux-melb:xfs-kern:28485a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:50:19 +10:00
Lachlan McIlroy	ba87ea699e	[XFS] Fix to prevent the notorious 'NULL files' problem after a crash. The problem that has been addressed is that of synchronising updates of the file size with writes that extend a file. Without the fix the update of a file's size, as a result of a write beyond eof, is independent of when the cached data is flushed to disk. Often the file size update would be written to the filesystem log before the data is flushed to disk. When a system crashes between these two events and the filesystem log is replayed on mount the file's size will be set but since the contents never made it to disk the file is full of holes. If some of the cached data was flushed to disk then it may just be a section of the file at the end that has holes. There are existing fixes to help alleviate this problem, particularly in the case where a file has been truncated, that force cached data to be flushed to disk when the file is closed. If the system crashes while the file(s) are still open then this flushing will never occur. The fix that we have implemented is to introduce a second file size, called the in-memory file size, that represents the current file size as viewed by the user. The existing file size, called the on-disk file size, is the one that get's written to the filesystem log and we only update it when it is safe to do so. When we write to a file beyond eof we only update the in- memory file size in the write operation. Later when the I/O operation, that flushes the cached data to disk completes, an I/O completion routine will update the on-disk file size. The on-disk file size will be updated to the maximum offset of the I/O or to the value of the in-memory file size if the I/O includes eof. SGI-PV: 958522 SGI-Modid: xfs-linux-melb:xfs-kern:28322a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:46 +10:00
Lachlan McIlroy	d3cf209476	[XFS] propogate return codes from flush routines This patch handles error return values in fs_flush_pages and fs_flushinval_pages. It changes the prototype of fs_flushinval_pages so we can propogate the errors and handle them at higher layers. I also modified xfs_itruncate_start so that it could propogate the error further. SGI-PV: 961990 SGI-Modid: xfs-linux-melb:xfs-kern:28231a Signed-off-by: Lachlan McIlroy <lachlan@sgi.com> Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2007-05-08 13:49:27 +10:00
David Chinner	7a18c38607	[XFS] Clean up i_flags and i_flags_lock handling. SGI-PV: 956832 SGI-Modid: xfs-linux-melb:xfs-kern:27358a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nscott@aconex.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-11-11 18:04:54 +11:00
David Chinner	f273ab848b	[XFS] Really fix use after free in xfs_iunpin. The previous attempts to fix the linux inode use-after-free in xfs_iunpin simply made the problem harder to hit. We actually need complete exclusion between xfs_reclaim and xfs_iunpin, as well as ensuring that the i_flags are consistent during both of these functions. Introduce a new spinlock for exclusion and the i_flags, and fix up xfs_iunpin to use igrab before marking the inode dirty. SGI-PV: 952967 SGI-Modid: xfs-linux-melb:xfs-kern:26964a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:06:03 +10:00
Nathan Scott	745b1f47fc	[XFS] Remove last bulkstat false-positives with debug kernels. SGI-PV: 953819 SGI-Modid: xfs-linux-melb:xfs-kern:26628a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Tim Shimmin <tes@sgi.com>	2006-09-28 11:02:23 +10:00
Nathan Scott	67fcaa73ad	[XFS] Resolve a namespace collision on vnode/vnodeops for FreeBSD porters. SGI-PV: 953338 SGI-Modid: xfs-linux-melb:xfs-kern:26107a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 17:00:52 +10:00
Nathan Scott	b83bd13881	[XFS] Resolve a namespace collision on vfs/vfsops for FreeBSD porters. SGI-PV: 9533338 SGI-Modid: xfs-linux-melb:xfs-kern:26106a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-06-09 16:48:30 +10:00
David Chinner	1fc5d959d8	[XFS] Fix inode reclaim scalability regression. When a filesystem has millions of inodes cached and has sparse cluster population, removing inodes from the cluster hash consumes excessive amounts of CPU time. Reduce the CPU cost by making removal O(1) via use of a double linked list for the hash chains. SGI-PV: 951551 SGI-Modid: xfs-linux-melb:xfs-kern:25683a Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-04-11 15:11:12 +10:00
Nathan Scott	b12dd34298	[XFS] Fix an infinite loop issue in bulkstat when a corrupt inode is detected. Thanks to Roger Willcocks. SGI-PV: 951054 SGI-Modid: xfs-linux-melb:xfs-kern:25477a Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-17 17:26:04 +11:00
Mandy Kirkconnell	8867bc9bf0	[XFS] There are a few problems with the new xfs_bmap_search_multi_extents() wrapper function that I introduced in mod xfs-linux:xfs-kern:207393a. The function was added as a wrapper around xfs_bmap_do_search_extents() to avoid breaking the top-of-tree CXFS interface. The idea of the function was basically to extract the target extent buffer (if muli- level extent allocation mode), then call xfs_bmap_do_search_extents() with either a pointer to the first extent in the target buffer or a pointer to the first extent in the file, depending on which extent mode was being used. However, in addition to locating the target extent record for block bno, xfs_bmap_do_search_extents() also sets four parameters needed by the caller: lastx, eofp, gotp, prevp. Passing only the target extent buffer to xfs_bmap_do_search_extents() causes eofp to be set incorrectly if the extent is at the end of the target list but there are actually more extents in the next er_extbuf. Likewise, if the extent is the first one in the buffer but NOT the first in the file, prevp is incorrectly set to NULL. Adding the needed functionality to xfs_bmap_search_multi_extents() to re-set any incorrectly set fields is redundant and makes the call to xfs_bmap_do_search_extents() not make much sense when multi-level extent allocation mode is being used. This mod basically extracts the two functional components from xfs_bmap_do_search_extents(), with the intent of obsoleting/removing xfs_bmap_do_search_extents() after the CXFS mult-level in-core extent changes are checked in. The two components are: 1) The binary search to locate the target extent record, and 2) Setting the four parameters needed by the caller (lastx, eofp, gotp, prevp). Component 1: I created a new function in xfs_inode.c called xfs_iext_bno_to_ext(), which executes the binary search to find the target extent record. xfs_bmap_search_multi_extents() has been modified to call xfs_iext_bno_to_ext() rather than xfs_bmap_do_search_extents(). Component 2: The parameter setting functionality has been added to xfs_bmap_search_multi_extents(), eliminating the need for xfs_bmap_do_search_extents(). These changes make the removal of xfs_bmap_do_search_extents() trival once the CXFS changes are in place. They also allow us to maintain the current XFS interface, using the new search function introduced in mod xfs-linux:xfs-kern:207393a. SGI-PV: 928864 SGI-Modid: xfs-linux-melb:xfs-kern:207866a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-17 17:25:04 +11:00
Mandy Kirkconnell	0293ce3a9f	[XFS] 929045 567344 This mod introduces multi-level in-core file extent functionality, building upon the new layout introduced in mod xfs-linux:xfs-kern:207390a. The new multi-level extent allocations are only required for heavily fragmented files, so the old-style linear extent list is used on files until the extents reach a pre-determined size of 4k. 4k buffers are used because this is the system page size on Linux i386 and systems with larger page sizes don't seem to gain much, if anything, by using their native page size as the extent buffer size. Also, using 4k extent buffers everywhere provides a consistent interface for CXFS across different platforms. The 4k extent buffers are managed by an indirection array (xfs_ext_irec_t) which is basically just a pointer array with a bit of extra information to keep track of the number of extents in each buffer as well as the extent offset of each buffer. Major changes include: - Add multi-level in-core file extent functionality to the xfs_iext_ subroutines introduced in mod: xfs-linux:xfs-kern:207390a - Introduce 13 new subroutines which add functionality for multi-level in-core file extents: xfs_iext_add_indirect_multi() xfs_iext_remove_indirect() xfs_iext_realloc_indirect() xfs_iext_indirect_to_direct() xfs_iext_bno_to_irec() xfs_iext_idx_to_irec() xfs_iext_irec_init() xfs_iext_irec_new() xfs_iext_irec_remove() xfs_iext_irec_compact() xfs_iext_irec_compact_pages() xfs_iext_irec_compact_full() xfs_iext_irec_update_extoffs() SGI-PV: 928864 SGI-Modid: xfs-linux-melb:xfs-kern:207393a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:30:23 +11:00
Mandy Kirkconnell	4eea22f01b	[XFS] 929045 567344 This mod re-organizes some of the in-core file extent code to prepare for an upcoming mod which will introduce multi-level in-core extent allocations. Although the in-core extent management is using a new code path in this mod, the functionality remains the same. Major changes include: - Introduce 10 new subroutines which re-orgainze the existing code but do NOT change functionality: xfs_iext_get_ext() xfs_iext_insert() xfs_iext_add() xfs_iext_remove() xfs_iext_remove_inline() xfs_iext_remove_direct() xfs_iext_realloc_direct() xfs_iext_direct_to_inline() xfs_iext_inline_to_direct() xfs_iext_destroy() - Remove 2 subroutines (functionality moved to new subroutines above): xfs_iext_realloc() -replaced by xfs_iext_add() and xfs_iext_remove() xfs_bmap_insert_exlist() - replaced by xfs_iext_insert() xfs_bmap_delete_exlist() - replaced by xfs_iext_remove() - Replace all hard-coded (indexed) extent assignments with a call to xfs_iext_get_ext() - Replace all extent record pointer arithmetic (ep++, ep--, base + lastx,..) with calls to xfs_iext_get_ext() - Update comments to remove the idea of a single "extent list" and introduce "extent record" terminology instead SGI-PV: 928864 SGI-Modid: xfs-linux-melb:xfs-kern:207390a Signed-off-by: Mandy Kirkconnell <alkirkco@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-03-14 13:29:52 +11:00
Christoph Hellwig	75e17b3caf	[XFS] add helper to get xfs_inode from vnode SGI-PV: 947206 SGI-Modid: xfs-linux-melb:xfs-kern:203960a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-01-11 20:58:44 +11:00
Christoph Hellwig	42fe2b1f7f	[XFS] fix, speedup and simplify atime handling let the VFS handle atime updates and only sync back to the xfs inode when nessecary SGI-PV: 946679 SGI-Modid: xfs-linux-melb:xfs-kern:203362a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2006-01-11 15:35:17 +11:00
Nathan Scott	9dac13e7ff	[XFS] Remove unused type, xfs_gap_t. SGI-PV: 907752 SGI-Modid: xfs-linux:xfs-kern:23932a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 15:05:34 +11:00
Nathan Scott	7b71876980	[XFS] Update license/copyright notices to match the prefered SGI boilerplate. SGI-PV: 913862 SGI-Modid: xfs-linux:xfs-kern:23903a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 14:58:39 +11:00
Nathan Scott	a844f4510d	[XFS] Remove xfs_macros.c, xfs_macros.h, rework headers a whole lot. SGI-PV: 943122 SGI-Modid: xfs-linux:xfs-kern:23901a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 14:38:42 +11:00
Nathan Scott	30dab21abb	[XFS] Add a comment about the use of XFS_SIZE_TOKEN_WANT. SGI-PV: 936331 SGI-Modid: xfs-linux:xfs-kern:23827a Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-11-02 10:31:13 +11:00
Christoph Hellwig	efa8027804	[XFS] rewrite xfs_iflush_all SGI-PV: 936890 SGI-Modid: xfs-linux:xfs-kern:193349a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-06-21 15:37:17 +10:00
Christoph Hellwig	ba0f32d460	[XFS] mark various symbols static Patch from Adrian Bunk SGI-PV: 936255 SGI-Modid: xfs-linux:xfs-kern:192760a Signed-off-by: Christoph Hellwig <hch@sgi.com> Signed-off-by: Nathan Scott <nathans@sgi.com>	2005-06-21 15:36:52 +10:00
Nathan Scott	31b084aef3	[XFS] Fix up uses of nlink_t incorrectly restricting us to 2^16 links for some platforms SGI Modid: xfs-linux:xfs-kern:22032a Signed-off-by: Nathan Scott <nathans@sgi.com> Signed-off-by: Christoph Hellwig <hch@sgi.com>	2005-05-05 13:25:00 -07:00
Linus Torvalds	1da177e4c3	Linux-2.6.12-rc2 Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!	2005-04-16 15:20:36 -07:00

1 2 3 4 5

229 Commits