Commit Graph

28608 Commits

Author SHA1 Message Date
Pavel Shilovsky
24985c53d5 CIFS: Move r/wsize negotiating to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:27 -05:00
Pavel Shilovsky
7a5cfb1965 CIFS: Add SMB2 support for flush
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:27 -05:00
Pavel Shilovsky
1d8c4c0009 CIFS: Make flush code use ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:27 -05:00
Pavel Shilovsky
2ae78ba85c CIFS: Move reopen code to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:27 -05:00
Pavel Shilovsky
253641388a CIFS: Move create code use ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:27 -05:00
Pavel Shilovsky
b7546bc54c CIFS: Add SMB2 support for query_file_info
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
4ad6504453 CIFS: Move guery file info code to ops struct
and make cifs_get_file_info(_unix) calls static.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
f0df737ee8 CIFS: Add open/close file support for SMB2
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
0ff78a221b CIFS: Move close code to ops struct
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
fb1214e48f CIFS: Move open code to ops struct
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
4b4de76e35 CIFS: Replace netfid with cifs_fid struct in cifsFileInfo
This is help us to extend the code for future protocols that can use
another fid mechanism (as SMB2 that has it divided into two parts:
persistent and violatile).

Also rename variables and refactor the code around the changes.

Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
cbe6f439f5 CIFS: Add SMB2 support for unlink
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Pavel Shilovsky
ed6875e0d6 CIFS: Move unlink code to ops struct
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-24 21:46:26 -05:00
Benjamin Marzinski
2216db70c9 GFS2: Write out dirty inode metadata in delayed deletes
If a dirty GFS2 inode was being deleted but was in use by another node, its
metadata was not getting written out before GFS2 checked for dirty buffers in
gfs2_ail_flush().  GFS2 was relying on inode_go_sync() to write out the
metadata when the other node tried to free the file, but it failed the error
check before it got that far. This patch writes out the metadata before calling
gfs2_ail_flush()

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:30 +01:00
Eric Sandeen
a0b4df2943 GFS2: fix s_writers.counter imbalance in gfs2_ail_empty_gl
gfs2_ail_empty_gl() contains an "inline version" of gfs2_trans_begin(),
so it needs an explicit sb_start_intwrite() as well, to balance the
sb_end_intwrite() which will be called by gfs2_trans_end().

With this, xfstest 068 passes on lock_nolock local gfs2.
Without it, we reach a writer count of -1 and get stuck.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:29 +01:00
Bob Peterson
3701530aed GFS2: Fix infinite loop in rbm_find
This patch fixes an infinite loop in gfs2_rbm_find that was introduced
by the previous patch. The problem occurred when the length was less
than 3 but the rbm block was byte-aligned, causing it to improperly
return a extent length of zero, which caused it to spin.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Barry Marson <bmarson@redhat.com>
2012-09-24 10:47:27 +01:00
Steven Whitehouse
ff7f4cb461 GFS2: Consolidate free block searching functions
With the recently added block reservation code, an additional function
was added to search for free blocks. This had a restriction of only being
able to search for aligned extents of free blocks. As a result the
allocation patterns when reserving blocks were suboptimal when the
existing allocation of blocks for an inode was not aligned to the same
boundary.

This patch resolves that problem by adding the ability for gfs2_rbm_find
to search for extents of a particular minimum size. We can then use
gfs2_rbm_find for both looking for reservations, and also looking for
free blocks on an individual basis when we actually come to do the
allocation later on. As a result we only need a single set of code
to deal with both situations.

The function gfs2_rbm_from_block() is moved up rgrp.c so that it
occurs before all of its callers.

Many thanks are due to Bob for helping track down the final issue in
this patch. That fix to the rb_tree traversal and to not share
block reservations from a dirctory to its children is included here.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:26 +01:00
Jan Kara
56aa72d0fc GFS2: Get rid of I_MUTEX_QUOTA usage
GFS2 uses i_mutex on its system quota inode to synchronize writes to
quota file. Since this is an internal inode to GFS2 (not part of directory
hiearchy or visible by user) we are safe to define locking rules for it. So
let's just get it its own locking class to make it clear.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:24 +01:00
Bob Peterson
0688a5ecea GFS2: Stop block extents at the end of bitmaps
This patch stops multiple block allocations if a nonzero
return code is received from gfs2_rbm_from_block. Without
this patch, if enough pressure is put on the file system,
you get a kernel warning quickly followed by:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa04f47e8>] gfs2_alloc_blocks+0x2c8/0x880 [gfs2]
With this patch, things run normally.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:23 +01:00
Steven Whitehouse
c743ffd09f GFS2: Fix unclaimed_blocks() wrapping bug and clean up
When rgd->rd_free_clone is less than rgd->rd_reserved, the
unclaimed_blocks() calculation would wrap and produce
incorrect results. This patch checks for this condition
when this function is called from gfs2_mblk_search()

In addition, the use of this particular function in other
places in the code has been dropped by means of a general
clean up of gfs2_inplace_reserve(). This function is now
much easier to follow.

Also the setting of the rgd->rd_last_alloc field is corrected.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:21 +01:00
Steven Whitehouse
9e733d3923 GFS2: Improve block reservation tracing
This patch improves the tracing of block reservations by
removing some corner cases and also providing more useful
detail in the traces.

A new field is added to the reservation structure to contain
the inode number. This is used since in certain contexts it is
not possible to access the inode itself to obtain this information.
As a result we can then display the inode number for all tracepoints
and also in case we dump the resource group.

The "del" tracepoint operation has been removed. This could be called
with the reservation rgrp set to NULL. That resulted in not printing
the device number, and thus making the information largely useless
anyway. Also, the conditional on the rgrp being NULL can then be
removed from the tracepoint. After this change, all the block
reservation tracepoint calls will be called with the rgrp information.

The existing ins,clm and tdel calls to the block reservation tracepoint
are sufficient to track the entire life of the block reservation.

In gfs2_block_alloc() the error detection is updated to print out
the inode number of the problematic inode. This can then be compared
against the information in the glock dump,tracepoints, etc.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:20 +01:00
Steven Whitehouse
137834a696 GFS2: Fall back to ignoring reservations, if there are no other blocks left
When we get to the stage of allocating blocks, we know that the
resource group in question must contain enough free blocks, otherwise
gfs2_inplace_reserve() would have failed. So if we are left with only
free blocks which are reserved, then we must use those. This can happen
if another node has sneeked in and use some blocks reserved on this
node, for example. Generally this will happen very rarely and only
when the resouce group is nearly full.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:19 +01:00
Steven Whitehouse
2b9731e8bb GFS2: Fix ->show_options() for statfs slow
The ->show_options() function for GFS2 was not correctly displaying
the value when statfs slow in in use.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Milos Jakubicek <xjakub@fi.muni.cz>
2012-09-24 10:47:17 +01:00
Steven Whitehouse
3e6339dd28 GFS2: Use rbm for gfs2_setbit()
Use the rbm structure for gfs2_setbit() in order to simplify the
arguments to the function. We have to add a bool to control whether
the clone bitmap should be updated (if it exists) but otherwise it
is a more or less direct substitution.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:16 +01:00
Steven Whitehouse
c04a2ef3a8 GFS2: Use rbm for gfs2_testbit()
Change the arguments to gfs2_testbit() so that it now just takes an
rbm specifying the position of the two bit entry to return.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:14 +01:00
Bob Peterson
29c05b205d GFS2: Eliminate unnecessary check for state > 3 in bitfit
Function gfs2_bitfit was checking for state > 3, but that's
impossible since it is only called from rgblk_search, which receives
only GFS2_BLKST_ constants.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:13 +01:00
Bob Peterson
e5dc76b9af GFS2: Eliminate redundant calls to may_grant
Function add_to_queue was checking may_grant for the passed-in
holder for every iteration of its gh2 loop. Now it only checks it
once at the beginning to see if a try lock is futile.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:12 +01:00
Bob Peterson
81e1d45061 GFS2: Combine functions gfs2_glock_dq_wait and wait_on_demote
Function gfs2_glock_dq_wait called two-line function wait_on_demote,
so they were combined.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:10 +01:00
Bob Peterson
07a7904942 GFS2: Combine functions gfs2_glock_wait and wait_on_holder
Function gfs2_glock_wait only called function wait_on_holder and
returned its return code, so they were combined for readability.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:09 +01:00
Bob Peterson
4abb6ad9ea GFS2: inline __gfs2_glock_schedule_for_reclaim
Since function gfs2_glock_schedule_for_reclaim is only two
significant lines, we can eliminate it, simplifying the code
and making it more readable.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:07 +01:00
Bob Peterson
8e711e100f GFS2: change function gfs2_direct_IO to use a normal gfs2_glock_dq
This patch changes function gfs2_direct_IO so that it uses a normal
call to gfs2_glock_dq rather than a call to a multiple-dq of one item.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:06 +01:00
Bob Peterson
8d8b752a0f GFS2: rbm code cleanup
This patch fixes a few small rbm related things. First, it fixes
a corner case where the rbm needs to switch bitmaps and wasn't
adjusting its buffer pointer. Second, there's a white space issue
fixed. Third, the logic in function gfs2_rbm_from_block was optimized
a bit. Lastly, a check for goal block overflows was added to function
gfs2_alloc_blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:04 +01:00
Steven Whitehouse
5d50d53246 GFS2: Fix case where reservation finished at end of rgrp
One corner case which the original patch failed to take into
account was when there is a reservation which ended such that
the following block was one beyond the end of the rgrp in
question. This extra test fixes that case.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
2012-09-24 10:47:03 +01:00
Michel Lespinasse
24d634e8f3 GFS2: Use RB_CLEAR_NODE() rather than rb_init_node()
gfs2 calls RB_EMPTY_NODE() to check if nodes are not on an rbtree.
The corresponding initialization function is RB_CLEAR_NODE().
rb_init_node() was never clearly defined and is going away.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:02 +01:00
Steven Whitehouse
3b1d0b9d0b GFS2: Update rgblk_free() to use rbm
Replace open coded version with a call to gfs2_rbm_from_block()

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:47:00 +01:00
Steven Whitehouse
3983903a71 GFS2: Update gfs2_get_block_type() to use rbm
Use the new gfs2_rbm_from_block() function to replace an open
coded version of the same code.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:59 +01:00
Steven Whitehouse
5b924ae2dc GFS2: Replace rgblk_search with gfs2_rbm_find
This is part of a series of patches which are introducing the
gfs2_rbm structure throughout the block allocation code. The
main aim of this part is to create a search function which can
deal directly with struct gfs2_rbm. In this case it specifies
the initial position at which to start the search and also the
point at which the search terminates.

The net result of this is to clean up the search code and make
it rather more readable, and the various possible exceptions which
may occur during the search are partitioned into their own functions.

There are some bug fixes too. We should not be checking the reservations
while allocating extents - the time for that is when we are searching
for where to put the extent, not when we've already made that decision.

Also, rgblk_search had two uses, and in only one of those cases did
it make sense to check for reservations. This is fixed in the new
gfs2_rbm_find function, which has a cleaner interface.

The reservation checking has been improved by always checking for
contiguous reservations, and returning the first free block after
all contiguous reservations. This is done under the spin lock to
ensure consistancy of the tree.

The allocation of extents is now in all cases done by the existing
allocation code, and if there is an active reservation, that is updated
after the fact. Again this is done under the spin lock, since it entails
changing the lookup key for the reservation in question.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:57 +01:00
Steven Whitehouse
4a993fb150 GFS2: Add structure to contain rgrp, bitmap, offset tuple
This patch introduces a new structure, gfs2_rbm, which is a
tuple of a resource group, a bitmap within the resource group
and an offset within that bitmap. This is designed to make
manipulating these sets of variables easier. There is also a
new helper function which converts this representation back
to a disk block address.

In addition, the rbtree nodes which are used for the reservations
were not being correctly initialised, which is now fixed. Also,
the tracing was not passing through the inode where it should
have been. That is mostly fixed aside from one corner case. This
needs to be revisited since there can also be a NULL rgrp in
some cases which results in the device being incorrect in the
trace.

This is intended to be the first step towards cleaning up some
of the allocation code, and some further bug fixes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:56 +01:00
Steven Whitehouse
71f890f7f7 GFS2: Remove rs_requested field from reservations
The rs_requested field is left over from the original allocation
code, however this should have been a parameter passed to the
various functions from gfs2_inplace_reserve() and not a member of the
reservation structure as the value is not required after the
initial allocation.

This also helps simplify the code since we no longer need to set
the rs_requested to zero. Also the gfs2_inplace_release()
function can also be simplified since the reservation structure
will always be defined when it is called, and the only remaining
task is to unlock the rgrp if required. It can also now be
called unconditionally too, resulting in a further simplification.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:54 +01:00
Steven Whitehouse
1f98169743 GFS2: Merge two nearly identical xattr functions
There were two functions in the xattr code which were nearly
identical, the only difference being that one was copy data into
the unstuffed xattrs and the other was copying data out from it.

This patch merges the two functions such that the code which deal
with iteration over the unstuffed xattrs is no longer duplicated.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:53 +01:00
Al Viro
c5aa1e554a close the race in nlmsvc_free_block()
we need to grab mutex before the reference counter reaches 0

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-22 20:48:20 -04:00
Al Viro
156cacb1d0 do_add_mount()/umount -l races
normally we deal with lock_mount()/umount races by checking that
mountpoint to be is still in our namespace after lock_mount() has
been done.  However, do_add_mount() skips that check when called
with MNT_SHRINKABLE in flags (i.e. from finish_automount()).  The
reason is that ->mnt_ns may be a temporary namespace created exactly
to contain automounts a-la NFS4 referral handling.  It's not the
namespace of the caller, though, so check_mnt() would fail here.
We still need to check that ->mnt_ns is non-NULL in that case,
though.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-22 20:48:18 -04:00
Linus Torvalds
a4be6c77b5 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fix from Steve French.

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix return value in cifsConvertToUTF16
2012-09-22 12:36:57 -07:00
Linus Torvalds
789f95b788 xfs: bugfixes for 3.6-rc7
- fix a regression related to xfs_sync_worker racing with unmount.
 - fix a race while discarding xfs buffers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQWO3uAAoJENaLyazVq6ZOjfcP+gJkcJLS5+qmyNEcW2IUH0+E
 4WptMdBCLgZGa54aGAJ2mwg0FysyyTiTXjOSETRiBU+N3bAhgweucRsxc8z+awen
 L+InHr8YgQyAoY0nhEcXI/EuHaF9OlgVT6YCOqr/V4gtLO+aczovQS1wA3w/pjAk
 RWa4z+VlH+D9KenatoCcHSY6PIPO9pLs4Gfb7D/9BLFN+f6OnIaUlkwIQSuumuaw
 Lt/sw24/FEBYyzspmGfJT1fjDZK4VI4QoPEAVuvGiJCGFzSW2RDmlb48ZXsnGBbM
 f83tKjB7praQhXnBt56/S5YThgWzt8eaJVIhSExtEh1tisb5iWNQzVPk+USXUE9t
 DNTxtJjwiECbslyVYkTDUKnhdPGtHkpQSN96RBUDvQYfoLHQ/aXbxfPIZGEt24YM
 A/TbCFDFQrI91Rn3TkAxygvfOkxWxE9TB1PmwfgrJGFDWNxg84OBiCX9IMNi3NUF
 glqoKn6aI5fZH6gHVU7xA+bnfJYYRIxUtgIHJ1sYH6dH185G5Yj3m9bojcN7DnmM
 x1kLf0lscumgdB3OGLgpe5IrrFKM+ncclkS24X3eWOCvnWiEXBwajPqA8LloekZA
 X+IyGhoSfg2yRJAYEipRD+H0XouNM/AsLMcI/VbEoLGebxpsKCkg0VwCbd/4xISO
 90Q9jWXC4dzUVRc60rPw
 =ZcGP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 - fix a regression related to xfs_sync_worker racing with unmount.
 - fix a race while discarding xfs buffers.

* tag 'for-linus-v3.6-rc7' of git://oss.sgi.com/xfs/xfs:
  xfs: stop the sync worker before xfs_unmountfs
  xfs: fix race while discarding buffers [V4]
2012-09-21 12:43:01 -07:00
Linus Torvalds
e05e279e6f debugfs: fix u32_array race in format_array_alloc
The format_array_alloc() function is fundamentally racy, in that it
prints the array twice: once to figure out how much space to allocate
for the buffer, and the second time to actually print out the data.

If any of the array contents changes in between, the allocation size may
be wrong, and the end result may be truncated in odd ways.

Just don't do it.  Allocate a maximum-sized array up-front, and just
format the array contents once.  The only user of the u32_array
interfaces is the Xen spinlock statistics code, and it has 31 entries in
the arrays, so the maximum size really isn't that big, and the end
result is much simpler code without the bug.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-21 11:48:05 -07:00
David Rientjes
36048853c5 debugfs: fix race in u32_array_read and allocate array at open
u32_array_open() is racy when multiple threads read from a file with a
seek position of zero, i.e. when two or more simultaneous reads are
occurring after the non-seekable files are created.  It is possible that
file->private_data is double-freed because the threads races between

	kfree(file->private-data);

and

	file->private_data = NULL;

The fix is to only do format_array_alloc() when the file is opened and
free it when it is closed.

Note that because the file has always been non-seekable, you can't open
it and read it multiple times anyway, so the data has always been
generated just once.  The difference is that now it is generated at open
time rather than at the time of the first read, and that avoids the
race.

Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Raghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-21 10:28:17 -07:00
Jaeden Amero
84c3b84860 compat_ioctl: Add RS-485 IOCTLs to the list
The RS-485 TIOCSRS485 and TIOCGRS485 ioctls are 32-bit compatible, so
in order to call them on 64-bit systems from 32-bit user mode, we add
them to the ioctl pointer list as compatible ioctls.

Signed-off-by: Jaeden Amero <jaeden.amero@ni.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-21 09:51:09 -07:00
Eric W. Biederman
7223546586 userns: Convert the ufs filesystem to use kuid/kgid where appropriate
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 04:28:00 -07:00
Eric W. Biederman
c2ba138a27 userns: Convert the udf filesystem to use kuid/kgid where appropriate
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 04:18:54 -07:00
Eric W. Biederman
39241beb78 userns: Convert ubifs to use kuid/kgid
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:36 -07:00
Eric W. Biederman
61293ee274 userns: Convert squashfs to use kuid/kgid where appropriate
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:35 -07:00
Eric W. Biederman
df814654f3 userns: Convert reiserfs to use kuid and kgid where appropriate
Cc: reiserfs-devel@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:34 -07:00
Eric W. Biederman
c18cdc1a3e userns: Convert jfs to use kuid/kgid where appropriate
Cc: Dave Kleikamp <shaggy@kernel.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:33 -07:00
Eric W. Biederman
0cfe53d3c3 userns: Convert jffs2 to use kuid and kgid where appropriate
- General routine uid/gid conversion work
- When storing posix acls treat ACL_USER and ACL_GROUP separately
  so I can call from_kuid or from_kgid as appropriate.
- When reading posix acls treat ACL_USER and ACL_GROUP separately
  so I can call make_kuid or make_kgid as appropriate.

Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:33 -07:00
Eric W. Biederman
0e1a43c716 userns: Convert hpfs to use kuid and kgid where appropriate
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:32 -07:00
Eric W. Biederman
2f2f43d3c7 userns: Convert btrfs to use kuid/kgid where appropriate
Cc: Chris Mason <chris.mason@fusionio.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:31 -07:00
Eric W. Biederman
7f5b82b835 userns: Convert bfs to use kuid/kgid where appropriate
Cc: "Tigran A. Aivazian" <tigran@aivazian.fsnet.co.uk>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:31 -07:00
Eric W. Biederman
8fed10be00 userns: Convert affs to use kuid/kgid wherwe appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:30 -07:00
Eric W. Biederman
d2b31ca644 userns: Teach security_path_chown to take kuids and kgids
Don't make the security modules deal with raw user space uid and
gids instead pass in a kuid_t and a kgid_t so that security modules
only have to deal with internal kernel uids and gids.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: James Morris <james.l.morris@oracle.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:25 -07:00
Eric W. Biederman
29f82ae56e userns: Convert hostfs to use kuid and kgid where appropriate
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:23 -07:00
Eric W. Biederman
2f83ffa874 userns: Convert freevxfs to use kuid/kgid where appropriate
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:19 -07:00
Eric W. Biederman
a726ecce75 userns: Convert the sysv filesystem to use kuid/kgid where appropriate
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:18 -07:00
Eric W. Biederman
85a03d1bba userns: Convert the qnx6 filesystem to use kuid/kgid where appropriate
Cc: Kai Bankett <chaosman@ontika.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:17 -07:00
Eric W. Biederman
511728d778 userns: Convert the qnx4 filesystem to use kuid/kgid where appropriate
Acked-by: Anders Larsen <al@alarsen.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:17 -07:00
Eric W. Biederman
80fcbe751f userns: Convert omfs to use kuid and kgid where appropriate
Acked-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:16 -07:00
Eric W. Biederman
b29f7751c9 userns: Convert ntfs to use kuid and kgid where appropriate
Cc: Anton Altaparmakov <anton@tuxera.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:15 -07:00
Eric W. Biederman
305d3d0dbc userns: Convert nillfs2 to use kuid/kgid where appropriate
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:15 -07:00
Eric W. Biederman
f303bdc55e userns: Convert minix to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:14 -07:00
Eric W. Biederman
1a0a994ebe userns: Convert logfs to use kuid/kgid where appropriate
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:13 -07:00
Eric W. Biederman
ba64e2b9e3 userns: Convert isofs to use kuid/kgid where appropriate
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:12 -07:00
Eric W. Biederman
16525e3f14 userns: Convert hfsplus to use kuid and kgid where appropriate
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:12 -07:00
Eric W. Biederman
43b5e4ccd4 userns: Convert hfs to use kuid and kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:11 -07:00
Eric W. Biederman
d001b05365 userns: Convert exofs to use kuid/kgid where appropriate
Cc: Benny Halevy <bhalevy@tonian.com>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:10 -07:00
Eric W. Biederman
5d4ea4da6a userns: Convert efs to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:10 -07:00
Eric W. Biederman
cdf8c58a35 userns: Convert ecryptfs to use kuid/kgid where appropriate
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:09 -07:00
Eric W. Biederman
a7d9cfe97b userns: Convert cramfs to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:08 -07:00
Eric W. Biederman
31aba059bb userns: Convert befs to use kuid/kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:08 -07:00
Eric W. Biederman
c010d1ff4f userns: Convert adfs to use kuid and kgid where appropriate
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:07 -07:00
Eric W. Biederman
a0eb3a05a8 userns: Convert hugetlbfs to use kuid/kgid where appropriate
Note sysctl_hugetlb_shm_group can only be written in the root user
in the initial user namespace, so we can assume sysctl_hugetlb_shm_group
is in the initial user namespace.

Cc: William Irwin <wli@holomorphy.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:05 -07:00
Liu Bo
7dfd8cc536 fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
Argument @only_this_sb has been removed.

Signed-off-by: Liu Bo <liub.liubo@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-21 12:04:36 +02:00
Wang Sheng-Hui
44a075bde9 btrfs: fix the commment for the action flags in delayed-ref.h
The action field has been merged into struct btrfs_delayed_ref_node,
and no struct btrfs_delayed_ref is available now.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-21 08:07:12 +02:00
Eric W. Biederman
170782eb89 userns: Convert fat to use kuid/kgid where appropriate
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-20 06:11:55 -07:00
Ben Myers
0ba6e5368c xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
2012-09-18 16:51:26 -05:00
Jeff Layton
c73f693989 cifs: fix return value in cifsConvertToUTF16
This function returns the wrong value, which causes the callers to get
the length of the resulting pathname wrong when it contains non-ASCII
characters.

This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767

Cc: <stable@vger.kernel.org>
Reported-by: Baldvin Kovacs <baldvin.kovacs@gmail.com>
Reported-and-Tested-by: Nicolas Lefebvre <nico.lefebvre@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-18 15:35:25 -05:00
Miklos Szeredi
b161dfa693 vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()
IBM reported a soft lockup after applying the fix for the rename_lock
deadlock.  Commit c83ce989cb ("VFS: Fix the nfs sillyrename regression
in kernel 2.6.38") was found to be the culprit.

The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
dentry was killed.  This flag can be set on non-killed dentries too,
which results in infinite retries when trying to traverse the dentry
tree.

This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
only set in d_kill() and makes try_to_ascend() test only this flag.

IBM reported successful test results with this patch.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-18 11:23:51 -07:00
Eric W. Biederman
1a06d420ce userns: Convert quota
Now that the type changes are done, here is the final set of
changes to make the quota code work when user namespaces are enabled.

Small cleanups and fixes to make the code build when user namespaces
are enabled.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:42 -07:00
Eric W. Biederman
7b9c7321ca userns: Convert struct dquot_warn
Convert w_dq_id to be a struct kquid and remove the now unncessary
w_dq_type.

This is a simple conversion and enough other places have already
been converted that this actually reduces the code complexity
by a little bit, when removing now unnecessary type conversions.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:42 -07:00
Eric W. Biederman
4c376dcae8 userns: Convert struct dquot dq_id to be a struct kqid
Change struct dquot dq_id to a struct kqid and remove the now
unecessary dq_type.

Make minimal changes to dquot, quota_tree, quota_v1, quota_v2, ext3,
ext4, and ocfs2 to deal with the change in quota structures and
signatures.  The ocfs2 changes are larger than most because of the
extensive tracing throughout the ocfs2 quota code that prints out
dq_id.

quota_tree.c:get_index is modified to take a struct kqid instead of a
qid_t because all of it's callers pass in dquot->dq_id and it allows
me to introduce only a single conversion.

The rest of the changes are either just replacing dq_type with dq_id.type,
adding conversions to deal with the change in type and occassionally
adding qid_eq to allow quota id comparisons in a user namespace safe way.

Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Theodore Tso <tytso@mit.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:41 -07:00
Eric W. Biederman
aca645a6a5 userns: Modify dqget to take struct kqid
Modify dqget to take struct kqid instead of a type and an identifier
pair.

Modify the callers of dqget in ocfs2 and dquot to take generate
a struct kqid so they can continue to call dqget.  The conversion
to create struct kqid should all be the final conversions that
are needed in those code paths.

Cc: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:40 -07:00
Eric W. Biederman
431f19744d userns: Convert quota netlink aka quota_send_warning
Modify quota_send_warning to take struct kqid instead a type and
identifier pair.

When sending netlink broadcasts always convert uids and quota
identifiers into the intial user namespace.  There is as yet no way to
send a netlink broadcast message with different contents to receivers
in different namespaces, so for the time being just map all of the
identifiers into the initial user namespace which preserves the
current behavior.

Change the callers of quota_send_warning in gfs2, xfs and dquot
to generate a struct kqid to pass to quota send warning.  When
all of the user namespaces convesions are complete a struct kqid
values will be availbe without need for conversion, but a conversion
is needed now to avoid needing to convert everything at once.

Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:40 -07:00
Eric W. Biederman
74a8a10378 userns: Convert qutoactl
Update the quotactl user space interface to successfull compile with
user namespaces support enabled and to hand off quota identifiers to
lower layers of the kernel in struct kqid instead of type and qid
pairs.

The quota on function is not converted because while it takes a quota
type and an id.  The id is the on disk quota format to use, which
is something completely different.

The signature of two struct quotactl_ops methods were changed to take
struct kqid argumetns get_dqblk and set_dqblk.

The dquot, xfs, and ocfs2 implementations of get_dqblk and set_dqblk
are minimally changed so that the code continues to work with
the change in parameter type.

This is the first in a series of changes to always store quota
identifiers in the kernel in struct kqid and only use raw type and qid
values when interacting with on disk structures or userspace.  Always
using struct kqid internally makes it hard to miss places that need
conversion to or from the kernel internal values.

Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: Alex Elder <elder@kernel.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:39 -07:00
Eric W. Biederman
e8a3e4719b userns: Implement struct kqid
Add the data type struct kqid which holds the kernel internal form of
the owning identifier of a quota.  struct kqid is a replacement for
the implicit union of uid, gid and project id stored in an unsigned
int and the quota type field that is was used in the quota data
structures.  Making the data type explicit allows the kuid_t and
kgid_t type safety to propogate more thoroughly through the code,
revealing more places where uid/gid conversions need be made.

Along with the data type struct kqid comes the helper functions
qid_eq, qid_lt, from_kqid, from_kqid_munged, qid_valid, make_kqid,
make_kqid_invalid, make_kqid_uid, make_kqid_gid.

Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:38 -07:00
Eric W. Biederman
f76d207a66 userns: Add kprojid_t and associated infrastructure in projid.h
Implement kprojid_t a cousin of the kuid_t and kgid_t.

The per user namespace mapping of project id values can be set with
/proc/<pid>/projid_map.

A full compliment of helpers is provided: make_kprojid, from_kprojid,
from_kprojid_munged, kporjid_has_mapping, projid_valid, projid_eq,
projid_eq, projid_lt.

Project identifiers are part of the generic disk quota interface,
although it appears only xfs implements project identifiers currently.

The xfs code allows anyone who has permission to set the project
identifier on a file to use any project identifier so when
setting up the user namespace project identifier mappings I do
not require a capability.

Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:37 -07:00
Eric W. Biederman
69552c0c50 userns: Convert configfs to use kuid and kgid where appropriate
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-18 01:01:37 -07:00
Eric W. Biederman
af84df93ff userns: Convert extN to support kuids and kgids in posix acls
Convert ext2, ext3, and ext4 to fully support the posix acl changes,
using e_uid e_gid instead e_id.

Enabled building with posix acls enabled, all filesystems supporting
user namespaces, now also support posix acls when user namespaces are enabled.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-18 01:01:36 -07:00
Eric W. Biederman
5f3a4a28ec userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr
- Pass the user namespace the uid and gid values in the xattr are stored
   in into posix_acl_from_xattr.

 - Pass the user namespace kuid and kgid values should be converted into
   when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
  pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer.  In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
Eric W. Biederman
2f6f0654ab userns: Convert vfs posix_acl support to use kuids and kgids
- In setxattr if we are setting a posix acl convert uids and gids from
  the current user namespace into the initial user namespace, before
  the xattrs are passed to the underlying filesystem.

  Untranslatable uids and gids are represented as -1 which
  posix_acl_from_xattr will represent as INVALID_UID or INVALID_GID.
  posix_acl_valid will fail if an acl from userspace has any
  INVALID_UID or INVALID_GID values.  In net this guarantees that
  untranslatable posix acls will not be stored by filesystems.

- In getxattr if we are reading a posix acl convert uids and gids from
  the initial user namespace into the current user namespace.

  Uids and gids that can not be tranlsated into the current user namespace
  will be represented as -1.

- Replace e_id in struct posix_acl_entry with an anymouns union of
  e_uid and e_gid.  For the short term retain the e_id field
  until all of the users are converted.

- Don't set struct posix_acl.e_id in the cases where the acl type
  does not use e_id.  Greatly reducing the use of ACL_UNDEFINED_ID.

- Rework the ordering checks in posix_acl_valid so that I use kuid_t
  and kgid_t types throughout the code, and so that I don't need
  arithmetic on uid and gid types.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
Eric W. Biederman
e1760bd5ff userns: Convert the audit loginuid to be a kuid
Always store audit loginuids in type kuid_t.

Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.

Modify audit_get_loginuid to return a kuid_t.

Modify audit_set_loginuid to take a kuid_t.

Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.

Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-17 18:08:54 -07:00
Francesco Ruggeri
6bf6104573 fs/proc: fix potential unregister_sysctl_table hang
The unregister_sysctl_table() function hangs if all references to its
ctl_table_header structure are not dropped.

This can happen sometimes because of a leak in proc_sys_lookup():
proc_sys_lookup() gets a reference to the table via lookup_entry(), but
it does not release it when a subsequent call to sysctl_follow_link()
fails.

This patch fixes this leak by making sure the reference is always
dropped on return.

See also commit 076c3eed2c ("sysctl: Rewrite proc_sys_lookup
introducing find_entry and lookup_entry") which reorganized this code in
3.4.

Tested in Linux 3.4.4.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-17 10:32:03 -07:00
Ben Myers
4026c9fde9 xfs: stop the sync worker before xfs_unmountfs
Cancel work of the xfs_sync_worker before teardown of the log in
xfs_unmountfs.  This prevents occasional crashes on unmount like so:

PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
 #0 [c5377d28] crash_kexec at c0292c94
 #1 [c5377d80] oops_end at c07090c2
 #2 [c5377d98] no_context at c06f614e
 #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
 #4 [c5377df4] bad_area_nosemaphore at c06f629b
 #5 [c5377e00] do_page_fault at c070b0cb
 #6 [c5377e7c] error_code (via page_fault) at c070892c
    EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
    CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
 #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
 #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
 #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
#10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
#11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
#12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
#13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
#14 [c5377f58] process_one_work at c024ee4c
#15 [c5377f98] worker_thread at c024f43d
#16 [c5377fbc] kthread at c025326b
#17 [c5377fe8] kernel_thread_helper at c070e834

PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
 #0 [cde0fda0] __schedule at c0706595
 #1 [cde0fe28] schedule at c0706b89
 #2 [cde0fe30] schedule_timeout at c0705600
 #3 [cde0fe94] __down_common at c0706098
 #4 [cde0fec8] __down at c0706122
 #5 [cde0fed0] down at c025936f
 #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
 #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
 #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
 #9 [cde0ff1c] generic_shutdown_super at c0333d7a
#10 [cde0ff38] kill_block_super at c0333e0f
#11 [cde0ff48] deactivate_locked_super at c0334218
#12 [cde0ff58] deactivate_super at c033495d
#13 [cde0ff68] mntput_no_expire at c034bc13
#14 [cde0ff7c] sys_umount at c034cc69
#15 [cde0ffa0] sys_oldumount at c034ccd4
#16 [cde0ffb0] system_call at c0707e66

commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
at a later date.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
2012-09-17 12:05:22 -05:00
Greg Kroah-Hartman
8f949b9a7e Merge 3.6-rc7 into driver-core-next
This pulls in the fixes in that branch that are needed here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-16 16:51:27 -07:00
Linus Torvalds
6167f81fd1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull a btrfs revert from Chris Mason:
 "My for-linus branch has one revert in the new quota code.

  We're building up more fixes at etc for the next merge window, but I'm
  keeping them out unless they are bigger regressions or have a huge
  impact."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
2012-09-16 12:58:44 -07:00
David S. Miller
b48b63a1f6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/netfilter/nfnetlink_log.c
	net/netfilter/xt_LOG.c

Rather easy conflict resolution, the 'net' tree had bug fixes to make
sure we checked if a socket is a time-wait one or not and elide the
logging code if so.

Whereas on the 'net-next' side we are calculating the UID and GID from
the creds using different interfaces due to the user namespace changes
from Eric Biederman.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-15 11:43:53 -04:00
Linus Torvalds
3f0c3c8fe3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
 "Here are three GFS2 fixes for the current kernel tree.  These are all
  related to the block reservation code which was added at the merge
  window.  That code will be getting an update at the forthcoming merge
  window too.  In the mean time though there are a few smaller issues
  which should be fixed.

  The first patch resolves an issue with write sizes of greater than 32
  bits with the size hinting code.  The second ensures that the
  allocation data structure is initialised when using xattrs and the
  third takes into account allocations which may have been made by other
  nodes which affect a reservation on the local node."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Take account of blockages when using reserved blocks
  GFS2: Fix missing allocation data for set/remove xattr
  GFS2: Make write size hinting code common
2012-09-14 18:05:14 -07:00
Linus Torvalds
1547cb80db - Fixes a regression, introduced in 3.6-rc1, when a file is closed before its
shared memory mapping is dirtied and unmapped. The lower file was being
   released when the eCryptfs file was closed and the dirtied pages could not be
   written out.
 - Adds a call to the lower filesystem's ->flush() from ecryptfs_flush().
 - Fixes a regression, introduced in 2.6.39, when a file is renamed on top of
   another file. The target file's inode was not being evicted and the space
   taken by the file was not reclaimed until eCryptfs was unmounted.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCgAGBQJQU14UAAoJENaSAD2qAscKrpwQAK+Xu3RAHPLO16bEjhrFZaIr
 FY2z8yK5geFzHfmhDxUT7jovw8d2ghAjrE3XHf+PgrqfUqIZ6vo57GFmR8ZNS+30
 exNOb1DffaRqIejStrFkz+Ek4/FbnaPD/FvJYgC8D3Bv04YllyNI9Euq49VkkOiy
 v6jxAy4H+/MCGzYvR4K0jmKlOLuC47TtJ322jXTm0iNeJJH2+kNXm0WvE8dY7dYV
 2QMvv11M1HMI39S4yIY0eBPwbloP3AoO++H3Id7/XWcIlaaeZrVA7HmV8z04DEWq
 Nt2vceSbX4FVBxmLab5TzTftorskJRDLCtMaSM8opve+3Sxgx+wooLskgrHqJjZH
 8AtGfs/ZzvUkKcYMNv+iMax0/KqkGKOhdJMmqc37aI8qkBo5IYXpIPU1ZTy/S2T8
 t3fM8jEI4AOKFn7GlG0kZg7EXo0wWPw3A8X73tCQx8W6sJy5STvv07hFZspX/4nn
 +qhlM7Qzpr1XXPEv9cKo4oXzqOmlSi1Jd2ueGANNGJvche90uCWgmSZCfNNxGnPq
 mLiiiV6KJ94xjjyf/ZWiKLryk5rv0fWiYXzutaORADn6WWwIVKX9IyvJfFI8y+M6
 phREWNdOnyDFFA++h5QotWh3L7ZCL/HNeTpLDSD1ONxosVDhViviA2HrRlyl8LSc
 soDzN7w+yOh+QjlN4sO/
 =UC2V
 -----END PGP SIGNATURE-----

Merge tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:

 - Fixes a regression, introduced in 3.6-rc1, when a file is closed
   before its shared memory mapping is dirtied and unmapped.  The lower
   file was being released when the eCryptfs file was closed and the
   dirtied pages could not be written out.
 - Adds a call to the lower filesystem's ->flush() from
   ecryptfs_flush().
 - Fixes a regression, introduced in 2.6.39, when a file is renamed on
   top of another file.  The target file's inode was not being evicted
   and the space taken by the file was not reclaimed until eCryptfs was
   unmounted.

* tag 'ecryptfs-3.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: Copy up attributes of the lower target inode after rename
  eCryptfs: Call lower ->flush() from ecryptfs_flush()
  eCryptfs: Write out all dirty pages just before releasing the lower file
2012-09-14 17:53:55 -07:00
Chris Mason
f3a87f1b0c Revert "Btrfs: fix some error codes in btrfs_qgroup_inherit()"
This reverts commit 5986802c2f.

Both paths are not error paths but regular cases where non-qgroup
subvols are involved.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-09-14 20:06:30 -04:00
Linus Torvalds
55815f7014 vfs: make O_PATH file descriptors usable for 'fstat()'
We already use them for openat() and friends, but fstat() also wants to
be able to use O_PATH file descriptors.  This should make it more
directly comparable to the O_SEARCH of Solaris.

Note that you could already do the same thing with "fstatat()" and an
empty path, but just doing "fstat()" directly is simpler and faster, so
there is no reason not to just allow it directly.

See also commit 332a2e1244, which did the same thing for fchdir, for
the same reasons.

Reported-by: ольга крыжановская <olga.kryzhanovska@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org    # O_PATH introduced in 3.0+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-14 14:48:21 -07:00
Tyler Hicks
8335eafc28 eCryptfs: Copy up attributes of the lower target inode after rename
After calling into the lower filesystem to do a rename, the lower target
inode's attributes were not copied up to the eCryptfs target inode. This
resulted in the eCryptfs target inode staying around, rather than being
evicted, because i_nlink was not updated for the eCryptfs inode. This
also meant that eCryptfs didn't do the final iput() on the lower target
inode so it stayed around, as well. This would result in a failure to
free up space occupied by the target file in the rename() operation.
Both target inodes would eventually be evicted when the eCryptfs
filesystem was unmounted.

This patch calls fsstack_copy_attr_all() after the lower filesystem
does its ->rename() so that important inode attributes, such as i_nlink,
are updated at the eCryptfs layer. ecryptfs_evict_inode() is now called
and eCryptfs can drop its final reference on the lower inode.

http://launchpad.net/bugs/561129

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Tested-by: Colin Ian King <colin.king@canonical.com>
Cc: <stable@vger.kernel.org> [2.6.39+]
2012-09-14 09:36:03 -07:00
Tyler Hicks
64e6651dcc eCryptfs: Call lower ->flush() from ecryptfs_flush()
Since eCryptfs only calls fput() on the lower file in
ecryptfs_release(), eCryptfs should call the lower filesystem's
->flush() from ecryptfs_flush().

If the lower filesystem implements ->flush(), then eCryptfs should try
to flush out any dirty pages prior to calling the lower ->flush(). If
the lower filesystem does not implement ->flush(), then eCryptfs has no
need to do anything in ecryptfs_flush() since dirty pages are now
written out to the lower filesystem in ecryptfs_release().

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-09-14 09:35:54 -07:00
Catalin Marinas
0753f70f07 fs: Build sys_stat64() and friends if __ARCH_WANT_COMPAT_STAT64
On AArch64 Linux, we want the sys_stat64() and related functions for
compat support but do not need the generic struct stat64, enabled
automatically if __ARCH_WANT_STAT64.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
2012-09-14 17:15:34 +01:00
Tyler Hicks
7149f2558d eCryptfs: Write out all dirty pages just before releasing the lower file
Fixes a regression caused by:

821f749 eCryptfs: Revert to a writethrough cache model

That patch reverted some code (specifically, 32001d6f) that was
necessary to properly handle open() -> mmap() -> close() -> dirty pages
-> munmap(), because the lower file could be closed before the dirty
pages are written out.

Rather than reapplying 32001d6f, this approach is a better way of
ensuring that the lower file is still open in order to handle writing
out the dirty pages. It is called from ecryptfs_release(), while we have
a lock on the lower file pointer, just before the lower file gets the
final fput() and we overwrite the pointer.

https://launchpad.net/bugs/1047261

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Artemy Tregubenko <me@arty.name>
Tested-by: Artemy Tregubenko <me@arty.name>
Tested-by: Colin Ian King <colin.king@canonical.com>
2012-09-14 09:11:29 -07:00
Aristeu Rozanski
b9d6cfdeaf xattr: mark variable as uninitialized to make both gcc and smatch happy
new_xattr in __simple_xattr_set() is only initialized with a valid
pointer if value is not NULL, which only happens if this function is
called directly with the intention to remove an existing extended
attribute. Even being safe to be this way, smatch warns about possible
NULL dereference. Dan Carpenter suggested using uninitialized_var()
which will make both gcc and smatch happy.

Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-13 11:10:49 -07:00
Aristeu Rozanski
4895768b6a fs: add missing documentation to simple_xattr functions
v2: add function documentation instead of adding a separate file under
    Documentation/

tj: Updated comment a bit and rolled in Randy's suggestions.

Cc: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-09-13 11:08:47 -07:00
Steven Whitehouse
62e252eeef GFS2: Take account of blockages when using reserved blocks
The claim_reserved_blks() function was not taking account of
the possibility of "blockages" while performing allocation.
This can be caused by another node allocating something in
the same extent which has been reserved locally.

This patch tests for this condition and then skips the remainder
of the reservation in this case. This is a relatively rare event,
so that it should not affect the general performance improvement
which the block reservations provide.

The claim_reserved_blks() function also appears not to be able
to deal with reservations which cross bitmap boundaries, but
that can be dealt with in a future patch since we don't generate
boundary crossing reservations currently.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-by: David Teigland <teigland@redhat.com>
Cc: Bob Peterson <rpeterso@redhat.com>
2012-09-13 10:30:58 +01:00
Steven Whitehouse
645b2ccc75 GFS2: Fix missing allocation data for set/remove xattr
These entry points were missed in the original patch to allocate
this data structure.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-13 10:30:34 +01:00
Steven Whitehouse
da1dfb6af8 GFS2: Make write size hinting code common
This collects up the write size hinting code which is used by the
block reservation subsystem into a single function. At the same
time this also corrects the rounding for this calculation.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-13 10:30:00 +01:00
Linus Torvalds
22b4e63ebe NFS client bugfixes for Linux 3.6
- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
   should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQUN+KAAoJEGcL54qWCgDyHGcQAKj7MYVDIjhdmsVGGNWXUCnf
 X0LVg/ajh+vjusK+hmquzcJikZqgce5IU5DW4vcFr1X8BgP+R51UVvU0KksByD5H
 ourV2JVCztAQzQ4WWOsZAGqN0tooJUjyEjl4lEiDsQCF4Nk1HWbuCHeYuX74OToZ
 jrgedj0EZ6zb7TOizvbgU/7lI+FKu3Hlw6+u27M9phtSuefJdYSHZHYVMOX81qPh
 k0zgZ4tuLIaDuBB84iCrPwNt9icnevq6cIc+AGluI6xhDw+foPvUaUR+OUI420IZ
 tunNzP2So+nNoyjEiyMVENaCdEyA75XAmmGHTUUdBiVOsMV4HF/TqvTtSsjk2mN1
 FbZVvtjD6srjsQaKdVmqMIZBdhY9LSMLIQVqb4H2rYP6Mwq06WTuyCxf5YhzFfoy
 2tai7JuqBkTAWfKB8ESWywV6Qk/MkUWRAOBO6ksS66gAwpcFDj6nfeAdwaEmoYKc
 uzLUIRZaclPMZf661cs1fWeFV5XOnCL7je4owgTRGs7MHooWHPcC3273fEJqnhFz
 5MkC7nfmUiGcdO1v0mfYTEtMj9Pp9icBoZcVTGn4eZIHzvhhZOx//8LhyBfS+jll
 bKjaLZ1rErvIqwnSGcB7PK2yBYY9P6ZaxWjOrAAncZmiOxfhN0hvCo54jNOr/VZ+
 atsDEAuqSTeK7ouBqyO4
 =e5yE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Final (hopefully) fix for the range checking code in NFSv4 getacl.
   This should fix the Oopses being seen when the acl size is close to
   PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.

* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fsync() must exit with an error if page writeback failed
  SUNRPC: Fix a UDP transport regression
  NFS: return error from decode_getfh in decode open
  NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
  NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
  NFS: Fix a problem with the legacy binary mount code
  NFS: Fix the initialisation of the readdir 'cookieverf' array
2012-09-13 09:04:13 +08:00
Trond Myklebust
7b281ee026 NFS: fsync() must exit with an error if page writeback failed
We need to ensure that if the call to filemap_write_and_wait_range()
fails, then we report that error back to the application.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-11 15:38:32 -04:00
Linus Torvalds
44346cfe4d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull FUSE fixes from Miklos Szeredi:
 "This contains bugfixes for FUSE and CUSE and a compile warning fix."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: fix retrieve length
  fuse: mark variables uninitialized
  cuse: kill connection on initialization error
  cuse: fix fuse_conn_kill()
2012-09-11 09:29:17 +08:00
Linus Torvalds
e9bd8f1624 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix endianness conversion
  CIFS: Fix error handling in cifs_push_mandatory_locks
2012-09-11 05:52:49 +08:00
Linus Torvalds
1b0a9069de Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and ext3 fixes from Jan Kara:
 "One UDF data corruption fix and one ext3 fix where we didn't write
  everything to disk on fsync in one corner case."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Fix data corruption for files in ICB
  ext3: Fix fdatasync() for files with only i_size changes
2012-09-11 05:51:35 +08:00
Eric W. Biederman
15e473046c netlink: Rename pid to portid to avoid confusion
It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 15:30:41 -04:00
Sasha Levin
2b75bc9121 dlm: check the maximum size of a request from user
device_write only checks whether the request size is big enough, but it doesn't
check if the size is too big.

At that point, it also tries to allocate as much memory as the user has requested
even if it's too much. This can lead to OOM killer kicking in, or memory corruption
if (count + 1) overflows.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2012-09-10 09:50:27 -05:00
Mimi Zohar
9957a5043e ima: add inode_post_setattr call
Changing an inode's metadata may result in our not needing to appraise
the file.  In such cases, we must remove 'security.ima'.

Changelog v1:
- use ima_inode_post_setattr() stub function, if IMA_APPRAISE not configured

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
2012-09-07 14:57:46 -04:00
Mimi Zohar
4199d35cbc vfs: move ima_file_free before releasing the file
ima_file_free(), called on __fput(), currently flags files that have
changed, so that the file is re-measured.  For appraising a files's
integrity, the file's hash must be re-calculated and stored in the
'security.ima' xattr to reflect any changes.

This patch moves the ima_file_free() call to before releasing the file
in preparation of ima-appraisal measuring the file and updating the
'security.ima' xattr.

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
2012-09-07 14:57:27 -04:00
Mimi Zohar
2ab51f3721 vfs: extend vfs_removexattr locking
This patch takes the i_mutex lock before security_inode_removexattr(),
instead of after, in preparation of calling ima_inode_removexattr().

Signed-off-by: Mimi Zohar <zohar@us.ibm.com>
Signed-off-by: Dmitry Kasatkin <dmitry.kasatkin@nokia.com>
2012-09-07 14:57:27 -04:00
Eric W. Biederman
7dc05881b6 userns: Convert debugfs to use kuid/kgid where appropriate.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-06 19:02:52 -07:00
Weston Andros Adamson
01913b49cf NFS: return error from decode_getfh in decode open
If decode_getfh failed, nfs4_xdr_dec_open would return 0 since the last
decode_* call must have succeeded.

Cc: stable@vger.kernel.org
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-06 16:01:33 -04:00
Pavel Shilovsky
b2ede58e98 CIFS: Fix endianness conversion
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-06 12:42:35 -05:00
Pavel Shilovsky
e2f2886a82 CIFS: Fix error handling in cifs_push_mandatory_locks
Cc: <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-09-06 12:42:31 -05:00
Trond Myklebust
1f1ea6c2d9 NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
Pass the checks made by decode_getacl back to __nfs4_get_acl_uncached
so that it knows if the acl has been truncated.

The current overflow checking is broken, resulting in Oopses on
user-triggered nfs4_getfacl calls, and is opaque to the point
where several attempts at fixing it have failed.
This patch tries to clean up the code in addition to fixing the
Oopses by ensuring that the overflow checks are performed in
a single place (decode_getacl). If the overflow check failed,
we will still be able to report the acl length, but at least
we will no longer attempt to cache the acl or copy the
truncated contents to user space.

Reported-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Sachin Prabhu <sprabhu@redhat.com>
2012-09-06 11:11:53 -04:00
Wang Sheng-Hui
527a136138 btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
It should be storing, not sotring.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-06 11:11:14 +02:00
Jan Kara
9c2fc0de1a udf: Fix data corruption for files in ICB
When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).

Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.

CC: <stable@vger.kernel.org> # >= 2.6.24
Reported-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-09-05 16:06:03 +02:00
Yanchuan Nian
ca1868309b vfs: fix kerneldoc for generic_fh_to_parent()
Wrong function name in the kerneldoc description of generic_fh_to_parent().

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-05 10:59:30 +02:00
Michael Schutte
caaa357dc0 Input: Add KD[GS]KBDIACRUC ioctls to the compatible list
Allow handling of Unicode compose sequences by 32-bit apps on a 64-bit
system.  The issue has been reported in <http://bugs.debian.org/540534>
and <http://lists.altlinux.org/pipermail/kbd/2009-December/000235.html>.

A formal check of the two affected ioctls in drivers/char/vt_ioctl.c
(introduced in 04c71976) and a test using x86 kbd 1.15.1 on a so patched
x86_64 kernel both confirm that KD[GS]KBDIACRUC are ioctl32()
compatible.

Signed-off-by: Michael Schutte <michi@uiae.at>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2012-09-04 23:21:46 -07:00
Robert P. J. Day
6f1cbd4a25 sysfs: Fix comment typo "sysf_create_link".
More pedantry.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-09-04 16:11:31 -07:00
Trond Myklebust
21f498c2f7 NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
Ensure that the user supplied buffer size doesn't cause us to overflow
the 'pages' array.

Also fix up some confusion between the use of PAGE_SIZE and
PAGE_CACHE_SIZE when calculating buffer sizes. We're not using
the page cache for anything here.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-04 14:52:43 -04:00
Trond Myklebust
872ece86ea NFS: Fix a problem with the legacy binary mount code
Apparently, am-utils is still using the legacy binary mountdata interface,
and is having trouble parsing /proc/mounts due to the 'port=' field being
incorrectly set.

The following patch should fix up the regression.

Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-09-04 14:52:43 -04:00
Trond Myklebust
c3f52af3e0 NFS: Fix the initialisation of the readdir 'cookieverf' array
When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd764 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).

At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.

Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881

Reported-by: Andi Kleen <andi@firstfloor.org>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2012-09-04 14:52:42 -04:00
Miklos Szeredi
c9e67d4837 fuse: fix retrieve length
In some cases fuse_retrieve() would return a short byte count if offset was
non-zero.  The data returned was correct, though.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
2012-09-04 18:45:54 +02:00
Jan Kara
156bddd8e5 ext3: Fix fdatasync() for files with only i_size changes
Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

CC: <stable@vger.kernel.org> # >= 2.6.32
Reported-by: Kristian Nielsen <knielsen@knielsen-hq.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-09-04 00:04:43 +02:00
Daniel Mack
381bf7cad9 fuse: mark variables uninitialized
gcc 4.6.3 complains about uninitialized variables in fs/fuse/control.c:

  CC      fs/fuse/control.o
fs/fuse/control.c: In function 'fuse_conn_congestion_threshold_write':
fs/fuse/control.c:165:29: warning: 'val' may be used uninitialized in this function [-Wuninitialized]
fs/fuse/control.c: In function 'fuse_conn_max_background_write':
fs/fuse/control.c:128:23: warning: 'val' may be used uninitialized in this function [-Wuninitialized]

fuse_conn_limit_write() will always return non-zero unless the &val
is modified, so the warning is misleading. Let the compiler know
about it by marking 'val' with 'uninitialized_var'.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Brian Foster <bfoster@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-09-03 17:44:06 +02:00
Linus Torvalds
5b716ac728 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix cifs_do_create error hadnling
  cifs: print error code if smb signature verification fails
  CIFS: Fix log messages in packet checking for SMB2
  CIFS: Protect i_nlink from being negative
2012-09-02 11:30:10 -07:00
Anatol Pomozov
4907cb7b19 treewide: fix comment/printk/variable typos
Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-01 10:33:05 -07:00
Peter Meerwald
1856b225ca nfs: comment fix
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-01 10:09:44 -07:00
Liu Bo
8bad3c0244 btrfs: fix comment typo in btrfs_finish_ordered_io
Fix typo errors in comments of btrfs_finish_ordered_io.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-09-01 08:27:57 -07:00
David S. Miller
c32f38619a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Merge the 'net' tree to get the recent set of netfilter bug fixes in
order to assist with some merge hassles Pablo is going to have to deal
with for upcoming changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 15:14:18 -04:00
Artem Bityutskiy
3668b70fcf UBIFS: print less
UBIFS currently prints a lot of information when it mounts a volume, which
bothers some people. Make it less chatty - print only important information
by default.

Get rid of 'dbg_msg()' macro completely.

Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:58 +03:00
Artem Bityutskiy
6b38d03f48 UBIFS: use pr_ helper instead of printk
Use 'pr_err()' instead of 'printk(KERN_ERR', etc.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:57 +03:00
Artem Bityutskiy
79fda5179a UBIFS: comply with coding style
Join all the split printk lines in order to stop checkpatch complaining.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:57 +03:00
Artem Bityutskiy
43457c60c8 UBIFS: use __aligned() attribute
.. instead of __attribute__((aligned())).

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:57 +03:00
Jiri Slaby
e2441b4319 UBIFS: remove __DATE__ and __TIME__
This tag is useless and it breaks automatic builds. It causes rebuilds
for packages that depend on kernel for no real reason.

Further, quoting Michal, who removed most of the users already:
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:57 +03:00
Artem Bityutskiy
8089ed7928 UBIFS: fix power cut emulation for mtdram
The power cut emulation did not work correctly because we corrupted more than
one max. I/O unit in the buffer and then wrote the entire buffer. This lead to
recovery errors because UBIFS complained about corrupted free space. And this
was easily reproducible on mtdram because max. write size is very small there
(64 bytes), and we could easily have a 1KiB buffer, corrupt 128 bytes there,
and then write the entire buffer.

The fix is to corrupt max. write size bytes at most, and write only up to the
last corrupted max. write size chunk, not the entire buffer.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2012-08-31 17:32:38 +03:00
Miklos Szeredi
8d39d801d6 cuse: kill connection on initialization error
Luca Risolia reported that a CUSE daemon will continue to run even if
initialization of the emulated device failes for some reason (e.g. the device
number is already registered by another driver).

This patch disconnects the fuse device on error, which will make the userspace
CUSE daemon exit, albeit without indication about what the problem was.

Reported-by: Luca Risolia <luca.risolia@studio.unibo.it>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-08-30 19:24:35 +02:00
Miklos Szeredi
bbd9979797 cuse: fix fuse_conn_kill()
fuse_conn_kill() removed fc->entry, called fuse_ctl_remove_conn() and
fuse_bdi_destroy().  None of which is appropriate for cuse cleanup.

The fuse_ctl_remove_conn() decrements the nlink on the control filesystem, which
is totally bogus.  The others are harmless but unnecessary.

So move these out from fuse_conn_kill() to fuse_put_super() where they belong.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2012-08-30 19:24:34 +02:00
Carlos Maiolino
6fb8a90aa3 xfs: fix race while discarding buffers [V4]
While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.

This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:

if (!list_empty(&bp->b_lru))

If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.

The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

dchinner, sandeen, aquini and aris also deserve credits for this.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-29 15:01:11 -05:00
Linus Torvalds
318e151019 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I've split out the big send/receive update from my last pull request
  and now have just the fixes in my for-linus branch.  The send/recv
  branch will wander over to linux-next shortly though.

  The largest patches in this pull are Josef's patches to fix DIO
  locking problems and his patch to fix a crash during balance.  They
  are both well tested.

  The rest are smaller fixes that we've had queued.  The last rc came
  out while I was hacking new and exciting ways to recover from a
  misplaced rm -rf on my dev box, so these missed rc3."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
  Btrfs: fix that repair code is spuriously executed for transid failures
  Btrfs: fix ordered extent leak when failing to start a transaction
  Btrfs: fix a dio write regression
  Btrfs: fix deadlock with freeze and sync V2
  Btrfs: revert checksum error statistic which can cause a BUG()
  Btrfs: remove superblock writing after fatal error
  Btrfs: allow delayed refs to be merged
  Btrfs: fix enospc problems when deleting a subvol
  Btrfs: fix wrong mtime and ctime when creating snapshots
  Btrfs: fix race in run_clustered_refs
  Btrfs: don't run __tree_mod_log_free_eb on leaves
  Btrfs: increase the size of the free space cache
  Btrfs: barrier before waitqueue_active
  Btrfs: fix deadlock in wait_for_more_refs
  btrfs: fix second lock in btrfs_delete_delayed_items()
  Btrfs: don't allocate a seperate csums array for direct reads
  Btrfs: do not strdup non existent strings
  Btrfs: do not use missing devices when showing devname
  Btrfs: fix that error value is changed by mistake
  Btrfs: lock extents as we map them in DIO
  ...
2012-08-29 11:36:22 -07:00
Stefan Behrens
256dd1bb37 Btrfs: fix that repair code is spuriously executed for transid failures
If verify_parent_transid() fails for all mirrors, the current code
calls repair_io_failure() anyway which means:
- that the disk block is rewritten without repairing anything and
- that a kernel log message is printed which misleadingly claims
  that a read error was corrected.

This is an example:
parent transid verify failed on 615015833600 wanted 110423 found 110424
parent transid verify failed on 615015833600 wanted 110423 found 110424
btrfs read error corrected: ino 1 off 615015833600 (dev /dev/...)

It is wrong to ignore the results from verify_parent_transid() and to
call repair_eb_io_failure() when the verification of the transids failed.
This commit fixes the issue.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28 16:53:43 -04:00
Liu Bo
d280e5be94 Btrfs: fix ordered extent leak when failing to start a transaction
We cannot just return error before freeing ordered extent and releasing reserved
space when we fail to start a transacion.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28 16:53:42 -04:00
Liu Bo
24c03fa5cf Btrfs: fix a dio write regression
This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
(Btrfs: lock extents as we map them in DIO).

In dio write, we should unlock the section which we didn't do IO on in case that
we fall back to buffered write.  But we need to not only unlock the section
but also cleanup reserved space for the section.

This bug was found while running xfstests 133, with this 133 no longer complains.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28 16:53:41 -04:00
Josef Bacik
bd7de2c9a4 Btrfs: fix deadlock with freeze and sync V2
We can deadlock with freeze right now because we unconditionally start a
transaction in our ->sync_fs() call.  To fix this just check and see if we
have a running transaction to commit.  This saves us from the deadlock
because at this point we'll have the umount sem for the sb so we're safe
from freezes coming in after we've done our check.  With this patch the
freeze xfstests no longer deadlocks.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28 16:53:40 -04:00
Stefan Behrens
5ee0844d64 Btrfs: revert checksum error statistic which can cause a BUG()
Commit 442a4f6308 added btrfs device
statistic counters for detected IO and checksum errors to Linux 3.5.
The statistic part that counts checksum errors in
end_bio_extent_readpage() can cause a BUG() in a subfunction:
"kernel BUG at fs/btrfs/volumes.c:3762!"
That part is reverted with the current patch.
However, the counting of checksum errors in the scrub context remains
active, and the counting of detected IO errors (read, write or flush
errors) in all contexts remains active.

Cc: stable <stable@vger.kernel.org> # 3.5
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-08-28 16:53:39 -04:00
Stefan Behrens
68ce9682a4 Btrfs: remove superblock writing after fatal error
With commit acce952b0, btrfs was changed to flag the filesystem with
BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
error happened like a write I/O errors of all mirrors.
In such situations, on unmount, the superblock is written in
btrfs_error_commit_super(). This is done with the intention to be able
to evaluate the error flag on the next mount. A warning is printed
in this case during the next mount and the log tree is ignored.

The issue is that it is possible that the superblock points to a root
that was not written (due to write I/O errors).
The result is that the filesystem cannot be mounted. btrfsck also does
not start and all the other btrfs-progs tools fail to start as well.
However, mount -o recovery is working well and does the right things
to recover the filesystem (i.e., don't use the log root, clear the
free space cache and use the next mountable root that is stored in the
root backup array).

This patch removes the writing of the superblock when
BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
flag in the mount function.

These lines can be used to reproduce the issue (using /dev/sdm):
SCRATCH_DEV=/dev/sdm
SCRATCH_MNT=/mnt
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
ls -alLF /dev/mapper/foo
mkfs.btrfs /dev/mapper/foo
mount /dev/mapper/foo $SCRATCH_MNT
echo bar > $SCRATCH_MNT/foo
sync
echo 0 25165824 error | dmsetup reload foo
dmsetup resume foo
ls -alF $SCRATCH_MNT
touch $SCRATCH_MNT/1
ls -alF $SCRATCH_MNT
sleep 35
echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
dmsetup resume foo
sleep 1
umount $SCRATCH_MNT
btrfsck /dev/mapper/foo
dmsetup remove foo

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-08-28 16:53:38 -04:00
Josef Bacik
ae1e206b80 Btrfs: allow delayed refs to be merged
Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
Basically what happens is the balance relocates a tree block which will drop
the implicit refs for all of its children and adds a full backref.  Once the
block is relocated we have to add the implicit refs back, so when we cow the
block again we add the implicit refs for its children back.  The problem
comes when the original drop ref doesn't get run before we add the implicit
refs back.  The delayed ref stuff will specifically prefer ADD operations
over DROP to keep us from freeing up an extent that will have references to
it, so we try to add the implicit ref before it is actually removed and we
panic.  This worked fine before because the add would have just canceled the
drop out and we would have been fine.  But the backref walking work needs to
be able to freeze the delayed ref stuff in time so we have this ever
increasing sequence number that gets attached to all new delayed ref updates
which makes us not merge refs and we run into this issue.

So to fix this we need to merge delayed refs.  So everytime we run a
clustered ref we need to try and merge all of its delayed refs.  The backref
walking stuff locks the delayed ref head before processing, so if we have it
locked we are safe to merge any refs inside of the sequence number.  If
there is no sequence number we can merge all refs.  Doing this not only
fixes our bug but keeps the delayed ref code from adding and removing
useless refs and batching together multiple refs into one search instead of
one search per delayed ref, which will really help our commit times.  I ran
this with Daniels test and 276 and I haven't seen any problems.  Thanks,

Reported-by: Daniel J Blueman <daniel@quora.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:38 -04:00
Josef Bacik
5a24e84c55 Btrfs: fix enospc problems when deleting a subvol
Subvol delete is a special kind of awful where we use the global reserve to
cover the ENOSPC requirements.  The problem is once we're done removing
everything we do a btrfs_update_inode(), which by default will try to do the
delayed update stuff which will use it's own reserve.  There will be no
space in this reserve and we'll return ENOSPC.  So instead use
btrfs_update_inode_fallback() which will just fallback to updating the inode
item in the case of enospc.  This is fine because the global reserve covers
the space requirements for this.  With this patch I can now delete a subvol
on a problem image Dave Sterba sent me.  Thanks,

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:37 -04:00
Miao Xie
c0f62dedd0 Btrfs: fix wrong mtime and ctime when creating snapshots
When we created a new snapshot, the mtime and ctime of its parent directory
were not updated. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:36 -04:00
Arne Jansen
22cd2e7de7 Btrfs: fix race in run_clustered_refs
With commit

commit d1270cd91f
Author: Arne Jansen <sensille@gmx.net>
Date:   Tue Sep 13 15:16:43 2011 +0200

     Btrfs: put back delayed refs that are too new

I added a window where the delayed_ref's head->ref_mod code can diverge
from the sum of the remaining refs, because we release the head->mutex
in the middle. This leads to btrfs_lookup_extent_info returning wrong
numbers. This patch fixes this by adjusting the head's ref_mod with each
delayed ref we run.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:35 -04:00
Chris Mason
b12a3b1ea2 Btrfs: don't run __tree_mod_log_free_eb on leaves
When we split a leaf, we may end up inserting a new root on top of that
leaf.  The reflog code was incorrectly assuming the old root was always
a node.  This makes sure we skip over leaves.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:34 -04:00
Josef Bacik
6fc823b10f Btrfs: increase the size of the free space cache
Arne was complaining about the space cache having mismatching generation
numbers when debugging a deadlock.  This is because we can run out of space
in our preallocated range for our space cache if you have a pretty
fragmented amount of space in your pinned space.  So just increase the
amount of space we preallocate for space cache so we can be sure to have
enough space.  This will only really affect data ranges since their the only
chunks that end up larger than 256MB.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:34 -04:00
Josef Bacik
66657b318e Btrfs: barrier before waitqueue_active
We need a barrir before calling waitqueue_active otherwise we will miss
wakeups.  So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:33 -04:00
Arne Jansen
1fa11e265f Btrfs: fix deadlock in wait_for_more_refs
Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:32 -04:00
Fengguang Wu
6209526531 btrfs: fix second lock in btrfs_delete_delayed_items()
Fix a real bug caught by coccinelle.

fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-08-28 16:53:31 -04:00
Josef Bacik
c329861da4 Btrfs: don't allocate a seperate csums array for direct reads
We've been allocating a big array for csums instead of storing them in the
io_tree like we do for buffered reads because previously we were locking the
entire range, so we didn't have an extent state for each sector of the
range.  But now that we do the range locking as we map the buffers we can
limit the mapping lenght to sectorsize and use the private part of the
io_tree for our csums.  This allows us to avoid an extra memory allocation
for direct reads which could incur latency.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:30 -04:00
Josef Bacik
99f5944b84 Btrfs: do not strdup non existent strings
When we close devices we add back empty devices for some reason that escapes
me.  In the case of a missing dev we don't allocate an rcu_string for it's
name, so check to see if the device has a name and if it doesn't don't
bother strdup()'ing it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:29 -04:00
Josef Bacik
aa9ddcd4b5 Btrfs: do not use missing devices when showing devname
If you do the following

mkfs.btrfs /dev/sdb /dev/sdc
rmmod btrfs
dd if=/dev/zero of=/dev/sdb bs=1M count=1
mount -o degraded /dev/sdc /mnt/btrfs-test

the box will panic trying to deref the name for the missing dev since it is
the lower numbered devid.  So fix show_devname to not use missing devices.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:29 -04:00
Stefan Behrens
3627bf4503 Btrfs: fix that error value is changed by mistake
In iterate_inodes_from_logical() the error result from
extent_from_logical() is patched by mistake. Typically ENOENT is
patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK)
evaluates to true.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-28 16:53:28 -04:00
Josef Bacik
eb838e73dc Btrfs: lock extents as we map them in DIO
A deadlock in xfstests 113 was uncovered by commit

d187663ef2

This is because we would not return EIOCBQUEUED for short AIO reads, instead
we'd wait for the DIO to complete and then return the amount of data we
transferred, which would allow our stuff to unlock the remaning amount.  But
with this change this no longer happens, so if we have a short AIO read (for
example if we try to read past EOF), we could leave the section from EOF to
the end of where we tried to read locked.  Fixing this is tricky since there
is no clear way to know exactly how much data DIO truly submitted for IO, so
to make this less hard on ourselves and less combersome we need to lock the
extents as we try to map them, and then we unlock any areas we didn't
actually map.  This makes us completely safe from deadlocks and reliance on
a particular behavior of the DIO code.  This also lays the groundwork for
allowing us to use the normal csum storage method for reads which means we
can remove an allocation.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:27 -04:00
Dan Carpenter
dadd1105ca Btrfs: fix some endian bugs handling the root times
"trans->transid" is cpu endian but we want to store the data as little
endian.  "item->ctime.nsec" is only 32 bits, not 64.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:26 -04:00
Dan Carpenter
55e591ffde Btrfs: unlock on error in btrfs_delalloc_reserve_metadata()
We should release this mutex before returning the error code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:25 -04:00
Dan Carpenter
57a5a88203 Btrfs: checking for NULL instead of IS_ERR
add_qgroup_rb() never returns NULL, only error pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:25 -04:00
Dan Carpenter
5986802c2f Btrfs: fix some error codes in btrfs_qgroup_inherit()
These are returning zero when it should be returning a negative error
code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:24 -04:00
Stefan Behrens
aa2ffd0616 Btrfs: fix a misplaced address operator in a condition
This should obviously not be "if (&flag)" but "if (flag)".

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-28 16:53:23 -04:00
Kees Cook
82aceae4f0 debugfs: more tightly restrict default mount mode
Since the debugfs is mostly only used by root, make the default mount
mode 0700. Most system owners do not need a more permissive value,
but they can choose to weaken the restrictions via their fstab.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-27 13:42:02 -07:00
Greg Kroah-Hartman
9db48aaf18 Merge 3.6-rc3 into driver-core-next
This picks up the printk fixes in 3.6-rc3 that are needed in this branch.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-08-27 07:08:39 -07:00
Linus Torvalds
89a897fbd8 Pull request from git://github.com/prasad-joshi/logfs_upstream.git
There are few important bug fixes for LogFS
 
 9f0bbd8 logfs: query block device for number of pages to send with bio
 	This BUG was found when LogFS was used on KVM. The patch fixes
 	the problem by asking for underlaying block device the number
 	of pages to send with each BIO.
 
 41b93bc logfs: maintain the ordering of meta-inode destruction
 	LogFS maintains file system meta-data in special inodes. These
 	inodes are releated to each other, therefore they must be
 	destroyed in a proper order.
 
 ddb24bb logfs: create a pagecache page if it is not present
 
 cd8bfa9 logfs: initialize the number of iovecs in bio
 	LogFS used to panic when it was created on an encrypted LVM
 	volume. The patch fixes the problem by properly initializing
 	the BIO.
 
 d2dcd90 logfs: destroy the reserved inodes while unmounting
 
 Diffstat:
  fs/logfs/dev_bdev.c  |   15 ++++++++-------
  fs/logfs/inode.c     |   18 +-----------------
  fs/logfs/journal.c   |    2 +-
  fs/logfs/readwrite.c |    1 +
  fs/logfs/segment.c   |    2 +-
  5 files changed, 12 insertions(+), 26 deletions(-)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJQNmnnAAoJEDFA/f+3K+ZNhc0QAJWWtcvqCIU2UHKw0DaQieH8
 X+YwoqdH5UBR0vylyoHCgCRSoRuttYEy47DYDhkpUdqad60atDhwIwhFaknlNZbt
 +hv2ve6hZTzcs42HZhu+0WUzMkUs0s6Xjuu9JlZND59zk1wWBfttDlKI2/SdjMlm
 GnW0nx5gLFZ1rmpaL8bdRpyLXxUvb9FmUc+YWsuinj8A2Lnqg5bNOmJZ3CiMYNVk
 UvbHDYJmNaMZndbYeXZtxXtUo9Uk4HsU+7whpmgD+OPz7h+VMOgfXnwkpsgmtbY2
 qTXAjyFVpN23zhBTbpCMXvpfbrffdBJBfkEW+2sXqpr9IHbMX0Y9cb/XJ3Ub1Qz+
 HRLdBh0iAZewWVRsGOKG2OU1WUrTNzKUAJ795QFL0c0bZsODzQ+5OlcD7rBzEMpq
 ioN49UtJWlmckWaott6PinSl/OKWlvz771Zayh9+ttuL3Dvt6coV7K3ns5MI6glN
 M9DTMd8GAW2Kdz80EyAjZYWw2M3lbs0/GghJB4ozYg3mrDpBq1w5ouEvSNw8PuJw
 yoUEe2xLGpLZdzQDouKMlrai8kohCyMoHKH1Kimx+iG4LZkYLy8VJepY1mfzQbEY
 1zSB1nCV/c67qI3fxRdPNmJ7YtjTC9ero0qOouXwSJWwZ+OEikGXOKWnjRutcY1C
 Jfbfg6gNqOrZZbYWp1ke
 =eS/c
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream

Pull LogFS bugfixes from Prasad Joshi:

 - "logfs: query block device for number of pages to send with bio"

	This BUG was found when LogFS was used on KVM. The patch fixes
	the problem by asking for underlaying block device the number
	of pages to send with each BIO.

 - "logfs: maintain the ordering of meta-inode destruction"

	LogFS maintains file system meta-data in special inodes. These
	inodes are releated to each other, therefore they must be
	destroyed in a proper order.

 - "logfs: initialize the number of iovecs in bio"

	LogFS used to panic when it was created on an encrypted LVM
	volume. The patch fixes the problem by properly initializing
	the BIO.

Plus a couple more:
 - logfs: create a pagecache page if it is not present
 - logfs: destroy the reserved inodes while unmounting

* tag 'for-linus' of git://github.com/prasad-joshi/logfs_upstream:
  logfs: query block device for number of pages to send with bio
  logfs: maintain the ordering of meta-inode destruction
  logfs: create a pagecache page if it is not present
  logfs: initialize the number of iovecs in bio
  logfs: destroy the reserved inodes while unmounting
2012-08-26 10:14:11 -07:00
Linus Torvalds
e1d33a5c63 xfs: bugfixes for 3.6-rc4
- fix uninitialised variable in xfs_rtbuf_get()
 - unlock the AGI buffer when looping in xfs_dialloc
 - check for possible overflow in xfs_ioc_trim
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQN7FMAAoJENaLyazVq6ZOrwsQAMB2Cg/t/XeiPFD5QtvFX7Ua
 9cS3klSo9p29Xr5oeo54b3mF+5dycS61PQu2zdg4XhU5ZqoWoS6xUIFxzo3yWiVi
 nC/BbgMQTgkfEQs+2ulmXRgP8A2DdakSDmvAZDQNURZQwH4zykU0tPKD9V009TMW
 SaIprIWQqhRrsd9prrv364FgKv3/vyuFL1Agho304o3ynk+2SLDzm3eqdTaXTD9Z
 9OGP5Hs5Mh9CZmr46MIpRnoaHppqNRZh7EbDyMFL8kMN8F2yOdJk5MgA1m7gPJ5Z
 ADfnv3laxEp04hhD5lazePYJwZlg/KB77Y4YXD5hmTSznIxcQNlfRcMV9ju4FGgr
 Nb5SgA/6iBk5dkZyuyIdqCzj1Mt6SimpIWb3+/vcS8+VaqQUsyoufccO0zrgeoIA
 /O1PtCe//haFENlD+hMc9QrQSISsaR/YYUv0yz4YcsxKsdmyYpbHdTyFhVjnkKoS
 BtA0ZOgWYebgEzr7J1ijW085CrDvAc9jsPujOWmZlBJWcfo1rtVfqhXWll0ms5cr
 ZRoWqLDqoOhjlUDqwMN4dk6ENxG4QbSl6AWXXVXrUvtvaNzoi0MBNX3HPt/RVmF5
 rGKFvLuIxBHAfD1xYCqtCfP9yzeQzDy1SPdASForEZOyM+RVUnOf9i+1nhbsu/97
 AhiW4lozQetWzx+RQwU+
 =+FuN
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.6-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
 - fix uninitialised variable in xfs_rtbuf_get()
 - unlock the AGI buffer when looping in xfs_dialloc
 - check for possible overflow in xfs_ioc_trim

* tag 'for-linus-v3.6-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: check for possible overflow in xfs_ioc_trim
  xfs: unlock the AGI buffer when looping in xfs_dialloc
  xfs: fix uninitialised variable in xfs_rtbuf_get()
2012-08-25 11:47:06 -07:00
Linus Torvalds
8497ae61d0 Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
  Heilman for their testing and debugging help."

* 'for-3.6' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
  svcrpc: sends on closed socket should stop immediately
  svcrpc: fix BUG() in svc_tcp_clear_pages
  nfsd4: fix security flavor of NFSv4.0 callback
2012-08-25 11:43:41 -07:00
Linus Torvalds
a7e546f175 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block-related fixes from Jens Axboe:

 - Improvements to the buffered and direct write IO plugging from
   Fengguang.

 - Abstract out the mapping of a bio in a request, and use that to
   provide a blk_bio_map_sg() helper.  Useful for mapping just a bio
   instead of a full request.

 - Regression fix from Hugh, fixing up a patch that went into the
   previous release cycle (and marked stable, too) attempting to prevent
   a loop in __getblk_slow().

 - Updates to discard requests, fixing up the sizing and how we align
   them.  Also a change to disallow merging of discard requests, since
   that doesn't really work properly yet.

 - A few drbd fixes.

 - Documentation updates.

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: replace __getblk_slow misfix by grow_dev_page fix
  drbd: Write all pages of the bitmap after an online resize
  drbd: Finish requests that completed while IO was frozen
  drbd: fix drbd wire compatibility for empty flushes
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update missing index files in block/00-INDEX
  block: move down direct IO plugging
  block: remove plugging at buffered write time
  block: disable discard request merge temporarily
  bio: Fix potential memory leak in bio_find_or_create_slab()
  block: Don't use static to define "void *p" in show_partition_start()
  block: Add blk_bio_map_sg() helper
  block: Introduce __blk_segment_map_sg() helper
  fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices
  block: split discard into aligned requests
  block: reorganize rounding of max_discard_sectors
2012-08-25 11:36:43 -07:00
Aristeu Rozanski
38f3865744 xattr: extract simple_xattr code from tmpfs
Extract in-memory xattr APIs from tmpfs. Will be used by cgroup.

$ size vmlinux.o
   text    data     bss     dec     hex filename
4658782  880729 5195032 10734543         a3cbcf vmlinux.o
$ size vmlinux.o
   text    data     bss     dec     hex filename
4658957  880729 5195032 10734718         a3cc7e vmlinux.o

v7:
- checkpatch warnings fixed
- Implement the changes requested by Hugh Dickins:
	- make simple_xattrs_init and simple_xattrs_free inline
	- get rid of locking and list reinitialization in simple_xattrs_free,
	  they're not needed
v6:
- no changes
v5:
- no changes
v4:
- move simple_xattrs_free() to fs/xattr.c
v3:
- in kmem_xattrs_free(), reinitialize the list
- use simple_xattr_* prefix
- introduce simple_xattr_add() to prevent direct list usage

Original-patch-by: Li Zefan <lizefan@huawei.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lennart Poettering <lpoetter@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-24 15:55:33 -07:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
Jeff Liu
b686d1f79a xfs: xfs_seek_hole() refinement with hole searching from page cache for unwritten extents
xfs_seek_hole() refinement with hole searching from page cache for unwritten extent.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-24 13:57:10 -05:00
Jeff Liu
52f1acc8b5 xfs: xfs_seek_data() refinement with unwritten extents check up from page cache
xfs_seek_data() refinement with unwritten extents check up from page cache.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-24 13:56:29 -05:00
Jeff Liu
d126d43f63 xfs: Introduce a helper routine to probe data or hole offset from page cache
Introduce helpers to probe data or hole offset from page cache.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-24 13:55:09 -05:00
Jeff Liu
834ab12228 xfs: Remove type argument from xfs_seek_data()/xfs_seek_hole()
The type is already indicated by the function naming explicitly, so this argument
can be omitted from those calls.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-24 13:48:05 -05:00
Carlos Maiolino
e599b3253c xfs: fix race while discarding buffers [V4]
While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
buffers from lru list), there is a possibility to have xfs_buf_stale() racing
with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
it.

This happens because xfs_buftarg_shrink() handle the dispose list without
locking and the test condition in xfs_buf_stale() checks for the buffer being in
*any* list:

if (!list_empty(&bp->b_lru))

If the buffer happens to be on dispose list, this causes the buffer counter of
lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
and another in xfs_buf_stale()) causing a wrong account usage of the lru list.

This may cause xfs_buftarg_shrink() to return a wrong value to the memory
shrinker shrink_slab(), and such account error may also cause an underflowed
value to be returned; since the counter is lower than the current number of
items in the lru list, a decrement may happen when the counter is 0, causing
an underflow on the counter.

The fix uses a new flag field (and a new buffer flag) to serialize buffer
handling during the shrink process. The new flag field has been designed to use
btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.

dchinner, sandeen, aquini and aris also deserve credits for this.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-24 13:46:10 -05:00
Linus Torvalds
62688e5b64 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF, ext3 & reiserfs fixes from Jan Kara:
 "A couple of fixes (udf, reiserfs, ext3) that accumulated over my
  vacation."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: fix retun value on error path in udf_load_logicalvol
  jbd: don't write superblock when unmounting an ro filesystem
  reiserfs: fix deadlocks with quotas
  quota: Move down dqptr_sem read after initializing default warn[] type at __dquot_alloc_space().
  UDF: During mount free lvid_bh before rescanning with different blocksize
  udf: fix udf_setsize() for file data in ICB
2012-08-23 21:56:22 -07:00
Linus Torvalds
f4673d6f18 UBIFS fixes for 3.6:
1. Fix crash on error which prevents emulated power-cut testing.
 2. Fix log reply regression introduced in 3.6-rc1.
 3. Fix UBIFS complaints about too small debug buffer size which.
 4. Fix error message spelling, and remove incorrect commentary.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQNdHYAAoJECmIfjd9wqK0t+cQANEkF1Xgs8cd/5U++tQWCGyu
 EInLm+416ytTy4Vg2bLwlJ4OnUhgtTSTeKovtp3yoSKytxB4vf50qI5P/gInXSxo
 mI1KkvHZLraFzFPGtnbXGCg0YS8xZLIhw2p8VDuNFcxZIEiuAGZ2qMC9tPzzzle6
 em/yDIumR//oioFHOqVNu6wKE3do9QY5fo0meqE6QTliKppSDhWcO9MCfodkFOCp
 WtlD2dfIQv3SdaZW+X8skbqMFQ9KJRAzk59hl1Cm2jlXEHEHYmvGwqTCSOk8W/0c
 3ZSgUVW1YKBYq25yA3Xdz6aDCgRT6LYsi/EH0VNNMlt/H9RfcCNTGe96ioTcgsOT
 FroLZZUK9PiQYmqUqnk8eqmUyq888vKZY3Sbxt5/bbZhSoK6aYzNgfLlkZNfnHDL
 unzpyPqYq9VahuTgY5ravDclPKCc2p7xeWJYeOE9JtUY6/fVKJh9qERrtVoNP+nD
 LjS89G/tNBn9ZPk8KyXgikJ4JZ7sw30vySmBdGtx+ckzXqKTsDtLznvdSU3/XiJM
 zdqYV3gulgoJtIPXFVuI3i3GeKyXuJlPpjNimuFn+9mNElZfrvs6RcGbQ3Df/3Mc
 omfTrRxS+vRadGC9MZoIMCV27p7+cEynbLLVXtvPebwqiO1tV1fCspdJGjJmkbvL
 14hi7hTYvYCtMR+m8i2+
 =VvYb
 -----END PGP SIGNATURE-----

Merge tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifs

Pull UBIFS fixes from Artem Bityutskiy:
 - Fix crash on error which prevents emulated power-cut testing.
 - Fix log reply regression introduced in 3.6-rc1.
 - Fix UBIFS complaints about too small debug buffer size which.
 - Fix error message spelling, and remove incorrect commentary.

* tag 'upstream-3.6-rc3' of git://git.infradead.org/linux-ubifs:
  UBIFS: fix error messages spelling
  UBIFS: fix complaints about too small debug buffer size
  UBIFS: fix replay regression
  UBIFS: fix crash on error path
  UBIFS: remove stale commentary
2012-08-23 21:50:40 -07:00
Tomas Racek
a672e1be30 xfs: check for possible overflow in xfs_ioc_trim
If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.

Signed-off-by: Tomas Racek <tracek@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23 14:48:44 -05:00
Christoph Hellwig
7612903099 xfs: unlock the AGI buffer when looping in xfs_dialloc
Also update some commens in the area to make the code easier to read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23 14:48:32 -05:00
Dave Chinner
0b9e3f6d84 xfs: fix uninitialised variable in xfs_rtbuf_get()
Results in this assert failure in generic/090:

XFS: Assertion failed: *nmap >= 1, file: fs/xfs/xfs_bmap.c, line: 4363
.....
Call Trace:
 [<ffffffff814680db>] xfs_bmapi_read+0x6b/0x370
 [<ffffffff814b64b2>] xfs_rtbuf_get+0x42/0x130
 [<ffffffff814b6f09>] xfs_rtget_summary+0x89/0x120
 [<ffffffff814b7bfe>] xfs_rtallocate_extent_size+0xce/0x340
 [<ffffffff814b89f0>] xfs_rtallocate_extent+0x240/0x290
 [<ffffffff81462c1a>] xfs_bmap_rtalloc+0x1ba/0x340
 [<ffffffff81463a65>] xfs_bmap_alloc+0x35/0x40
 [<ffffffff8146f111>] xfs_bmapi_allocate+0xf1/0x350
 [<ffffffff8146f9de>] xfs_bmapi_write+0x66e/0xa60
 [<ffffffff8144538a>] xfs_iomap_write_direct+0x22a/0x3f0
 [<ffffffff8143707b>] __xfs_get_blocks+0x38b/0x5d0
 [<ffffffff814372d4>] xfs_get_blocks_direct+0x14/0x20
 [<ffffffff811b0081>] do_blockdev_direct_IO+0xf71/0x1eb0
 [<ffffffff811b1015>] __blockdev_direct_IO+0x55/0x60
 [<ffffffff814355ca>] xfs_vm_direct_IO+0x11a/0x1e0
 [<ffffffff8112d617>] generic_file_direct_write+0xd7/0x1b0
 [<ffffffff8143e16c>] xfs_file_dio_aio_write+0x13c/0x320
 [<ffffffff8143e6f2>] xfs_file_aio_write+0x1c2/0x1d0
 [<ffffffff81174a07>] do_sync_write+0xa7/0xe0
 [<ffffffff81175288>] vfs_write+0xa8/0x160
 [<ffffffff81175702>] sys_pwrite64+0x92/0xb0
 [<ffffffff81b68f69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-08-23 14:48:16 -05:00