Commit Graph

399799 Commits

Author SHA1 Message Date
Linus Torvalds
68cf8d0c72 Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "After merge window, no new stuff this time only a collection of neatly
  confined and simple fixes"

* 'for-3.12/core' of git://git.kernel.dk/linux-block:
  cfq: explicitly use 64bit divide operation for 64bit arguments
  block: Add nr_bios to block_rq_remap tracepoint
  If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
  bio-integrity: Fix use of bs->bio_integrity_pool after free
  blkcg: relocate root_blkg setting and clearing
  block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
  block: trace all devices plug operation
2013-09-22 15:00:11 -07:00
Linus Torvalds
0fbf2cc983 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "These are mostly bug fixes and a two small performance fixes.  The
  most important of the bunch are Josef's fix for a snapshotting
  regression and Mark's update to fix compile problems on arm"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
  Btrfs: create the uuid tree on remount rw
  btrfs: change extent-same to copy entire argument struct
  Btrfs: dir_inode_operations should use btrfs_update_time also
  btrfs: Add btrfs: prefix to kernel log output
  btrfs: refuse to remount read-write after abort
  Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
  Btrfs: don't leak transaction in btrfs_sync_file()
  Btrfs: add the missing mutex unlock in write_all_supers()
  Btrfs: iput inode on allocation failure
  Btrfs: remove space_info->reservation_progress
  Btrfs: kill delay_iput arg to the wait_ordered functions
  Btrfs: fix worst case calculator for space usage
  Revert "Btrfs: rework the overcommit logic to be based on the total size"
  Btrfs: improve replacing nocow extents
  Btrfs: drop dir i_size when adding new names on replay
  Btrfs: replay dir_index items before other items
  Btrfs: check roots last log commit when checking if an inode has been logged
  Btrfs: actually log directory we are fsync()'ing
  Btrfs: actually limit the size of delalloc range
  Btrfs: allocate the free space by the existed max extent size when ENOSPC
  ...
2013-09-22 14:58:49 -07:00
Anatol Pomozov
f3cff25f05 cfq: explicitly use 64bit divide operation for 64bit arguments
'samples' is 64bit operant, but do_div() second parameter is 32.
do_div silently truncates high 32 bits and calculated result
is invalid.

In case if low 32bit of 'samples' are zeros then do_div() produces
kernel crash.

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-22 12:43:47 -06:00
Linus Torvalds
c43a3855f4 NFS client bugfix for 3.12
- Fix a regression due to incorrect sharing of gss auth caches
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJSPev8AAoJEGcL54qWCgDy+2UP/3ZyJbL8qIpbgdOdlCIFYHZ7
 +8Z87XH31i+9hzKejSnmTURdRgWl7f0ehEOMG4soPF/4vpc1Ji03Xo0Iunq6AR3R
 0JiKSPHJR+j1IiTdQR+HL127+ymUEECsWKubm0ZYOgphxhGy2e95sbU3C7w3wRry
 kIul7+oVdmp6bXUbKpGqRx3SiT5H0YunGO0dBD7SWHJP4cQIPVNKd1ErRF9EAMqc
 MhOTdy04hoYbL4AdZH95MGW8/l6t+djO8DRwI89Gfw1g2ybqZjbU72Ur+FotU09H
 dQmyFuoiyRazO+VEInfvngYdtZ3w3ZfBqxQdq7rEhbrDnH0tLi+e49VjUFeNavr+
 c+xeC169hzggplARaeCtMQkvulV5ucI6pQJyVZiOqiIpiXJmwAlhZzkuYmqdfISx
 uZy43dgD64APoOeDcGmvqCnPfhl2gtGfEO36hGZVru4sZ5YaomgqA9gFUtkNnP3S
 YUQovg/g8zCZ4F35AHFvYaRmbBUbqjhzEIamX4gAotd71+zFevrX6R00oFWluhJD
 8WUdp4GzfSlIE9Ns6Ef9jIEba8M/1l9Vwzaa3MpbdtaDRSBVNhuz7Tm+3K/KeE8e
 QL8zqyQbgVihPQyBpMl1ukxnTwrTvEkVDPQtsP+5J9ArasYjU5kTgjVAzmeTgOdR
 3YxnAh7LMTKAoyItXta9
 =44+y
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfix from Trond Myklebust:
 "Fix a regression due to incorrect sharing of gss auth caches"

* tag 'nfs-for-3.12-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  RPCSEC_GSS: fix crash on destroying gss auth
2013-09-21 15:59:41 -07:00
Jun'ichi Nomura
75afb35299 block: Add nr_bios to block_rq_remap tracepoint
Adding the number of bios in a remapped request to 'block_rq_remap'
tracepoint.

Request remapper clones bios in a request to track the completion
status of each bio. So the number of bios can be useful information
for investigation.

Related discussions:
  http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
  http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-09-21 13:57:47 -06:00
Josef Bacik
94aebfb2e7 Btrfs: create the uuid tree on remount rw
Users have been complaining of the uuid tree stuff warning that there is no uuid
root when trying to do snapshot operations.  This is because if you mount -o ro
we will not create the uuid tree.  But then if you mount -o rw,remount we will
still not create it and then any subsequent snapshot/subvol operations you try
to do will fail gloriously.  Fix this by creating the uuid_root on remount rw if
it was not already there.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:50:43 -04:00
Mark Fasheh
cbf8b8ca3e btrfs: change extent-same to copy entire argument struct
btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
back to it's argument struct. Unfortunately, not all architectures provide
__put_user_unaligned(), so compiles break on them if btrfs is selected.

Instead, just copy the whole struct in / out at the start and end of
operations, respectively.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:31 -04:00
Guangyu Sun
93fd63c2f0 Btrfs: dir_inode_operations should use btrfs_update_time also
Commit 2bc5565286 (Btrfs: don't update atime on
RO subvolumes) ensures that the access time of an inode is not updated when
the inode lives in a read-only subvolume.
However, if a directory on a read-only subvolume is accessed, the atime is
updated. This results in a write operation to a read-only subvolume. I
believe that access times should never be updated on read-only subvolumes.

To reproduce:

 # mkfs.btrfs -f /dev/dm-3
 (...)
 # mount /dev/dm-3 /mnt
 # btrfs subvol create /mnt/sub
 	Create subvolume '/mnt/sub'
 # mkdir /mnt/sub/dir
 # echo "abc" > /mnt/sub/dir/file
 # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
 	Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
 # stat /mnt/rosnap/dir
 	File: `/mnt/rosnap/dir'
 	Size: 8         Blocks: 0          IO Block: 4096   directory
 Device: 16h/22d    Inode: 257         Links: 1
 Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
 	Access: 2013-09-11 07:21:49.389157126 -0400
 	Modify: 2013-09-11 07:22:02.330156079 -0400
 	Change: 2013-09-11 07:22:02.330156079 -0400
 # ls /mnt/rosnap/dir
 	file
 # stat /mnt/rosnap/dir
 	File: `/mnt/rosnap/dir'
 	Size: 8         Blocks: 0          IO Block: 4096   directory
 Device: 16h/22d    Inode: 257         Links: 1
 Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
 	Access: 2013-09-11 07:22:56.797151670 -0400
 	Modify: 2013-09-11 07:22:02.330156079 -0400
 	Change: 2013-09-11 07:22:02.330156079 -0400

Reported-by: Koen De Wit <koen.de.wit@oracle.com>
Signed-off-by: Guangyu Sun <guangyu.sun@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:30 -04:00
Frank Holton
5138cccf34 btrfs: Add btrfs: prefix to kernel log output
The kernel log entries for device label %s and device fsid %pU
are missing the btrfs: prefix. Add those here.

Signed-off-by: Frank Holton <fholton@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:30 -04:00
David Sterba
6ef3de9c92 btrfs: refuse to remount read-write after abort
It's still possible to flip the filesystem into RW mode after it's
remounted RO due to an abort. There are lots of places that check for
the superblock error bit and will not write data, but we should not let
the filesystem appear read-write.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:30 -04:00
chandan
1cecf579d1 Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
subvolume by passing a subvolume id of 0.

Signed-off-by: chandan <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:29 -04:00
Filipe David Borba Manana
a0634be562 Btrfs: don't leak transaction in btrfs_sync_file()
In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
return without ending/freeing the transaction.

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:29 -04:00
Stefan Behrens
a724b43690 Btrfs: add the missing mutex unlock in write_all_supers()
The BUG() was replaced by btrfs_error() and return -EIO with the
patch "get rid of one BUG() in write_all_supers()", but the missing
mutex_unlock() was overlooked.

The 0-DAY kernel build service from Intel reported the missing
unlock which was found by the coccinelle tool:

    fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:28 -04:00
Josef Bacik
f4ab9ea706 Btrfs: iput inode on allocation failure
We don't do the iput when we fail to allocate our delayed delalloc work in
__start_delalloc_inodes, fix this.

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:28 -04:00
Josef Bacik
363e4d354e Btrfs: remove space_info->reservation_progress
This isn't used for anything anymore, just remove it.

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:27 -04:00
Josef Bacik
f0de181c9b Btrfs: kill delay_iput arg to the wait_ordered functions
This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it.  However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:27 -04:00
Josef Bacik
c4fbb4300a Btrfs: fix worst case calculator for space usage
Forever ago I made the worst case calculator say that we could potentially split
into 3 blocks for every level on the way down, which isn't right.  If we split
we're only going to get two new blocks, the one we originally cow'ed and the new
one we're going to split.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:27 -04:00
Josef Bacik
14575aef42 Revert "Btrfs: rework the overcommit logic to be based on the total size"
This reverts commit 70afa3998c.  It is causing
performance issues and wasn't actually correct.  There were problems with the
way we flushed delalloc and that was the real cause of the early enospc.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:26 -04:00
Josef Bacik
652f25a292 Btrfs: improve replacing nocow extents
Various people have hit a deadlock when running btrfs/011.  This is because when
replacing nocow extents we will take the i_mutex to make sure nobody messes with
the file while we are replacing the extent.  The problem is we are already
holding a transaction open, which is a locking inversion, so instead we need to
save these inodes we find and then process them outside of the transaction.

Further we can't just lock the inode and assume we are good to go.  We need to
lock the extent range and then read back the extent cache for the inode to make
sure the extent really still points at the physical block we want.  If it
doesn't we don't have to copy it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:26 -04:00
Josef Bacik
d555438b6e Btrfs: drop dir i_size when adding new names on replay
So if we have dir_index items in the log that means we also have the inode item
as well, which means that the inode's i_size is correct.  However when we
process dir_index'es we call btrfs_add_link() which will increase the
directory's i_size for the new entry.  To fix this we need to just set the dir
items i_size to 0, and then as we find dir_index items we adjust the i_size.
btrfs_add_link() will do it for new entries, and if the entry already exists we
can just add the name_len to the i_size ourselves.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:25 -04:00
Josef Bacik
dd8e721773 Btrfs: replay dir_index items before other items
A user reported a bug where his log would not replay because he was getting
-EEXIST back.  This was because he had a file moved into a directory that was
logged.  What happens is the file had a lower inode number, and so it is
processed first when replaying the log, and so we add the inode ref in for the
directory it was moved to.  But then we process the directories DIR_INDEX item
and try to add the inode ref for that inode and it fails because we already
added it when we replayed the inode.  To solve this problem we need to just
process any DIR_INDEX items we have in the log first so this all is taken care
of, and then we can replay the rest of the items.  With this patch my reproducer
can remount the file system properly instead of erroring out.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:25 -04:00
Josef Bacik
a5874ce6ce Btrfs: check roots last log commit when checking if an inode has been logged
Liu introduced a local copy of the last log commit for an inode to make sure we
actually log an inode even if a log commit has already taken place.  In order to
make sure we didn't relog the same inode multiple times he set this local copy
to the current trans when we log the inode, because usually we log the inode and
then sync the log.  The exception to this is during rename, we will relog an
inode if the name changed and it is already in the log.  The problem with this
is then we go to sync the inode, and our check to see if the inode has already
been logged is tripped and we don't sync the log.  To fix this we need to _also_
check against the roots last log commit, because it could be less than what is
in our local copy of the log commit.  This fixes a bug where we rename a file
into a directory and then fsync the directory and then on remount the directory
is no longer there.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:24 -04:00
Josef Bacik
de2b530bfb Btrfs: actually log directory we are fsync()'ing
If you just create a directory and then fsync that directory and then pull the
power plug you will come back up and the directory will not be there.  That is
because we won't actually create directories if we've logged files inside of
them since they will be created on replay, but in this check we will set our
logged_trans of our current directory if it happens to be a directory, making us
think it doesn't need to be logged.  Fix the logic to only do this to parent
directories.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:24 -04:00
Josef Bacik
573aecafca Btrfs: actually limit the size of delalloc range
So forever we have had this thing to limit the amount of delalloc pages we'll
setup to be written out to 128mb.  This is because we have to lock all the pages
in this range, so anything above this gets a bit unweildly, and also without a
limit we'll happily allocate gigantic chunks of disk space.  Turns out our check
for this wasn't quite right, we wouldn't actually limit the chunk we wanted to
write out, we'd just stop looking for more space after we went over the limit.
So if you do a giant 20gb dd on my box with lots of ram I could get 2gig
extents.  This is fine normally, except when you go to relocate these extents
and we can't find enough space to relocate these moster extents, since we have
to be able to allocate exactly the same sized extent to move it around.  So fix
this by actually enforcing the limit.  With this patch I'm no longer seeing
giant 1.5gb extents.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:24 -04:00
Miao Xie
a482039889 Btrfs: allocate the free space by the existed max extent size when ENOSPC
By the current code, if the requested size is very large, and all the extents
in the free space cache are small, we will waste lots of the cpu time to cut
the requested size in half and search the cache again and again until it gets
down to the size the allocator can return. In fact, we can know the max extent
size in the cache after the first search, so we needn't cut the size in half
repeatedly, and just use the max extent size directly. This way can save
lots of cpu time and make the performance grow up when there are only fragments
in the free space cache.

According to my test, if there are only 4KB free space extents in the fs,
and the total size of those extents are 256MB, we can reduce the execute
time of the following test from 5.4s to 1.4s.
  dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync

Changelog v2 -> v3:
- fix the problem that we skip the block group with the space which is
  less than we need.

Changelog v1 -> v2:
- address the problem that we return a wrong start position when searching
  the free space in a bitmap.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 11:05:23 -04:00
David Sterba
13fd8da98f btrfs: add lockdep and tracing annotations for uuid tree
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:56 -04:00
Stefan Behrens
79556c3d88 btrfs: show compiled-in config features at module load time
We want to know if there are debugging features compiled in, this may
affect performance. The message is printed before the sanity checks.

(This commit message is a copy of David Sterba's commit message when
he introduced btrfs_print_info()).

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:56 -04:00
Filipe David Borba Manana
cef2193729 Btrfs: more efficient inode tree replace operation
Instead of removing the current inode from the red black tree
and then add the new one, just use the red black tree replace
operation, which is more efficient.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:55 -04:00
Ilya Dryomov
55e50e458e Btrfs: do not add replace target to the alloc_list
If replace was suspended by the umount, replace target device is added
to the fs_devices->alloc_list during a later mount.  This is obviously
wrong.  ->is_tgtdev_for_dev_replace is supposed to guard against that,
but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized
*after* everything is opened and fs_devices lists are populated.  Fix
this by checking the devid instead: for replace targets it's always
equal to BTRFS_DEV_REPLACE_DEVID.

Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:55 -04:00
Josef Bacik
83d4cfd4da Btrfs: fixup error handling in btrfs_reloc_cow
If we failed to actually allocate the correct size of the extent to relocate we
will end up in an infinite loop because we won't return an error, we'll just
move on to the next extent.  So fix this up by returning an error, and then fix
all the callers to return an error up the stack rather than BUG_ON()'ing.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2013-09-21 10:58:54 -04:00
Chris Mason
07f0e62e7f Linux 3.11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQEcBAABAgAGBQJSJPkeAAoJEHm+PkMAQRiGVWMH/jo5f01Ra7G4/CYS59K+AlBQ
 /oWL3W81r5MORlsMxwUwGtJ3sZ7UulKwiDrluWeOkz2+/9SmoHoUfkpbByq1bSIV
 y0eqhmjtkHQZz5radJIHeyz1gJIICBIgAM0l45j8SpK4n9EXRcjLSZjdjAkPzxZp
 qZpfxKhVSTu79m96bud7F+HrboHDQEyhD9zqdSi4xPQNnOmTc7K3tvui9AB3rMbV
 ablM3C+LqBYjZx+pKS/rOdfATxZvtU392HU53XTALt6VD1e8alMmhmpe0I9Zxvjv
 scsB6hfRkevfe7VaK3aVoDnQnLKd61yxs+/XdzTtkWPbVGp+kiuFUdDv/5y2r1g=
 =7Xf6
 -----END PGP SIGNATURE-----

Merge tag 'v3.11' into for-linus

Linux 3.11
2013-09-21 10:44:55 -04:00
Linus Torvalds
2457aaf73a ACPI and power management fixes for 3.12-rc2
1) Four fixes for cpufreq regressions introduced by the changes that
     removed Device Tree parsing for CPU device nodes from cpufreq
     drivers from Sudeep KarkadaNagesha.
 
  2) Two fixes for recent cpufreq regressions introduced by changes
     related to the preservation of sysfs attributes over system
     suspend/resume cycles from Viresh Kumar.
 
  3) Fix for ACPI-based wakeup signaling in the PCI subsystem that
     fails to stop PME polling for devices put into the D3cold power
     state from Rafael J Wysocki.
 
  4) Fix for bad interactions between cpufreq and udev on systems
     supporting intel_pstate where acpi-cpufreq is available as well
     from Yinghai Lu.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJSPJvhAAoJEKhOf7ml8uNst9oQAJ0E5lcRdqC3DhEU7eXoa8Ty
 BpSI1u9uEGTmzh6jmYLNp66p3vtl4J6Lu+rtZOAHylRj/W8DY0AIusiF3HYQEwnR
 d8fjw2W2JmeKK6rXXdNGfcNwi+O67mmkcKJ1PuEm392FYfVKnPfoYWhxnFEcLgD1
 yK3r/8gkoSLnMMcmqUy8q/f3m69fxEEXICzN+IMlFD9bTs91DQ52vBEuom1Bmly+
 1k/HjNlBUoN+7GV0TweSlh22JHtFAk+9kzTmm2oIHsSdAfQp7at7cDgDJPdFb3df
 ANS+6s6F+vCgYn/7rBN18Z5jZx9SvRMhEoINfho7KoxaYuma4x5CFS0gyT1o9TYa
 BSEReW+LTOo2VN2qCHQcAvd//idU3DhJ4vccvnfL6p/gZ14rIkG79OGZlD4AoAXx
 B/DkR6K7TIfxbB41mVHaXzaW8RwnNqvTMN0gELSCu6rixKhOBnReVi7a5GIGgu/j
 TbgMlmRSHnfYEMIYZz8X/WsVsiUL9Z5bcRl6GpTQgqv4gjbbC8X9i1B7gNE1952Y
 IYAZjs/SdvRqpcUWbpRNogFuUWIoqhd7DGgcxuKrkXDPeo3IdP10foDm5Cmh8FWM
 dxigGhuoipvc8DdQaH8xoJGoz+Q7WUSICirNL+UAKQCKzzRdE9p9DMAPunFtF6WP
 yOGZgcfyUYVdKrCt/HEb
 =8EeE
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:

 1) Four fixes for cpufreq regressions introduced by the changes that
    removed Device Tree parsing for CPU device nodes from cpufreq
    drivers from Sudeep KarkadaNagesha.

 2) Two fixes for recent cpufreq regressions introduced by changes
    related to the preservation of sysfs attributes over system
    suspend/resume cycles from Viresh Kumar.

 3) Fix for ACPI-based wakeup signaling in the PCI subsystem that
    fails to stop PME polling for devices put into the D3cold power
    state from Rafael J Wysocki.

 4) Fix for bad interactions between cpufreq and udev on systems
    supporting intel_pstate where acpi-cpufreq is available as well
    from Yinghai Lu.

* tag 'pm+acpi-3.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: return EEXIST instead of EBUSY for second registering
  PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup
  ARM: shmobile: change dev_id to cpu0 while registering cpu clock
  ARM: i.MX: change dev_id to cpu0 while registering cpu clock
  cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 device
  cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 device
  cpufreq: unlock correct rwsem while updating policy->cpu
  cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
2013-09-20 15:17:14 -07:00
Linus Torvalds
d45004f994 vhost: minor changes on top of 3.12-rc1
This fixes module loading for vhost-scsi, and tweaks locking in vhost core
 a bit. Both of these are not exactly release blockers but it's early
 in the cycle so I think it's a good idea to apply them now.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSOWVbAAoJECgfDbjSjVRp10UIALB3nkEdmhC011ld+7xgVxMt
 hB5gN1Uw3nhZOxOd1t+udHERSuZWw2CjkGRetpLHPh22bMHpHIb++Y0FvNGG9kIY
 RRSrBuFupPTCpLlsI8bkCx1uqO8VpZiKwEh1h9SxmqWmuGIJgno1ZnVhMx+fTwmn
 Av4ERtc3LrzR4RgaZqYLBf/6Ed+ElZzwi6xNvwV6rrN9oKZR2pRJSSvkJkDQzwN7
 lRdHZJrtwd5KyMoUdPmSoafwCZnrjGIwYQpXq3jpvePAYF1Ot6p/+r/M1BaEHR7N
 eUfMp4ItCXZubrUeiXuqlX/dPCq+DtQePoO86OyviVS7URlHEpAkOu2hPpi8m5M=
 =I209
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost updates from Michael Tsirkin:
 "vhost: minor changes on top of 3.12-rc1

  This fixes module loading for vhost-scsi, and tweaks locking in vhost
  core a bit.  Both of these are not exactly release blockers but it's
  early in the cycle so I think it's a good idea to apply them now"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost-scsi: whitespace tweak
  vhost/scsi: use vmalloc for order-10 allocation
  vhost: wake up worker outside spin_lock
2013-09-20 15:16:15 -07:00
David Howells
509bf24d18 CacheFiles: Don't try to dump the index key if the cookie has been cleared
Don't try to dump the index key that distinguishes an object if netfs
data in the cookie the object refers to has been cleared (ie.  the
cookie has passed most of the way through
__fscache_relinquish_cookie()).

Since the netfs holds the index key, we can't get at it once the ->def
and ->netfs_data pointers have been cleared - and a NULL pointer
exception will ensue, usually just after a:

	CacheFiles: Error: Unexpected object collision

error is reported.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-20 15:15:43 -07:00
Josh Boyer
607566aecc CacheFiles: Fix memory leak in cachefiles_check_auxdata error paths
In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if
we determine there's an error or that the data is stale.

Further, assigning the output of vfs_getxattr() to auxbuf->len gives
problems with checking for errors as auxbuf->len is a u16.  We don't
actually need to set auxbuf->len, so keep the length in a variable for
now.  We shouldn't need to check the upper limit of the buffer as an
overflow there should be indicated by -ERANGE.

While we're at it, fscache_check_aux() returns an enum value, not an
int, so assign it to an appropriately typed variable rather than to ret.

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-20 15:15:42 -07:00
Will Deacon
8f4c344696 lockref: use cmpxchg64 explicitly for lockless updates
The cmpxchg() function tends not to support 64-bit arguments on 32-bit
architectures.  This could be either due to use of unsigned long
arguments (like on ARM) or lack of instruction support (cmpxchgq on
x86).  However, these architectures may implement a specific cmpxchg64()
function to provide 64-bit cmpxchg support instead.

Since the lockref code requires a 64-bit cmpxchg and relies on the
architecture selecting ARCH_USE_CMPXCHG_LOCKREF, move to using cmpxchg64
instead of cmpxchg and allow 32-bit architectures to make use of the
lockless lockref implementation.

Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-20 11:04:28 -05:00
Rafael J. Wysocki
d831a00510 Merge branch 'pm-cpufreq'
* pm-cpufreq:
  cpufreq: return EEXIST instead of EBUSY for second registering
  ARM: shmobile: change dev_id to cpu0 while registering cpu clock
  ARM: i.MX: change dev_id to cpu0 while registering cpu clock
  cpufreq: imx6q-cpufreq: assign cpu_dev correctly to cpu0 device
  cpufreq: cpufreq-cpu0: assign cpu_dev correctly to cpu0 device
  cpufreq: unlock correct rwsem while updating policy->cpu
  cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
2013-09-20 15:40:41 +02:00
Rafael J. Wysocki
09359c8319 Merge branch 'acpi-pci'
* acpi-pci:
  PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup
2013-09-20 15:40:30 +02:00
Linus Torvalds
dcb30e6592 - Compat register fault reporting fix
- Documentation clarification on tagged pointers
 - hwcap widened to 64-bit (user space already reading it as 64-bit)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJSPEd3AAoJEGvWsS0AyF7xHFwQAIZZq6qbhEuojbrRxRSU1/Il
 4VuPIJIdFW+T09uzE9d9ER1LJkQlG9+RkffOZWZaqQ+kVpwtH8YImWpvKBA+JHc+
 IgwTOFEQ8c6JJS/g1XzJdm/0ykR/ZzOhavqbgWDpuJsjsw4aXbxTlWoE34/ZlVeo
 zn4QiirbzkUEsMNlV/Di6EdVkwdJ8WTvrFWxpyR2teSokgTKNACv2fwdxYz+ggS+
 +dwoqcQD122SogT72ti5LriGRucOynH1gjHETRTEfXkdAeCdMgjJ6jnVRLJEX9Je
 Qv7Q9YHI8eCFpv4rGKvSJ7GtYJWY3Vsjp/t1dSmXtw5+ctFQgKqpT5tyG4yHC1DU
 huR2/Ui5RuEYHHiMMBCFp9JOvceab9lBFbTrsLTJto6kRg8E3nkEAdVNcgb5MLRE
 jcgwquK4HfF1JW3+l9rKClKz7fo3eVoA/cR92i5VdBjwzoL6pqKSk3pbPro+557k
 q/gbQhiggX4kTcv16tfVvzcfWwi7xRRJyFMm1W/VqniJa+gkEptpzfdtvdZOxKCx
 bUDp7LRwejRAMPTVp8MJZY4NvA68jJucRuenYDKZ6UVN5LQl0rcCPTvfOtafy44M
 CwArqPpP9/wUInqYQhjBLvp0yMSndthhatoCjLUdee+YkhMbvLqAzOyu2Vp2CVqf
 ib5Roul1AuFWqWFRqJrN
 =OICB
 -----END PGP SIGNATURE-----

Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull ARM64 fixes from Catalin Marinas:
 - Compat register fault reporting fix
 - Documentation clarification on tagged pointers
 - hwcap widened to 64-bit (user space already reading it as 64-bit)

* tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: Widen hwcap to be 64 bit
  arm64: Correctly report LR and SP for compat tasks
  arm64: documentation: tighten up tagged pointer documentation
  arm64: Make do_bad_area() function static
2013-09-20 08:18:51 -05:00
Steve Capper
25804e6a96 arm64: Widen hwcap to be 64 bit
Under arm64 elf_hwcap is a 32 bit quantity, but it is stored in
a 64 bit auxiliary ELF field and glibc reads hwcap as 64 bit.

This patch widens elf_hwcap to be 64 bit.

Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-09-20 09:56:07 +01:00
Catalin Marinas
6ca68e8026 arm64: Correctly report LR and SP for compat tasks
When a task crashes and we print debugging information, ensure that
compat tasks show the actual AArch32 LR and SP registers rather than the
AArch64 ones.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-09-20 09:56:07 +01:00
Will Deacon
374ed9d18e arm64: documentation: tighten up tagged pointer documentation
Commit d50240a5f6 ("arm64: mm: permit use of tagged pointers at EL0")
added support for tagged pointers in userspace, but the corresponding
update to Documentation/ contained some imprecise statements.

This patch fixes up some minor ambiguities in the text, hopefully making
it more clear about exactly what the kernel expects from user virtual
addresses.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-09-20 09:56:06 +01:00
Catalin Marinas
59f67e16e6 arm64: Make do_bad_area() function static
This function is only called from arch/arm64/mm/fault.c.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-09-20 09:56:05 +01:00
Linus Torvalds
7b9e3a6ac0 ARM: SoC fixes for 3.12-rc
A set of fixes for ARM platforms for 3.12. Among them:
 
 - A fix for build breakage in the MTD subsystem for some PXA devices.
   David Woodhouse has this patch in his for-next branch but has not been
   responding to our requests to send it up so here it is.
   I should have amended the commit message to describe the build failure for
   CONFIG_OF=n setups, but forgot and now it's down in the stack of commits.
 
 - Added device-tree for the BeagleBone Black. Turns out people have been
   using the older "regualar" bone DT for the newer boards, and there's
   risk of damaging hardware that way.
 
 - Misc DT and regular fixes for OMAP.
 
 - Fix to make the ST-Ericsson "snowball" boards boot with
   multi_v7_defconfig, and enable one of the ST-E reference boards on the
   same config.
 
 - Kconfig cleanup for u300 to hide submenus when the platform isn't
   enabled.
 
 - Enable ARM_ATAG_DTB_COMPAT to let firmware override command
   line when booting with an appended devicetree on non-DT-enabled
   firmware (needed to boot snowball).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJSO1O3AAoJEIwa5zzehBx3oEIP/A9emXKxNUOnnC47VkVHEMAl
 F26Q3SHkDZK4lKmvnfPGv4zTtk6E8zwZKdcQ4Sb/efLQqih8w0GG5auPbehn4shb
 WduDtsqhxTvNv1TDmv28PogRdEF9oqAGWPT91P6N/sCaehjmW+LRZO8JU0oS+t15
 nhqSHh53Nr5CtDAjIjiIuizOsF5o67QQz8ia7lOUW12P0c7RRPhJhV5G+gbKTUHE
 u7o0SDL/TJid+kWNvqNj57YhwJSJPeHUVkItxlZDEjhRCNNFU3JhmX/R0V9l1RrL
 Kry8kz0lWDjV91nl3ZUKA0+DBNOvN8uhIcy9QpG24u4hUnJrQvHjuMwoGOKp9kBh
 pohizIWRGlOPGqV2Fy75GASUAGQk1ARixHV007hiNwQETmeMiYX5y9prN97Hc7Jk
 +I+vTomsONb+Ielix420aaCUE0trunTm+BgZiAcYs995bzM5TbzBaB+K2DBkk8b5
 vqDQM8/PnUPXK6lOnjIirrYMpRzBkLbpSwSX2H+66G1exS1lgI6rIsSvjh9xP9BD
 r+9KSc7028CWhxdtZCw0cQFIa6a+HqIKMFS5yHK3TmbwX+BwHryGyMLoHc+VtN1Q
 LAmEsW/qPRelhhoBVgGo2i6eMDcMxj5ae7ovFBhy9cpskOsZpHXErMl92JBP5BBn
 GDIYMkee17bf0eFMEItZ
 =I14p
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A set of fixes for ARM platforms for 3.12.  Among them:

   - A fix for build breakage in the MTD subsystem for some PXA devices.
     David Woodhouse has this patch in his for-next branch but has not
     been responding to our requests to send it up so here it is.  I
     should have amended the commit message to describe the build
     failure for CONFIG_OF=n setups, but forgot and now it's down in the
     stack of commits.

   - Added device-tree for the BeagleBone Black.  Turns out people have
     been using the older "regualar" bone DT for the newer boards, and
     there's risk of damaging hardware that way.

   - Misc DT and regular fixes for OMAP.

   - Fix to make the ST-Ericsson "snowball" boards boot with
     multi_v7_defconfig, and enable one of the ST-E reference boards on
     the same config.

   - Kconfig cleanup for u300 to hide submenus when the platform isn't
     enabled.

   - Enable ARM_ATAG_DTB_COMPAT to let firmware override command line
     when booting with an appended devicetree on non-DT-enabled firmware
     (needed to boot snowball)"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (26 commits)
  ARM: multi_v7: add HREFv60 to multi_v7 defconfig
  ARM: OMAP2+: mux: fix trivial typo in name
  ARM: OMAP4 SMP: Corrected a typo fucntions to functions
  ARM: OMAP4: cpuidle: fix: call cpu_cluster_pm_exit conditionally
  mailbox: remove unnecessary platform_set_drvdata()
  ARM: mach-omap2: gpmc: Fix warning when CONFIG_ARM_LPAE=y
  ARM: OMAP: fix return value check in omap_device_build_from_dt()
  ARM: OMAP4: Fix clock_get error for GPMC during boot
  ARM: sa1100: collie.c: fall back to jedec_probe flash detection
  ARM: u300: hide submenus
  ARM: dts: igep00x0: Add pinmux configuration for MCBSP2
  ARM: dts: Fix muxing and regulator for wl12xx on the SDIO bus for blaze
  ARM: dts: Fix muxing and regulator for wl12xx on the SDIO bus for pandaboard
  mtd: nand: pxa3xx: Remove unneeded ifdef CONFIG_OF
  ARM: multi_v7_defconfig: enable ARM_ATAG_DTB_COMPAT
  ARM: ux500: disable outer cache debug
  ARM: dts: OMAP5: fix ocp2scp DTS data
  ARM: dts: OMAP5: fix reg property size
  ARM: dts: am335x-bone*: add DT for BeagleBone Black
  ARM: dts: omap3-beagle-xm: fix string error in compatible property
  ...
2013-09-19 18:49:08 -05:00
Yinghai Lu
4dea5806d3 cpufreq: return EEXIST instead of EBUSY for second registering
On systems that support intel_pstate, acpi_cpufreq fails to load, and
udev keeps trying until trace gets filled up and kernel crashes.

The root cause is driver return ret from cpufreq_register_driver(),
because when some other driver takes over before, it will return
EBUSY and then udev will keep trying ...

cpufreq_register_driver() should return EEXIST instead so that the
system can boot without appending intel_pstate=disable and still use
intel_pstate.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-20 00:37:10 +02:00
Rafael J. Wysocki
834145156b PCI / ACPI / PM: Clear pme_poll for devices in D3cold on wakeup
Commit 448bd85 (PCI/PM: add PCIe runtime D3cold support) added a
piece of code to pci_acpi_wake_dev() causing that function to behave
in a special way for devices in D3cold (so that their configuration
registers are not accessed before those devices are resumed).
However, it didn't take the clearing of the pme_poll flag into
account.  That has to be done for all devices, even if they are in
D3cold, or pci_pme_list_scan() will not know that wakeup has been
signaled for the device and will poll its PME Status bit
unnecessarily.

Fix the problem by moving the clearing of the pme_poll flag in
pci_acpi_wake_dev() before the code introduced by commit 448bd85.

Reported-and-tested-by: David E. Box <david.e.box@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: 3.6+ <stable@vger.kernel.org> # 3.6+
2013-09-20 00:24:43 +02:00
Linus Torvalds
b75ff5e84b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the local_df boolean is set on an SKB we have to allocate a
    unique ID even if IP_DF is set in the ipv4 headers, from Ansis
    Atteka.

 2) Some fixups for the new chipset support that went into the sfc
    driver, from Ben Hutchings.

 3) Because SCTP bypasses a good chunk of, and actually duplicates, the
    logic of the ipv6 output path, some IPSEC things don't get done
    properly.  Integrate SCTP better into the ipv6 output path so that
    these problems are fixed and such issues don't get missed in the
    future either.  From Daniel Borkmann.

 4) Fix skge regressions added by the DMA mapping error return checking
    added in v3.10, from Mikulas Patocka.

 5) Kill some more IRQF_DISABLED references, from Michael Opdenacker.

 6) Fix races and deadlocks in the bridging code, from Hong Zhiguo.

 7) Fix error handling in tun_set_iff(), in particular don't leak
    resources.  From Jason Wang.

 8) Prevent format-string injection into xen-netback driver, from Kees
    Cook.

 9) Fix regression added to netpoll ARP packet handling, in particular
    check for the right ETH_P_ARP protocol code.  From Sonic Zhang.

10) Try to deal with AMD IOMMU errors when using r8169 chips, from
    Francois Romieu.

11) Cure freezes due to recent changes in the rt2x00 wireless driver,
    from Stanislaw Gruszka.

12) Don't do SPI transfers (which can sleep) in interrupt context in
    cw1200 driver, from Solomon Peachy.

13) Fix LEDs handling bug in 5720 tg3 chips already handled for 5719.
    From Nithin Sujir.

14) Make xen_netbk_count_skb_slots() count the actual number of slots
    that will be used, taking into consideration packing and other
    issues that the transmit path will run into.  From David Vrabel.

15) Use the correct maximum age when calculating the bridge
    message_age_timer, from Chris Healy.

16) Get rid of memory leaks in mcs7780 IRDA driver, from Alexey
    Khoroshilov.

17) Netfilter conntrack extensions were converted to RCU but are not
    always freed properly using kfree_rcu().  Fix from Michal Kubecek.

18) VF reset recovery not being done correctly in qlcnic driver, from
    Manish Chopra.

19) Fix inverted test in ATM nicstar driver, from Andy Shevchenko.

20) Missing workqueue destroy in cxgb4 error handling, from Wei Yang.

21) Internal switch not initialized properly in bgmac driver, from Rafał
    Miłecki.

22) Netlink messages report wrong local and remote addresses in IPv6
    tunneling, from Ding Zhi.

23) ICMP redirects should not generate socket errors in DCCP and SCTP.
    We're still working out how this should be handled for RAW and UDP
    sockets.  From Daniel Borkmann and Duan Jiong.

24) We've had several bugs wherein the network namespace's loopback
    device gets accessed after it is free'd, NULL it out so that we can
    catch these problems more readily.  From Eric W Biederman.

25) Fix regression in TCP RTO calculations, from Neal Cardwell.

26) Fix too early free of xen-netback network device when VIFs still
    exist.  From Paul Durrant.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  netconsole: fix a deadlock with rtnl and netconsole's mutex
  netpoll: fix NULL pointer dereference in netpoll_cleanup
  skge: fix broken driver
  ip: generate unique IP identificator if local fragmentation is allowed
  ip: use ip_hdr() in __ip_make_skb() to retrieve IP header
  xen-netback: Don't destroy the netdev until the vif is shut down
  net:dccp: do not report ICMP redirects to user space
  cnic: Fix crash in cnic_bnx2x_service_kcq()
  bnx2x, cnic, bnx2i, bnx2fc: Fix bnx2i and bnx2fc regressions.
  vxlan: Avoid creating fdb entry with NULL destination
  tcp: fix RTO calculated from cached RTT
  drivers: net: phy: cicada.c: clears warning Use #include <linux/io.h> instead of <asm/io.h>
  net loopback: Set loopback_dev to NULL when freed
  batman-adv: set the TAG flag for the vid passed to BLA
  netfilter: nfnetlink_queue: use network skb for sequence adjustment
  net: sctp: rfc4443: do not report ICMP redirects to user space
  net: usb: cdc_ether: use usb.h macros whenever possible
  net: usb: cdc_ether: fix checkpatch errors and warnings
  net: usb: cdc_ether: Use wwan interface for Telit modules
  ip6_tunnels: raddr and laddr are inverted in nl msg
  ...
2013-09-19 13:57:28 -05:00
Nikolay Aleksandrov
c71380ff0b netconsole: fix a deadlock with rtnl and netconsole's mutex
This bug was introduced by commit
7a163bfb7c ("netconsole: avoid a crash with
multiple sysfs writers"). In store_enabled() we have the following
sequence: acquire nt->mutex then rtnl, but in the netconsole netdev
notifier we have rtnl then nt->mutex effectively leading to a deadlock.
The NULL pointer dereference that the above commit tries to fix is
actually due to another bug in netpoll_cleanup(). This is fixed by dropping
the mutex from the netdev notifier as it's already protected by rtnl.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19 14:15:53 -04:00
Nikolay Aleksandrov
d0fe8c888b netpoll: fix NULL pointer dereference in netpoll_cleanup
I've been hitting a NULL ptr deref while using netconsole because the
np->dev check and the pointer manipulation in netpoll_cleanup are done
without rtnl and the following sequence happens when having a netconsole
over a vlan and we remove the vlan while disabling the netconsole:
	CPU 1					CPU2
					removes vlan and calls the notifier
enters store_enabled(), calls
netdev_cleanup which checks np->dev
and then waits for rtnl
					executes the netconsole netdev
					release notifier making np->dev
					== NULL and releases rtnl
continues to dereference a member of
np->dev which at this point is == NULL

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19 14:15:53 -04:00
Mikulas Patocka
c194992cbe skge: fix broken driver
The patch 136d8f377e broke the skge driver.
Note this part of the patch:
+               if (skge_rx_setup(skge, e, nskb, skge->rx_buf_size) < 0) {
+                       dev_kfree_skb(nskb);
+                       goto resubmit;
+               }
+
                pci_unmap_single(skge->hw->pdev,
                                 dma_unmap_addr(e, mapaddr),
                                 dma_unmap_len(e, maplen),
                                 PCI_DMA_FROMDEVICE);
                skb = e->skb;
                prefetch(skb->data);
-               skge_rx_setup(skge, e, nskb, skge->rx_buf_size);

The function skge_rx_setup modifies e->skb to point to the new skb. Thus,
after this change, the new buffer, not the old, is returned to the
networking stack.

This bug is present in kernels 3.11, 3.11.1 and 3.12-rc1. The patch should
be queued for 3.11-stable.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Reported-by: Vasiliy Glazov <vascom2@gmail.com>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-19 14:15:15 -04:00