We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
channel when we're done recovering the session.
Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
draining deadlock)
Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: Changed order to start back-channel first. Minor code cleanup]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>=3.10]
Give them names that are a bit more consistent with the general
pNFS naming scheme.
- lo_seg_contained -> pnfs_lseg_range_contained
- lo_seg_intersecting -> pnfs_lseg_range_intersecting
- cmp_layout -> pnfs_lseg_range_cmp
- is_matching_lseg -> pnfs_lseg_range_match
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The other protocols don't use it, so make it local to NFSv4, and
remove the EXPORT.
Also ensure that we only compile in cache_lib.o if we're using
the legacy DNS resolver.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Make sure that NFSv4 SETCLIENTID does not parse the NETID as a
format string.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
State recovery currently relies on being able to find a valid
nfs_open_context in the inode->open_files list.
We therefore need to put the nfs_open_context on the list while
we're still protected by the sp->so_reclaim_seqcount in order
to avoid reboot races.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
All the callers have an open_context at this point, and since we always
need one in order to do state recovery, it makes sense to use it as the
basis for the nfs4_do_open() call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Use the EXCHGID4_FLAG_BIND_PRINC_STATEID exchange_id flag to enable
stateid protection. This means that if we create a stateid using a
particular principal, then we must use the same principal if we
want to change that state.
IOW: if we OPEN a file using a particular credential, then we have
to use the same credential in subsequent OPEN_DOWNGRADE, CLOSE,
or DELEGRETURN operations that use that stateid.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This is not strictly needed, since get_deviceinfo is not allowed to
return NFS4ERR_ACCESS or NFS4ERR_WRONG_CRED, but lets do it anyway
for consistency with other pNFS operations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We want to use the same credential for reclaim_complete as we used
for the exchange_id call.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need to use the same credential as was used for the layoutget
and/or layoutcommit operations.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Darrick J. Wong <darrick.wong@oracle.com> reports:
> I have a kvm-based testing setup that netboots VMs over NFS, the
> client end of which seems to have broken somehow in 3.10-rc1. The
> server's exports file looks like this:
>
> /storage/mtr/x64 192.168.122.0/24(ro,sync,no_root_squash,no_subtree_check)
>
> On the client end (inside the VM), the initrd runs the following
> command to try to mount the rootfs over NFS:
>
> # mount -o nolock -o ro -o retrans=10 192.168.122.1:/storage/mtr/x64/ /root
>
> (Note: This is the busybox mount command.)
>
> The mount fails with -EINVAL.
Commit 4580a92d44 "NFS: Use server-recommended security flavor by
default (NFSv3)" introduced a behavior regression for NFS mounts
done via a legacy binary mount(2) call.
Ensure that a default security flavor is specified for legacy binary
mount requests, since they do not invoke nfs_select_flavor() in the
kernel.
Busybox uses klibc's nfsmount command, which performs NFS mounts
using the legacy binary mount data format. /sbin/mount.nfs is not
affected by this regression.
Reported-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We need to pass the full open mode flags to nfs_may_open() when doing
a delegated open.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
Commit 79d852bf "NFS: Retry SETCLIENTID with AUTH_SYS instead of
AUTH_NONE" did not take into account commit 23631227 "NFSv4: Fix the
fallback to AUTH_NULL if krb5i is not available".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
On a CB_RECALL the callback service thread flushes the inode using
filemap_flush prior to scheduling the state manager thread to return the
delegation. When pNFS is used and I/O has not yet gone to the data server
servicing the inode, a LAYOUTGET can preceed the I/O. Unlike the async
filemap_flush call, the LAYOUTGET must proceed to completion.
If the state manager starts to recover data while the inode flush is sending
the LAYOUTGET, a deadlock occurs as the callback service thread holds the
single callback session slot until the flushing is done which blocks the state
manager thread, and the state manager thread has set the session draining bit
which puts the inode flush LAYOUTGET RPC to sleep on the forechannel slot
table waitq.
Separate the draining of the back channel from the draining of the fore channel
by moving the NFS4_SESSION_DRAINING bit from session scope into the fore
and back slot tables. Drain the back channel first allowing the LAYOUTGET
call to proceed (and fail) so the callback service thread frees the callback
slot. Then proceed with draining the forechannel.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.
Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...
Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
available by moving to the ablkcipher crypto API. The improvement is more
apparent on faster storage devices. There's no noticeable change when hardware
crypto is not available.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRjJFjAAoJENaSAD2qAscKuRgQAJkefyLPNBb4plXr3R+zJ1aM
jkcqkUJWamCxlmqHg/n9LmlxpSZiiWSWJM8iiq8zQhPE0PVXVOVhqvzogT1xwv75
xfT9xTuRi1v7UFaSGHtj3WoO2nscQ0pjZW/0CHgd/PZjz9y0iJ/l6ueoWCOz5L2i
3oJvx/W407qM+MSogWKx79i1B2jILdBdQH/7PZ+UJS3jWEo3rMWBbfCwbYhd4pUG
oVc+qFglNs+3HLdHVUmHPCerCL9qYAJIJmDrvupSOQ6DwdFaV8IysTgSEdFtLcfC
8Z6DUOPzXnvA/+y+NCCCUxg1CrkgYkNrefLKAq18atFu63zIZIHZyJBTJ5Q6vXVF
o1H8UcIOg/liGa6lXGf5b4ENNKvB0qMYQgiSrL0/FVVima4zGqUWkQLno4kQl1zx
FHB5imQ7F/EMcow/nTN3YQYC3N/iYIFAIRxf35SiGGsNhO2sEyIqYlRSyq2MNrRl
pLWNbhnRuhBUqcbqZDxq1oZ7624Ui4jnHHx7rl6Y3gfm8Xa1ZmQeY6rOadSZaRd7
+ZqFZi1jiHz1c0tVUO/3DsqABIhbr9Ee03vracN8bVTGV0ZmO0qneugFB9esoeDf
UnrU0Im8ilHu3OHAyc+UkRZuzThL9bLZEivwICQ+JDUJ1zsLYSQZEaGIt8hA0iDy
8Bu3gtfX2WxD4ak/AkFv
=3EhP
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs update from Tyler Hicks:
"Improve performance when AES-NI (and most likely other crypto
accelerators) is available by moving to the ablkcipher crypto API.
The improvement is more apparent on faster storage devices.
There's no noticeable change when hardware crypto is not available"
* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Use the ablkcipher crypto API
Pull m68knommu updates from Greg Ungerer:
"The bulk of the changes are generalizing the ColdFire v3 core support
and adding in 537x CPU support. Also a couple of other bug fixes, one
to fix a reintroduction of a past bug in the romfs filesystem nommu
support."
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: enable Timer on coldfire 532x
m68knommu: fix ColdFire 5373/5329 QSPI base address
m68knommu: add support for configuring a Freescale M5373EVB board
m68knommu: add support for the ColdFire 537x family of CPUs
m68knommu: make ColdFire M532x platform support more v3 generic
m68knommu: create and use a common M53xx ColdFire class of CPUs
m68k: remove unused asm/dbg.h
m68k: Set ColdFire ACR1 cache mode depending on kernel configuration
romfs: fix nommu map length to keep inside filesystem
m68k: clean up unused "config ROMVECSIZE"
Make the switch from the blkcipher kernel crypto interface to the
ablkcipher interface.
encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
interface but, from the eCryptfs standpoint, still treat the crypto
operation as a synchronous operation. They submit the async request and
then wait until the operation is finished before they return. Most of
the changes are contained inside those two functions.
Despite waiting for the completion of the crypto operation, the
ablkcipher interface provides performance increases in most cases when
used on AES-NI capable hardware.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Acked-by: Colin King <colin.king@canonical.com>
Reviewed-by: Zeev Zilberman <zeev@annapurnaLabs.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Thieu Le <thieule@google.com>
Cc: Li Wang <dragonylffly@163.com>
Cc: Jarkko Sakkinen <jarkko.sakkinen@iki.fi>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJRiosmAAoJEKurIx+X31iBQ/IP/2D+JHfGcMzZbRo+2moRyhp9
VYUrNOg+QA+Csm6ZO43GMB4JZtodI/tO12YLNGFbNh+YELdpox7XTqmG058+pABV
El1xt7i2Ro2/PxnWBRnnvWVHoWSsPI9R65aVNgz8C2QyDxG4wY9/ZYfcZQgEajJO
M4gyKx/d54WjKcD31OJBF5NGki2zZQcuBI8vWkjXZBPximNj+cJeC27VPnuNI2GC
p32p9Q+pvBv43bkf3EPEFGsd/ZKdczZ75SOzLXVqOEJhmsxfEKgleQbBd3hiRlwe
zwj8lCzjZS3xR9oCnvalzWgswFQMd9S1kQYbItdztIofU4Y9hP6wmsEzLofXku+0
FshRxAOCtW1jSgRGo/BiDNfRQm8w+l+rJGafZafK6cQOtUHLD+Kig8AhFBIY3Nt1
Wzsmjz5ERhMDImM2lVw4ypji5vWkQ50wst6sX5CQHiOdOEmX+HYJkpOXCHzZYNY2
IrAa/qq6EKbQCfUf7btbsFMDMDIYk66gy6R596sC1Onwy14ecxWerY1HO0BFg3b8
UUo9tGVHEBBLdogc1aErU6OBL2DLeO7TeURhFOu7UHo0PfJNtLyG2ekYOYjwgS+Q
RKNJz9PHKsQnBOdI+dK3iHG2uqkbv6rg0pH5oIglCptabqflN9SE0RpM5b2PSLT3
74fTPSlUDSuCcm1PXJ8O
=Zw0E
-----END PGP SIGNATURE-----
Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull trivial pstore update from Tony Luck:
"Couple of pstore cleanups"
It turns out that the kmemdup() conversion ends up being undone by the
fact that the memory block also needed the ecc information (see commit
bd08ec33b5: "pstore/ram: Restore ecc information block"), so all that
remains after merging is the error return code change.
* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore/ram: fix error return code in ramoops_probe()
fs: pstore: Replaced calls to kmalloc and memcpy with kmemdup
Pull more vfs fixes from Al Viro:
"Regression fix from Geert + yet another open-coded kernel_read()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs: don't open-code kernel_read()
xtensa simdisk: Fix proc_create_data() conversion fallout
Pull btrfs update from Chris Mason:
"These are mostly fixes. The biggest exceptions are Josef's skinny
extents and Jan Schmidt's code to rebuild our quota indexes if they
get out of sync (or you enable quotas on an existing filesystem).
The skinny extents are off by default because they are a new variation
on the extent allocation tree format. btrfstune -x enables them, and
the new format makes the extent allocation tree about 30% smaller.
I rebased this a few days ago to rework Dave Sterba's crc checks on
the super block, but almost all of these go back to rc6, since I
though 3.9 was due any minute.
The biggest missing fix is the tracepoint bug that was hit late in
3.9. I ran into problems with that in overnight testing and I'm still
tracking it down. I'll definitely have that fixed for rc2."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
Btrfs: allow superblock mismatch from older mkfs
btrfs: enhance superblock checks
btrfs: fix misleading variable name for flags
btrfs: use unsigned long type for extent state bits
Btrfs: improve the loop of scrub_stripe
btrfs: read entire device info under lock
btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
btrfs: handle errors returned from get_tree_block_key
btrfs: make static code static & remove dead code
Btrfs: deal with errors in write_dev_supers
Btrfs: remove almost all of the BUG()'s from tree-log.c
Btrfs: deal with free space cache errors while replaying log
Btrfs: automatic rescan after "quota enable" command
Btrfs: rescan for qgroups
Btrfs: split btrfs_qgroup_account_ref into four functions
Btrfs: allocate new chunks if the space is not enough for global rsv
Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
btrfs: move leak debug code to functions
Btrfs: return free space in cow error path
Btrfs: set UUID in root_item for created trees
...
* add CONFIG_XFS_WARN, a step between zero debugging and CONFIG_XFS_DEBUG.
* fix attrmulti and attrlist to fall back to vmalloc when kmalloc fails.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRi/V+AAoJENaLyazVq6ZOgncP/i0p141SO7qAwf20ql5NMi5Q
KdhzWymyILIZ8GQoxsvoDASXb3D5I/f+8qX1WgrAG7/k7fahdGOYCtFTJ5YqYLvy
ocxNoKqkhmUrMBbx0BcoNB4rtqgwLuHMR9ihDGDJrJDsn1b6nZlr38VBngIMhcC3
rWS8hU6ZwEnGm9hcmNzMBLJpjCJRfwqRr9CmHbENP/LspV0ZTSgltIuFggBfjK6q
LK3FYdrlYfiX1md3c9zPpdVn8P3hxeM/3Jq+O3mLZ6hY4uq1+NBSKQbS5uwg1d6e
3ib5hxBDlRk//+S0mzJJ3DZ+Qqa2zHpZ/jNKsjdOBUoyqEdRC5tI9hx3F3la5VxP
4oktNP2rvhpziRqRny/EwZm5xHrGTrdP8Uwq2nO2nxevtZyVd+P73K/17sNgOeCU
8Xm6d3Usxw4FUTbiHjw0LFoJlM8yfEUSiTr1K8TmDP5G4phE6RsNnlbDLSUZ6kgh
6UodqGZtKqdXegYljtwysX75PELQcWWcrA5+7U2/yk+qKyEJ3HvMNIXHvW6MqGTL
69gxdrdD/Ff+83N+ktA3Ks31aj9n0LYpy6shxXg+YpHl6Mny3UiEpiUEsVc8jn4E
iJ4qVqQI73qLDCNLM9XShBpPz5N/aPzaFw7QPXbj3A9UH3wd5OmUKNieoL4t9ucH
sAreKLnndOuriYoTAWA9
=IPgF
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs
Pull xfs update (#2) from Ben Myers:
- add CONFIG_XFS_WARN, a step between zero debugging and
CONFIG_XFS_DEBUG.
- fix attrmulti and attrlist to fall back to vmalloc when kmalloc
fails.
* tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs:
xfs: fallback to vmalloc for large buffers in xfs_compat_attrlist_by_handle
xfs: fallback to vmalloc for large buffers in xfs_attrlist_by_handle
xfs: introduce CONFIG_XFS_WARN
- Ensure that we match the 'sec=' mount flavour against the server list
- Fix the NFSv4 byte range locking in the presence of delegations
- Ensure that we conform to the NFSv4.1 spec w.r.t. freeing lock stateids
- Fix a pNFS data server connection race
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRit1yAAoJEGcL54qWCgDyD9EQAKgb37dXhGt7OXBRBP4EY/T8
xJZ2tmdDZ6etLFJVftqCv05hBvyfilPLK0E9zg/zW/kvkKxYQ/fykvpzBR/+Q7KF
quOmjDHLhDTXBnXzPg1HEoeTaXI2/a8CdjpxxEkthD4+FaKlyCXM+EFtA9orT9ZI
oM+aNaqEzTjoQyryTFMcHxAvsrqjnZBa0MT6Fh45HaLaijV7CdDWoj6gjy6Lc3Al
4wHeT8QrZTp/NfIN16uykFZjeWwul4N9upu+CI2V8ZDMEit6JDYX4sl5tB41PzYW
audDBcu0waSqoVQ2mJ5OHoYGZf0wopMUFaAst+tn0pQvwWUfTjD8XtO8uOgeMNoz
2S+XxUC2qhSMszwNBVSmwe2LtSAyHiw32Md4hqkLYDH2c7tk8bJPKDXZJACBzJS7
O1aMmOgWar8+nmzvmXFeU804SxBykV1V8UgtXWp5IwC36V0HAYnM5xtHwXBR7HWe
lnuVHVdux7ySeAyrs2aMdKk7SAw5OC//WW8qoEF5USDEIljeoBzA+IYu9n91Hg2b
ufnsyxumGJ6dZ0iU2nJVoLagRaZcm6kOhnxcegMpb9IH2+RLCQNef09lj2iklm2j
mJA4o2lkVEHOswg/NwKn/I4ho8tbNNb8v//S5KiqrYhiiqZhOzu3RRtFeZi91iac
P/g+hPzfuGnmwcoCEUSa
=5zpc
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull more NFS client bugfixes from Trond Myklebust:
- Ensure that we match the 'sec=' mount flavour against the server list
- Fix the NFSv4 byte range locking in the presence of delegations
- Ensure that we conform to the NFSv4.1 spec w.r.t. freeing lock
stateids
- Fix a pNFS data server connection race
* tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS4.1 Fix data server connection race
NFSv3: match sec= flavor against server list
NFSv4.1: Ensure that we free the lock stateid on the server
NFSv4: Convert nfs41_free_stateid to use an asynchronous RPC call
SUNRPC: Don't spam syslog with "Pseudoflavor not found" messages
NFSv4.x: Fix handling of partially delegated locks
This patch-set includes the following major enhancement patches.
o introduce a new gloabl lock scheme
o add tracepoints on several major functions
o fix the overall cleaning process focused on victim selection
o apply the block plugging to merge IOs as much as possible
o enhance management of free nids and its list
o enhance the readahead mode for node pages
o address several cretical deadlock conditions
o reduce lock_page calls
The other minor bug fixes and enhancements are as follows.
o calculation mistakes: overflow
o bio types: READ, READA, and READ_SYNC
o fix the recovery flow, data races, and null pointer errors
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz
8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb
wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF
GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s
XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L
iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF
51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ
gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt
ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK
d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b
RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ
AATl8b+cW/aTZ4l7WOPU
=Hqii
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches.
- introduce a new gloabl lock scheme
- add tracepoints on several major functions
- fix the overall cleaning process focused on victim selection
- apply the block plugging to merge IOs as much as possible
- enhance management of free nids and its list
- enhance the readahead mode for node pages
- address several cretical deadlock conditions
- reduce lock_page calls
The other minor bug fixes and enhancements are as follows.
- calculation mistakes: overflow
- bio types: READ, READA, and READ_SYNC
- fix the recovery flow, data races, and null pointer errors"
* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: cover free_nid management with spin_lock
f2fs: optimize scan_nat_page()
f2fs: code cleanup for scan_nat_page() and build_free_nids()
f2fs: bugfix for alloc_nid_failed()
f2fs: recover when journal contains deleted files
f2fs: continue to mount after failing recovery
f2fs: avoid deadlock during evict after f2fs_gc
f2fs: modify the number of issued pages to merge IOs
f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry.
f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
f2fs: check truncation of mapping after lock_page
f2fs: enhance alloc_nid and build_free_nids flows
f2fs: add a tracepoint on f2fs_new_inode
f2fs: check nid == 0 in add_free_nid
f2fs: add REQ_META about metadata requests for submit
f2fs: give a chance to merge IOs by IO scheduler
f2fs: avoid frequent background GC
f2fs: add tracepoints to debug checkpoint request
f2fs: add tracepoints for write page operations
f2fs: add tracepoints to debug the block allocation
...
Unlike meta data server mounts which support multiple mount points to
the same server via struct nfs_server, data servers support a single connection.
Concurrent calls to setup the data server connection can race where the first
call allocates the nfs_client struct, and before the cache struct nfs_client
pointer can be set, a second call also tries to setup the connection, finds the
already allocated nfs_client, bumps the reference count, re-initializes the
session,etc. This results in a hanging data server session after umount.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Pull block core updates from Jens Axboe:
- Major bit is Kents prep work for immutable bio vecs.
- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.
- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.
- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.
- Runtime PM framework, SCSI patches exists on top of these in James'
tree.
- A few random fixes.
* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...
After build_free_nids() searches free nid candidates from nat pages and
current journal blocks, it checks all the candidates if they are allocated
so that the nat cache has its nid with an allocated block address.
In this procedure, previously we used
list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
But, this is not covered by free_nid_list_lock, resulting in null pointer bug.
This patch moves this checking routine inside add_free_nid() in order not to use
the spin_lock.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: fix handling the return value of add_free_nid()]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
This patch does two cleanups:
1. remove unused variable "fcnt" in build_free_nids().
2. make scan_nat_page() as void type and remove useless variable "fcnt".
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS
Since there is NOT nmi->free_nid_list_lock spinlock protection between
a sequential calling of alloc_nid() and alloc_nid_failed(), some other
threads may already add new free_nid to the free_nid_list during this
period.
We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When recovering a journal file with fsync data for files that have
been deleted, don't bail out on recovery.
Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
When unable to roll forward the journal, we shouldn't bail out and
not mount, we should continue to attempt the mount. Bad recovery data
is likely unrecoverable at this point, and requiring the user to try
to mount again doesn't solve any issues.
Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
o Deadlock case #1
Thread 1:
- writeback_sb_inodes
- do_writepages
- f2fs_write_data_pages
- write_cache_pages
- f2fs_write_data_page
- f2fs_balance_fs
- wait mutex_lock(gc_mutex)
Thread 2:
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
- f2fs_iget
- wait iget_locked(inode->i_lock)
Thread 3:
- do_unlinkat
- iput
- lock(inode->i_lock)
- evict
- inode_wait_for_writeback
o Deadlock case #2
Thread 1:
- __writeback_single_inode
: set I_SYNC
- do_writepages
- f2fs_write_data_page
- f2fs_balance_fs
- f2fs_gc
- iput
- evict
- inode_wait_for_writeback(I_SYNC)
In order to avoid this, even though iput is called with the zero-reference
count, we need to stop the eviction procedure if the inode is on writeback.
So this patch links f2fs_drop_inode which checks the I_SYNC flag.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Merge more incoming from Andrew Morton:
- Various fixes which were stalled or which I picked up recently
- A large rotorooting of the AIO code. Allegedly to improve
performance but I don't really have good performance numbers (I might
have lost the email) and I can't raise Kent today. I held this out
of 3.9 and we could give it another cycle if it's all too late/scary.
I ended up taking only the first two thirds of the AIO rotorooting. I
left the percpu parts and the batch completion for later. - Linus
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits)
aio: don't include aio.h in sched.h
aio: kill ki_retry
aio: kill ki_key
aio: give shared kioctx fields their own cachelines
aio: kill struct aio_ring_info
aio: kill batch allocation
aio: change reqs_active to include unreaped completions
aio: use cancellation list lazily
aio: use flush_dcache_page()
aio: make aio_read_evt() more efficient, convert to hrtimers
wait: add wait_event_hrtimeout()
aio: refcounting cleanup
aio: make aio_put_req() lockless
aio: do fget() after aio_get_req()
aio: dprintk() -> pr_debug()
aio: move private stuff out of aio.h
aio: add kiocb_cancel()
aio: kill return value of aio_complete()
char: add aio_{read,write} to /dev/{null,zero}
aio: remove retry-based AIO
...
Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
need this anymore - ki_retry was only called right after the kiocb was
initialized.
This also refactors and trims some duplicated code, as well as cleaning up
the refcounting/error handling a bit.
[akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
[akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ki_key wasn't actually used for anything previously - it was always 0.
Drop it to trim struct kiocb a bit.
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>