Commit Graph

1739 Commits

Author SHA1 Message Date
Stanislav Kinsbursky
541e864f00 nfsd: simplify service shutdown
Function nfsd_shutdown is called from two places: nfsd_last_thread (when last
kernel thread is exiting) and nfsd_svc (in case of kthreads starting error).
When calling from nfsd_svc(), we can be sure that per-net resources are
allocated, so we don't need to check per-net nfsd_net_up boolean flag.
This allows us to remove nfsd_shutdown function at all and move check for
per-net nfsd_net_up boolean flag to nfsd_last_thread.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:42 -05:00
Stanislav Kinsbursky
4539f14981 nfsd: replace boolean nfsd_up flag by users counter
Since we have generic NFSd resurces, we have to introduce some way how to
allocate and destroy those resources on first per-net NFSd start and on
last per-net NFSd stop respectively.
This patch replaces global boolean nfsd_up flag (which is unused now) by users
counter and use it to determine either we need to allocate generic resources
or destroy them.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:41 -05:00
Stanislav Kinsbursky
903d9bf0ed nfsd: simplify NFSv4 state init and shutdown
This patch moves nfsd_startup_generic() and nfsd_shutdown_generic()
calls to nfsd_startup_net() and nfsd_shutdown_net() respectively, which
allows us to call nfsd_startup_net() instead of nfsd_startup() and makes
the code look clearer.  It also modifies nfsd_svc() and nfsd_shutdown()
to check nn->nfsd_net_up instead of global nfsd_up.  The latter is now
used only for generic resources shutdown and is currently useless.  It
will replaced by NFSd users counter later in this series.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:40 -05:00
Stanislav Kinsbursky
bda9cac1db nfsd: introduce helpers for generic resources init and shutdown
NFSd have per-net resources and resources, used globally.
Let's move generic resources init and shutdown to separated functions since
they are going to be allocated on first NFSd service start and destroyed after
last NFSd service shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:39 -05:00
Stanislav Kinsbursky
9dd9845f08 nfsd: make NFSd service structure allocated per net
This patch makes main step in NFSd containerisation.

There could be different approaches to how to make NFSd able to handle
incoming RPC request from different network namespaces.  The two main
options are:

1) Share NFSd kthreads betwween all network namespaces.
2) Create separated pool of threads for each namespace.

While first approach looks more flexible, second one is simpler and
non-racy.  This patch implements the second option.

To make it possible to allocate separate pools of threads, we have to
make it possible to allocate separate NFSd service structures per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:39 -05:00
Stanislav Kinsbursky
b9c0ef8571 nfsd: make NFSd service boot time per-net
This is simple: an NFSd service can be started at different times in
different network environments. So, its "boot time" has to be assigned
per net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:38 -05:00
Stanislav Kinsbursky
2c2fe2909e nfsd: per-net NFSd up flag introduced
This patch introduces introduces per-net "nfsd_net_up" boolean flag, which has
the same purpose as general "nfsd_up" flag - skip init or shutdown of per-net
resources in case of they are inited on shutted down respectively.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:37 -05:00
Stanislav Kinsbursky
6ff50b3dea nfsd: move per-net startup code to separated function
NFSd resources are partially per-net and partially globally used.
This patch splits resources init and shutdown and moves per-net code to
separated functions.
Generic and per-net init and shutdown are called sequentially for a while.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:36 -05:00
Stanislav Kinsbursky
081603520b nfsd: pass net to __write_ports() and down
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:36 -05:00
Stanislav Kinsbursky
3938a0d5eb nfsd: pass net to nfsd_set_nrthreads()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:35 -05:00
Stanislav Kinsbursky
d41a9417cd nfsd: pass net to nfsd_svc()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:34 -05:00
Stanislav Kinsbursky
6777436b0f nfsd: pass net to nfsd_create_serv()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:34 -05:00
Stanislav Kinsbursky
db42d1a76a nfsd: pass net to nfsd_startup() and nfsd_shutdown()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:33 -05:00
Stanislav Kinsbursky
db6e182c17 nfsd: pass net to nfsd_init_socks()
Precursor patch. Hard-coded "init_net" will be replaced by proper one in
future.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:32 -05:00
Stanislav Kinsbursky
f7fb86c6e6 nfsd: use "init_net" for portmapper
There could be a situation, when NFSd was started in one network namespace, but
stopped in another one.
This will trigger kernel panic, because RPCBIND client is stored on per-net
NFSd data, and will be NULL on NFSd shutdown.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:32 -05:00
Neil Brown
7007c90fb9 nfsd: avoid permission checks on EXCLUSIVE_CREATE replay
With NFSv4, if we create a file then open it we explicit avoid checking
the permissions on the file during the open because the fact that we
created it ensures we should be allow to open it (the create and the
open should appear to be a single operation).

However if the reply to an EXCLUSIVE create gets lots and the client
resends the create, the current code will perform the permission check -
because it doesn't realise that it did the open already..

This patch should fix this.

Note that I haven't actually seen this cause a problem.  I was just
looking at the code trying to figure out a different EXCLUSIVE open
related issue, and this looked wrong.

(Fix confirmed with pynfs 4.0 test OPEN4--bfields)

Cc: stable@kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
[bfields: use OWNER_OVERRIDE and update for 4.1]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:31 -05:00
Stanislav Kinsbursky
9a9c6478a8 nfsd: make NFSv4 recovery client tracking options per net
Pointer to client tracking operations - client_tracking_ops - have to be
containerized, because different environment can support different trackers
(for example, legacy tracker currently is not suported in container).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-10 16:25:30 -05:00
J. Bruce Fields
9b2ef62b15 nfsd4: lockt, release_lockowner should renew clients
Fix nfsd4_lockt and release_lockowner to lookup the referenced client,
so that it can renew it, or correctly return "expired", as appropriate.

Also share some code while we're here.

Reported-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-04 07:51:12 -05:00
Bryan Schumaker
6c1e82a4b7 NFSD: Forget state for a specific client
Write the client's ip address to any state file and all appropriate
state for that client will be forgotten.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:03 -05:00
Bryan Schumaker
d7cc431edd NFSD: Add a custom file operations structure for fault injection
Controlling the read and write functions allows me to add in "forget
client w.x.y.z", since we won't be limited to reading and writing only
u64 values.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:02 -05:00
Bryan Schumaker
184c18471f NFSD: Reading a fault injection file prints a state count
I also log basic information that I can figure out about the type of
state (such as number of locks for each client IP address).  This can be
useful for checking that state was actually dropped and later for
checking if the client was able to recover.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:01 -05:00
Bryan Schumaker
8ce54e0d82 NFSD: Fault injection operations take a per-client forget function
The eventual goal is to forget state based on ip address, so it makes
sense to call this function in a for-each-client loop until the correct
amount of state is forgotten.  I also use this patch as an opportunity
to rename the forget function from "func()" to "forget()".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:59:00 -05:00
Bryan Schumaker
269de30f10 NFSD: Clean up forgetting and recalling delegations
Once I have a client, I can easily use its delegation list rather than
searching the file hash table for delegations to remove.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:59 -05:00
Bryan Schumaker
4dbdbda84f NFSD: Clean up forgetting openowners
Using "forget_n_state()" forces me to implement the code needed to
forget a specific client's openowners.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:58 -05:00
Bryan Schumaker
fc29171f5b NFSD: Clean up forgetting locks
I use the new "forget_n_state()" function to iterate through each client
first when searching for locks.  This may slow down forgetting locks a
little bit, but it implements most of the code needed to forget a
specified client's locks.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:56 -05:00
Bryan Schumaker
44e34da60b NFSD: Clean up forgetting clients
I added in a generic for-each loop that takes a pass over the client_lru
list for the current net namespace and calls some function.  The next few
patches will update other operations to use this function as well.  A value
of 0 still means "forget everything that is found".

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:55 -05:00
Bryan Schumaker
043958395a NFSD: Lock state before calling fault injection function
Each function touches state in some way, so getting the lock earlier
can help simplify code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:58:54 -05:00
J. Bruce Fields
e5f9570319 nfsd4: discard some unused nfsd4_verify xdr code
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-12-03 09:43:51 -05:00
Bryan Schumaker
f3c7521fe5 NFSD: Fold fault_inject.h into state.h
There were only a small number of functions in this file and since they
all affect stored state I think it makes sense to put them in state.h
instead.  I also dropped most static inline declarations since there are
no callers when fault injection is not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 13:01:02 -05:00
Stanislav Kinsbursky
5284b44e43 nfsd: make NFSv4 grace time per net
Grace time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:39:47 -05:00
Stanislav Kinsbursky
3d7337115d nfsd: make NFSv4 lease time per net
Lease time is a part of NFSv4 state engine, which is constructed per network
namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:39:46 -05:00
Stanislav Kinsbursky
864aee5c6f nfsd: remove redundant declarations
This is a cleanup patch. Functions nfsd_pool_stats_open() and
nfsd_pool_stats_release() are declared in fs/nfsd/nfsd.h.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:55 -05:00
Stanislav Kinsbursky
f141f79d70 nfsd: recovery - make in_grace per net
Flag in_grace is a part of client tracking state, which is network namesapce
aware. So let'a replace global static variable with per-net one.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:54 -05:00
Stanislav Kinsbursky
3a0733692f nfsd: recovery - make rec_file per net
Opening and closing of this file is done in client tracking init and exit
operations.
Client tracking is done in network namespace context already. So let's make
this file opened and closed per network context - this will simlify it's
management.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:53 -05:00
Stanislav Kinsbursky
f252bc6806 nfsd: call state init and shutdown twice
Split NFSv4 state init and shutdown into two different calls: per-net one and
generic one.
Per-net cwinit/shutdown pair have to be called for any namespace, generic pair
- only once on NSFd kthreads start and shutdown respectively.

Refresh of diff-nfsd-call-state-init-twice

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:53 -05:00
Stanislav Kinsbursky
d85ed44305 nfsd: cleanup NFSd state start a bit
This patch renames nfs4_state_start_net() into nfs4_state_create_net(), where
get_net() now performed.
Also it introduces new nfs4_state_start_net(), which is now responsible for
state creation and initializing all per-net data and which is now called from
nfs4_state_start().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:52 -05:00
Stanislav Kinsbursky
4dce0ac906 nfsd: cleanup NFSd state shutdown a bit
This patch renames __nfs4_state_shutdown_net() into nfs4_state_shutdown_net(),
__nfs4_state_shutdown() into nfs4_state_shutdown_net() and moves all network
related shutdown operations to nfs4_state_shutdown_net().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:51 -05:00
Stanislav Kinsbursky
4e37a7c207 nfsd: make delegations shutdown network namespace aware
NFSv4 delegations are stored in global list. But they are nfs4_client
dependent, which is network namespace aware already.
State shutdown and laundromat are done per network namespace as well.
So, delegations unhash have to be done in network namespace context.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:50 -05:00
Stanislav Kinsbursky
c9a4962881 nfsd: make client_lock per net
This lock protects the client lru list and session hash table, which are
allocated per network namespace already.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:50 -05:00
Stanislav Kinsbursky
ec28e02ca5 nfsd4: remove state lock from nfs4_state_shutdown
Protection of __nfs4_state_shutdown() with nfs4_lock_state() looks redundant.

This function is called by the last NFSd thread on it's exit and state lock
protects actually two functions (del_recall_lru is protected by recall_lock):
1) nfsd4_client_tracking_exit
2) __nfs4_state_shutdown_net

"nfsd4_client_tracking_exit" doesn't require state lock protection, because it's
state can be modified only by tracker callbacks.
Here a re they:
1) create: is called only from nfsd4_proc_compound.
2) remove: is called from either nfsd4_proc_compound or nfs4_laundromat.
3) check: is called only from nfsd4_proc_compound.
4) grace_done; called only from nfs4_laundromat.

nfsd4_proc_compound is called onll by NFSd kthread, which is exiting right
now.
nfs4_laundromat is called by laundry_wq. But laundromat_work was canceled
already.

"__nfs4_state_shutdown_net" also doesn't require state lock protection,
because all NFSd kthreads are dead, and no race can happen with NFSd start,
because "nfsd_up" flag is still set.
Moreover, all Nfsd shutdown is protected with global nfsd_mutex.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:49 -05:00
J. Bruce Fields
dba88ba55a nfsd4: remove state lock from nfsd4_load_reboot_recovery_data
That function is only called under nfsd_mutex: we know that because the
only caller is nfsd_svc, via

        nfsd_svc
          nfsd_startup
            nfs4_state_start
              nfsd4_client_tracking_init
                client_tracking_ops->init == nfsd4_load_reboot_recovery_data

The shared state accessed here includes:

        - user_recovery_dirname: used here, modified only by
          nfs4_reset_recoverydir, which can be verified to only be
          called under nfsd_mutex.
        - filesystem state, protected by i_mutex (handwaving slightly
	  here)
        - rec_file, reclaim_str_hashtbl, reclaim_str_hashtbl_size: other
          than here, used only from code called from nfsd or laundromat
          threads, both of which should be started only after this runs
          (see nfsd_svc) and stopped before this could run again (see
          nfsd_shutdown, called from nfsd_last_thread).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-28 10:13:48 -05:00
J. Bruce Fields
a36b1725b3 nfsd4: return badname, not inval, on "." or "..", or "/"
The spec requires badname, not inval, in these cases.

Some callers want us to return enoent, but I can see no justification
for that.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-27 16:41:48 -05:00
J. Bruce Fields
063b0fb9fa nfsd4: downgrade some fs/nfsd/nfs4state.c BUG's
Linus has pointed out that indiscriminate use of BUG's can make it
harder to diagnose bugs because they can bring a machine down, often
before we manage to get any useful debugging information to the logs.
(Consider, for example, a BUG() that fires in a workqueue, or while
holding a spinlock).

Most of these BUG's won't do much more than kill an nfsd thread, but it
would still probably be safer to get out the warning without dying.

There's still more of this to do in nfsd/.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:16 -05:00
J. Bruce Fields
ffe1137ba7 nfsd4: delay filling in write iovec array till after xdr decoding
Our server rejects compounds containing more than one write operation.
It's unclear whether this is really permitted by the spec; with 4.0,
it's possibly OK, with 4.1 (which has clearer limits on compound
parameters), it's probably not OK.  No client that we're aware of has
ever done this, but in theory it could be useful.

The source of the limitation: we need an array of iovecs to pass to the
write operation.  In the worst case that array of iovecs could have
hundreds of elements (the maximum rwsize divided by the page size), so
it's too big to put on the stack, or in each compound op.  So we instead
keep a single such array in the compound argument.

We fill in that array at the time we decode the xdr operation.

But we decode every op in the compound before executing any of them.  So
once we've used that array we can't decode another write.

If we instead delay filling in that array till the time we actually
perform the write, we can reuse it.

Another option might be to switch to decoding compound ops one at a
time.  I considered doing that, but it has a number of other side
effects, and I'd rather fix just this one problem for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:15 -05:00
J. Bruce Fields
70cc7f75b1 nfsd4: move more write parameters into xdr argument
In preparation for moving some of this elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:14 -05:00
J. Bruce Fields
5a80a54d21 nfsd4: reorganize write decoding
In preparation for moving some of it elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:14 -05:00
J. Bruce Fields
8a61b18c9b nfsd4: simplify reading of opnum
The comment here is totally bogus:
	- OP_WRITE + 1 is RELEASE_LOCKOWNER.  Maybe there was some older
	  version of the spec in which that served as a sort of
	  OP_ILLEGAL?  No idea, but it's clearly wrong now.
	- In any case, I can't see that the spec says anything about
	  what to do if the client sends us less ops than promised.
	  It's clearly nutty client behavior, and we should do
	  whatever's easiest: returning an xdr error (even though it
	  won't be consistent with the error on the last op returned)
	  seems fine to me.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:13 -05:00
J. Bruce Fields
447bfcc936 nfsd4: no, we're not going to check tags for utf8
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:08:12 -05:00
J. Bruce Fields
57d276d71a nfsd: fix v4 reply caching
Very embarassing: 1091006c5e "nfsd: turn
on reply cache for NFSv4" missed a line, effectively leaving the reply
cache off in the v4 case.  I thought I'd tested that, but I guess not.

This time, wrote a pynfs test to confirm it works.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-26 09:05:19 -05:00
Stanislav Kinsbursky
0912128149 nfsd: make laundromat network namespace aware
This patch moves laundromat_work to nfsd per-net context, thus allowing to run
multiple laundries.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:51 -05:00
Stanislav Kinsbursky
12760c6685 nfsd: pass nfsd_net instead of net to grace enders
Passing net context looks as overkill.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky
3320fef19b nfsd: use service net instead of hard-coded init_net
This patch replaces init_net by SVC_NET(), where possible and also passes
proper context to nested functions where required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky
73758fed71 nfsd: make close_lru list per net
This list holds nfs4 clients (open) stateowner queue for last close replay,
which are network namespace aware. So let's make this list per network
namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:49 -05:00
Stanislav Kinsbursky
5ed58bb243 nfsd: make client_lru list per net
This list holds nfs4 clients queue for lease renewal, which are network
namespace aware. So let's make this list per network namespace too.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:48 -05:00
Stanislav Kinsbursky
1872de0e81 nfsd: make sessionid_hashtbl allocated per net
This hash holds established sessions state and closely associated with
nfs4_clients info, which are network namespace aware. So let's make it
allocated per network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky
20e9e2bc98 nfsd: make lockowner_ino_hashtbl allocated per net
This hash holds file lock owners and closely associated with nfs4_clients info,
which are network namespace aware. So let's make it allocated per network
namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky
9b53113740 nfsd: make ownerstr_hashtbl allocated per net
This hash holds open owner state and closely associated with nfs4_clients
info, which are network namespace aware. So let's make it allocated per
network namespace too.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:46 -05:00
Stanislav Kinsbursky
a99454aa4f nfsd: make unconf_name_tree per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky
0a7ec37727 nfsd: make unconf_id_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky
382a62e76c nfsd: make conf_name_tree per net
This tree holds nfs4_clients info, which are network namespace aware.
So let's make it per network namespace.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:44 -05:00
Stanislav Kinsbursky
8daae4dc0d nfsd: make conf_id_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash can be allocated in per-net operations. But it looks
better to allocate it on nfsd state start and thus don't waste resources
if server is not running.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky
52e19c09a1 nfsd: make reclaim_str_hashtbl allocated per net
This hash holds nfs4_clients info, which are network namespace aware.
So let's make it allocated per network namespace.

Note: this hash is used only by legacy tracker. So let's allocate hash in
tracker init.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky
c212cecfa2 nfsd: make nfs4_client network namespace dependent
And use it's net where possible.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:42 -05:00
Stanislav Kinsbursky
7f2210fa6b nfsd: use service net instead of hard-coded net where possible
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-15 07:40:41 -05:00
Fengguang Wu
2b4cf668a7 nfsd4: get_backchannel_cred should be static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-14 11:23:00 -05:00
Fengguang Wu
135ae8270d nfsd4: init_session should be declared static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-14 11:23:00 -05:00
Jeff Layton
7e4f015d81 nfsd: release the legacy reclaimable clients list in grace_done
The current code holds on to this list until nfsd is shut down, but it's
never touched once the grace period ends. Release that memory back into
the wild when the grace period ends.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:12 -05:00
Jeff Layton
2216d449a9 nfsd: get rid of cl_recdir field
Remove the cl_recdir field from the nfs4_client struct. Instead, just
compute it on the fly when and if it's needed, which is now only when
the legacy client tracking code is in effect.

The error handling in the legacy client tracker is also changed to
handle the case where md5 is unavailable. In that case, we'll warn
the admin with a KERN_ERR message and disable the client tracking.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
ac55fdc408 nfsd: move the confirmed and unconfirmed hlists to a rbtree
The current code requires that we md5 hash the name in order to store
the client in the confirmed and unconfirmed trees. Change it instead
to store the clients in a pair of rbtrees, and simply compare the
cl_names directly instead of hashing them. This also necessitates that
we add a new flag to the clp->cl_flags field to indicate which tree
the client is currently in.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
0ce0c2b5d2 nfsd: don't search for client by hash on legacy reboot recovery gracedone
When nfsd starts, the legacy reboot recovery code creates a tracking
struct for each directory in the v4recoverydir. When the grace period
ends, it basically does a "readdir" on the directory again, and matches
each dentry in there to an existing client id to see if it should be
removed or not. If the matching client doesn't exist, or hasn't
reclaimed its state then it will remove that dentry.

This is pretty inefficient since it involves doing a lot of hash-bucket
searching. It also means that we have to keep relying on being able to
search for a nfs4_client by md5 hashed cl_recdir name.

Instead, add a pointer to the nfs4_client that indicates the association
between the nfs4_client_reclaim and nfs4_client. When a reclaim operation
comes in, we set the pointer to make that association. On gracedone, the
legacy client tracker will keep the recdir around iff:

1/ there is a reclaim record for the directory

...and...

2/ there's an association between the reclaim record and a client record
-- that is, a create or check operation was performed on the client that
matches that directory.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
772a9bbbb5 nfsd: make nfs4_client_to_reclaim return a pointer to the reclaim record
Later callers will need to make changes to the record.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
ce30e5392f nfsd: break out reclaim record removal into separate function
We'll need to be able to call this from nfs4recover.c eventually.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
278c931cb0 nfsd: have nfsd4_find_reclaim_client take a char * argument
Currently, it takes a client pointer, but later we're going to need to
search for these records without knowing whether a matching client even
exists.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:11 -05:00
Jeff Layton
8b0554e9a2 nfsd: warn about impending removal of nfsdcld upcall
Let's shoot for removing the nfsdcld upcall in 3.10. Most likely,
no one is actually using it so I don't expect this warning to
fire often (except maybe on misconfigured systems).

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton
f3aa7e24c9 nfsd: pass info about the legacy recoverydir in environment variables
The usermodehelper upcall program can then decide to use this info as
a (one-way) transition mechanism to the new scheme. When a "check"
upcall occurs and the client doesn't exist in the database, we can
look to see whether the directory exists. If it does, then we'd add
the client to the database, remove the legacy recdir, and return
success to the kernel to allow the recovery to proceed.

For gracedone, we simply pass the v4recovery "topdir" so that the
upcall can clean it out prior to returning to the kernel.

A module parm is also added to disable the legacy conversion if
the admin chooses.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton
2d77bf0a55 nfsd: change heuristic for selecting the client_tracking_ops
First, try to use the new usermodehelper upcall. It should succeed or
fail quickly, so there's little cost to doing so.

If it fails, and the legacy tracking dir exists, use that. If it
doesn't exist then fall back to using nfsdcld.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton
2873d2147e nfsd: add a usermodehelper upcall for NFSv4 client ID tracking
Add a new client tracker upcall type that uses call_usermodehelper to
call out to a program. This seems to be the preferred method of
calling out to usermode these days for seldom-called upcalls. It's
simple and doesn't require a running daemon, so it should "just work"
as long as the binary is installed.

The client tracking exit operation is also changed to check for a
NULL pointer before running. The UMH upcall doesn't need to do anything
at module teardown time.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-12 18:55:10 -05:00
Jeff Layton
a0af710a65 nfsd: remove unused argument to nfs4_has_reclaimed_state
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-10 14:56:54 -05:00
Jeff Layton
698d8d875a nfsd: fix error handling in nfsd4_remove_clid_dir
If the credential save fails, then we'll leak our mnt_want_write_file
reference.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-10 14:52:03 -05:00
J. Bruce Fields
12fc3e92d4 nfsd4: backchannel should use client-provided security flavor
For now this only adds support for AUTH_NULL.  (Previously we assumed
AUTH_UNIX.)  We'll also need AUTH_GSS, which is trickier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:40:05 -05:00
J. Bruce Fields
57725155dc nfsd4: common helper to initialize callback work
I've found it confusing having the only references to
nfsd4_do_callback_rpc() in a different file.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:40:04 -05:00
J. Bruce Fields
cb73a9f464 nfsd4: implement backchannel_ctl operation
This operation is mandatory for servers to implement.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:39:58 -05:00
J. Bruce Fields
c6bb3ca27d nfsd4: use callback security parameters in create_session
We're currently ignoring the callback security parameters specified in
create_session, and just assuming the client wants auth_sys, because
that's all the current linux client happens to care about.  But this
could cause us callbacks to fail to a client that wanted something
different.

For now, all we're doing is no longer ignoring the uid and gid passed in
the auth_sys case.  Further patches will add support for auth_null and
gss (and possibly use more of the auth_sys information; the spec wants
us to use exactly the credential we're passed, though it's hard to
imagine why a client would care).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:35 -05:00
J. Bruce Fields
acb2887e04 nfsd4: clean up callback security parsing
Move the callback parsing into a separate function.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:35 -05:00
J. Bruce Fields
face15025f nfsd: use vfs_fsync_range(), not O_SYNC, for stable writes
NFSv4 shares the same struct file across multiple writes.  (And we'd
like NFSv2 and NFSv3 to do that as well some day.)

So setting O_SYNC on the struct file as a way to request a synchronous
write doesn't work.

Instead, do a vfs_fsync_range() in that case.

Reported-by: Peter Staubach <pstaubach@exagrid.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:34 -05:00
J. Bruce Fields
fae5096ad2 nfsd: assume writeable exportabled filesystems have f_sync
I don't really see how you could claim to support nfsd and not support
fsync somehow.

And in practice a quick look through the exportable filesystems suggests
the only ones without an ->fsync are read-only (efs, isofs, squashfs) or
in-memory (shmem).

Also, performing a write and then returning an error if the sync fails
(as we would do here in the wgather case) seems unhelpful to clients.

Also remove an incorrect comment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:33 -05:00
J. Bruce Fields
7fa10cd12d nfsd4: don't BUG in delegation break callback
These conditions would indeed indicate bugs in the code, but if we want
to hear about them we're likely better off warning and returning than
immediately dying while holding file_lock_lock.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:33 -05:00
J. Bruce Fields
7c1f8b65af nfsd4: remove unused init_session return
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:31 -05:00
J. Bruce Fields
ae7095a7c4 nfsd4: helper function for getting mounted_on ino
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:31:31 -05:00
Yanchuan Nian
3c40794b2d nfs: fix wrong object type in lockowner_slab
The object type in the cache of lockowner_slab is wrong, and it is
better to fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:30:57 -05:00
Wei Yongjun
01f6c8fd94 nfsd4: remove unused variable in nfsd4_delegreturn()
The variable inode is initialized but never used
otherwise, so remove the unused variable.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:22:31 -05:00
Namjae Jeon
216b6cbdcb exportfs: add FILEID_INVALID to indicate invalid fid_type
This commit adds FILEID_INVALID = 0xff in fid_type to
indicate invalid fid_type

It avoids using magic number 255

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-11-07 19:22:30 -05:00
J. Bruce Fields
f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Linus Torvalds
aab174f0df Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs update from Al Viro:

 - big one - consolidation of descriptor-related logics; almost all of
   that is moved to fs/file.c

   (BTW, I'm seriously tempted to rename the result to fd.c.  As it is,
   we have a situation when file_table.c is about handling of struct
   file and file.c is about handling of descriptor tables; the reasons
   are historical - file_table.c used to be about a static array of
   struct file we used to have way back).

   A lot of stray ends got cleaned up and converted to saner primitives,
   disgusting mess in android/binder.c is still disgusting, but at least
   doesn't poke so much in descriptor table guts anymore.  A bunch of
   relatively minor races got fixed in process, plus an ext4 struct file
   leak.

 - related thing - fget_light() partially unuglified; see fdget() in
   there (and yes, it generates the code as good as we used to have).

 - also related - bits of Cyrill's procfs stuff that got entangled into
   that work; _not_ all of it, just the initial move to fs/proc/fd.c and
   switch of fdinfo to seq_file.

 - Alex's fs/coredump.c spiltoff - the same story, had been easier to
   take that commit than mess with conflicts.  The rest is a separate
   pile, this was just a mechanical code movement.

 - a few misc patches all over the place.  Not all for this cycle,
   there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
  MAX_LFS_FILESIZE should be a loff_t
  compat: fs: Generic compat_sys_sendfile implementation
  fs: push rcu_barrier() from deactivate_locked_super() to filesystems
  btrfs: reada_extent doesn't need kref for refcount
  coredump: move core dump functionality into its own file
  coredump: prevent double-free on an error path in core dumper
  usb/gadget: fix misannotations
  fcntl: fix misannotations
  ceph: don't abuse d_delete() on failure exits
  hypfs: ->d_parent is never NULL or negative
  vfs: delete surplus inode NULL check
  switch simple cases of fget_light to fdget
  new helpers: fdget()/fdput()
  switch o2hb_region_dev_write() to fget_light()
  proc_map_files_readdir(): don't bother with grabbing files
  make get_file() return its argument
  vhost_set_vring(): turn pollstart/pollstop into bool
  switch prctl_set_mm_exe_file() to fget_light()
  switch xfs_find_handle() to fget_light()
  switch xfs_swapext() to fget_light()
  ...
2012-10-02 20:25:04 -07:00
Linus Torvalds
437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
J. Bruce Fields
0d22f68f02 nfsd4: don't allow reclaims of expired clients
When a confirmed client expires, we normally also need to expire any
stable storage record which would allow that client to reclaim state on
the next boot.  We forgot to do this in some cases.  (For example, in
destroy_clientid, and in the cases in exchange_id and create_session
that destroy and existing confirmed client.)

But in most other cases, there's really no harm to calling
nfsd4_client_record_remove(), because it is a no-op in the case the
client doesn't have an existing

The single exception is destroying a client on shutdown, when we want to
keep the stable storage records so we can recognize which clients will
be allowed to reclaim when we come back up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:04 -04:00
J. Bruce Fields
6a3b156342 nfsd4: remove redundant callback probe
Both nfsd4_init_conn and alloc_init_session are probing the callback
channel, harmless but pointless.

Also, nfsd4_init_conn should probably be probing in the "unknown" case
as well.  In fact I don't see any harm to just doing it unconditionally
when we get a new backchannel connection.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:03 -04:00
J. Bruce Fields
8f9d3d3b7c nfsd4: expire old client earlier
Before we had to delay expiring a client till we'd found out whether the
session and connection allocations would succeed.  That's no longer
necessary.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:03 -04:00
J. Bruce Fields
81f0b2a496 nfsd4: separate session allocation and initialization
This will allow some further simplification.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:02 -04:00
J. Bruce Fields
a827bcb242 nfsd4: clean up session allocation
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:01 -04:00
J. Bruce Fields
1377b69e68 nfsd4: minor free_session cleanup
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:00 -04:00
J. Bruce Fields
e1ff371f9d nfsd4: new_conn_from_crses should only allocate
Do the initialization in the caller, and clarify that the only failure
ever possible here was due to allocation.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:40:00 -04:00
J. Bruce Fields
3ba6367124 nfsd4: separate connection allocation and initialization
It'll be useful to have connection allocation and initialization as
separate functions.

Also, note we'd been ignoring the alloc_conn error return in
bind_conn_to_session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:59 -04:00
J. Bruce Fields
4973050148 nfsd4: reject bad forechannel attrs earlier
This could simplify the logic a little later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:58 -04:00
J. Bruce Fields
d15c077e44 nfsd4: enforce per-client sessions/no-sessions distinction
Something like creating a client with setclientid and then trying to
confirm it with create_session may not crash the server, but I'm not
completely positive of that, and in any case it's obviously bad client
behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:58 -04:00
J. Bruce Fields
c116a0af76 nfsd4: set cl_minorversion at create time
And remove some mostly obsolete comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:57 -04:00
J. Bruce Fields
68eb35081e nfsd4: don't pin clientids to pseudoflavors
I added cr_flavor to the data compared in same_creds without any
justification, in d5497fc693 "nfsd4: move
rq_flavor into svc_cred".

Recent client changes then started making

	mount -osec=krb5 server:/export /mnt/
	echo "hello" >/mnt/TMP
	umount /mnt/
	mount -osec=krb5i server:/export /mnt/
	echo "hello" >/mnt/TMP

to fail due to a clid_inuse on the second open.

Mounting sequentially like this with different flavors probably isn't
that common outside artificial tests.  Also, the real bug here may be
that the server isn't just destroying the former clientid in this case
(because it isn't good enough at recognizing when the old state is
gone).  But it prompted some discussion and a look back at the spec, and
I think the check was probably wrong.  Fix and document.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-01 17:39:14 -04:00
Al Viro
cb0942b812 make get_file() return its argument
simplifies a bunch of callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-09-26 21:10:25 -04:00
J. Bruce Fields
6e67b5d184 nfsd4: fix bind_conn_to_session xdr comment
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-25 13:26:42 -04:00
Eric W. Biederman
5f3a4a28ec userns: Pass a userns parameter into posix_acl_to_xattr and posix_acl_from_xattr
- Pass the user namespace the uid and gid values in the xattr are stored
   in into posix_acl_from_xattr.

 - Pass the user namespace kuid and kgid values should be converted into
   when storing uid and gid values in an xattr in posix_acl_to_xattr.

- Modify all callers of posix_acl_from_xattr and posix_acl_to_xattr to
  pass in &init_user_ns.

In the short term this change is not strictly needed but it makes the
code clearer.  In the longer term this change is necessary to be able to
mount filesystems outside of the initial user namespace that natively
store posix acls in the linux xattr format.

Cc: Theodore Tso <tytso@mit.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-09-18 01:01:35 -07:00
J. Bruce Fields
fac7a17b5f nfsd4: cast readlink() bug argument
As we already do in readv, writev.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:19 -04:00
Malahal Naineni
9959ba0c24 NFSD: pass null terminated buf to kstrtouint()
The 'buf' is prepared with null termination with intention of using it for
this purpose, but 'name' is passed instead!

Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:19 -04:00
Namjae Jeon
8c8651b8e2 nfsd: remove duplicate init in nfsd4_cb_recall
remove duplicate init in nfsd4_cb_recall

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:18 -04:00
J. Bruce Fields
ef79859e04 nfsd4: eliminate redundant nfs4_free_stateid
Somehow we ended up with identical functions "nfs4_free_stateid" and
"free_generic_stateid".

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:17 -04:00
Julia Lawall
92566e287d fs/nfsd/nfs4idmap.c: adjust inconsistent IS_ERR and PTR_ERR
Change the call to PTR_ERR to access the value just tested by IS_ERR.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e,e1;
@@

(
if (IS_ERR(e)) { ... PTR_ERR(e) ... }
|
if (IS_ERR(e=e1)) { ... PTR_ERR(e) ... }
|
*if (IS_ERR(e))
 { ...
*  PTR_ERR(e1)
   ... }
)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 17:46:17 -04:00
J. Bruce Fields
eccf50c129 nfsd: remove unused listener-removal interfaces
You can use nfsd/portlist to give nfsd additional sockets to listen on.
In theory you can also remove listening sockets this way.  But nobody's
ever done that as far as I can tell.

Also this was partially broken in 2.6.25, by
a217813f90 "knfsd: Support adding
transports by writing portlist file".

(Note that we decide whether to take the "delfd" case by checking for a
digit--but what's actually expected in that case is something made by
svc_one_sock_name(), which won't begin with a digit.)

So, let's just rip out this stuff.

Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 10:55:19 -04:00
J. Bruce Fields
cf9182e90b nfsd4: fix nfs4 stateid leak
Processes that open and close multiple files may end up setting this
oo_last_closed_stid without freeing what was previously pointed to.
This can result in a major leak, visible for example by watching the
nfsd4_stateids line of /proc/slabinfo.

Reported-by: Cyril B. <cbay@excellency.fr>
Tested-by: Cyril B. <cbay@excellency.fr>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-09-10 10:55:14 -04:00
J. Bruce Fields
5b444cc9a4 svcrpc: remove handling of unknown errors from svc_recv
svc_recv() returns only -EINTR or -EAGAIN.  If we really want to worry
about the case where it has a bug that causes it to return something
else, we could stick a WARN() in svc_recv.  But it's silly to require
every caller to have all this boilerplate to handle that case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:42:00 -04:00
J. Bruce Fields
a10fded18e nfsd: allow configuring nfsd to listen on 5-digit ports
Note a 16-bit value can require up to 5 digits.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:50 -04:00
J. Bruce Fields
38af2cabb6 nfsd: remove redundant "port" argument
"port" in all these functions is always NFS_PORT.

nfsd can already be run on a nonstandard port using the "nfsd/portlist"
interface.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 17:07:49 -04:00
Jeff Layton
21179d81f1 knfsd: don't allocate file_locks on the stack
struct file_lock is pretty large and really ought not live on the stack.
On my x86_64 machine, they're almost 200 bytes each.

    (gdb) p sizeof(struct file_lock)
    $1 = 192

...allocate them dynamically instead.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:39 -04:00
Jeff Layton
5592a3f397 knfsd: remove bogus BUG_ON() call from nfsd4_locku
The code checks for a NULL filp and handles it gracefully just before
this BUG_ON.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 14:08:38 -04:00
J. Bruce Fields
da5c80a935 nfsd4: nfsd_process_n_delegations should be static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-21 13:59:39 -04:00
Bryan Schumaker
24ff99c6fe NFSD: Swap the struct nfs4_operation getter and setter
stateid_setter should be matched to op_set_currentstateid, rather than
op_get_currentstateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:53:25 -04:00
J. Bruce Fields
95c7a20aeb nfsd: do_nfsd_create verf argument is a u32
The types here are actually a bit of a mess.  For now cast as we do in
the v4 case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:49 -04:00
J. Bruce Fields
87f26f9b08 nfsd4: declare nfs4_recoverydir properly
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:49 -04:00
J. Bruce Fields
9c0b0ff799 nfsd4: nfsaclsvc_encode_voidres static
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:49 -04:00
Jeff Layton
1696c47ce2 nfsd: trivial comment updates
locks.c doesn't use the BKL anymore and there is no fi_perfile field.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:42 -04:00
J. Bruce Fields
39307655a1 nfsd4: fix security flavor of NFSv4.0 callback
Commit d5497fc693 "nfsd4: move rq_flavor
into svc_cred" forgot to remove cl_flavor from the client, leaving two
places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored.
After that patch, the latter was the one that was updated, but the
former was the one that the callback used.

Symptoms were a long delay on utime().  This is because the utime()
generated a setattr which recalled a delegation, but the cb_recall was
ignored by the client because it had the wrong security flavor.

Cc: stable@vger.kernel.org
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:36 -04:00
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Linus Torvalds
08843b79fb Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J. Bruce Fields:
 "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
  The one large piece is Stanislav's work to containerize the server's
  grace period--but that in itself is just one more step in a
  not-yet-complete project to allow fully containerized nfs service.

  There are a number of outstanding delegation, container, v4 state, and
  gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
  NFSd: make boot_time variable per network namespace
  NFSd: make grace end flag per network namespace
  Lockd: move grace period management from lockd() to per-net functions
  LockD: pass actual network namespace to grace period management functions
  LockD: manage grace list per network namespace
  SUNRPC: service request network namespace helper introduced
  NFSd: make nfsd4_manager allocated per network namespace context.
  LockD: make lockd manager allocated per network namespace
  LockD: manage grace period per network namespace
  Lockd: add more debug to host shutdown functions
  Lockd: host complaining function introduced
  LockD: manage used host count per networks namespace
  LockD: manage garbage collection timeout per networks namespace
  LockD: make garbage collector network namespace aware.
  LockD: mark host per network namespace on garbage collect
  nfsd4: fix missing fault_inject.h include
  locks: move lease-specific code out of locks_delete_lock
  locks: prevent side-effects of locks_release_private before file_lock is initialized
  NFSd: set nfsd_serv to NULL after service destruction
  NFSd: introduce nfsd_destroy() helper
  ...
2012-07-31 14:42:28 -07:00
Jan Kara
4a55c1017b nfsd: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: linux-nfs@vger.kernel.org
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Stanislav Kinsbursky
2c142baa7b NFSd: make boot_time variable per network namespace
NFSd's boot_time represents grace period start point in time.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
a51c84ed50 NFSd: make grace end flag per network namespace
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
5ccb0066f2 LockD: pass actual network namespace to grace period management functions
Passed network namespace replaced hard-coded init_net

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:22 -04:00
Stanislav Kinsbursky
9695c7057f SUNRPC: service request network namespace helper introduced
This is a cleanup patch - makes code looks simplier.
It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
Stanislav Kinsbursky
5e1533c788 NFSd: make nfsd4_manager allocated per network namespace context.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:49:21 -04:00
J. Bruce Fields
99dbb8fe09 nfsd4: fix missing fault_inject.h include
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-27 16:30:12 -04:00
Stanislav Kinsbursky
57c8b13e3c NFSd: set nfsd_serv to NULL after service destruction
In nfsd_destroy():

	if (destroy)
		svc_shutdown_net(nfsd_serv, net);
	svc_destroy(nfsd_server);

svc_shutdown_net(nfsd_serv, net) calls nfsd_last_thread(), which sets
nfsd_serv to NULL, causing a NULL dereference on the following line.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:21:31 -04:00
Stanislav Kinsbursky
19f7e2ca44 NFSd: introduce nfsd_destroy() helper
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:21:30 -04:00
J. Bruce Fields
a007c4c3e9 nfsd: add get_uint for u32's
I don't think there's a practical difference for the range of values
these interfaces should see, but it would be safer to be unambiguous.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:18:27 -04:00
Stanislav Kinsbursky
a6d88f293e NFSd: fix locking in nfsd_forget_delegations()
This patch adds recall_lock hold to nfsd_forget_delegations() to protect
nfsd_process_n_delegations() call.
Also, looks like it would be better to collect delegations to some local
on-stack list, and then unhash collected list. This split allows to
simplify locking, because delegation traversing is protected by recall_lock,
when delegation unhash is protected by client_mutex.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:18:27 -04:00
Vivek Trivedi
5559b50acd nfsd4: fix cr_principal comparison check in same_creds
This fixes a wrong check for same cr_principal in same_creds

Introduced by 8fbba96e5b "nfsd4: stricter
cred comparison for setclientid/exchange_id".

Cc: stable@vger.kernel.org
Signed-off-by: Vivek Trivedi <vtrivedi018@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-25 09:05:30 -04:00
Al Viro
765927b2d5 switch dentry_open() to struct path, make it grab references itself
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-23 00:01:29 +04:00
Al Viro
312b63fba9 don't pass nameidata * to vfs_create()
all we want is a boolean flag, same as the method gets now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14 16:34:50 +04:00
J. Bruce Fields
7f2e7dc0fd nfsd: share some function prototypes
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-10 16:41:35 -04:00
J. Bruce Fields
d91d0b5690 nfsd: allow owner_override only for regular files
We normally allow the owner of a file to override permissions checks on
IO operations, since:
	- the client will take responsibility for doing an access check
	  on open;
	- the permission checks offer no protection against malicious
	  clients--if they can authenticate as the file's owner then
	  they can always just change its permissions;
	- checking permission on each IO operation breaks the usual
	  posix rule that permission is checked only on open.

However, we've never allowed the owner to override permissions on
readdir operations, even though the above logic would also apply to
directories.  I've never heard of this causing a problem, probably
because a) simultaneously opening and creating a directory (with
restricted mode) isn't possible, and b) opening a directory, then
chmod'ing it, is rare.

Our disallowal of owner-override on directories appears to be an
accident, though--the readdir itself succeeds, and then we fail just
because lookup_one_len() calls in our filldir methods fail.

I'm not sure what the easiest fix for that would be.  For now, just make
this behavior obvious by denying the override right at the start.

This also fixes some odd v4 behavior: with the rdattr_error attribute
requested, it would perform the readdir but return an ACCES error with
each entry.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-10 16:41:35 -04:00
J. Bruce Fields
74dbafaf5d nfsd4: release openowners on free in >=4.1 case
We don't need to keep openowners around in the >=4.1 case, because they
aren't needed to handle CLOSE replays any more (that's a problem for
sessions).  And doing so causes unexpected failures on a subsequent
destroy_clientid to fail.

We probably also need something comparable for lock owners on last
unlock.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-10 16:41:34 -04:00
J. Bruce Fields
2930d381d2 nfsd4: our filesystems are normally case sensitive
Actually, xfs and jfs can optionally be case insensitive; we'll handle
that case in later patches.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-10 15:20:57 -04:00
J. Bruce Fields
4af825041b nfsd4: process_open2 cleanup
Note we can simplify the error handling a little by doing the truncate
earlier.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:43 -04:00
J. Bruce Fields
e1aaa8916f nfsd4: nfsd4_lock() cleanup
Share a little common logic.  And note the comments here are a little
out of date (e.g. we don't always create new state in the "new" case any
more.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:42 -04:00
J. Bruce Fields
9068bed1a3 nfsd4: remove unnecessary comment
For the most part readers of cl_cb_state only need a value that is
"eventually" right.  And the value is set only either 1) in response to
some change of state, in which case it's set to UNKNOWN and then a
callback rpc is sent to probe the real state, or b) in the handling of a
response to such a callback.  UNKNOWN is therefore always a "temporary"
state, and for the other states we're happy to accept last writer wins.

So I think we're OK here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:41 -04:00
Chuck Lever
7df302f75e NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID
According to RFC 5661, the TEST_STATEID operation is not allowed to
return NFS4ERR_STALE_STATEID.  In addition, RFC 5661 says:

15.1.16.5.  NFS4ERR_STALE_STATEID (Error Code 10023)

   A stateid generated by an earlier server instance was used.  This
   error is moot in NFSv4.1 because all operations that take a stateid
   MUST be preceded by the SEQUENCE operation, and the earlier server
   instance is detected by the session infrastructure that supports
   SEQUENCE.

I triggered NFS4ERR_STALE_STATEID while testing the Linux client's
NOGRACE recovery.  Bruce suggested an additional test that could be
useful to client developers.

Lastly, RFC 5661, section 18.48.3 has this:

 o  Special stateids are always considered invalid (they result in the
    error code NFS4ERR_BAD_STATEID).

An explicit check is made for those state IDs to avoid printk noise.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:40 -04:00
Weston Andros Adamson
2411967305 nfsd: probe the back channel on new connections
Initiate a CB probe when a new connection with the correct direction is added
to a session (IFF backchannel is marked as down).  Without this a
BIND_CONN_TO_SESSION has no effect on the internal backchannel state, which
causes the server to reply to every SEQUENCE op with the
SEQ4_STATUS_CB_PATH_DOWN flag set until DESTROY_SESSION.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-20 08:59:39 -04:00
J. Bruce Fields
bc2df47a40 nfsd4: BUG_ON(!is_spin_locked()) no good on UP kernels
Most frequent symptom was a BUG triggering in expire_client, with the
server locking up shortly thereafter.

Introduced by 508dc6e110 "nfsd41:
free_session/free_client must be called under the client_lock".

Cc: stable@kernel.org
Cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-06-14 13:54:08 -04:00
Linus Torvalds
419f431949 Merge branch 'for-3.5' of git://linux-nfs.org/~bfields/linux
Pull the rest of the nfsd commits from Bruce Fields:
 "... and then I cherry-picked the remainder of the patches from the
  head of my previous branch"

This is the rest of the original nfsd branch, rebased without the
delegation stuff that I thought really needed to be redone.

I don't like rebasing things like this in general, but in this situation
this was the lesser of two evils.

* 'for-3.5' of git://linux-nfs.org/~bfields/linux: (50 commits)
  nfsd4: fix, consolidate client_has_state
  nfsd4: don't remove rebooted client record until confirmation
  nfsd4: remove some dprintk's and a comment
  nfsd4: return "real" sequence id in confirmed case
  nfsd4: fix exchange_id to return confirm flag
  nfsd4: clarify that renewing expired client is a bug
  nfsd4: simpler ordering of setclientid_confirm checks
  nfsd4: setclientid: remove pointless assignment
  nfsd4: fix error return in non-matching-creds case
  nfsd4: fix setclientid_confirm same_cred check
  nfsd4: merge 3 setclientid cases to 2
  nfsd4: pull out common code from setclientid cases
  nfsd4: merge last two setclientid cases
  nfsd4: setclientid/confirm comment cleanup
  nfsd4: setclientid remove unnecessary terms from a logical expression
  nfsd4: move rq_flavor into svc_cred
  nfsd4: stricter cred comparison for setclientid/exchange_id
  nfsd4: move principal name into svc_cred
  nfsd4: allow removing clients not holding state
  nfsd4: rearrange exchange_id logic to simplify
  ...
2012-06-01 08:32:58 -07:00
Linus Torvalds
a00b6151a2 Merge branch 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux
Pull nfsd update from Bruce Fields.

* 'for-3.5-take-2' of git://linux-nfs.org/~bfields/linux: (23 commits)
  nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
  SUNRPC: split upcall function to extract reusable parts
  nfsd: allocate id-to-name and name-to-id caches in per-net operations.
  nfsd: make name-to-id cache allocated per network namespace context
  nfsd: make id-to-name cache allocated per network namespace context
  nfsd: pass network context to idmap init/exit functions
  nfsd: allocate export and expkey caches in per-net operations.
  nfsd: make expkey cache allocated per network namespace context
  nfsd: make export cache allocated per network namespace context
  nfsd: pass pointer to export cache down to stack wherever possible.
  nfsd: pass network context to export caches init/shutdown routines
  Lockd: pass network namespace to creation and destruction routines
  NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches
  nfsd: pass pointer to expkey cache down to stack wherever possible.
  nfsd: use hash table from cache detail in nfsd export seq ops
  nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops
  nfsd: use exp_put() for svc_export_cache put
  nfsd: use cache detail pointer from svc_export structure on cache put
  nfsd: add link to owner cache detail to svc_export structure
  nfsd: use passed cache_detail pointer expkey_parse()
  ...
2012-05-31 18:18:11 -07:00
J. Bruce Fields
6eccece90b nfsd4: fix, consolidate client_has_state
Whoops: first, I reimplemented the already-existing has_resources
without noticing; second, I got the test backwards.  I did pick a better
name, though.  Combine the two....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:39 -04:00
J. Bruce Fields
b9831b59f3 nfsd4: don't remove rebooted client record until confirmation
In the NFSv4.1 client-reboot case we're currently removing the client's
previous state in exchange_id.  That's wrong--we should be waiting till
the confirming create_session.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:34 -04:00
J. Bruce Fields
32f16b3823 nfsd4: remove some dprintk's and a comment
The comment is redundant, and if we really want dprintk's here they'd
probably be better in the common (check-slot_seqid) code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:31 -04:00
J. Bruce Fields
778df3f0fe nfsd4: return "real" sequence id in confirmed case
The client should ignore the returned sequence_id in the case where the
CONFIRMED flag is set on an exchange_id reply--and in the unconfirmed
case "1" is always the right response.  So it shouldn't actually matter
what we return here.

We could continue returning 1 just to catch clients ignoring the spec
here, but I'd rather be generous.  Other things equal, returning the
existing sequence_id seems more informative.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:27 -04:00
J. Bruce Fields
0f1ba0ef21 nfsd4: fix exchange_id to return confirm flag
Otherwise nfsd4_set_ex_flags writes over the return flags.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:21 -04:00
J. Bruce Fields
7447758be7 nfsd4: clarify that renewing expired client is a bug
This can't happen:
	- cl_time is zeroed only by unhash_client_locked, which is only
	  ever called under both the state lock and the client lock.
	- every caller of renew_client() should have looked up a
	  (non-expired) client and then called renew_client() all
	  without dropping the state lock.
	- the only other caller of renew_client_locked() is
	  release_session_client(), which first checks under the
	  client_lock that the cl_time is nonzero.

So make it clear that this is a bug, not something we handle.  I can't
quite bring myself to make this a BUG(), though, as there are a lot of
renew_client() callers, and returning here is probably safer than a
BUG().

We'll consider making it a BUG() after some more cleanup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:14 -04:00
J. Bruce Fields
90d700b779 nfsd4: simpler ordering of setclientid_confirm checks
The cases here divide into two main categories:

	- if there's an uncomfirmed record with a matching verifier,
	  then this is a "normal", succesful case: we're either creating
	  a new client, or updating an existing one.
	- otherwise, this is a weird case: a replay, or a server reboot.

Reordering to reflect that makes the code a bit more concise and the
logic a lot easier to understand.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:07 -04:00
J. Bruce Fields
f3d03b9202 nfsd4: setclientid: remove pointless assignment
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:06 -04:00
J. Bruce Fields
8695b90ac3 nfsd4: fix error return in non-matching-creds case
Note CLID_INUSE is for the case where two clients are trying to use the
same client-provided long-form client identifiers.  But what we're
looking at here is the server-returned shorthand client id--if those
clash there's a bug somewhere.

Fix the error return, pull the check out into common code, and do the
check unconditionally in all cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:04 -04:00
J. Bruce Fields
788c1eba50 nfsd4: fix setclientid_confirm same_cred check
New clients are created only by nfsd4_setclientid(), which always gives
any new client a unique clientid.  The only exception is in the
"callback update" case, in which case it may create an unconfirmed
client with the same clientid as a confirmed client.  In that case it
also checks that the confirmed client has the same credential.

Therefore, it is pointless for setclientid_confirm to check whether a
confirmed and unconfirmed client with the same clientid have matching
credentials--they're guaranteed to.

Instead, it should be checking whether the credential on the
setclientid_confirm matches either of those.  Otherwise, it could be
anyone sending the setclientid_confirm.  Granted, I can't see why anyone
would, but still it's probalby safer to check.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:03 -04:00
J. Bruce Fields
34b232bb37 nfsd4: merge 3 setclientid cases to 2
Boy, is this simpler.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:02 -04:00
J. Bruce Fields
8f9307119d nfsd4: pull out common code from setclientid cases
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:01 -04:00
J. Bruce Fields
ad72aae5ad nfsd4: merge last two setclientid cases
The code here is mostly the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:30:00 -04:00
J. Bruce Fields
63db46328a nfsd4: setclientid/confirm comment cleanup
Be a little more concise.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:59 -04:00
J. Bruce Fields
e98479b8d6 nfsd4: setclientid remove unnecessary terms from a logical expression
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields
d5497fc693 nfsd4: move rq_flavor into svc_cred
Move the rq_flavor into struct svc_cred, and use it in setclientid and
exchange_id comparisons as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:58 -04:00
J. Bruce Fields
8fbba96e5b nfsd4: stricter cred comparison for setclientid/exchange_id
The typical setclientid or exchange_id will probably be performed with a
credential that maps to either root or nobody, so comparing just uid's
is unlikely to be useful.  So, use everything else we can get our hands
on.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:57 -04:00
J. Bruce Fields
03a4e1f6dd nfsd4: move principal name into svc_cred
Instead of keeping the principal name associated with a request in a
structure that's private to auth_gss and using an accessor function,
move it to svc_cred.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields
631fc9ea05 nfsd4: allow removing clients not holding state
RFC 5661 actually says we should allow an exchange_id to remove a
matching client, even if the exchange_id comes from a different
principal, *if* the victim client lacks any state.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:55 -04:00
J. Bruce Fields
136e658d62 nfsd4: rearrange exchange_id logic to simplify
Minor cleanup: it's simpler to have separate code paths for the update
and non-update cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:54 -04:00
J. Bruce Fields
2dbb269dfe nfsd4: exchange_id cleanup: comments
Make these comments a bit more concise and uniform.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:53 -04:00
J. Bruce Fields
83e08fd46c nfsd4: exchange_id cleanup: local shorthands for repeated tests
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields
1a308118c2 nfsd4: allow an EXCHANGE_ID to kill a 4.0 client
Following rfc 5661 section 2.4.1, we can permit a 4.1 client to remove
an established 4.0 client's state.

(But we don't allow updates.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:52 -04:00
J. Bruce Fields
ea236d0704 nfsd4: exchange_id: check creds before killing confirmed client
We mustn't allow a client to destroy another client with established
state unless it has the right credential.

And some minor cleanup.

(Note: our comparison of credentials is actually pretty bogus currently;
that will need to be fixed in another patch.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:51 -04:00
J. Bruce Fields
2786cc3a05 nfsd4: exchange_id error cleanup
There's no point to the dprintk here as the main proc_compound loop
already does this.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
J. Bruce Fields
11ae681052 nfsd4: exchange_id has a pointless copy
We just verified above that these two verifiers are already the same.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:50 -04:00
Weston Andros Adamson
5fb35a3a9b nfsd: return 0 on reads of fault injection files
debugfs read operations were returning the contents of an uninitialized u64.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton
ce0fc43c5a nfsd: wrap all accesses to st_deny_bmap
Handle the st_deny_bmap in a similar fashion to the st_access_bmap. Add
accessor functions and use those instead of bare bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:48 -04:00
Jeff Layton
82c5ff1b14 nfsd: wrap accesses to st_access_bmap
Currently, we do this for the most part with "bare" bitops, but
eventually we'll need to expand the share mode code to handle access
and deny modes on other nodes.

In order to facilitate that code in the future, move to some generic
accessor functions. For now, these are mostly static inlines, but
eventually we'll want to move these to "real" functions that are
able to handle multi-node configurations or have a way to "swap in"
new operations to be done in lieu of or in conjunction with these
atomic bitops.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:47 -04:00
Jeff Layton
3a3286147f nfsd: make test_share a bool return
All of the callers treat the return that way already.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Jeff Layton
5ae037e599 nfsd: consolidate set_access and set_deny
These functions are identical. Also, rename them to bmap_to_share_mode
to better reflect what they do, and have them just return the result
instead of passing in a pointer to the storage location.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:46 -04:00
Chuck Lever
f07ea10dc8 NFSD: SETCLIENTID_CONFIRM returns NFS4ERR_CLID_INUSE too often
According to RFC 3530bis, the only items SETCLIENTID_CONFIRM processing
should be concerned with is the clientid, clientid verifier, and
principal.  The client's IP address is not supposed to be interesting.

And, NFS4ERR_CLID_INUSE is meant only for principal mismatches.

I triggered this logic with a prototype UCS client -- one that
uses the same nfs_client_id4 string for all servers.  The client
mounted our server via its IPv4, then via its IPv6 address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:45 -04:00
Stanislav Kinsbursky
786185b5f8 SUNRPC: move per-net operations from svc_destroy()
The idea is to separate service destruction and per-net operations,
because these are two different things and the mix looks ugly.

Notes:

1) For NFS server this patch looks ugly (sorry for that). But these
place will be rewritten soon during NFSd containerization.

2) LockD per-net counter increase int lockd_up() was moved prior to
make_socks() to make lockd_down_net() call safe in case of error.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:40 -04:00
Stanislav Kinsbursky
9793f7c889 SUNRPC: new svc_bind() routine introduced
This new routine is responsible for service registration in a specified
network context.

The idea is to separate service creation from per-net operations.

Note also: since registering service with svc_bind() can fail, the
service will be destroyed and during destruction it will try to
unregister itself from rpcbind. In this case unregistration has to be
skipped.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:39 -04:00
Weston Andros Adamson
e7a0444aef nfsd: add IPv6 addr escaping to fs_location hosts
The fs_location->hosts list is split on colons, but this doesn't work when
IPv6 addresses are used (they contain colons).
This patch adds the function nfsd4_encode_components_esc() to
allow the caller to specify escape characters when splitting on 'sep'.
In order to fix referrals, this patch must be used with the mountd patch
that similarly fixes IPv6 [] escaping.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields
45eaa1c1a1 nfsd4: fix change attribute endianness
Though actually this doesn't matter much, as NFSv4.0 clients are
required to treat the change attribute as opaque.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:38 -04:00
J. Bruce Fields
d1829b3824 nfsd4: fix free_stateid return endianness
Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields
57b7b43b40 nfsd4: int/__be32 fixes
In each of these cases there's a simple unambiguous correct choice, and
no actual bug.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:37 -04:00
J. Bruce Fields
bc1b542be9 nfsd4: preserve __user annotation on cld downcall msg
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:36 -04:00
J. Bruce Fields
2355c59644 nfsd4: fix missing "static"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
J. Bruce Fields
bfa4b36525 nfsd: state.c should include current_stateid.h
OK, admittedly I'm mainly just trying to shut sparse up.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-05-31 20:29:35 -04:00
Linus Torvalds
644473e9c6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
 "This is a course correction for the user namespace, so that we can
  reach an inexpensive, maintainable, and reasonably complete
  implementation.

  Highlights:
   - Config guards make it impossible to enable the user namespace and
     code that has not been converted to be user namespace safe.

   - Use of the new kuid_t type ensures the if you somehow get past the
     config guards the kernel will encounter type errors if you enable
     user namespaces and attempt to compile in code whose permission
     checks have not been updated to be user namespace safe.

   - All uids from child user namespaces are mapped into the initial
     user namespace before they are processed.  Removing the need to add
     an additional check to see if the user namespace of the compared
     uids remains the same.

   - With the user namespaces compiled out the performance is as good or
     better than it is today.

   - For most operations absolutely nothing changes performance or
     operationally with the user namespace enabled.

   - The worst case performance I could come up with was timing 1
     billion cache cold stat operations with the user namespace code
     enabled.  This went from 156s to 164s on my laptop (or 156ns to
     164ns per stat operation).

   - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
     Most uid/gid setting system calls treat these value specially
     anyway so attempting to use -1 as a uid would likely cause
     entertaining failures in userspace.

   - If setuid is called with a uid that can not be mapped setuid fails.
     I have looked at sendmail, login, ssh and every other program I
     could think of that would call setuid and they all check for and
     handle the case where setuid fails.

   - If stat or a similar system call is called from a context in which
     we can not map a uid we lie and return overflowuid.  The LFS
     experience suggests not lying and returning an error code might be
     better, but the historical precedent with uids is different and I
     can not think of anything that would break by lying about a uid we
     can't map.

   - Capabilities are localized to the current user namespace making it
     safe to give the initial user in a user namespace all capabilities.

  My git tree covers all of the modifications needed to convert the core
  kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  userns:  Silence silly gcc warning.
  cred: use correct cred accessor with regards to rcu read lock
  userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
  userns: Convert cgroup permission checks to use uid_eq
  userns: Convert tmpfs to use kuid and kgid where appropriate
  userns: Convert sysfs to use kgid/kuid where appropriate
  userns: Convert sysctl permission checks to use kuid and kgids.
  userns: Convert proc to use kuid/kgid where appropriate
  userns: Convert ext4 to user kuid/kgid where appropriate
  userns: Convert ext3 to use kuid/kgid where appropriate
  userns: Convert ext2 to use kuid/kgid where appropriate.
  userns: Convert devpts to use kuid/kgid where appropriate
  userns: Convert binary formats to use kuid/kgid where appropriate
  userns: Add negative depends on entries to avoid building code that is userns unsafe
  userns: signal remove unnecessary map_cred_ns
  userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
  userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
  userns: Convert stat to return values mapped from kuids and kgids
  userns: Convert user specfied uids and gids in chown into kuids and kgid
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
  ...
2012-05-23 17:42:39 -07:00
Eric W. Biederman
ae2975bc34 userns: Convert group_info values from gid_t to kgid_t.
As a first step to converting struct cred to be all kuid_t and kgid_t
values convert the group values stored in group_info to always be
kgid_t values.   Unless user namespaces are used this change should
have no effect.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03 03:27:21 -07:00
Randy Dunlap
8a7dc4b04b nfsd: fix nfs4recover.c printk format warning
Fix printk format warnings -- both items are size_t,
so use %zu to print them.

fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t'
fs/nfsd/nfs4recover.c:580:3: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30 12:28:48 -07:00
Jeff Layton
b108fe6b08 nfsd: trivial: use SEEK_SET instead of 0 in vfs_llseek
They're equivalent, but SEEK_SET is more informative...

Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-25 15:38:23 -04:00
Linus Torvalds
c6f5c93098 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "One bugfix, and one minor header fix from Jeff Layton while we're
  here"

* 'for-3.4' of git://linux-nfs.org/~bfields/linux:
  nfsd: include cld.h in the headers_install target
  nfsd: don't fail unchecked creates of non-special files
2012-04-19 14:54:52 -07:00
Al Viro
efe39651f0 nfsd: fix compose_entry_fh() failure exits
Restore the original logics ("fail on mountpoints, negatives and in
case of fh_compose() failures").  Since commit 8177e (nfsd: clean up
readdirplus encoding) that got broken -
	rv = fh_compose(fhp, exp, dchild, &cd->fh);
	if (rv)
	       goto out;
	if (!dchild->d_inode)
		goto out;
	rv = 0;
out:
is equivalent to
	rv = fh_compose(fhp, exp, dchild, &cd->fh);
out:
and the second check has no effect whatsoever...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:02 -04:00
Al Viro
afcf6792af nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
PTR_ERR(NULL) is going to be 0...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
02f5fde5df nfsd: fix endianness breakage in TEST_STATEID handling
->ts_id_status gets nfs errno, i.e. it's already big-endian; no need
to apply htonl() to it.  Broken by commit 174568 (NFSD: Added TEST_STATEID
operation) last year...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
04da6e9d63 nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
nfsd_open() already returns an NFS error value; only vfs_test_lock()
result needs to be fed through nfserrno().  Broken by commit 55ef12
(nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT)
three years ago...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:01 -04:00
Al Viro
96f6f98501 nfsd: fix b0rken error value for setattr on read-only mount
..._want_write() returns -EROFS on failure, _not_ an NFS error value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-04-13 10:12:00 -04:00
Stanislav Kinsbursky
f69adb2fe2 nfsd: allocate id-to-name and name-to-id caches in per-net operations.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:12:11 -04:00
Stanislav Kinsbursky
9e75a4dee0 nfsd: make name-to-id cache allocated per network namespace context
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:12:10 -04:00
Stanislav Kinsbursky
c2e76ef5e0 nfsd: make id-to-name cache allocated per network namespace context
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:12:10 -04:00
Stanislav Kinsbursky
43ec1a20bf nfsd: pass network context to idmap init/exit functions
These functions will be called from per-net operations.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:12:10 -04:00
Stanislav Kinsbursky
5717e01284 nfsd: allocate export and expkey caches in per-net operations.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:11:47 -04:00
Stanislav Kinsbursky
e5f06f720e nfsd: make expkey cache allocated per network namespace context
This patch also changes svcauth_unix_purge() function: added network namespace
as a parameter and thus loop over all networks was replaced by only one call
for ip map cache purge.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:11:46 -04:00
Stanislav Kinsbursky
b3853e0ea1 nfsd: make export cache allocated per network namespace context
This patch also changes prototypes of nfsd_export_flush() and exp_rootfh():
network namespace parameter added.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:11:11 -04:00
Stanislav Kinsbursky
2a75cfa64e nfsd: pass pointer to export cache down to stack wherever possible.
This cache will be per-net soon. And it's easier to get the pointer to desired
per-net instance only once and then pass it down instead of discovering it in
every place were required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-12 09:10:19 -04:00
Stanislav Kinsbursky
b89109bef4 nfsd: pass network context to export caches init/shutdown routines
These functions will be called from per-net operations.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 18:01:33 -04:00
Stanislav Kinsbursky
e3f70eadb7 Lockd: pass network namespace to creation and destruction routines
v2: dereference of most probably already released nlm_host removed in
nlmclnt_done() and reclaimer().

These routines are called from locks reclaimer() kernel thread. This thread
works in "init_net" network context and currently relays on persence on lockd
thread and it's per-net resources. Thus lockd_up() and lockd_down() can't relay
on current network context. So let's pass corrent one into them.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:06 -04:00
Stanislav Kinsbursky
f890edbbef NFSd: remove hard-coded dereferences to name-to-id and id-to-name caches
These dereferences to global static caches are redundant. They also prevents
converting these caches into per-net ones. So this patch is cleanup + precursor
of patch set,a which will make them per-net.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:05 -04:00
Stanislav Kinsbursky
c89172e36e nfsd: pass pointer to expkey cache down to stack wherever possible.
This cache will be per-net soon. And it's easier to get the pointer to desired
per-net instance only once and then pass it down instead of discovering it in
every place were required.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:05 -04:00
Stanislav Kinsbursky
83e0ed700d nfsd: use hash table from cache detail in nfsd export seq ops
Hard-code is redundant and will prevent from making caches per net ns.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:04 -04:00
Stanislav Kinsbursky
f2c7ea10f9 nfsd: pass svc_export_cache pointer as private data to "exports" seq file ops
Global svc_export_cache cache is going to be replaced with per-net instance. So
prepare the ground for it.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:03 -04:00
Stanislav Kinsbursky
a09581f294 nfsd: use exp_put() for svc_export_cache put
This patch replaces cache_put() call for svc_export_cache by exp_put() call.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:02 -04:00
Stanislav Kinsbursky
db3a353263 nfsd: add link to owner cache detail to svc_export structure
Without info about owner cache datail it won't be able to find out, which
per-net cache detail have to be.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:01 -04:00
Stanislav Kinsbursky
d4bb527e9e nfsd: use passed cache_detail pointer expkey_parse()
Using of hard-coded svc_expkey_cache pointer in expkey_parse() looks redundant.
Moreover, global cache will be replaced with per-net instance soon.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:00 -04:00
Jeff Layton
33dcc481ed nfsd: don't use locks_in_grace to determine whether to call nfs4_grace_end
It's possible that lockd or another lock manager might still be on the
list after we call nfsd4_end_grace. If the laundromat thread runs
again at that point, then we could end up calling nfsd4_end_grace more
than once.

That's not only inefficient, but calling nfsd4_recdir_purge_old more
than once could be problematic. Fix this by adding a new global
"grace_ended" flag and use that to determine whether we've already
called nfsd4_grace_end.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:55:00 -04:00
Jeff Layton
03af42c59e nfsd: trivial: remove unused variable from nfsd4_lock
..."fp" is set but never used.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:54:58 -04:00
J. Bruce Fields
9dc4e6c4d1 nfsd: don't fail unchecked creates of non-special files
Allow a v3 unchecked open of a non-regular file succeed as if it were a
lookup; typically a client in such a case will want to fall back on a
local open, so succeeding and giving it the filehandle is more useful
than failing with nfserr_exist, which makes it appear that nothing at
all exists by that name.

Similarly for v4, on an open-create, return the same errors we would on
an attempt to open a non-regular file, instead of returning
nfserr_exist.

This fixes a problem found doing a v4 open of a symlink with
O_RDONLY|O_CREAT, which resulted in the current client returning EEXIST.

Thanks also to Trond for analysis.

Cc: stable@kernel.org
Reported-by: Orion Poplawski <orion@cora.nwra.com>
Tested-by: Orion Poplawski <orion@cora.nwra.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-04-11 17:49:52 -04:00
Linus Torvalds
71db34fc43 Merge branch 'for-3.4' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from Bruce Fields:

Highlights:
 - Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
   moving us closer to a complete 4.1 implementation.
 - Bernd Schubert fixed a long-standing problem with readdir cookies on
   ext2/3/4.
 - Jeff Layton performed a long-overdue overhaul of the server reboot
   recovery code which will allow us to deprecate the current code (a
   rather unusual user of the vfs), and give us some needed flexibility
   for further improvements.
 - Like the client, we now support numeric uid's and gid's in the
   auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.

Plus miscellaneous bugfixes and cleanup.

Thanks to everyone!

There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5.  With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).

And the list of 4.1 todo's should be achievable for 3.5 as well:

   http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues

though we may still want a bit more experience with it before turning it
on by default.

* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
  nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
  nfsd4: use auth_unix unconditionally on backchannel
  nfsd: fix NULL pointer dereference in cld_pipe_downcall
  nfsd4: memory corruption in numeric_name_to_id()
  sunrpc: skip portmap calls on sessions backchannel
  nfsd4: allow numeric idmapping
  nfsd: don't allow legacy client tracker init for anything but init_net
  nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
  nfsd: add the infrastructure to handle the cld upcall
  nfsd: add a header describing upcall to nfsdcld
  nfsd: add a per-net-namespace struct for nfsd
  sunrpc: create nfsd dir in rpc_pipefs
  nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  NFSD: Fix nfs4_verifier memory alignment
  NFSD: Fix warnings when NFSD_DEBUG is not defined
  nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
  nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
  ext4: return 32/64-bit dir name hash according to usage type
  fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
  ...
2012-03-29 14:53:25 -07:00
Jeff Layton
797a9d797f nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
Otherwise, we get a warning or error similar to this when building with
CONFIG_NFSD_V4 disabled:

    ERROR: "nfsd4_cld_block" [fs/nfsd/nfsd.ko] undefined!

Fix this by wrapping the calls to rpc_pipefs_notifier_register and
..._unregister in another function and providing no-op replacements
when CONFIG_NFSD_V4 is disabled.

Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-29 08:01:07 -04:00
J. Bruce Fields
4ca1f872cd nfsd4: use auth_unix unconditionally on backchannel
This isn't actually correct, but it works with the Linux client, and
agrees with the behavior we used to have before commit 80fc015bdf.

Later patches will implement the spec-mandated behavior (which is to use
the security parameters explicitly given by the client in create_session
or backchannel_ctl).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-28 19:14:36 -04:00
Jeff Layton
21f72c9f0a nfsd: fix NULL pointer dereference in cld_pipe_downcall
If we find that "cup" is NULL in this case, then we obviously don't
want to dereference it. What we really want to print in this case
is the xid that we copied off earlier.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-28 10:10:24 -04:00
Dan Carpenter
3af706135b nfsd4: memory corruption in numeric_name_to_id()
"id" is type is a uid_t (32 bits) but on 64 bit systems strict_strtoul()
modifies 64 bits of data.  We should use kstrtouint() instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-28 10:10:23 -04:00
J. Bruce Fields
e9541ce8ef nfsd4: allow numeric idmapping
Mimic the client side by providing a module parameter that turns off
idmapping in the auth_sys case, for backwards compatibility with NFSv2
and NFSv3.

Unlike in the client case, we don't have any way to negotiate, since the
client can return an error to us if it doesn't like the id that we
return to it in (for example) a getattr call.

However, it has always been possible for servers to return numeric id's,
and as far as we're aware clients have always been able to handle them.

Also, in the auth_sys case clients already need to have numeric id's the
same between client and server.

Therefore we believe it's safe to default this to on; but the module
parameter is available to return to previous behavior if this proves to
be a problem in some unexpected setup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Jeff Layton
cc27e0d407 nfsd: don't allow legacy client tracker init for anything but init_net
This code isn't set up for containers, so don't allow it to be
used for anything but init_net.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Jeff Layton
813fd320c1 nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
In the event that rpc_pipefs isn't mounted when nfsd starts, we
must register a notifier to handle creating the dentry once it
is mounted, and to remove the dentry on unmount.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Jeff Layton
f3f8014862 nfsd: add the infrastructure to handle the cld upcall
...and add a mechanism for switching between the "legacy" tracker and
the new one. The decision is made by looking to see whether the
v4recoverydir exists. If it does, then the legacy client tracker is
used.

If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
That pipe is used to talk to a daemon for handling the upcall.

Most of the data structures for the new client tracker are handled on a
per-namespace basis, so this upcall should be essentially ready for
containerization. For now however, nfsd just starts it by calling the
initialization and exit functions for init_net.

I'm making the assumption that at some point in the future we'll be able
to determine the net namespace from the nfs4_client. Until then, this
patch hardcodes init_net in those places. I've sprinkled some "FIXME"
comments around that code to attempt to make it clear where we'll need
to fix that up later.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:48 -04:00
Jeff Layton
7ea34ac15e nfsd: add a per-net-namespace struct for nfsd
Eventually, we'll need this when nfsd gets containerized fully. For
now, create a struct on a per-net-namespace basis that will just hold
a pointer to the cld_net structure. That struct will hold all of the
per-net data that we need for the cld tracker.

Eventually we can add other pernet objects to struct nfsd_net.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:47 -04:00
Jeff Layton
2a4317c554 nfsd: add nfsd4_client_tracking_ops struct and a way to set it
Abstract out the mechanism that we use to track clients into a set of
client name tracking functions.

This gives us a mechanism to plug in a new set of client tracking
functions without disturbing the callers. It also gives us a way to
decide on what tracking scheme to use at runtime.

For now, this just looks like pointless abstraction, but later we'll
add a new alternate scheme for tracking clients on stable storage.

Note too that this patch anticipates the eventual containerization
of this code by passing in struct net pointers in places. No attempt
is made to containerize the legacy client tracker however.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:47 -04:00
Jeff Layton
a52d726bbd nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall. Currently, that's
recorded in the cl_firststate field of the client struct. Using an
entire u32 to store a flag is rather wasteful though.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new flags don't interfere.

Convert all references to cl_firstate to the NFSD4_CLIENT_STABLE flag,
and add a new NFSD4_CLIENT_RECLAIM_COMPLETE flag.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-26 11:49:47 -04:00
J. Bruce Fields
1df00640c9 Merge nfs containerization work from Trond's tree
The nfs containerization work is a prerequisite for Jeff Layton's reboot
recovery rework.
2012-03-26 11:48:54 -04:00
Linus Torvalds
f63d395d47 NFS client updates for Linux 3.4
New features include:
 - Add NFS client support for containers.
   This should enable most of the necessary functionality, including
   lockd support, and support for rpc.statd, NFSv4 idmapper and
   RPCSEC_GSS upcalls into the correct network namespace from
   which the mount system call was issued.
 - NFSv4 idmapper scalability improvements
   Base the idmapper cache on the keyring interface to allow concurrent
   access to idmapper entries. Start the process of migrating users from
   the single-threaded daemon-based approach to the multi-threaded
   request-key based approach.
 - NFSv4.1 implementation id.
   Allows the NFSv4.1 client and server to mutually identify each other
   for logging and debugging purposes.
 - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
   having to use the more counterintuitive 'vers=4,minorversion=1'.
 - SUNRPC tracepoints.
   Start the process of adding tracepoints in order to improve debugging
   of the RPC layer.
 - pNFS object layout support for autologin.
 
 Important bugfixes include:
 - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to fail
   to wake up all tasks when applied to priority waitqueues.
 - Ensure that we handle read delegations correctly, when we try to
   truncate a file.
 - A number of fixes for NFSv4 state manager loops (mostly to do with
   delegation recovery).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJPalZbAAoJEGcL54qWCgDyCi4P+QHcmzQhJO7HWx3Pzjs67bFT
 xMSYaKHGWS4AJKUBVl5OKBxUExfrMHBNbElV3IKUIwBlDx8RVtnwfptKSe146iki
 dn4TrRO5es8nmI4hRDcGMlzJDZq4y0Qg//qiUFmojiNW/Avw0ljfMoVUejJJ09FV
 oeDk4EGtcxkEyH+g48ZjYbyspRnG8qtD3atf70Z3lYE0ELdG/B5Dyzw1RDrA5p73
 xJX3lqy8p/4ROzw/dmNoxdAXOrr3Q4/T58Bvp/lUglPy/EHyPmWzFoH0MU0C/PFu
 5VnAl6QDbNCTcIw9FvJlX/mIyErpNG9eKzUskUc9L9SA+B+J/i4rIap4KATRN3nH
 7QhE5qUacPuJnvxml7MPmlQTuft3fkAQ7NhKIWrbRi1QS9FmJC5NxctIb8loqlFn
 yIXdKeLfMshB+NyuFS9uzStX7SmV3eMgVd+5ZxRjYxm+PKJLw2KXeudArL6M5mHK
 3QeKZpqwaYQ3RfaTNpvAp0doiXHCO5UbWfI0Pe8xQs/QcMCNReffqV2G4IJKFAu6
 WpoN2UDQC9LCBifLw2nS7kku8+ZVXLQU8OC1NVl3TG15xD9cNLXuk3/y5llPGq4O
 odo52uLFpJohbDaHMj5RTKOfchTQCm2iyuVmxZEeAySypMSiAXmW7COSKHs/HxI1
 VBm+EI00Pvmm5+fUjIlp
 =LuHE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates for Linux 3.4 from Trond Myklebust:
 "New features include:
   - Add NFS client support for containers.

     This should enable most of the necessary functionality, including
     lockd support, and support for rpc.statd, NFSv4 idmapper and
     RPCSEC_GSS upcalls into the correct network namespace from which
     the mount system call was issued.

   - NFSv4 idmapper scalability improvements

     Base the idmapper cache on the keyring interface to allow
     concurrent access to idmapper entries.  Start the process of
     migrating users from the single-threaded daemon-based approach to
     the multi-threaded request-key based approach.

   - NFSv4.1 implementation id.

     Allows the NFSv4.1 client and server to mutually identify each
     other for logging and debugging purposes.

   - Support the 'vers=4.1' mount option for mounting NFSv4.1 instead of
     having to use the more counterintuitive 'vers=4,minorversion=1'.

   - SUNRPC tracepoints.

     Start the process of adding tracepoints in order to improve
     debugging of the RPC layer.

   - pNFS object layout support for autologin.

  Important bugfixes include:

   - Fix a bug in rpc_wake_up/rpc_wake_up_status that caused them to
     fail to wake up all tasks when applied to priority waitqueues.

   - Ensure that we handle read delegations correctly, when we try to
     truncate a file.

   - A number of fixes for NFSv4 state manager loops (mostly to do with
     delegation recovery)."

* tag 'nfs-for-3.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (224 commits)
  NFS: fix sb->s_id in nfs debug prints
  xprtrdma: Remove assumption that each segment is <= PAGE_SIZE
  xprtrdma: The transport should not bug-check when a dup reply is received
  pnfs-obj: autologin: Add support for protocol autologin
  NFS: Remove nfs4_setup_sequence from generic rename code
  NFS: Remove nfs4_setup_sequence from generic unlink code
  NFS: Remove nfs4_setup_sequence from generic read code
  NFS: Remove nfs4_setup_sequence from generic write code
  NFS: Fix more NFS debug related build warnings
  SUNRPC/LOCKD: Fix build warnings when CONFIG_SUNRPC_DEBUG is undefined
  nfs: non void functions must return a value
  SUNRPC: Kill compiler warning when RPC_DEBUG is unset
  SUNRPC/NFS: Add Kbuild dependencies for NFS_DEBUG/RPC_DEBUG
  NFS: Use cond_resched_lock() to reduce latencies in the commit scans
  NFSv4: It is not safe to dereference lsp->ls_state in release_lockowner
  NFS: ncommit count is being double decremented
  SUNRPC: We must not use list_for_each_entry_safe() in rpc_wake_up()
  Try using machine credentials for RENEW calls
  NFSv4.1: Fix a few issues in filelayout_commit_pagelist
  NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit code
  ...
2012-03-23 08:53:47 -07:00
Al Viro
88187398cc debugfs-related mode_t whack-a-mole
all of those should be umode_t...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:53 -04:00
Al Viro
68ac1234fb switch touch_atime to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-03-20 21:29:41 -04:00
Chuck Lever
ab4684d156 NFSD: Fix nfs4_verifier memory alignment
Clean up due to code review.

The nfs4_verifier's data field is not guaranteed to be u32-aligned.
Casting an array of chars to a u32 * is considered generally
hazardous.

We can fix most of this by using a __be32 array to generate the
verifier's contents and then byte-copying it into the verifier field.

However, there is one spot where there is a backwards compatibility
constraint: the do_nfsd_create() call expects a verifier which is
32-bit aligned.  Fix this spot by forcing the alignment of the create
verifier in the nfsd4_open args structure.

Also, sizeof(nfs4_verifer) is the size of the in-core verifier data
structure, but NFS4_VERIFIER_SIZE is the number of octets in an XDR'd
verifier.  The two are not interchangeable, even if they happen to
have the same value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-20 15:36:15 -04:00
Trond Myklebust
8f199b8262 NFSD: Fix warnings when NFSD_DEBUG is not defined
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-20 15:34:19 -04:00
J. Bruce Fields
62b9510cb3 nfsd: merge cookie collision fixes from ext4 tree
These changes fix readdir loops on ext4 filesystems with dir_index
turned on.  I'm pulling them from Ted's tree as I'd like to give them
some extra nfsd testing, and expect to be applying (potentially
conflicting) patches to the same code before the next merge window.

From the nfs-ext4-premerge branch of

	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-19 12:35:05 -04:00
Bernd Schubert
06effdbb49 nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
2012-03-18 22:44:50 -04:00
Bernd Schubert
999448a8c0 nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
Just rename this variable, as the next patch will add a flag and
'access' as variable name would not be correct any more.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields<bfields@redhat.com>
2012-03-18 22:44:49 -04:00
J. Bruce Fields
8546ee518c nfsd4: make sure set CB_PATH_DOWN sequence flag set
Make sure this is set whenever there is no callback channel.

If a client does not set up a callback channel at all, then it will get
this flag set from the very start.  That's OK, it can just ignore the
flag if it doesn't care.  If a client does care, I think it's better to
inform it of the problem as early as possible.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-03-09 17:05:01 -05:00