There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received. In
other cases it is the largest 'payload' that can be included in a NFS
message.
In either case, it is not possible for both the request and the reply to be
this large. One of the request or reply may only be one page long, which
fits nicely with NFS.
So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'. Max_payload is the size that the server requests. It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.
max_mesg is the largest single message that can be sent or received. It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead. Only one of the request and reply may be
this size. The other must be at most one page.
Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The rpc reply has multiple levels of error returns. The code here contributes
to the confusion by using "accept_statp" for a pointer to what the rfc (and
wireshark, etc.) refer to as the "reply_stat". (The confusion is compounded
by the fact that the rfc also has an "accept_stat" which follows the
reply_stat in the succesful case.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NFSACL patches introduced support for multiple RPC services listening on
the same transport. However, only the first of these services was registered
with portmapper. This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.
The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The limit over UDP remains at 32K. Also, make some of the apparently
arbitrary sizing constants clearer.
The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp. This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.
Note that we don't actually increase sv_bufsz for nfs yet. That comes next.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256. This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.
struct svc_rqst contains two such arrays. However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.
The two arrays are for the pages holding the request, and the pages holding
the reply. Instead of two arrays, we can simply keep an index into where the
first reply page is.
This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.
Also remove the 'rq_restailpage' variable as it is *always* 0. i.e. if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.
check counters are initilised and incr properly
check for consistant usage of ++ etc
maybe extra some inlines for common approach
general review
Signed-off-by: Neil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Actually implement multiple pools. On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool. Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e. the NIC interrupt CPU). Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.
This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.
Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv. Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code. Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv. The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.
The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
This is a major step towards making the NFS server NUMA-friendly.
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware. They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.
knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.
The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.
Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients. Number of NUMA nodes == N/2
N Throughput, MiB/s CPU usage, % (max=N*100)
Before After Before After
--- ------ ---- ----- -----
4 312 435 350 228
6 500 656 501 418
8 562 804 690 589
This patch:
Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes. This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.
[akpm@osdl.org: cleanup]
Signed-off-by: Greg Banks <gnb@melbourne.sgi.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.
[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
nfsd has some cleanup that it wants to do when the last thread exits, and
there will shortly be some more. So collect this all into one place and
define a callback for an rpc service to call when the service is about to be
destroyed.
[akpm@osdl.org: cleanups, build fix]
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
pure s/u32/__be32/
[AV: large part based on Alexey's patches]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
* add svc_getnl():
Take network-endian value from buffer, convert to host-endian
and return it.
* add svc_putnl():
Take host-endian value, convert to network-endian and put it
into a buffer.
* annotate svc_getu32()/svc_putu32() as dealing with network-endian.
* convert to svc_getnl(), svc_putnl().
[AV: in large part it's a carved-up Alexey's patch]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Server-side implementation of rpcsec_gss privacy, which enables encryption of
the payload of every rpc request and response.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
case so that the wrapping code will get copies of the read data instead of
real page cache pages. This makes life simpler when we encrypt the response.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This removes more unneeded casts on the return value for kmalloc(),
sock_kmalloc(), and vmalloc().
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of needless casting of kmalloc() return value in net/
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This changes some simple "if (x) BUG();" statements to "BUG_ON(x);"
Signed-off-by: Kris Katterjohn <kjak@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Jesper Juhl <jesper.juhl@gmail.com>
This is the net/ part of the big kfree cleanup patch.
Remove pointless checks for NULL prior to calling kfree() in net/.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
If an RPC socket is serving multiple programs, then the pg_authenticate of
the first program in the list is called, instead of pg_authenticate for the
program to be run.
This does not cause a problem with any programs in the current kernel, but
could confuse future code.
Also set pg_authenticate for nfsd_acl_program incase it ever gets used.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NFS and NFSACL programs run on the same RPC transport. This patch adds
support for this by converting svc_program into a chained list of programs
(server-side).
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix up call_header() so that it calls xdr_adjust_iovec().
Fix calculation of the scratch buffer length in xdr_init_encode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!