linux_dsm_epyc7002/fs/lockd
Calum Mackay 0e1c02e4e0 lockd: don't use interval-based rebinding over TCP
[ Upstream commit 9b82d88d5976e5f2b8015d58913654856576ace5 ]

NLM uses an interval-based rebinding, i.e. it clears the transport's
binding under certain conditions if more than 60 seconds have elapsed
since the connection was last bound.

This rebinding is not necessary for an autobind RPC client over a
connection-oriented protocol like TCP.

It can also cause problems: it is possible for nlm_bind_host() to clear
XPRT_BOUND whilst a connection worker is in the middle of trying to
reconnect, after it had already been checked in xprt_connect().

When the connection worker notices that XPRT_BOUND has been cleared
under it, in xs_tcp_finish_connecting(), that results in:

	xs_tcp_setup_socket: connect returned unhandled error -107

Worse, it's possible that the two can get into lockstep, resulting in
the same behaviour repeated indefinitely, with the above error every
300 seconds, without ever recovering, and the connection never being
established. This has been seen in practice, with a large number of NLM
client tasks, following a server restart.

The existing callers of nlm_bind_host & nlm_rebind_host should not need
to force the rebind, for TCP, so restrict the interval-based rebinding
to UDP only.

For TCP, we will still rebind when needed, e.g. on timeout, and connection
error (including closure), since connection-related errors on an existing
connection, ECONNREFUSED when trying to connect, and rpc_check_timeout(),
already unconditionally clear XPRT_BOUND.

To avoid having to add the fix, and explanation, to both nlm_bind_host()
and nlm_rebind_host(), remove the duplicate code from the former, and
have it call the latter.

Drop the dprintk, which adds no value over a trace.

Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Fixes: 35f5a422ce ("SUNRPC: new interface to force an RPC rebind")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:53:30 +01:00
..
clnt4xdr.c NFS: Remove print_overflow_msg() 2019-02-13 11:53:45 -05:00
clntlock.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
clntproc.c lockd: Make two symbols static 2019-07-03 17:52:09 -04:00
clntxdr.c NFS: Remove print_overflow_msg() 2019-02-13 11:53:45 -05:00
host.c lockd: don't use interval-based rebinding over TCP 2020-12-30 11:53:30 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
mon.c Replace HTTP links with HTTPS ones: NFS, SUNRPC, and LOCKD clients 2020-09-21 10:21:10 -04:00
netns.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
procfs.c proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
procfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
svc4proc.c lockd: Replace PROC() macro with open code 2020-10-02 09:37:41 -04:00
svc.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
svclock.c lockd: Make two symbols static 2019-07-03 17:52:09 -04:00
svcproc.c lockd: Replace PROC() macro with open code 2020-10-02 09:37:41 -04:00
svcshare.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
svcsubs.c lockd: Convert NLM service fl_owner to nlm_lockowner 2019-07-03 17:52:08 -04:00
xdr4.c lockd: Show pid of lockd for remote locks 2019-07-03 17:52:09 -04:00
xdr.c lockd: Show pid of lockd for remote locks 2019-07-03 17:52:09 -04:00