linux_dsm_epyc7002/net/sunrpc
Weston Andros Adamson 6ff33b7dd0 sunrpc: Fix infinite loop in RPC state machine
When a task enters call_refreshresult with status 0 from call_refresh and
!rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
or max number of retries.

Instead of trying forever, make use of the retry path that other errors use.

This only seems to be possible when the crrefresh callback is gss_refresh_null,
which only happens when destroying the context.

To reproduce:

1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
   operations).

2) reboot - the client will be stuck and will need to be hard rebooted

BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
irq event stamp: 195724
hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
RSP: 0018:ffff880079003d18  EFLAGS: 00000246
RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
Stack:
 ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
Call Trace:
 [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffff81052974>] process_one_work+0x211/0x3a5
 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
 [<ffffffff81052eeb>] worker_thread+0x134/0x202
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff810584a0>] kthread+0xc9/0xd1
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00

And the output of "rpcdebug -m rpc -s all":

RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05 15:29:26 -05:00
..
auth_gss sunrpc: replace sunrpc_net->gssd_running flag with a more reliable check 2013-12-06 13:06:31 -05:00
xprtrdma Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux 2013-07-17 13:43:55 -07:00
addr.c sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h 2013-02-05 09:41:14 -05:00
auth_generic.c sunrpc: Add missing kuids conversion for printing 2013-09-12 10:16:06 -04:00
auth_null.c SUNRPC refactor rpcauth_checkverf error returns 2013-09-03 15:25:09 -04:00
auth_unix.c SUNRPC refactor rpcauth_checkverf error returns 2013-09-03 15:25:09 -04:00
auth.c shrinker: convert remaining shrinkers to count/scan API 2013-09-10 18:56:32 -04:00
backchannel_rqst.c SUNRPC: remove two BUG_ON asserts 2012-11-04 14:43:41 -05:00
bc_svc.c SUNRPC: remove BUG_ON from bc_send 2012-11-04 14:43:41 -05:00
cache.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-07-14 11:42:26 -07:00
clnt.c sunrpc: Fix infinite loop in RPC state machine 2014-01-05 15:29:26 -05:00
Kconfig Merge branch 'rpcsec_gss-from_cel' into linux-next 2013-04-23 15:40:40 -04:00
Makefile SUNRPC: sunrpc should not explicitly depend on NFS config options 2011-07-15 09:12:23 -04:00
netns.h sunrpc: replace sunrpc_net->gssd_running flag with a more reliable check 2013-12-06 13:06:31 -05:00
rpc_pipe.c rpc_pipe: fix cleanup of dummy gssd directory when notification fails 2013-12-10 19:39:53 +02:00
rpcb_clnt.c SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister 2013-08-07 17:07:18 -04:00
sched.c SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints 2013-09-04 14:45:13 -04:00
socklib.c sunrpc: remove the second argument of k[un]map_atomic() 2012-03-20 21:48:28 +08:00
stats.c SUNRPC: Deprecate rpc_client->cl_protname 2013-08-30 09:19:34 -04:00
sunrpc_syms.c sunrpc: create a new dummy pipe for gssd to hold open 2013-12-06 13:06:30 -05:00
sunrpc.h SUNRPC: subscribe RPC clients to pipefs notifications 2012-01-31 18:20:25 -05:00
svc_xprt.c svcrpc: fix rpc server shutdown races 2013-02-17 10:53:51 -05:00
svc.c svcrpc: remove an unnecessary assignment 2013-11-13 15:31:22 -05:00
svcauth_unix.c svcrpc: fix failures to handle -1 uid's 2013-07-08 17:27:23 -04:00
svcauth.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
svcsock.c net: fix build errors if ipv6 is disabled 2013-10-09 13:04:03 -04:00
sysctl.c net: Convert uses of typedef ctl_table to struct ctl_table 2013-06-13 02:36:09 -07:00
timer.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
xdr.c SUNRPC: Fix memory corruption issue on 32-bit highmem systems 2013-08-28 15:43:43 -04:00
xprt.c SUNRPC: Ensure xprt_connect_status handles all potential connection errors 2013-12-31 13:42:22 -05:00
xprtsock.c SUNRPC: Add tracepoint for socket errors 2013-12-31 14:02:46 -05:00