linux_dsm_epyc7002/fs/nfsd
Chuck Lever f46c445b79 nfsd: Fix general protection fault in release_lock_stateid()
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Fixes: 42691398be ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-11-01 15:24:43 -04:00
..
acl.h nfsd4: remove nfs4_acl_new 2014-07-08 17:14:27 -04:00
auth.c cred: simpler, 1D supplementary groups 2016-10-07 18:46:30 -07:00
auth.h
blocklayout.c fs: Replace current_fs_time() with current_time() 2016-09-27 21:06:22 -04:00
blocklayoutxdr.c Highlights: 2016-08-04 19:59:06 -04:00
blocklayoutxdr.h nfsd: add SCSI layout support 2016-03-18 11:42:53 -04:00
cache.h nfsd: Remove the cache_hash list 2014-08-17 12:00:12 -04:00
current_stateid.h
export.c nfsd: allow nfsd to advertise multiple layout types 2016-07-15 15:31:32 -04:00
export.h nfsd: allow nfsd to advertise multiple layout types 2016-07-15 15:31:32 -04:00
fault_inject.c nfsd: remove old fault injection infrastructure 2014-08-05 10:55:10 -04:00
flexfilelayout.c nfsd: don't set a FL_LAYOUT lease for flexfiles layouts 2016-09-16 16:15:52 -04:00
flexfilelayoutxdr.c nfsd: Add a super simple flex file server 2016-07-13 15:40:48 -04:00
flexfilelayoutxdr.h nfsd: Add a super simple flex file server 2016-07-13 15:40:48 -04:00
idmap.h nfsd: Remove duplicate define of IDMAP_NAMESZ/IDMAP_TYPE_xx 2015-07-20 14:58:46 -04:00
Kconfig xfs: abstract block export operations from nfsd layouts 2016-07-15 15:31:29 -04:00
lockd.c lockd: constify nlmsvc_binding structure 2016-01-07 10:10:50 -05:00
Makefile nfsd: Add a super simple flex file server 2016-07-13 15:40:48 -04:00
netns.h nfsd: move blocked lock handling under a dedicated spinlock 2016-10-24 16:51:21 -04:00
nfs2acl.c nfsd: check permissions when setting ACLs 2016-06-24 12:11:52 -04:00
nfs3acl.c nfsd: check permissions when setting ACLs 2016-06-24 12:11:52 -04:00
nfs3proc.c don't bother with ->d_inode->i_sb - it's always equal to ->d_sb 2016-04-10 17:11:51 -04:00
nfs3xdr.c A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever. 2016-05-24 14:39:20 -07:00
nfs4acl.c nfsd: check permissions when setting ACLs 2016-06-24 12:11:52 -04:00
nfs4callback.c nfsd: plumb in a CB_NOTIFY_LOCK operation 2016-09-26 15:20:35 -04:00
nfs4idmap.c nfsd: Remove duplicate define of IDMAP_NAMESZ/IDMAP_TYPE_xx 2015-07-20 14:58:46 -04:00
nfs4layouts.c nfsd: don't set a FL_LAYOUT lease for flexfiles layouts 2016-09-16 16:15:52 -04:00
nfs4proc.c NFSD: Implement the COPY call 2016-10-07 14:54:25 -04:00
nfs4recover.c Various bugfixes, a RDMA update from Chuck Lever, and support for a new 2016-03-24 10:41:00 -07:00
nfs4state.c nfsd: Fix general protection fault in release_lock_stateid() 2016-11-01 15:24:43 -04:00
nfs4xdr.c NFSD: Implement the COPY call 2016-10-07 14:54:25 -04:00
nfscache.c nfsd: remove recurring workqueue job to clean DRC 2015-11-10 09:25:51 -05:00
nfsctl.c nfsd: randomize SETCLIENTID reply to help distinguish servers 2016-09-26 15:20:38 -04:00
nfsd.h nfsd: implement machine credential support for some operations 2016-07-13 15:32:47 -04:00
nfsfh.c nfsd: check d_can_lookup in fh_verify of directories 2016-08-04 17:11:48 -04:00
nfsfh.h wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
nfsproc.c Some RDMA work and some good bugfixes, and two new features that could 2016-10-13 21:04:42 -07:00
nfssvc.c NFSD: fix corruption in notifier registration 2016-09-26 14:17:45 -04:00
nfsxdr.c nfsd: Fix some indent inconsistancy 2016-07-13 15:53:41 -04:00
pnfs.h nfsd: don't set a FL_LAYOUT lease for flexfiles layouts 2016-09-16 16:15:52 -04:00
state.h nfsd: add a LRU list for blocked locks 2016-09-26 15:20:36 -04:00
stats.c drop redundant ->owner initializations 2016-05-29 19:08:00 -04:00
stats.h
trace.c nfsd: move include of state.h from trace.c to trace.h 2015-10-23 15:57:29 -04:00
trace.h nfsd: add new io class tracepoint 2016-01-14 17:32:51 -05:00
vfs.c NFSD: Implement the COPY call 2016-10-07 14:54:25 -04:00
vfs.h NFSD: Implement the COPY call 2016-10-07 14:54:25 -04:00
xdr3.h
xdr4.h NFSD: Implement the COPY call 2016-10-07 14:54:25 -04:00
xdr4cb.h nfsd: plumb in a CB_NOTIFY_LOCK operation 2016-09-26 15:20:35 -04:00
xdr.h