linux_dsm_epyc7002/fs/ocfs2
David Howells cdfbabfb2f net: Work around lockdep limitation in sockets that use sockets
Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-09 18:23:27 -08:00
..
cluster net: Work around lockdep limitation in sockets that use sockets 2017-03-09 18:23:27 -08:00
dlm sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
dlmfs sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
acl.c ocfs2: fix deadlock issue when taking inode lock at vfs entry points 2017-02-22 16:41:27 -08:00
acl.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00
alloc.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
alloc.h ocfs2: retry on ENOSPC if sufficient space in truncate log 2016-08-02 17:31:41 -04:00
aops.c fs: add i_blocksize() 2017-02-27 18:43:46 -08:00
aops.h ocfs2: clean up unused 'page' parameter in ocfs2_write_end_nolock() 2016-12-12 18:55:06 -08:00
blockcheck.c ocfs2: kill endianness abuses in blockcheck.c 2012-05-29 23:28:35 -04:00
blockcheck.h
buffer_head_io.c block,fs: untangle fs.h and blk_types.h 2016-11-01 09:43:26 -06:00
buffer_head_io.h
dcache.c VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dcache.h ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock 2014-04-03 16:20:55 -07:00
dir.c ocfs2: fix not enough credit panic 2016-11-11 08:12:37 -08:00
dir.h VFS: normal filesystems (and lustre): d_inode() annotations 2015-04-15 15:06:57 -04:00
dlmglue.c sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> 2017-03-02 08:42:32 +01:00
dlmglue.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
export.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
export.h
extent_map.c ocfs2: neaten do_error, ocfs2_error and ocfs2_abort 2015-09-04 16:54:41 -07:00
extent_map.h
file.c statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
file.h statx: Add a system call to make enhanced file info available 2017-03-02 20:51:15 -05:00
filecheck.c ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
filecheck.h ocfs2: sysfile interfaces for online file check 2016-03-22 15:36:02 -07:00
heartbeat.c
heartbeat.h
inode.c ocfs2: replace CURRENT_TIME macro 2016-12-12 18:55:06 -08:00
inode.h ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
ioctl.c wrappers for ->i_mutex access 2016-01-22 18:04:28 -05:00
ioctl.h
journal.c ocfs2: use time64_t to represent orphan scan times 2016-12-12 18:55:06 -08:00
journal.h jbd2: add support for avoiding data writes during transaction commits 2016-04-24 00:56:07 -04:00
Kconfig
localalloc.c ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
localalloc.h ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
locks.c ocfs2: fix flock panic issue 2015-12-29 17:45:49 -08:00
locks.h
Makefile ocfs2: disable BUG assertions in reading blocks 2016-06-24 17:23:52 -07:00
mmap.c mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf 2017-02-24 17:46:54 -08:00
mmap.h
move_extents.c ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
move_extents.h
namei.c ocfs2: replace CURRENT_TIME macro 2016-12-12 18:55:06 -08:00
namei.h ocfs2: do not include dio entry in case of orphan scan 2015-11-05 19:34:48 -08:00
ocfs1_fs_compat.h
ocfs2_fs.h ocfs2: fix comment in struct ocfs2_extended_slot 2016-05-19 19:12:14 -07:00
ocfs2_ioctl.h
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h switch generic_file_splice_read() to use of ->read_iter() 2016-10-05 18:23:56 -04:00
ocfs2.h ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock 2017-02-22 16:41:27 -08:00
quota_global.c ocfs2: Protect periodic quota syncing with s_umount semaphore 2016-11-30 08:36:54 +01:00
quota_local.c ocfs2: Use s_umount for quota recovery protection 2016-11-30 08:37:21 +01:00
quota.h quota: constify qtree_fmt_operations structures 2016-01-04 10:58:35 +01:00
refcounttree.c vfs: fix isize/pos/len checks for reflink & dedupe 2016-12-22 23:00:23 -05:00
refcounttree.h ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features 2016-12-10 12:39:45 -08:00
reservations.c ocfs2: make resv_lock spinlock static 2015-02-10 14:30:29 -08:00
reservations.h
resize.c ocfs2: solve a problem of crossing the boundary in updating backups 2016-03-25 16:37:42 -07:00
resize.h
slot_map.c ocfs2: clean up an unneeded goto in ocfs2_put_slot() 2016-05-19 19:12:14 -07:00
slot_map.h
stack_o2cb.c ocfs2: avoid a pointless delay in o2cb_cluster_check() 2015-04-14 16:48:57 -07:00
stack_user.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
stackglue.c ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-10 18:31:54 -08:00
stackglue.h ocfs2: fix crash caused by stale lvb with fsdlm plugin 2017-01-10 18:31:54 -08:00
suballoc.c ocfs2: fix double unlock in case retry after free truncate log 2016-09-19 15:36:17 -07:00
suballoc.h ocfs2: rollback alloc_dinode counts when ocfs2_block_group_set_bits() failed 2014-04-03 16:20:56 -07:00
super.c sched/headers: Prepare for the reduction of <linux/sched.h>'s signal API dependency 2017-03-02 08:42:37 +01:00
super.h ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local 2016-03-25 16:37:42 -07:00
symlink.c vfs: remove ".readlink = generic_readlink" assignments 2016-12-09 16:45:04 +01:00
symlink.h ocfs: simplify symlink handling 2012-05-29 23:28:40 -04:00
sysfile.c ocfs2: avoid system inode ref confusion by adding mutex lock 2014-04-03 16:20:57 -07:00
sysfile.h
uptodate.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
uptodate.h
xattr.c ocfs2: convert inode refcount test to a helper 2016-12-10 12:39:45 -08:00
xattr.h ocfs2: fix posix_acl_create deadlock 2016-05-12 15:52:50 -07:00