linux_dsm_epyc7002/fs/fscache
Kiran Kumar Modukuri f29507ce66 fscache: Fix reference overput in fscache_attach_object() error handling
When a cookie is allocated that causes fscache_object structs to be
allocated, those objects are initialised with the cookie pointer, but
aren't blessed with a ref on that cookie unless the attachment is
successfully completed in fscache_attach_object().

If attachment fails because the parent object was dying or there was a
collision, fscache_attach_object() returns without incrementing the cookie
counter - but upon failure of this function, the object is released which
then puts the cookie, whether or not a ref was taken on the cookie.

Fix this by taking a ref on the cookie when it is assigned in
fscache_object_init(), even when we're creating a root object.


Analysis from Kiran Kumar:

This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277

fscache cookie ref count updated incorrectly during fscache object
allocation resulting in following Oops.

kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

[Cause]
Two threads are trying to do operate on a cookie and two objects.

(1) One thread tries to unmount the filesystem and in process goes over a
    huge list of objects marking them dead and deleting the objects.
    cookie->usage is also decremented in following path:

      nfs_fscache_release_super_cookie
       -> __fscache_relinquish_cookie
        ->__fscache_cookie_put
        ->BUG_ON(atomic_read(&cookie->usage) <= 0);

(2) A second thread tries to lookup an object for reading data in following
    path:

    fscache_alloc_object
    1) cachefiles_alloc_object
        -> fscache_object_init
           -> assign cookie, but usage not bumped.
    2) fscache_attach_object -> fails in cant_attach_object because the
         cookie's backing object or cookie's->parent object are going away
    3) fscache_put_object
        -> cachefiles_put_object
          ->fscache_object_destroy
            ->fscache_cookie_put
               ->BUG_ON(atomic_read(&cookie->usage) <= 0);

[NOTE from dhowells] It's unclear as to the circumstances in which (2) can
take place, given that thread (1) is in nfs_kill_super(), however a
conflicting NFS mount with slightly different parameters that creates a
different superblock would do it.  A backtrace from Kiran seems to show
that this is a possibility:

    kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
    ...
    RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
    Call Trace:
     __fscache_relinquish_cookie+0x87/0x120 [fscache]
     nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
     nfs_kill_super+0x29/0x40 [nfs]
     deactivate_locked_super+0x48/0x80
     deactivate_super+0x5c/0x60
     cleanup_mnt+0x3f/0x90
     __cleanup_mnt+0x12/0x20
     task_work_run+0x86/0xb0
     exit_to_usermode_loop+0xc2/0xd0
     syscall_return_slowpath+0x4e/0x60
     int_ret_from_sys_call+0x25/0x9f

[Fix] Bump up the cookie usage in fscache_object_init, when it is first
being assigned a cookie atomically such that the cookie is added and bumped
up if its refcount is not zero.  Remove the assignment in
fscache_attach_object().

[Testcase]
I have run ~100 hours of NFS stress tests and not seen this bug recur.

[Regression Potential]
 - Limited to fscache/cachefiles.

Fixes: ccc4fc3d11 ("FS-Cache: Implement the cookie management part of the netfs API")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2018-07-25 14:49:00 +01:00
..
cache.c fscache: Fix reference overput in fscache_attach_object() error handling 2018-07-25 14:49:00 +01:00
cookie.c fscache: Fix reference overput in fscache_attach_object() error handling 2018-07-25 14:49:00 +01:00
fsdef.c fscache: Pass object size in rather than calling back for it 2018-04-06 14:05:14 +01:00
histogram.c proc: introduce proc_create_seq{,_data} 2018-05-16 07:23:35 +02:00
internal.h proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
Kconfig fscache: drop references to slow-work 2010-07-22 22:58:58 +02:00
main.c fscache: Add tracepoints 2018-04-04 13:41:27 +01:00
Makefile License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
netfs.c fscache: Maintain a catalogue of allocated cookies 2018-04-06 14:05:14 +01:00
object-list.c fscache: Attach the index key and aux data to the cookie 2018-04-04 13:41:28 +01:00
object.c fscache: Fix reference overput in fscache_attach_object() error handling 2018-07-25 14:49:00 +01:00
operation.c fscache: Allow cancelled operations to be enqueued 2018-07-25 14:31:20 +01:00
page.c fscache: Pass object size in rather than calling back for it 2018-04-06 14:05:14 +01:00
proc.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
stats.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00