Commit Graph

61792 Commits

Author SHA1 Message Date
Linus Torvalds
043cf46825 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "The main changes in the timer code in this cycle were:

   - Clockevent updates:

      - timer-of framework cleanups. (Geert Uytterhoeven)

      - Use timer-of for the renesas-ostm and the device name to prevent
        name collision in case of multiple timers. (Geert Uytterhoeven)

      - Check if there is an error after calling of_clk_get in asm9260
        (Chuhong Yuan)

   - ABI fix: Zero out high order bits of nanoseconds on compat
     syscalls. This got broken a year ago, with apparently no side
     effects so far.

     Since the kernel would use random data otherwise I don't think we'd
     have other options but to fix the bug, even if there was a side
     effect to applications (Dmitry Safonov)

   - Optimize ns_to_timespec64() on 32-bit systems: move away from
     div_s64_rem() which can be slow, to div_u64_rem() which is faster
     (Arnd Bergmann)

   - Annotate KCSAN-reported false positive data races in
     hrtimer_is_queued() users by moving timer->state handling over to
     the READ_ONCE()/WRITE_ONCE() APIs. This documents these accesses
     (Eric Dumazet)

   - Misc cleanups and small fixes"

[ I undid the "ABI fix" and updated the comments instead. The reason
  there were apparently no side effects is that the fix was a no-op.

  The updated comment is to say _why_ it was a no-op.    - Linus ]

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Zero the upper 32-bits in __kernel_timespec on 32-bit
  time: Rename tsk->real_start_time to ->start_boottime
  hrtimer: Remove the comment about not used HRTIMER_SOFTIRQ
  time: Fix spelling mistake in comment
  time: Optimize ns_to_timespec64()
  hrtimer: Annotate lockless access to timer->state
  clocksource/drivers/asm9260: Add a check for of_clk_get
  clocksource/drivers/renesas-ostm: Use unique device name instead of ostm
  clocksource/drivers/renesas-ostm: Convert to timer_of
  clocksource/drivers/timer-of: Use unique device name instead of timer
  clocksource/drivers/timer-of: Convert last full_name to %pOF
2019-12-03 12:20:25 -08:00
Jens Axboe
87f80d623c io_uring: handle connect -EINPROGRESS like -EAGAIN
Right now we return it to userspace, which means the application has
to poll for the socket to be writeable. Let's just treat it like
-EAGAIN and have io_uring handle it internally, this makes it much
easier to use.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 11:23:54 -07:00
Jackie Liu
8cdda87a44 io_uring: remove io_wq_current_is_worker
Since commit b18fdf71e0 ("io_uring: simplify io_req_link_next()"),
the io_wq_current_is_worker function is no longer needed, clean it
up.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 07:04:32 -07:00
Jackie Liu
22efde5998 io_uring: remove parameter ctx of io_submit_state_start
Parameter ctx we have never used, clean it up.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 07:04:32 -07:00
Jens Axboe
da8c969069 io_uring: mark us with IORING_FEAT_SUBMIT_STABLE
If this flag is set, applications can be certain that any data for
async offload has been consumed when the kernel has consumed the
SQE.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 07:04:32 -07:00
Jens Axboe
f499a021ea io_uring: ensure async punted connect requests copy data
Just like commit f67676d160 for read/write requests, this one ensures
that the sockaddr data has been copied for IORING_OP_CONNECT if we need
to punt the request to async context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 07:04:30 -07:00
Jens Axboe
03b1230ca1 io_uring: ensure async punted sendmsg/recvmsg requests copy data
Just like commit f67676d160 for read/write requests, this one ensures
that the msghdr data is fully copied if we need to punt a recvmsg or
sendmsg system call to async context.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-03 07:03:35 -07:00
Jens Axboe
f67676d160 io_uring: ensure async punted read/write requests copy iovec
Currently we don't copy the iovecs when we punt to async context. This
can be problematic for applications that store the iovec on the stack,
as they often assume that it's safe to let the iovec go out of scope
as soon as IO submission has been called. This isn't always safe, as we
will re-copy the iovec once we're in async context.

Make this 100% safe by copying the iovec just once. With this change,
applications may safely store the iovec on the stack for all cases.

Reported-by: 李通洲 <carter.li@eoitek.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02 21:33:25 -07:00
Omar Sandoval
69ffe5960d xfs: don't check for AG deadlock for realtime files in bunmapi
Commit 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
in the wrong order. However, this check isn't applicable for realtime
files. In most cases, it just makes us do unnecessary commits. However,
without the fix from the previous commit ("xfs: fix realtime file data
space leak"), if the last and second-to-last extents also happen to have
different "AG numbers", then the break actually causes __xfs_bunmapi()
to return without making any progress, which sends
xfs_itruncate_extents_flags() into an infinite loop.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:51 -08:00
Omar Sandoval
0c4da70c83 xfs: fix realtime file data space leak
Realtime files in XFS allocate extents in rextsize units. However, the
written/unwritten state of those extents is still tracked in blocksize
units. Therefore, a realtime file can be split up into written and
unwritten extents that are not necessarily aligned to the realtime
extent size. __xfs_bunmapi() has some logic to handle these various
corner cases. Consider how it handles the following case:

1. The last extent is unwritten.
2. The last extent is smaller than the realtime extent size.
3. startblock of the last extent is not aligned to the realtime extent
   size, but startblock + blockcount is.

In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
to set the second-to-last extent to unwritten. This should merge the
last and second-to-last extents, so __xfs_bunmapi() moves on to the
second-to-last extent.

However, if the size of the last and second-to-last extents combined is
greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
merge the two extents. When that happens, __xfs_bunmapi() skips past the
last extent without unmapping it, thus leaking the space.

Fix it by only unwriting the minimum amount needed to align the last
extent to the realtime extent size, which is guaranteed to merge with
the last extent.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:50 -08:00
Jens Axboe
1a6b74fc87 io_uring: add general async offload context
Right now we just copy the sqe for async offload, but we want to store
more context across an async punt. In preparation for doing so, put the
sqe copy inside a structure that we can expand. With this pointer added,
we can get rid of REQ_F_FREE_SQE, as that is now indicated by whether
req->io is NULL or not.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02 18:49:33 -07:00
Eric Biggers
490547ca2d block: don't send uevent for empty disk when not invalidating
Commit 6917d06899 ("block: merge invalidate_partitions into
rescan_partitions") caused a regression where systemd-udevd spins
forever using max CPU starting at boot time.

It's caused by a behavior change where a KOBJ_CHANGE uevent is now sent
in a case where previously it wasn't.

Restore the old behavior.

Fixes: 6917d06899 ("block: merge invalidate_partitions into rescan_partitions")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02 18:49:30 -07:00
Jens Axboe
441cdbd544 io_uring: transform send/recvmsg() -ERESTARTSYS to -EINTR
We should never return -ERESTARTSYS to userspace, transform it into
-EINTR.

Cc: stable@vger.kernel.org # v5.3+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02 18:49:10 -07:00
Linus Torvalds
e3a251e366 This pull request contains mostly fixes for UBI, UBIFS and JFFS2:
UBI:
 - Fix a regression around producing a anchor PEB for fastmap.
   Due to a change in our locking fastmap was unable to produce
   fresh anchors an re-used the existing one a way to often.
 
 UBIFS:
 - Fixes for endianness. A few places blindly assumed little endian.
 - Fix for a memory leak in the orphan code.
 - Fix for a possible crash during a commit.
 - Revert a wrong bugfix.
 
 JFFS2:
 - Revert a bad bugfix in (false positive from a code checking
   tool).
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl3kG34WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wU70D/9K3QdG/fciafNRkVFFKnXVNXLv
 rxbfrRVCZeEo2Phm8daVQAvERWKi2Mw5jW1fNbOtDKvnPkC9+3vYmmHCienJ6VEt
 LhQ4Ey0MX8dQID1Caji49i5pPD3PBMX0m+YrxGt0uuzbr/nNWeiIUf/nyIvU95o7
 ZmmX7wWmKHUeLKC8TGpeolBVAf4xKqzp+U7NJxKpF5j6dn6hTi1J51GwsYw6aSXU
 lo5KvPWHq11ENWxJhtyWfO4MMyzTAtvbL1xEf0NRXX+aK4dUUyu0dcxzAqNzR6HU
 ER7gsIyEBKLZpgTqLpy6B9tWcnEjHHHN4tjxhD8gRGlOVBdXSlN+X45I8gjYOact
 Jkx1eF+T7E865YP+8vwoHo8eMoThxz54SnON8Dk6WaioAOGvf48w4gNnAilOb8Fa
 FZNk20pQdR2KyGc6PQ2I82vDYX1c2GmlGXOFLsEM8KxGWHjSCpQC1P946h17QSPe
 OxCptqu36czclm6zk76Ycw5/BWkbzViOcToVuQS61HLH7P/BzF1BnEM9rgwf+tD0
 4midOomxrZXmUrbC/HO4mARRdGtgonBHl1yJ8JjWhWEL9KPVLAhWJ4zHR8Zo84LJ
 BodZTzJPJ96igf+hLMAG/xFCnpGylXaz/G+W+5wUHnNJyKVsbhYIEmBscJYE60iM
 9/00Ce8r9lEpetUnKA==
 =LbMI
 -----END PGP SIGNATURE-----

Merge tag 'upstream-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull UBI/UBIFS/JFFS2 updates from Richard Weinberger:
 "This pull request contains mostly fixes for UBI, UBIFS and JFFS2:

  UBI:

   - Fix a regression around producing a anchor PEB for fastmap.

     Due to a change in our locking fastmap was unable to produce fresh
     anchors an re-used the existing one a way to often.

  UBIFS:

   - Fixes for endianness. A few places blindly assumed little endian.

   - Fix for a memory leak in the orphan code.

   - Fix for a possible crash during a commit.

   - Revert a wrong bugfix.

  JFFS2:

   - Revert a bad bugfix (false positive from a code checking tool)"

* tag 'upstream-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()"
  ubi: Fix producing anchor PEBs
  ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps
  ubifs: do_kill_orphans: Fix a memory leak bug
  Revert "ubifs: Fix memory leak bug in alloc_ubifs_info() error path"
  ubifs: Fix type of sup->hash_algo
  ubifs: Fixed missed le64_to_cpu() in journal
  ubifs: Force prandom result to __le32
  ubifs: Remove obsolete TODO from dfs_file_write()
  ubi: Fix warning static is not at beginning of declaration
  ubi: Print skip_check in ubi_dump_vol_info()
2019-12-02 17:06:34 -08:00
Steve French
9e8fae2597 smb3: remove unused flag passed into close functions
close was relayered to allow passing in an async flag which
is no longer needed in this path.  Remove the unneeded parameter
"flags" passed in on close.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-12-02 18:07:17 -06:00
Colin Ian King
a9f76cf827 cifs: remove redundant assignment to pointer pneg_ctxt
The pointer pneg_ctxt is being initialized with a value that is never
read and it is being updated later with a new value.  The assignment
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-02 16:55:08 -06:00
Linus Torvalds
97eeb4d9d7 New code for 5.5:
- Fill out the build string
 - Prevent inode fork extent count overflows
 - Refactor the allocator to reduce long tail latency
 - Rework incore log locking a little to reduce spinning
 - Break up the xfs_iomap_begin functions into smaller more cohesive
 parts
 - Fix allocation alignment being dropped too early when the allocation
 request is for more blocks than an AG is large
 - Other small cleanups
 - Clean up file buftarg retrieval helpers
 - Hoist the resvsp and unresvsp ioctls to the vfs
 - Remove the undocumented biosize mount option, since it has never been
   mentioned as existing or supported on linux
 - Clean up some of the mount option printing and parsing
 - Enhance attr leaf verifier to check block structure
 - Check dirent and attr names for invalid characters before passing them
 to the vfs
 - Refactor open-coded bmbt walking
 - Fix a few places where we return EIO instead of EFSCORRUPTED after
 failing metadata sanity checks
 - Fix a synchronization problem between fallocate and aio dio corrupting
 the file length
 - Clean up various loose ends in the iomap and bmap code
 - Convert to the new mount api
 - Make sure we always log something when returning EFSCORRUPTED
 - Fix some problems where long running scrub loops could trigger soft
 lockup warnings and/or fail to exit due to fatal signals pending
 - Fix various Coverity complaints
 - Remove most of the function pointers from the directory code to reduce
 indirection penalties
 - Ensure that dquots are attached to the inode when performing unwritten
 extent conversion after io
 - Deuglify incore projid and crtime types
 - Fix another AGI/AGF locking order deadlock when renaming
 - Clean up some quota typedefs
 - Remove the FSSETDM ioctls which haven't done anything in 20 years
 - Fix some memory leaks when mounting the log fails
 - Fix an underflow when updating an xattr leaf freemap
 - Remove some trivial wrappers
 - Report metadata corruption as an error, not a (potentially) fatal
 assertion
 - Clean up the dir/attr buffer mapping code
 - Allow fatal signals to kill scrub during parent pointer checks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3fNjcACgkQ+H93GTRK
 tOv/8w//Y0Oa9Paiy8+iPTChs3/PqeKp307Fj5KONG52haMCakEJFT5+/wpkIAJw
 uUmKiPolwN1ivviIUmIS14ThTJ7NV1jq0G0h/0tC25i/3hoJrGWdzqYJMlvhlqgE
 taHrjCwPTDkhRJ0D5QCrkkHPU7lSdquO5TWxltaqYLhyLIt8SkklD6dN1dHWEPnk
 k0j3TL+VqVJDYyEj1bLwJ0QUb2C3J8ygWnlviF/WxsSeJtJpGoeLEaYXhhsUK0Dt
 aHg70OM6zzFzrJJAtJeBXpgaFsG/Pqbcw4wUWSxEMWjVSJwCSKLuZ5F+p6NcqoEj
 HeLQkaGePoO61YCInk2JKLHIyx7ohqMOt7+Dm0mdbe1pvcKwV9ZcdkqKa8L/Fm6v
 bUP6a2hEpsGy7vLnkYxwYACTLPbGX3uLw8MUr6ZpJ+SpfVLktU4ycpr8dCkJkp6a
 0qOpEeHsBDy74NkMOUa7Qrju7lJ2GiL70qqBwaPe+ubcUa3U/3WAsSekSzXgUwn8
 Fap4r8wn7cUbxymAvO06RlU8YymuulAlyjwdo9gOL/Su/5POldss6dy1YuUtyq19
 CD6NtkHqEUMsTc2cI+H65H44aEeckB1j0D2Grm2uMchAh0GcTSFVNF6jony++B8k
 s2sL2dEw9/9vr0uc1TSVF5ezxaONuyaCXdYXUkkdyq3iNvfpRCg=
 =aACq
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS updates from Darrick Wong:
 "For this release, we changed quite a few things.

  Highlights:

   - Fixed some long tail latency problems in the block allocator

   - Removed some long deprecated (and for the past several years no-op)
     mount options and ioctls

   - Strengthened the extended attribute and directory verifiers

   - Audited and fixed all the places where we could return EFSCORRUPTED
     without logging anything

   - Refactored the old SGI space allocation ioctls to make the
     equivalent fallocate calls

   - Fixed a race between fallocate and directio

   - Fixed an integer overflow when files have more than a few
     billion(!) extents

   - Fixed a longstanding bug where quota accounting could be incorrect
     when performing unwritten extent conversion on a freshly mounted fs

   - Fixed various complaints in scrub about soft lockups and
     unresponsiveness to signals

   - De-vtable'd the directory handling code, which should make it
     faster

   - Converted to the new mount api, for better or for worse

   - Cleaned up some memory leaks

  and quite a lot of other smaller fixes and cleanups.

  A more detailed summary:

   - Fill out the build string

   - Prevent inode fork extent count overflows

   - Refactor the allocator to reduce long tail latency

   - Rework incore log locking a little to reduce spinning

   - Break up the xfs_iomap_begin functions into smaller more cohesive
     parts

   - Fix allocation alignment being dropped too early when the
     allocation request is for more blocks than an AG is large

   - Other small cleanups

   - Clean up file buftarg retrieval helpers

   - Hoist the resvsp and unresvsp ioctls to the vfs

   - Remove the undocumented biosize mount option, since it has never
     been mentioned as existing or supported on linux

   - Clean up some of the mount option printing and parsing

   - Enhance attr leaf verifier to check block structure

   - Check dirent and attr names for invalid characters before passing
     them to the vfs

   - Refactor open-coded bmbt walking

   - Fix a few places where we return EIO instead of EFSCORRUPTED after
     failing metadata sanity checks

   - Fix a synchronization problem between fallocate and aio dio
     corrupting the file length

   - Clean up various loose ends in the iomap and bmap code

   - Convert to the new mount api

   - Make sure we always log something when returning EFSCORRUPTED

   - Fix some problems where long running scrub loops could trigger soft
     lockup warnings and/or fail to exit due to fatal signals pending

   - Fix various Coverity complaints

   - Remove most of the function pointers from the directory code to
     reduce indirection penalties

   - Ensure that dquots are attached to the inode when performing
     unwritten extent conversion after io

   - Deuglify incore projid and crtime types

   - Fix another AGI/AGF locking order deadlock when renaming

   - Clean up some quota typedefs

   - Remove the FSSETDM ioctls which haven't done anything in 20 years

   - Fix some memory leaks when mounting the log fails

   - Fix an underflow when updating an xattr leaf freemap

   - Remove some trivial wrappers

   - Report metadata corruption as an error, not a (potentially) fatal
     assertion

   - Clean up the dir/attr buffer mapping code

   - Allow fatal signals to kill scrub during parent pointer checks"

* tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (198 commits)
  xfs: allow parent directory scans to be interrupted with fatal signals
  xfs: remove the mappedbno argument to xfs_da_get_buf
  xfs: remove the mappedbno argument to xfs_da_read_buf
  xfs: split xfs_da3_node_read
  xfs: remove the mappedbno argument to xfs_dir3_leafn_read
  xfs: remove the mappedbno argument to xfs_dir3_leaf_read
  xfs: remove the mappedbno argument to xfs_attr3_leaf_read
  xfs: remove the mappedbno argument to xfs_da_reada_buf
  xfs: improve the xfs_dabuf_map calling conventions
  xfs: refactor xfs_dabuf_map
  xfs: simplify mappedbno handling in xfs_da_{get,read}_buf
  xfs: report corruption only as a regular error
  xfs: Remove kmem_zone_free() wrapper
  xfs: Remove kmem_zone_destroy() wrapper
  xfs: Remove slab init wrappers
  xfs: fix attr leaf header freemap.size underflow
  xfs: fix some memory leaks in log recovery
  xfs: fix another missing include
  xfs: remove XFS_IOC_FSSETDM and XFS_IOC_FSSETDM_BY_HANDLE
  xfs: remove duplicated include from xfs_dir2_data.c
  ...
2019-12-02 14:46:22 -08:00
Deepa Dinamani
69738cfdfa fs: cifs: Fix atime update check vs mtime
According to the comment in the code and commit log, some apps
expect atime >= mtime; but the introduced code results in
atime==mtime.  Fix the comparison to guard against atime<mtime.

Fixes: 9b9c5bea0b ("cifs: do not return atime less than mtime")
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: stfrench@microsoft.com
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-02 15:15:35 -06:00
Pavel Shilovsky
6f582b273e CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks
Currently when the client creates a cifsFileInfo structure for
a newly opened file, it allocates a list of byte-range locks
with a pointer to the new cfile and attaches this list to the
inode's lock list. The latter happens before initializing all
other fields, e.g. cfile->tlink. Thus a partially initialized
cifsFileInfo structure becomes available to other threads that
walk through the inode's lock list. One example of such a thread
may be an oplock break worker thread that tries to push all
cached byte-range locks. This causes NULL-pointer dereference
in smb2_push_mandatory_locks() when accessing cfile->tlink:

[598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
[598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
...
[598428.945834] Call Trace:
[598428.945870]  ? cifs_revalidate_mapping+0x45/0x90 [cifs]
[598428.945901]  cifs_oplock_break+0x13d/0x450 [cifs]
[598428.945909]  process_one_work+0x1db/0x380
[598428.945914]  worker_thread+0x4d/0x400
[598428.945921]  kthread+0x104/0x140
[598428.945925]  ? process_one_work+0x380/0x380
[598428.945931]  ? kthread_park+0x80/0x80
[598428.945937]  ret_from_fork+0x35/0x40

Fix this by reordering initialization steps of the cifsFileInfo
structure: initialize all the fields first and then add the new
byte-range lock list to the inode's lock list.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-12-02 15:15:00 -06:00
Linus Torvalds
937d6eefc7 Here's the main documentation changes for 5.5:
- Various kerneldoc script enhancements.
 
  - More RST conversions; those are slowing down as we run out of things to
    convert, but we're a ways from done still.
 
  - Dan's "maintainer profile entry" work landed at last.  Now we just need
    to get maintainers to fill in the profiles...
 
  - A reworking of the parallel build setup to work better with a variety of
    systems (and to not take over huge systems entirely in particular).
 
  - The MAINTAINERS file is now converted to RST during the build.
    Hopefully nobody ever tries to print this thing, or they will need to
    load a lot of paper.
 
  - A script and documentation making it easy for maintainers to add Link:
    tags at commit time.
 
 Also included is the removal of a bunch of spurious CR characters.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl3j5B0PHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YtBcH/jIN2cO8/0YW2rjVT+1G6ytSdFUKx5WJ/lpf
 5uBeCvuCeYhtCB6+BgnXvjykJ7jDW11/NJNjWqz/gsvD5l5FJK1rXarI/oz2Klyi
 kcPtDmBF/ki4wz9qXzEpa0vg8LXdjeys50S1vE75qCzxZoPP7YjuRbPnLrlIJukv
 JbDVi4p9kxgeHfRB4+BHOe5rFwA3mMmaxKNIX34Y+UUO2KZ0g/yUi1bAaQwQAdt+
 PsORmkVQ8Puh3K9xRIr7dYlcWBlBiPqzYdvDgTVxSjrxdK6wjYjSgVk2VjC5MBUN
 mTSTWgyfsIcD/76/s8tq7ZRl2fw+SkCSkFo79Rb/hJwDTb7Vnng=
 =LPBr
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.5a' of git://git.lwn.net/linux

Pull Documentation updates from Jonathan Corbet:
 "Here are the main documentation changes for 5.5:

   - Various kerneldoc script enhancements.

   - More RST conversions; those are slowing down as we run out of
     things to convert, but we're a ways from done still.

   - Dan's "maintainer profile entry" work landed at last. Now we just
     need to get maintainers to fill in the profiles...

   - A reworking of the parallel build setup to work better with a
     variety of systems (and to not take over huge systems entirely in
     particular).

   - The MAINTAINERS file is now converted to RST during the build.
     Hopefully nobody ever tries to print this thing, or they will need
     to load a lot of paper.

   - A script and documentation making it easy for maintainers to add
     Link: tags at commit time.

  Also included is the removal of a bunch of spurious CR characters"

* tag 'docs-5.5a' of git://git.lwn.net/linux: (91 commits)
  docs: remove a bunch of stray CRs
  docs: fix up the maintainer profile document
  libnvdimm, MAINTAINERS: Maintainer Entry Profile
  Maintainer Handbook: Maintainer Entry Profile
  MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile
  docs, parallelism: Rearrange how jobserver reservations are made
  docs, parallelism: Do not leak blocking mode to other readers
  docs, parallelism: Fix failure path and add comment
  Documentation: Remove bootmem_debug from kernel-parameters.txt
  Documentation: security: core.rst: fix warnings
  Documentation/process/howto/kokr: Update for 4.x -> 5.x versioning
  Documentation/translation: Use Korean for Korean translation title
  docs/memory-barriers.txt: Remove remaining references to mmiowb()
  docs/memory-barriers.txt/kokr: Update I/O section to be clearer about CPU vs thread
  docs/memory-barriers.txt/kokr: Fix style, spacing and grammar in I/O section
  Documentation/kokr: Kill all references to mmiowb()
  docs/memory-barriers.txt/kokr: Rewrite "KERNEL I/O BARRIER EFFECTS" section
  docs: Add initial documentation for devfreq
  Documentation: Document how to get links with git am
  docs: Add request_irq() documentation
  ...
2019-12-02 11:51:02 -08:00
Linus Torvalds
8328dd2f39 pstore bug fix
- add missing "static" (Ben Dooks)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl3dUowWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJpZXD/9OHY0SaBLvZYmApI7l6dV4eN1I
 OcyrC9jtjQDuSaTQcwZPIDFvJAtPM32wbhI6SJztaF+tuWhQzeOnRBaupfYoWS0e
 cbaUCG/mqrCbmGOjMxcqQbkAY0uyOuRUJItrX5WZ4YspcrMjdHi3sTj4ItHaaXvK
 SjSxhITM/SQlgnK/4oysC3sdMdDac9OEdtDomltYc2qxou4b4nPWuc4epTTHLXBQ
 b+LjZXQBbdAQN7dcgWB4lgzg1eUPqkkFNUDw2hgcqZN1RHKzXGwruE+E4PsgY2Fw
 F/j+cSLu/dExVPzEGYcMV3ZS09ix9LHmO95LOy2N2TPD9Ax1KJbRb58P+1a1SDwb
 kwzMxyJlcUmqKRBfTTnVW4+g3v5qvrmducGqdmJrhOoNxCtwvRID1jRsm/DGIhdk
 /ZJDpHl1lg7VECcwXjTP+0+FYNGJPrb4MePXAMXbi/2/Zt1FDtf2z+Ubo220YMJM
 s82YMn8yVuBrXGXxjL8KmGvvRVFKCtGabvvdMTqhmIcp938CxI6hVfSU/B4SKciO
 N8R6Wl78pDDibXipTRCC3MXUNBIZi9qWfXLLmcrx0JRCjgfwSL7UWE/xrPDSr+6u
 tIDvOD08Ym1BXTtvEZoqZKuEViW04nVVukXi/rQXeyipk6tg+HfONfi2RRTCDzV5
 DbMj9pJB9373N0+h5w==
 =JD6z
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore bug fix from Kees Cook:

 - add missing "static" (Ben Dooks)

* tag 'pstore-v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore: Make pstore_choose_compression() static
2019-12-02 11:26:40 -08:00
Jens Axboe
0b8c0ec7ee io_uring: use current task creds instead of allocating a new one
syzbot reports:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace f2e1a4307fbe2245 ]---
RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

which is caused by slab fault injection triggering a failure in
prepare_creds(). We don't actually need to create a copy of the creds
as we're not modifying it, we just need a reference on the current task
creds. This avoids the failure case as well, and propagates the const
throughout the stack.

Fixes: 181e448d87 ("io_uring: async workers should inherit the user creds")
Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-02 08:50:00 -07:00
Linus Torvalds
596cf45cbf Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "Incoming:

   - a small number of updates to scripts/, ocfs2 and fs/buffer.c

   - most of MM

  I still have quite a lot of material (mostly not MM) staged after
  linux-next due to -next dependencies. I'll send those across next week
  as the preprequisites get merged up"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits)
  mm/page_io.c: annotate refault stalls from swap_readpage
  mm/Kconfig: fix trivial help text punctuation
  mm/Kconfig: fix indentation
  mm/memory_hotplug.c: remove __online_page_set_limits()
  mm: fix typos in comments when calling __SetPageUptodate()
  mm: fix struct member name in function comments
  mm/shmem.c: cast the type of unmap_start to u64
  mm: shmem: use proper gfp flags for shmem_writepage()
  mm/shmem.c: make array 'values' static const, makes object smaller
  userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
  fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
  userfaultfd: wrap the common dst_vma check into an inlined function
  userfaultfd: remove unnecessary WARN_ON() in __mcopy_atomic_hugetlb()
  userfaultfd: use vma_pagesize for all huge page size calculation
  mm/madvise.c: use PAGE_ALIGN[ED] for range checking
  mm/madvise.c: replace with page_size() in madvise_inject_error()
  mm/mmap.c: make vma_merge() comment more easy to understand
  mm/hwpoison-inject: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops
  autonuma: reduce cache footprint when scanning page tables
  autonuma: fix watermark checking in migrate_balanced_pgdat()
  ...
2019-12-01 20:36:41 -08:00
Linus Torvalds
31764f1b6d for-linus-20191129
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3h2LsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvrOD/43g08VvkLexb6DvO8kTkyxPSUsPZml292H
 yowv1Rij9x/7Y9MTOmcnEAV9QofRPY9nkmEq1KR+iqGQnLZsKFLrDVxQSgkhU7HJ
 2wRE2DCPYTfIyIfS0TF2DdVjh3Q90tPKZbVkiOxvy9R+yS6hmTur0t731OERsMAh
 QgkY8zJP04XonXNwZSch9QYdpf9XGu+W83Pc0ecmootimJHVhFV8J9111dessod0
 h82Epbbbc8bg/CBWCA4fDm4EZ8doMHdAHYpw60WDXPoLF9zISvg5QG+pVjKrLkVx
 pqgTt5Kwd/EkqurRw3sH+8A6p0ACLrUiKffX2cXk8ScdTXMGD5stmdF5cMLSikFY
 yJgGHOTBHdjV4T2gB1TQ0rlnnb1VtHoJm5XvPviRaSQt7irzh4EWwhYiL7JlTAC3
 etCbzMPn8Sm+Ns+/zObuOmVTQvyE/+PngToxDRpgzDd4pSbMPR502iL3gvSbMDjw
 BCfGKFRkBYAB1SSjMwi+l9hiX2F+jTnHSvrm69HUz5K2zvWl4hsd2wClExuwoCQ+
 UFXULCJZFCKbAcgEVh2OQX9JVg7fUv5GhIymEPBWFDAODwtX1XJK6IxmQhRh8owV
 AxBFnNdpgRcBzyy+c+2cM4JOVcm8bV1s6eYP0UyV+EieD5OcBdq7GH5YBFzOztJM
 SLMbjQQ/7w==
 =hnf4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20191129' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "I wasn't going to send this one off so soon, but unfortunately one of
  the fixes from the previous pull broke the build on some archs. So I'm
  sending this sooner rather than later. This contains:

   - Add highmem.h include for io_uring, because of the kmap() additions
     from last round. For some reason the build bot didn't spot this
     even though it sat for days.

   - Three minor ';' removals

   - Add support for the Beurer CD-on-a-chip device

   - Make io_uring work on MMU-less archs"

* tag 'for-linus-20191129' of git://git.kernel.dk/linux-block:
  io_uring: fix missing kmap() declaration on powerpc
  ataflop: Remove unneeded semicolon
  block: sunvdc: Remove unneeded semicolon
  drbd: Remove unneeded semicolon
  io_uring: add mapping support for NOMMU archs
  sr_vendor: support Beurer GL50 evo CD-on-a-chip devices.
  cdrom: respect device capabilities during opening action
2019-12-01 18:26:56 -08:00
Linus Torvalds
ceb3074745 y2038: syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended
 for namespace cleaning: the kernel defines the traditional
 time_t, timeval and timespec types that often lead to y2038-unsafe
 code. Even though the unsafe usage is mostly gone from the kernel,
 having the types and associated functions around means that we
 can still grow new users, and that we may be missing conversions
 to safe types that actually matter.
 
 There are still a number of driver specific patches needed to
 get the last users of these types removed, those have been
 submitted to the respective maintainers.
 
 Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJd3D+wAAoJEJpsee/mABjZfdcQAJvl6e+4ddKoDMIVJqVCE25N
 meFRgA7S8jy6BefEVeUgI8TxK+amGO36szMBUEnZxSSxq9u+gd13m5bEK6Xq/ov7
 4KTAiA3Irm/W5FBTktu1zc5ROIra1Xj7jLdubf8wEC3viSXIXB3+68Y28iBN7D2O
 k9kSpwINC5lWeC8guZy2I+2yc4ywUEXao9nVh8C/J+FQtU02TcdLtZop9OhpAa8u
 U19VVH3WHkQI7ZfLvBTUiYK6tlYTiYCnpr8l6sm850CnVv1fzBW+DzmVhPJ6FdFd
 4m5staC0sQ6gVqtjVMBOtT5CdzREse6hpwbKo2GRWFroO5W9tljMOJJXHvv/f6kz
 DxrpUmj37JuRbqAbr8KDmQqPo6M2CRkxFxjol1yh5ER63u1xMwLm/PQITZIMDvPO
 jrFc2C2SdM2E9bKP/RMCVoKSoRwxCJ5IwJ2AF237rrU0sx/zB2xsrOGssx5CWEgc
 3bbk6tDQujJJubnCfgRy1tTxpLZOHEEKw8YhFLLbR2LCtA9pA/0rfLLad16cjA5e
 5jIHxfsFc23zgpzrJeB7kAF/9xgu1tlA5BotOs3VBE89LtWOA9nK5dbPXng6qlUe
 er3xLCfS38ovhUw6DusQpaYLuaYuLM7DKO4iav9kuTMcY9GkbPk7vDD3KPGh2goy
 hY5cSM8+kT1q/THLnUBH
 =Bdbv
 -----END PGP SIGNATURE-----

Merge tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 cleanups from Arnd Bergmann:
 "y2038 syscall implementation cleanups

  This is a series of cleanups for the y2038 work, mostly intended for
  namespace cleaning: the kernel defines the traditional time_t, timeval
  and timespec types that often lead to y2038-unsafe code. Even though
  the unsafe usage is mostly gone from the kernel, having the types and
  associated functions around means that we can still grow new users,
  and that we may be missing conversions to safe types that actually
  matter.

  There are still a number of driver specific patches needed to get the
  last users of these types removed, those have been submitted to the
  respective maintainers"

Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
  y2038: alarm: fix half-second cut-off
  y2038: ipc: fix x32 ABI breakage
  y2038: fix typo in powerpc vdso "LOPART"
  y2038: allow disabling time32 system calls
  y2038: itimer: change implementation to timespec64
  y2038: move itimer reset into itimer.c
  y2038: use compat_{get,set}_itimer on alpha
  y2038: itimer: compat handling to itimer.c
  y2038: time: avoid timespec usage in settimeofday()
  y2038: timerfd: Use timespec64 internally
  y2038: elfcore: Use __kernel_old_timeval for process times
  y2038: make ns_to_compat_timeval use __kernel_old_timeval
  y2038: socket: use __kernel_old_timespec instead of timespec
  y2038: socket: remove timespec reference in timestamping
  y2038: syscalls: change remaining timeval to __kernel_old_timeval
  y2038: rusage: use __kernel_old_timeval
  y2038: uapi: change __kernel_time_t to __kernel_old_time_t
  y2038: stat: avoid 'time_t' in 'struct stat'
  y2038: ipc: remove __kernel_time_t reference from headers
  y2038: vdso: powerpc: avoid timespec references
  ...
2019-12-01 14:00:59 -08:00
Linus Torvalds
0da522107e compat_ioctl: remove most of fs/compat_ioctl.c
As part of the cleanup of some remaining y2038 issues, I came to
 fs/compat_ioctl.c, which still has a couple of commands that need support
 for time64_t.
 
 In completely unrelated work, I spent time on cleaning up parts of this
 file in the past, moving things out into drivers instead.
 
 After Al Viro reviewed an earlier version of this series and did a lot
 more of that cleanup, I decided to try to completely eliminate the rest
 of it and move it all into drivers.
 
 This series incorporates some of Al's work and many patches of my own,
 but in the end stops short of actually removing the last part, which is
 the scsi ioctl handlers. I have patches for those as well, but they need
 more testing or possibly a rewrite.
 
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
 RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
 +ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
 lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
 6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
 dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
 VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
 YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
 m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
 TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
 e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
 bN65armTm7bFFR32Avnu
 =lgCl
 -----END PGP SIGNATURE-----

Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
 "As part of the cleanup of some remaining y2038 issues, I came to
  fs/compat_ioctl.c, which still has a couple of commands that need
  support for time64_t.

  In completely unrelated work, I spent time on cleaning up parts of
  this file in the past, moving things out into drivers instead.

  After Al Viro reviewed an earlier version of this series and did a lot
  more of that cleanup, I decided to try to completely eliminate the
  rest of it and move it all into drivers.

  This series incorporates some of Al's work and many patches of my own,
  but in the end stops short of actually removing the last part, which
  is the scsi ioctl handlers. I have patches for those as well, but they
  need more testing or possibly a rewrite"

* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
  scsi: sd: enable compat ioctls for sed-opal
  pktcdvd: add compat_ioctl handler
  compat_ioctl: move SG_GET_REQUEST_TABLE handling
  compat_ioctl: ppp: move simple commands into ppp_generic.c
  compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
  compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
  compat_ioctl: unify copy-in of ppp filters
  tty: handle compat PPP ioctls
  compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
  compat_ioctl: handle SIOCOUTQNSD
  af_unix: add compat_ioctl support
  compat_ioctl: reimplement SG_IO handling
  compat_ioctl: move WDIOC handling into wdt drivers
  fs: compat_ioctl: move FITRIM emulation into file systems
  gfs2: add compat_ioctl support
  compat_ioctl: remove unused convert_in_user macro
  compat_ioctl: remove last RAID handling code
  compat_ioctl: remove /dev/raw ioctl translation
  compat_ioctl: remove PCI ioctl translation
  compat_ioctl: remove joystick ioctl translation
  ...
2019-12-01 13:46:15 -08:00
Mike Rapoport
3c1c24d91f userfaultfd: require CAP_SYS_PTRACE for UFFD_FEATURE_EVENT_FORK
A while ago Andy noticed
(http://lkml.kernel.org/r/CALCETrWY+5ynDct7eU_nDUqx=okQvjm=Y5wJvA4ahBja=CQXGw@mail.gmail.com)
that UFFD_FEATURE_EVENT_FORK used by an unprivileged user may have
security implications.

As the first step of the solution the following patch limits the availably
of UFFD_FEATURE_EVENT_FORK only for those having CAP_SYS_PTRACE.

The usage of CAP_SYS_PTRACE ensures compatibility with CRIU.

Yet, if there are other users of non-cooperative userfaultfd that run
without CAP_SYS_PTRACE, they would be broken :(

Current implementation of UFFD_FEATURE_EVENT_FORK modifies the file
descriptor table from the read() implementation of uffd, which may have
security implications for unprivileged use of the userfaultfd.

Limit availability of UFFD_FEATURE_EVENT_FORK only for callers that have
CAP_SYS_PTRACE.

Link: http://lkml.kernel.org/r/1572967777-8812-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Nick Kralevich <nnk@google.com>
Cc: Nosh Minwalla <nosh@google.com>
Cc: Pavel Emelyanov <ovzxemul@gmail.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:10 -08:00
Andrea Arcangeli
9d4678eb17 fs/userfaultfd.c: wp: clear VM_UFFD_MISSING or VM_UFFD_WP during userfaultfd_register()
If the registration is repeated without VM_UFFD_MISSING or VM_UFFD_WP they
need to be cleared.  Currently setting UFFDIO_REGISTER_MODE_WP returns
-EINVAL, so this patch is a noop until the UFFDIO_REGISTER_MODE_WP support
is applied.

Link: http://lkml.kernel.org/r/20191004232834.GP13922@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Wei Yang <richardw.yang@linux.intel.com>
Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:10 -08:00
Wei Yang
188b04a7d9 hugetlb: remove unused hstate in hugetlb_fault_mutex_hash()
The first parameter hstate in function hugetlb_fault_mutex_hash() is not
used anymore.

This patch removes it.

[akpm@linux-foundation.org: various build fixes]
[cai@lca.pw: fix a GCC compilation warning]
 Link: http://lkml.kernel.org/r/1570544108-32331-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20191005003302.785-1-richardw.yang@linux.intel.com
Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
Signed-off-by: Qian Cai <cai@lca.pw>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:08 -08:00
Piotr Sarna
1ab5b82f54 hugetlbfs: add O_TMPFILE support
With hugetlbfs, a common pattern for mapping anonymous huge pages is to
create a temporary file first.  Currently libraries like libhugetlbfs
and seastar create these with a standard mkstemp+unlink trick, but it
would be more robust to be able to simply pass the O_TMPFILE flag to
open().  O_TMPFILE is already supported by several file systems like
ext4 and xfs.  The implementation simply uses the existi= ng d_tmpfile
utility function to instantiate the dcache entry for the file.

Tested manually by successfully creating a temporary file by opening it
with (O_TMPFILE|O_RDWR) on mounted hugetlbfs and successfully mapping 2M
huge pages with it.  Without the patch, trying to open a file with
O_TMPFILE results in -ENOSUP.

Link: http://lkml.kernel.org/r/bc9383eff6e1374d79f3a92257ae829ba1e6ae60.1573285189.git.p.sarna@tlen.pl
Signed-off-by: Piotr Sarna <p.sarna@tlen.pl>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:08 -08:00
Mike Kravetz
8fc312b32b mm/hugetlbfs: fix error handling when setting up mounts
It is assumed that the hugetlbfs_vfsmount[] array will contain either a
valid vfsmount pointer or NULL for each hstate after initialization.
Changes made while converting to use fs_context broke this assumption.

While fixing the hugetlbfs_vfsmount issue, it was discovered that
init_hugetlbfs_fs never did correctly clean up when encountering a vfs
mount error.

It was found during code inspection.  A small memory allocation failure
would be the most likely cause of taking a error path with the bug.
This is unlikely to happen as this is early init code.

Link: http://lkml.kernel.org/r/94b6244d-2c24-e269-b12c-e3ba694b242d@oracle.com
Reported-by: Chengguang Xu <cgxu519@mykernel.net>
Fixes: 32021982a3 ("hugetlbfs: Convert to fs_context")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:08 -08:00
Mike Kravetz
552546366a hugetlbfs: hugetlb_fault_mutex_hash() cleanup
A new clang diagnostic (-Wsizeof-array-div) warns about the calculation
to determine the number of u32's in an array of unsigned longs.
Suppress warning by adding parentheses.

While looking at the above issue, noticed that the 'address' parameter
to hugetlb_fault_mutex_hash is no longer used.  So, remove it from the
definition and all callers.

No functional change.

Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ilie Halip <ilie.halip@gmail.com>
Cc: David Bolvansky <david.bolvansky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 12:59:08 -08:00
Konstantin Khlebnikov
a92853b674 fs/direct-io.c: keep dio_warn_stale_pagecache() when CONFIG_BLOCK=n
This helper prints warning if direct I/O write failed to invalidate cache,
and set EIO at inode to warn usersapce about possible data corruption.

See also commit 5a9d929d6e ("iomap: report collisions between directio
and buffered writes to userspace").

Direct I/O is supported by non-disk filesystems, for example NFS.  Thus
generic code needs this even in kernel without CONFIG_BLOCK.

Link: http://lkml.kernel.org/r/157270038074.4812.7980855544557488880.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:18 -08:00
Ben Dooks
2b211dc04c fs/buffer.c: include internal.h for missing declarations
The declarations of __block_write_begin_int and guard_bio_eod are needed
from internal.h so include it to fix the following sparse warnings:

  fs/buffer.c:1930:5: warning: symbol '__block_write_begin_int' was not declared. Should it be static?
  fs/buffer.c:2994:6: warning: symbol 'guard_bio_eod' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/20191011170039.16100-1-ben.dooks@codethink.co.uk
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:17 -08:00
Saurav Girepunje
1d70667973 fs/buffer.c: fix use true/false for bool type
Use true/false for bool return type of has_bh_in_lru().

Link: http://lkml.kernel.org/r/20191029040529.GA7625@saurav
Signed-off-by: Saurav Girepunje <saurav.girepunje@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:17 -08:00
Ding Xiang
188c523e1c ocfs2: fix passing zero to 'PTR_ERR' warning
Fix a static code checker warning:
fs/ocfs2/acl.c:331
	ocfs2_acl_chmod() warn: passing zero to 'PTR_ERR'

Link: http://lkml.kernel.org/r/1dee278b-6c96-eec2-ce76-fe6e07c6e20f@linux.alibaba.com
Fixes: 5ee0fbd50f ("ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:17 -08:00
Linus Torvalds
3b805ca177 audit/stable-5.5 PR 20191126
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl3dbM4UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMagw/+MiOlIHFykjK0NGOyUbXR7AwjdrHz
 1Fgkh7VZCAHMfCcdlljIIkYe7P+ybfdK51E1QLBiTPGh353JJRAvrjFbDcIyT4kf
 u7AVddUeT0QQefFs39ZFWTV0mvJjDBfjFkmiL3cdY9ulx4ZX8V426qjyl8KIwTHe
 YkQF3pYMmO28G1SfZu298zTmrFRA10FezxCUbBRZTTE9FcD7mnGQRB7w/wY9t0H1
 ebIDgDA0EBh3oznGD3qxD63b5ULdSrImTzvSmEfhzZZsoZB4XGO5MOnLRqgBwFbT
 qfRSRfHVl+1JtDHCU43zOUlzScAsff5mvfVNdDMLAFtGGSjDGxgxKNyLwp8+5wmH
 GzvB99QQECvCN+gbaedm6adWBGzi7vpoCcgfqY0UPLYvCqsNFbZw4U/iu5ONKWy/
 cEGVUGzHf0pWofugDIJfGQuRt6iS2XT9Ode6+QMx8OiC3auZcluhTuJSxMQIhFZ1
 5XmoHOQddBwlalmIx8fsIHSo6xsAjNFwTOikEFVzUB+ECR2Urs8eHvQj+d94xz6e
 q9LrNkt/eVIDiI+PeP+UBD5IJlfmRSoyUd/mIDFMfmqMBubBSQl70Eyt3dUdt1Bz
 0PBs6xjYztpgk3Q7s35TMn8EvDcEksG0WEoFM5fEohYPLWQf8tJ5BbyYJJqc0sod
 ExlXu1lPfg9ppKc=
 =fc7R
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Audit is back for v5.5, albeit with only two patches:

   - Allow for the auditing of suspicious O_CREAT usage via the new
     AUDIT_ANOM_CREAT record.

   - Remove a redundant if-conditional check found during code analysis.
     It's a minor change, but when the pull request is only two patches
     long, you need filler in the pull request email"

[ Heh on the pull request filler. I wish more people tried to write
  better pull request messages, even if maybe it's not worth it for the
  trivial cases ;^)   - Linus ]

* tag 'audit-pr-20191126' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: remove redundant condition check in kauditd_thread()
  audit: Report suspicious O_CREAT usage
2019-11-30 17:01:48 -08:00
Linus Torvalds
6a965666b7 Pipework for general notification queue
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl3O0OoACgkQ+7dXa6fL
 C2tAwA//VH9Y81azemXFdflDF90sSH3TCASlKHVYHbBNAkH/QP5F00G4BEM4nNqH
 F3x7qcU9vzfGdumF1pc90Yt6XSYlsQEGF+xMyMw/VS2wKs40yv+b/doVbzOWbN9C
 NfrklgHeuuBk+JzU2llDisVqKRTLt4SmDpYu1ZdcchUQFZCCl3BpgdSEC+xXrHay
 +KlRPVNMSd2kXMCDuSWrr71lVNdCTdf3nNC5p1i780+VrgpIBIG/jmiNdCcd7PLH
 1aesPlr8UZY3+bmRtqe587fVRAhT2qA2xibKtyf9R0hrDtUKR4NSnpPmaeIjb26e
 LhVntcChhYxQqzy/T4ScTDNVjpSlwi6QMo5DwAwzNGf2nf/v5/CZ+vGYDVdXRFHj
 tgH1+8eDpHsi7jJp6E4cmZjiolsUx/ePDDTrQ4qbdDMO7fmIV6YQKFAMTLJepLBY
 qnJVqoBq3qn40zv6tVZmKgWiXQ65jEkBItZhEUmcQRBiSbBDPweIdEzx/mwzkX7U
 1gShGdut6YP4GX7BnOhkiQmzucS85mgkUfG43+mBfYXb+4zNTEjhhkqhEduz2SQP
 xnjHxEM+MTGCj3PozIpJxNKzMTEceYY7cAUdNEMDQcHog7OCnIdGBIc7BPnsN8yA
 CPzntwP4mmLfK3weq3PIGC6d9xfc9PpmiR9docxQOvE6sk2Ifeo=
 =FKC7
 -----END PGP SIGNATURE-----

Merge tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull pipe rework from David Howells:
 "This is my set of preparatory patches for building a general
  notification queue on top of pipes. It makes a number of significant
  changes:

   - It removes the nr_exclusive argument from __wake_up_sync_key() as
     this is always 1. This prepares for the next step:

   - Adds wake_up_interruptible_sync_poll_locked() so that poll can be
     woken up from a function that's holding the poll waitqueue
     spinlock.

   - Change the pipe buffer ring to be managed in terms of unbounded
     head and tail indices rather than bounded index and length. This
     means that reading the pipe only needs to modify one index, not
     two.

   - A selection of helper functions are provided to query the state of
     the pipe buffer, plus a couple to apply updates to the pipe
     indices.

   - The pipe ring is allowed to have kernel-reserved slots. This allows
     many notification messages to be spliced in by the kernel without
     allowing userspace to pin too many pages if it writes to the same
     pipe.

   - Advance the head and tail indices inside the pipe waitqueue lock
     and use wake_up_interruptible_sync_poll_locked() to poke poll
     without having to take the lock twice.

   - Rearrange pipe_write() to preallocate the buffer it is going to
     write into and then drop the spinlock. This allows kernel
     notifications to then be added the ring whilst it is filling the
     buffer it allocated. The read side is stalled because the pipe
     mutex is still held.

   - Don't wake up readers on a pipe if there was already data in it
     when we added more.

   - Don't wake up writers on a pipe if the ring wasn't full before we
     removed a buffer"

* tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  pipe: Remove sync on wake_ups
  pipe: Increase the writer-wakeup threshold to reduce context-switch count
  pipe: Check for ring full inside of the spinlock in pipe_write()
  pipe: Remove redundant wakeup from pipe_write()
  pipe: Rearrange sequence in pipe_write() to preallocate slot
  pipe: Conditionalise wakeup in pipe_read()
  pipe: Advance tail pointer inside of wait spinlock in pipe_read()
  pipe: Allow pipes to have kernel-reserved slots
  pipe: Use head and tail pointers for the ring, not cursor and length
  Add wake_up_interruptible_sync_poll_locked()
  Remove the nr_exclusive argument from __wake_up_sync_key()
  pipe: Reduce #inclusion of pipe_fs_i.h
2019-11-30 14:12:13 -08:00
NeilBrown
466e16f092 nfsd: check for EBUSY from vfs_rmdir/vfs_unink.
vfs_rmdir and vfs_unlink can return -EBUSY if the
target is a mountpoint.  This currently gets passed to
nfserrno() by nfsd_unlink(), and that results in a WARNing,
which is not user-friendly.

Possibly the best NFSv4 error is NFS4ERR_FILE_OPEN, because
there is a sense in which the object is currently in use
by some other task.  The Linux NFSv4 client will map this
back to EBUSY, which is an added benefit.

For NFSv3, the best we can do is probably NFS3ERR_ACCES, which isn't
true, but is not less true than the other options.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-11-30 14:59:52 -05:00
Trond Myklebust
a25e3726b3 nfsd: Ensure CLONE persists data and metadata changes to the target file
The NFSv4.2 CLONE operation has implicit persistence requirements on the
target file, since there is no protocol requirement that the client issue
a separate operation to persist data.
For that reason, we should call vfs_fsync_range() on the destination file
after a successful call to vfs_clone_file_range().

Fixes: ffa0160a10 ("nfsd: implement the NFSv4.2 CLONE operation")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2019-11-30 14:55:38 -05:00
Linus Torvalds
32ef955363 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl3hAmYACgkQnJ2qBz9k
 QNna+AgAs/6eWmQqEwij9+E761bELuMXIp8Ej9OxrBgE9Uls0VsMOsG1xfFoue+J
 sZme9XSUQ+STqPQzTi6FH4x1n5kL3VnZzjRDv9qpsRCfMzX1MbibczNvXuCCgGuP
 cKr3gPFX+GSDwQPEJX7tzF5zLZ1t1XOc4TD/H6v/X+2Nr/N1lnYeAaNcfFiGMyg1
 Pk9JK2/qUoqWASCx6zxHG/lfjmaSSIrOwaKFACNHH0/6Luc1M7Oee3M9T/jS2IEo
 8/WTS4o9CwOUFaEPemuUBv4yHyBpbJwsM+emwNKrNDSWbI5vP3ECOO7c4c+1/abS
 FcwPvkHyAQnpSgmeYlhfTjZfA/A6+w==
 =m7WG
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "Three fsnotify cleanups"

* tag 'fsnotify_for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: Add git tree reference to MAINTAINERS
  fsnotify/fdinfo: exportfs_encode_inode_fh() takes pointer as 4th argument
  fsnotify: move declaration of fsnotify_mark_connector_cachep to fsnotify.h
2019-11-30 11:34:33 -08:00
Linus Torvalds
b8072d5b3c \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl3hAFIACgkQnJ2qBz9k
 QNkV/gf+Kwn7xHg76YXd15lZYBzhgj/ABAYEEAAVY49OOCK5+XVmmAufHesMZ2lU
 Solt8PvbQ8d5786bWpaYXgrTU3JW37c6x1MDUPDLQ8goXWzx7pZWvD+Yup558rDa
 H1aoqvFKLgpeVVqkUdvvv2CDbgZyOgGlkDqWeS+c5pZd1NPFZzUAoU26slvQ5h4f
 t41mbavOIm5DChQ5UjwRNw+pb09GXaHrPBRJwa1XuJYJWAansBcQIsxiiqt/43Gn
 AzwUGrsz4vrPBk+Kcd0SGb8vinFVQr19gBFKFeN3rPFUEUn6T0FPBqaYeiNTNE37
 AqASYKlIuhcSf0Wdvx6vxwSHsFl5VA==
 =NGxV
 -----END PGP SIGNATURE-----

Merge tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara:

 - Refactor the quota on/off kernel internal interfaces (mostly for
   ubifs quota support as ubifs does not want to have inodes holding
   quota information)

 - A few other small quota fixes and cleanups

 - Various small ext2 fixes and cleanups

 - Reiserfs xattr fix and one cleanup

* tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
  ext2: code cleanup for descriptor_loc()
  fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
  ext2: fix improper function comment
  ext2: code cleanup for ext2_try_to_allocate()
  ext2: skip unnecessary operations in ext2_try_to_allocate()
  ext2: Simplify initialization in ext2_try_to_allocate()
  ext2: code cleanup by calling ext2_group_last_block_no()
  ext2: introduce new helper ext2_group_last_block_no()
  reiserfs: replace open-coded atomic_dec_and_mutex_lock()
  ext2: check err when partial != NULL
  quota: Handle quotas without quota inodes in dquot_get_state()
  quota: Make dquot_disable() work without quota inodes
  quota: Drop dquot_enable()
  fs: Use dquot_load_quota_inode() from filesystems
  quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode()
  quota: Simplify dquot_resume()
  quota: Factor out setup of quota inode
  quota: Check that quota is not dirty before release
  quota: fix livelock in dquot_writeback_dquots
  ext2: don't set *count in the case of failure in ext2_try_to_allocate()
  ...
2019-11-30 11:16:07 -08:00
Linus Torvalds
e2d73c302b Updates since last change:
- Introduce superblock checksum support;
 - Set iowait when waiting I/O for sync decompression path;
 - Several code cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIwEABYIADQWIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCXd/dwRYcZ2FveGlhbmcy
 NUBodWF3ZWkuY29tAAoJEDk3MdwfteYEjd4BANBpFIxoenCeu9A8XVG4TVIobyoz
 g/eci9QZr+rgPGl/AP93QFEAPmjwQi6iNQREwvXQ96EE9rMX0Legt3tFeHPRCw==
 =xdz3
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs updates from Gao Xiang:
 "No major kernel updates for this round since I'm fully diving into
  LZMA algorithm internals now to provide high CR XZ algorihm support.
  That needs more work and time for me to get a better compression time.

  Summary:

   - Introduce superblock checksum support

   - Set iowait when waiting I/O for sync decompression path

   - Several code cleanups"

* tag 'erofs-for-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: remove unnecessary output in erofs_show_options()
  erofs: drop all vle annotations for runtime names
  erofs: support superblock checksum
  erofs: set iowait for sync decompression
  erofs: clean up decompress queue stuffs
  erofs: get rid of __stagingpage_alloc helper
  erofs: remove dead code since managed cache is now built-in
  erofs: clean up collection handling routines
2019-11-30 11:13:33 -08:00
Linus Torvalds
21b26d2679 Various smb3 fixes (including 12 for stable) and also features (addition of multichannel support)
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCAAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl3fB2gACgkQiiy9cAdy
 T1G4rgwAwTVvUYOmUVxJt1OxJDAnVxLXOH08FmrSL08nacrUCb6fPRw0z0Eb65SB
 4/Yc3xIH+bJInNpACKZVY0ghPlDYC4u7AtLUZqm4+ruDVSjoWmF4s38WsNCrZywm
 uAWq4p+j5A62uGdgdSzVxdUJkSLGRTc+7ib+y/TZ2U4BLjb6AY2etUtOpnA6lQ4R
 UHvjf7leHw55pvPbNnpZ/2zT18KpPH8twCj0f3lH4yt8Akw0p/va6tRcfsmE2plh
 V3dxiPmS86otXFPyHpu9B9plrxJCcGuYUL+R8pPpbLD3MZMsdK7J+sGNdayOoLXl
 bpuoTvl3LThEwcg7bFeB/jHXGx95KPm3vh99lrZvbM9Snya+ueSfudrhmoqCEiSo
 5ri1tWH7GlPr9nEaxp/aYB2LZENpbYvwn0yxoUdgK+R9EN5F3utUwK54CgYPVz9Y
 1zp0LksafWZu7NrIPOuoq+/ATJ1WgbDYrhotk9fwaZoc2EsUfJd0I31XGECisjdh
 fcHNxbA5
 =EIrP
 -----END PGP SIGNATURE-----

Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs updates from Steve French:
 "Various smb3 fixes (including 12 for stable) and also features
  (addition of multichannel support)"

* tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (41 commits)
  CIFS: fix a white space issue in cifs_get_inode_info()
  cifs: update internal module version number
  cifs: Always update signing key of first channel
  cifs: Fix retrieval of DFS referrals in cifs_mount()
  cifs: Fix potential softlockups while refreshing DFS cache
  cifs: Fix lookup of root ses in DFS referral cache
  cifs: Fix use-after-free bug in cifs_reconnect()
  cifs: dump channel info in DebugData
  smb3: dump in_send and num_waiters stats counters by default
  cifs: try harder to open new channels
  CIFS: Properly process SMB3 lease breaks
  cifs: move cifsFileInfo_put logic into a work-queue
  cifs: try opening channels after mounting
  CIFS: refactor cifs_get_inode_info()
  cifs: switch servers depending on binding state
  cifs: add server param
  cifs: add multichannel mount options and data structs
  cifs: sort interface list by speed
  CIFS: Fix SMB2 oplock break processing
  cifs: don't use 'pre:' for MODULE_SOFTDEP
  ...
2019-11-30 11:10:39 -08:00
Linus Torvalds
8f45533e9d f2fs-for-5.5-rc1
In this round, we've introduced fairly small number of patches as below.
 
 Enhancement:
  - improve the in-place-update IO flow
  - allocate segment to guarantee no GC for pinned files
 
 Bug fix:
  - fix updatetime in lazytime mode
  - potential memory leak in f2fs_listxattr
  - record parent inode number in rename2 correctly
  - fix deadlock in f2fs_gc along with atomic writes
  - avoid needless data migration in GC
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl3e1XkACgkQQBSofoJI
 UNJ0GhAAhVIX4J91CLnVSh0ik1XCaI6h/dFeS6kbDd8oxzQm/qt64b59aZqgy7Rk
 iblGWfj8uPP5yO60pqb5uN4a0hybptVZSEldbhF0Xv0zUeVoT7C1ksTMrdUd1p7d
 YkO8G+V4QBBrtpKG1KKKEncrvcdx4n9QHxGsRh4z5vXZH7sEmH7+N8OE88MaPjdZ
 UWqYk0S0GoZBhPe7c8pQuD/PM+WJJH4Lewgw5kK21eAjOKI+yZKb+bY2tGjo5dA1
 nzYO72CRMV4VEKsnxTZ/LCB2kCXeexaGuiVPyHjCmgAh990cLjsCWIbJ8EJu7uAa
 vAo6/EMfgfPkPt5Y7uWGR4EeNT7AFhUoMuoQ9zdXzecY48D4Gz58o87Q+OFY3ipZ
 W2OSf92pEJyfumE5o8wN435gaRYUjjCo1SMoIQABNav411XrBVoRwjvkV3DyA6af
 Bs1bafz2hR/E1q0uoZvLWC5waiHy9605OkKMs/y8IRsn6yhRep/tv3KLk2Dz3fOO
 LxenhuVO9bQDCheEcH15qIljxTuyfTyUOa9UrFXOwn4mK61J8A/Gs+SiqW0y28oA
 feSw7cLPxK0OlYQgql24JfJN/Xt523WmCSfXfe7TCUDTDkBpmsdhFwHYZyCLzqt+
 FyBhf2DF/BGzKMT28oc7StO43mIvOc1Wk+jfJFW+hld5ncAJxCE=
 =qyrd
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've introduced fairly small number of patches as below.

  Enhancements:
   - improve the in-place-update IO flow
   - allocate segment to guarantee no GC for pinned files

  Bug fixes:
   - fix updatetime in lazytime mode
   - potential memory leak in f2fs_listxattr
   - record parent inode number in rename2 correctly
   - fix deadlock in f2fs_gc along with atomic writes
   - avoid needless data migration in GC"

* tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: stop GC when the victim becomes fully valid
  f2fs: expose main_blkaddr in sysfs
  f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
  f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
  f2fs: show f2fs instance in printk_ratelimited
  f2fs: fix potential overflow
  f2fs: fix to update dir's i_pino during cross_rename
  f2fs: support aligned pinned file
  f2fs: avoid kernel panic on corruption test
  f2fs: fix wrong description in document
  f2fs: cache global IPU bio
  f2fs: fix to avoid memory leakage in f2fs_listxattr
  f2fs: check total_segments from devices in raw_super
  f2fs: update multi-dev metadata in resize_fs
  f2fs: mark recovery flag correctly in read_raw_super_block()
  f2fs: fix to update time in lazytime mode
2019-11-30 11:02:30 -08:00
Linus Torvalds
4a55d362ff AFS development
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl3W9UYACgkQ+7dXa6fL
 C2uO1g//T0NvHmrZr6zHbOWlaKkGXRfhgCLENzuTCtcFt29NV+Jv8D/FNuT29d7f
 x8PzrZUBDYfpjP8OSHEivZgU5+a4jpI5pSd2mnGnT3523Oxbm2lIz/SXgpkwQdkM
 4OcK+voCP1MvHbmO7jyOfQm7SX/aOtF90GQrUog5JqwPD0ddSpk/swNNRKZ6xK8/
 cAq9/DHPfYHfLh8wc2FWmhdPA5Px+eKt9kGmMMpwsCulR4yMM7vBCIVNSccSjpYR
 QJzmFpi/TfOiRMc0oZnXLv6UUG+7hHeRfRosoQA8YGGsjwXfs2I7MfAziJx68tfu
 v7beYI+u7tc4EhgK9jXxbgPCJF6nFR/zrbzfC3Zlh7Hr3NCGGzH4gwDu+pn4lwfV
 bpUyA6VwmEt+CVzjj6LB5l4vyRSk+TtVQSwfQvCA7xqvdq0J0/fux6OQ1DwJonQe
 6dOpPUd4ip5tW28S9U7MzbbpMv1kP5hMoAu87gBdygUDhPojM7Ut0mFwZDMOunXg
 vW6oQwKyyFPeJhRcPMKxS7opWQz5889tDe8GPNI/1ociOYCcCkmwZaaG+hYAJ8+N
 8bK2OCi4yNGBa/qMPbwATY+TVKdYcceq/7THxcI3vZk8nVrda73UQHfwUXsjwUzf
 SNNcoMrsEV9BiPSG7gAkCf3VI2gXj+USm3w7SBSjZDM5qR/lRHg=
 =YjAm
 -----END PGP SIGNATURE-----

Merge tag 'afs-next-20191121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS updates from David Howells:
 "Minor cleanups and fix:

   - Minor fix to make some debugging statements display information
     from the correct iov_iter.

   - Rename some members and variables to make things more obvious or
     consistent.

   - Provide a helper to wrap increments of the usage count on the
     afs_read struct.

   - Use scnprintf() to print into a stack buffer rather than sprintf().

   - Remove some set but unused variables"

* tag 'afs-next-20191121' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Remove set but not used variable 'ret'
  afs: Remove set but not used variables 'before', 'after'
  afs: xattr: use scnprintf
  afs: Introduce an afs_get_read() refcount helper
  afs: Rename desc -> req in afs_fetch_data()
  afs: Switch the naming of call->iter and call->_iter
  afs: Use call->_iter not &call->iter in debugging statements
2019-11-30 10:57:22 -08:00
Linus Torvalds
50b8b3f85a This merge window saw the the following new featuers added to ext4:
* Direct I/O via iomap (required the iomap-for-next branch from Darrick
    as a prereq).
  * Support for using dioread-nolock where the block size < page size.
  * Support for encryption for file systems where the block size < page size.
  * Rework of journal credits handling so a revoke-heavy workload will
    not cause the journal to run out of space.
  * Replace bit-spinlocks with spinlocks in jbd2
 
 Also included were some bug fixes and cleanups, mostly to clean up
 corner cases from fuzzed file systems and error path handling.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAl3dHxoACgkQ8vlZVpUN
 gaMZswf5AbtQhTEJDXO7Pc1ull38GIGFgAv7uAth0TymLC3h1/FEYWW0crEPFsDr
 1Eei55UUVOYrMMUKQ4P7wlLX0cIh3XDPMWnRFuqBoV5/ZOsH/ZSbkY//TG2Xze/v
 9wXIH/RKQnzbRtXffJ1+DnvmXJk+HFm1R1gjl0nfyUXGrnlSfqJxhLSczyd6bJJq
 ehi/tso5UC/4EQsAIdWp7VWsAdaHcZ7ogHqDoy8dXpM1equ408iml7VlKr8R+Nr7
 5ANpCISXChSlLLYm0NYN5vhO8upF5uDxWLdCtxVPL5kFdM2m/ELjXw9h9C+78l7C
 EWJGlGlxvx07Px+e+bfStEsoixpWBg==
 =0eko
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "This merge window saw the the following new featuers added to ext4:

   - Direct I/O via iomap (required the iomap-for-next branch from
     Darrick as a prereq).

   - Support for using dioread-nolock where the block size < page size.

   - Support for encryption for file systems where the block size < page
     size.

   - Rework of journal credits handling so a revoke-heavy workload will
     not cause the journal to run out of space.

   - Replace bit-spinlocks with spinlocks in jbd2

  Also included were some bug fixes and cleanups, mostly to clean up
  corner cases from fuzzed file systems and error path handling"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (59 commits)
  ext4: work around deleting a file with i_nlink == 0 safely
  ext4: add more paranoia checking in ext4_expand_extra_isize handling
  jbd2: make jbd2_handle_buffer_credits() handle reserved handles
  ext4: fix a bug in ext4_wait_for_tail_page_commit
  ext4: bio_alloc with __GFP_DIRECT_RECLAIM never fails
  ext4: code cleanup for get_next_id
  ext4: fix leak of quota reservations
  ext4: remove unused variable warning in parse_options()
  ext4: Enable encryption for subpage-sized blocks
  fs/buffer.c: support fscrypt in block_read_full_page()
  ext4: Add error handling for io_end_vec struct allocation
  jbd2: Fine tune estimate of necessary descriptor blocks
  jbd2: Provide trace event for handle restarts
  ext4: Reserve revoke credits for freed blocks
  jbd2: Make credit checking more strict
  jbd2: Rename h_buffer_credits to h_total_credits
  jbd2: Reserve space for revoke descriptor blocks
  jbd2: Drop jbd2_space_needed()
  jbd2: Account descriptor blocks into t_outstanding_credits
  jbd2: Factor out common parts of stopping and restarting a handle
  ...
2019-11-30 10:53:02 -08:00
Linus Torvalds
f112a2fd1f New code for 5.5:
- Fix another place in the splice code where a pipe could ask a
 filesystem for a longer read than the pipe actually has free buffer
 space.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3XKBsACgkQ+H93GTRK
 tOubhg/9FhWIX+Ijj0a1MQj3QDv0YbLPv2uTVLYyRK+1X5Xwc7u0OmmbWYrQQQUJ
 6aOaictBXXTIJtKqVcmXBXboUskKygnWKlVssKAI/BYR9ope/EMrMV5+4bz0mKt/
 ufcDXRiJX2hbL8GicmJxK/qmp3bEAG0n/e0+0LcuYynatYXJQ19eOUeritVZ3EYA
 Vhbgvrtuq8ws9SqvJ4Dm7fLsPT0AOHocor34tXS+gRq8KYIA2Ru7uDyE2uJJujF7
 qAlmZVvun+RcSBG0mYdk6GDgLGibn76t+d0xnqd/OzSeh+6bHYc1ni6FJD8kjVzf
 nGHFJfbEOCzDEees9ESBkgnH1TiCN4wCs7JQ4BictAPotBCtkbcNLvrO+b77zgwp
 DAu6uE/DiRmIu9zqfy8TyH1E78pW7qTpI9yFDUQLaItvAbbqYIaiRLr4Hjd2Hf3E
 MLYOSGZkKn/xZaqedk+rGwYJJmmvzCZblfrQ3hCtLGjmNQgfTRXidlgyicl1Ta12
 iG9UkCFdbkZmdW9zWyYmZrfxLB6QzOZpgbdUwyMk0Rfc1sKJKYoNE+hkt8fjmS71
 0hBPYSopDsYkGEH1719Nsfvyj0101McFklvEqS2hU9/rLyaF0Mdq5catVbzyBPhk
 RwnvXdFd/sFgbR+1xLiEm51aXAe/kmveNNQBqZkuBgXITt7oh7Q=
 =ripU
 -----END PGP SIGNATURE-----

Merge tag 'vfs-5.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull splice fix from Darrick Wong:
 "Fix another place in the splice code where a pipe could ask a
  filesystem for a longer read than the pipe actually has free buffer
  space"

* tag 'vfs-5.5-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  splice: only read in as much information as there is pipe buffer space
2019-11-30 10:48:24 -08:00
Linus Torvalds
3b266a52d8 New code for 5.5:
- Make iomap_dio_rw callers explicitly tell us if they want us to wait
 - Port the xfs writeback code to iomap to complete the buffered io
   library functions
 - Refactor the unshare code to share common pieces
 - Add support for performing copy on write with buffered writes
 - Other minor fixes
 - Fix unchecked return in iomap_bmap
 - Fix a type casting bug in a ternary statement in iomap_dio_bio_actor
 - Improve tracepoints for easier diagnostic ability
 - Fix pipe page leakage in directio reads
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3YDqQACgkQ+H93GTRK
 tOsbPg/+KrPkhe60RnUO4PQbSLTRLsz5R7OK5ubfxsKlHdyy/8vu12yPdY03LcHn
 QoyQz+5gPjM58UvysIonlyRO0O30apl8UZ9PAINDMi7os6NP87illw4HtHL1cZjB
 JfIQKVJrLNtocZnAgVL74d6pUs7MH32SPw0r+/qbfz/JFcHdf/Sz8fSpb0sdS/oK
 QwT73TcaCcNa2C4twvhtO6+kMkzlTkJknqSZMqrthScqRRVeOnyGTjLdKqUamSqp
 uj0iAKm+bBpCcuMdcHd7EOgQyVGwCQKUndaLKojK/V1iqiK+3KsLnIoJj6HwlU27
 Q+pDoThv2V8m/Y940Gq0wzTNtkwdCirNaeKXwXX2ytlyPX5W45ZxgzGUQy4YYXGM
 ObHRmeJXdka6kH7yzWlPiZGdZixpagFLzFUHWjMXAD8Fb4YxNKM4FsJDY3K3uQi6
 y7EKV5O4q3qBW3ieL6l+wl9NkdcppSywRjRxhBZIhH4T2n7RMDLdBkNCiKZrWqIW
 1QcrIC1NvwbXPzSNaZ40dgjB6mJGTv+P9AJLSQpmlR1Y6txLsJ1AeVzucZOnXcCo
 TEiBfDFOZ9jvXUsaqGSdM99HxsePKn+VgEeTA6TFxTWJ9QcEgio1Pym+RkN6KrIU
 zsbJbgk9bp3YT940xKt90fAHWm/iKUWrrRyidVCK3kJd72e41UE=
 =FRl4
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:
 "In this release, we hoisted as much of XFS' writeback code into iomap
  as was practicable, refactored the unshare file data function, added
  the ability to perform buffered io copy on write, and tweaked various
  parts of the directio implementation as needed to port ext4's directio
  code (that will be a separate pull).

  Summary:

   - Make iomap_dio_rw callers explicitly tell us if they want us to
     wait

   - Port the xfs writeback code to iomap to complete the buffered io
     library functions

   - Refactor the unshare code to share common pieces

   - Add support for performing copy on write with buffered writes

   - Other minor fixes

   - Fix unchecked return in iomap_bmap

   - Fix a type casting bug in a ternary statement in
     iomap_dio_bio_actor

   - Improve tracepoints for easier diagnostic ability

   - Fix pipe page leakage in directio reads"

* tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits)
  iomap: Fix pipe page leakage during splicing
  iomap: trace iomap_appply results
  iomap: fix return value of iomap_dio_bio_actor on 32bit systems
  iomap: iomap_bmap should check iomap_apply return value
  iomap: Fix overflow in iomap_page_mkwrite
  fs/iomap: remove redundant check in iomap_dio_rw()
  iomap: use a srcmap for a read-modify-write I/O
  iomap: renumber IOMAP_HOLE to 0
  iomap: use write_begin to read pages to unshare
  iomap: move the zeroing case out of iomap_read_page_sync
  iomap: ignore non-shared or non-data blocks in xfs_file_dirty
  iomap: always use AOP_FLAG_NOFS in iomap_write_begin
  iomap: remove the unused iomap argument to __iomap_write_end
  iomap: better document the IOMAP_F_* flags
  iomap: enhance writeback error message
  iomap: pass a struct page to iomap_finish_page_writeback
  iomap: cleanup iomap_ioend_compare
  iomap: move struct iomap_page out of iomap.h
  iomap: warn on inline maps in iomap_writepage_map
  iomap: lift the xfs writeback code to iomap
  ...
2019-11-30 10:44:49 -08:00
Jens Axboe
aa4c396775 io_uring: fix missing kmap() declaration on powerpc
Christophe reports that current master fails building on powerpc with
this error:

   CC      fs/io_uring.o
fs/io_uring.c: In function ‘loop_rw_iter’:
fs/io_uring.c:1628:21: error: implicit declaration of function ‘kmap’
[-Werror=implicit-function-declaration]
     iovec.iov_base = kmap(iter->bvec->bv_page)
                      ^
fs/io_uring.c:1628:19: warning: assignment makes pointer from integer
without a cast [-Wint-conversion]
     iovec.iov_base = kmap(iter->bvec->bv_page)
                    ^
fs/io_uring.c:1643:4: error: implicit declaration of function ‘kunmap’
[-Werror=implicit-function-declaration]
     kunmap(iter->bvec->bv_page);
     ^

which is caused by a missing highmem.h include. Fix it by including
it.

Fixes: 311ae9e159 ("io_uring: fix dead-hung for non-iter fixed rw")
Reported-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-29 10:14:00 -07:00
Joel Stanley
6e78c01fde Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()"
This reverts commit f2538f9993. The patch
stopped JFFS2 from being able to mount an existing filesystem with the
following errors:

 jffs2: error: (77) jffs2_build_inode_fragtree: Add node to tree failed -22
 jffs2: error: (77) jffs2_do_read_inode_internal: Failed to build final fragtree for inode #5377: error -22

Fixes: f2538f9993 ("jffs2: Fix possible null-pointer dereferences...")
Cc: stable@vger.kernel.org
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-11-29 11:29:58 +01:00
Linus Torvalds
05bd375b6b for-5.5/io_uring-post-20191128
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3f/C0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUjD/4t5ituIXmfPDulW4178LchRbAdC2FquGeU
 Okft4laeQsJ8ucTBm3RAziVDNnkeJ0mLy5JTMpdPwKW+1F/d8jPuDeTlmmRyKGyW
 vt2NlI9fKaOt8cU+LPp3MiGIamleZFOEPsRHU03HYASFcRKQwAswgjS2DZnR7oTK
 66XGB9CdwREh1mM63wERYX07+t+ylocTClcLQDCi99H4HI5ScsJXB5aSqpombycy
 XujZGlEAUT6WLp0HAwihhAnQIu/fEw+tXMWLvozphIP9xZGpk+a2nbd2WxTCdLZy
 MoksoCpl76RmYPuZ+TC1L+tHf2KqGHa/mo64L/6v15QS8hROw3ZTMdJI0mEC/90g
 5JwWuBqWOcRiMcV1Uwkz5OYL3KYQOWebS+FBbIc+2jqBlXJeN+1R/Ik5HP8xtGaE
 5p317/q09h32piZN0c+BmkXCdnAYZeBez0EjSXf6j3PW3LAqmLOjxc0yA2OmwFWj
 iudOfBOg48HtHkMaNrg2f+0THMP8zHqUVoRh96BKCSjnOXmbxGFt3ACHvb+VS0CX
 RmlVOcaq+SzL1EsT805Hb6N3x7mK2l9CoanGfLxYXQZlOzD6TcRdlBzs9JCgqiAs
 3jk5nAq9wtYvigAkVarMw/LAJNatEO4sy0AHyWJwAcfOsVJquNvhhokT8VxOcVZs
 pMxUhUYKZA==
 =HqiO
 -----END PGP SIGNATURE-----

Merge tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-block

Pull more io_uring updates from Jens Axboe:
 "As mentioned in the first pull request, there was a later batch as
  well. This contains fixes to the stuff that already went in, cleanups,
  and a few later additions. In particular, this contains:

   - Cleanups/fixes/unification of the submission and completion path
     (Pavel,me)

   - Linked timeouts improvements (Pavel,me)

   - Error path fixes (me)

   - Fix lookup window where cancellations wouldn't work (me)

   - Improve DRAIN support (Pavel)

   - Fix backlog flushing -EBUSY on submit (me)

   - Add support for connect(2) (me)

   - Fix for non-iter based fixed IO (Pavel)

   - creds inheritance for async workers (me)

   - Disable cmsg/ancillary data for sendmsg/recvmsg (me)

   - Shrink io_kiocb to 3 cachelines (me)

   - NUMA fix for io-wq (Jann)"

* tag 'for-5.5/io_uring-post-20191128' of git://git.kernel.dk/linux-block: (42 commits)
  io_uring: make poll->wait dynamically allocated
  io-wq: shrink io_wq_work a bit
  io-wq: fix handling of NUMA node IDs
  io_uring: use kzalloc instead of kcalloc for single-element allocations
  io_uring: cleanup io_import_fixed()
  io_uring: inline struct sqe_submit
  io_uring: store timeout's sqe->off in proper place
  net: disallow ancillary data for __sys_{send,recv}msg_file()
  net: separate out the msghdr copy from ___sys_{send,recv}msg()
  io_uring: remove superfluous check for sqe->off in io_accept()
  io_uring: async workers should inherit the user creds
  io-wq: have io_wq_create() take a 'data' argument
  io_uring: fix dead-hung for non-iter fixed rw
  io_uring: add support for IORING_OP_CONNECT
  net: add __sys_connect_file() helper
  io_uring: only return -EBUSY for submit on non-flushed backlog
  io_uring: only !null ptr to io_issue_sqe()
  io_uring: simplify io_req_link_next()
  io_uring: pass only !null to io_req_find_next()
  io_uring: remove io_free_req_find_next()
  ...
2019-11-28 10:43:39 -08:00
Roman Penyaev
6c5c240e41 io_uring: add mapping support for NOMMU archs
That is a bit weird scenario but I find it interesting to run fio loads
using LKL linux, where MMU is disabled.  Probably other real archs which
run uClinux can also benefit from this patch.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-28 10:08:02 -07:00
David Howells
82995cc6c5 libceph, rbd, ceph: convert to use the new mount API
Convert the ceph filesystem to the new internal mount API as the old
one will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.

[ Numerous string handling, leak and regression fixes; rbd conversion
  was particularly broken and had to be redone almost from scratch. ]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-11-27 22:28:37 +01:00
Linus Torvalds
9a3d7fd275 Driver core patches for 5.5-rc1
Here is the "big" set of driver core patches for 5.5-rc1
 
 There's a few minor cleanups and fixes in here, but the majority of the
 patches in here fall into two buckets:
   - debugfs api cleanups and fixes
   - driver core device link support for boot dependancy issues
 
 The debugfs api cleanups are working to slowly refactor the debugfs apis
 so that it is even harder to use incorrectly.  That work has been
 happening for the past few kernel releases and will continue over time,
 it's a long-term project/goal
 
 The driver core device link support missed 5.4 by just a bit, so it's
 been sitting and baking for many months now.  It's from Saravana Kannan
 to help resolve the problems that DT-based systems have at boot time
 with dependancy graphs and kernel modules.  Turns out that no one has
 actually tried to build a generic arm64 kernel with loads of modules and
 have it "just work" for a variety of platforms (like a distro kernel)
 The big problem turned out to be a lack of depandancy information
 between different areas of DT entries, and the work here resolves that
 problem and now allows devices to boot properly, and quicker than a
 monolith kernel.
 
 All of these patches have been in linux-next for a long time with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXd6m6Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yntJQCcCqg6RQ7LTdHuZv1ETeefXlsfk00An1Jtean6
 42bWGx52bGFvAcpjWy8R
 =P7hq
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.5-rc1

  There's a few minor cleanups and fixes in here, but the majority of
  the patches in here fall into two buckets:

   - debugfs api cleanups and fixes

   - driver core device link support for boot dependancy issues

  The debugfs api cleanups are working to slowly refactor the debugfs
  apis so that it is even harder to use incorrectly. That work has been
  happening for the past few kernel releases and will continue over
  time, it's a long-term project/goal

  The driver core device link support missed 5.4 by just a bit, so it's
  been sitting and baking for many months now. It's from Saravana Kannan
  to help resolve the problems that DT-based systems have at boot time
  with dependancy graphs and kernel modules. Turns out that no one has
  actually tried to build a generic arm64 kernel with loads of modules
  and have it "just work" for a variety of platforms (like a distro
  kernel). The big problem turned out to be a lack of dependency
  information between different areas of DT entries, and the work here
  resolves that problem and now allows devices to boot properly, and
  quicker than a monolith kernel.

  All of these patches have been in linux-next for a long time with no
  reported issues"

* tag 'driver-core-5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (68 commits)
  tracing: Remove unnecessary DEBUG_FS dependency
  of: property: Add device link support for interrupt-parent, dmas and -gpio(s)
  debugfs: Fix !DEBUG_FS debugfs_create_automount
  of: property: Add device link support for "iommu-map"
  of: property: Fix the semantics of of_is_ancestor_of()
  i2c: of: Populate fwnode in of_i2c_get_board_info()
  drivers: base: Fix Kconfig indentation
  firmware_loader: Fix labels with comma for builtin firmware
  driver core: Allow device link operations inside sync_state()
  driver core: platform: Declare ret variable only once
  cpu-topology: declare parse_acpi_topology in <linux/arch_topology.h>
  crypto: hisilicon: no need to check return value of debugfs_create functions
  driver core: platform: use the correct callback type for bus_find_device
  firmware_class: make firmware caching configurable
  driver core: Clarify documentation for fwnode_operations.add_links()
  mailbox: tegra: Fix superfluous IRQ error message
  net: caif: Fix debugfs on 64-bit platforms
  mac80211: Use debugfs_create_xul() helper
  media: c8sectpfe: no need to check return value of debugfs_create functions
  of: property: Add device link support for iommus, mboxes and io-channels
  ...
2019-11-27 11:06:20 -08:00
Dan Carpenter via samba-technical
68464b88cc CIFS: fix a white space issue in cifs_get_inode_info()
We accidentally messed up the indenting on this if statement.

Fixes: 16c696a6c300 ("CIFS: refactor cifs_get_inode_info()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-11-27 11:31:49 -06:00
Darrick J. Wong
8feb4732ff xfs: allow parent directory scans to be interrupted with fatal signals
Allow a fatal signal to interrupt us when we're scanning a directory to
verify a parent pointer.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-11-27 08:23:14 -08:00
Krzysztof Kozlowski
8d66fcb748 fuse: fix Kconfig indentation
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
	$ sed -e 's/^        /\t/' -i */Kconfig

Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-27 09:35:20 +01:00
Miklos Szeredi
f1ebdeffc6 fuse: fix leak of fuse_io_priv
exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.

The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv.  This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).

In this case the error value is propagated via io->err, so return success
to not confuse callers.

This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14

Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed7c ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-11-27 09:33:49 +01:00
Linus Torvalds
168829ad09 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - A comprehensive rewrite of the robust/PI futex code's exit handling
     to fix various exit races. (Thomas Gleixner et al)

   - Rework the generic REFCOUNT_FULL implementation using
     atomic_fetch_* operations so that the performance impact of the
     cmpxchg() loops is mitigated for common refcount operations.

     With these performance improvements the generic implementation of
     refcount_t should be good enough for everybody - and this got
     confirmed by performance testing, so remove ARCH_HAS_REFCOUNT and
     REFCOUNT_FULL entirely, leaving the generic implementation enabled
     unconditionally. (Will Deacon)

   - Other misc changes, fixes, cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  lkdtm: Remove references to CONFIG_REFCOUNT_FULL
  locking/refcount: Remove unused 'refcount_error_report()' function
  locking/refcount: Consolidate implementations of refcount_t
  locking/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
  locking/refcount: Move saturation warnings out of line
  locking/refcount: Improve performance of generic REFCOUNT_FULL code
  locking/refcount: Move the bulk of the REFCOUNT_FULL implementation into the <linux/refcount.h> header
  locking/refcount: Remove unused refcount_*_checked() variants
  locking/refcount: Ensure integer operands are treated as signed
  locking/refcount: Define constants for saturation and max refcount values
  futex: Prevent exit livelock
  futex: Provide distinct return value when owner is exiting
  futex: Add mutex around futex exit
  futex: Provide state handling for exec() as well
  futex: Sanitize exit state handling
  futex: Mark the begin of futex exit explicitly
  futex: Set task::futex_state to DEAD right after handling futex exit
  futex: Split futex_mm_release() for exit/exec
  exit/exec: Seperate mm_release()
  futex: Replace PF_EXITPIDONE with a state
  ...
2019-11-26 16:02:40 -08:00
Linus Torvalds
1ae78780ed Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Dynamic tick (nohz) updates, perhaps most notably changes to force
     the tick on when needed due to lengthy in-kernel execution on CPUs
     on which RCU is waiting.

   - Linux-kernel memory consistency model updates.

   - Replace rcu_swap_protected() with rcu_prepace_pointer().

   - Torture-test updates.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  security/safesetid: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/sched: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/netfilter: Replace rcu_swap_protected() with rcu_replace_pointer()
  net/core: Replace rcu_swap_protected() with rcu_replace_pointer()
  bpf/cgroup: Replace rcu_swap_protected() with rcu_replace_pointer()
  fs/afs: Replace rcu_swap_protected() with rcu_replace_pointer()
  drivers/scsi: Replace rcu_swap_protected() with rcu_replace_pointer()
  drm/i915: Replace rcu_swap_protected() with rcu_replace_pointer()
  x86/kvm/pmu: Replace rcu_swap_protected() with rcu_replace_pointer()
  rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
  rcu: Suppress levelspread uninitialized messages
  rcu: Fix uninitialized variable in nocb_gp_wait()
  rcu: Update descriptions for rcu_future_grace_period tracepoint
  rcu: Update descriptions for rcu_nocb_wake tracepoint
  rcu: Remove obsolete descriptions for rcu_barrier tracepoint
  rcu: Ensure that ->rcu_urgent_qs is set before resched IPI
  workqueue: Convert for_each_wq to use built-in list check
  rcu: Several rcu_segcblist functions can be static
  rcu: Remove unused function hlist_bl_del_init_rcu()
  Documentation: Rename rcu_node_context_switch() to rcu_note_context_switch()
  ...
2019-11-26 15:42:43 -08:00
Linus Torvalds
77a05940ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Make kcpustat vtime aware (Frederic Weisbecker)

   - Rework the CFS load_balance() logic (Vincent Guittot)

   - Misc cleanups, smaller enhancements, fixes.

  The load-balancing rework is the most intrusive change: it replaces
  the old heuristics that have become less meaningful after the
  introduction of the PELT metrics, with a grounds-up load-balancing
  algorithm.

  As such it's not really an iterative series, but replaces the old
  load-balancing logic with the new one. We hope there are no
  performance regressions left - but statistically it's highly probable
  that there *is* going to be some workload that is hurting from these
  chnages. If so then we'd prefer to have a look at that workload and
  fix its scheduling, instead of reverting the changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  rackmeter: Use vtime aware kcpustat accessor
  leds: Use all-in-one vtime aware kcpustat accessor
  cpufreq: Use vtime aware kcpustat accessors for user time
  procfs: Use all-in-one vtime aware kcpustat accessor
  sched/vtime: Bring up complete kcpustat accessor
  sched/cputime: Support other fields on kcpustat_field()
  sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
  sched/fair: Add comments for group_type and balancing at SD_NUMA level
  sched/fair: Fix rework of find_idlest_group()
  sched/uclamp: Fix overzealous type replacement
  sched/Kconfig: Fix spelling mistake in user-visible help text
  sched/core: Further clarify sched_class::set_next_task()
  sched/fair: Use mul_u32_u32()
  sched/core: Simplify sched_class::pick_next_task()
  sched/core: Optimize pick_next_task()
  sched/core: Make pick_next_task_idle() more consistent
  sched/fair: Better document newidle_balance()
  leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  ...
2019-11-26 15:23:14 -08:00
Jens Axboe
e944475e69 io_uring: make poll->wait dynamically allocated
In the quest to bring io_kiocb down to 3 cachelines, this one does
the trick. Make the wait_queue_entry for the poll command come out
of kmalloc instead of embedding it in struct io_poll_iocb, as the
latter is the largest member of io_kiocb. Once we trim this down a
bit, we're back at a healthy 192 bytes for struct io_kiocb.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Jens Axboe
6206f0e180 io-wq: shrink io_wq_work a bit
Currently we're using 40 bytes for the io_wq_work structure, and 16 of
those is the doubly link list node. We don't need doubly linked lists,
we always add to tail to keep things ordered, and any other use case
is list traversal with deletion. For the deletion case, we can easily
support any node deletion by keeping track of the previous entry.

This shrinks io_wq_work to 32 bytes, and subsequently io_kiock from
io_uring to 216 to 208 bytes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Jann Horn
3fc50ab559 io-wq: fix handling of NUMA node IDs
There are several things that can go wrong in the current code on NUMA
systems, especially if not all nodes are online all the time:

 - If the identifiers of the online nodes do not form a single contiguous
   block starting at zero, wq->wqes will be too small, and OOB memory
   accesses will occur e.g. in the loop in io_wq_create().
 - If a node comes online between the call to num_online_nodes() and the
   for_each_node() loop in io_wq_create(), an OOB write will occur.
 - If a node comes online between io_wq_create() and io_wq_enqueue(), a
   lookup is performed for an element that doesn't exist, and an OOB read
   will probably occur.

Fix it by:

 - using nr_node_ids instead of num_online_nodes() for the allocation size;
   nr_node_ids is calculated by setup_nr_node_ids() to be bigger than the
   highest node ID that could possibly come online at some point, even if
   those nodes' identifiers are not a contiguous block
 - creating workers for all possible CPUs, not just all online ones

This is basically what the normal workqueue code also does, as far as I can
tell.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Jann Horn
ad6e005ca6 io_uring: use kzalloc instead of kcalloc for single-element allocations
These allocations are single-element allocations, so don't use the array
allocation wrapper for them.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Pavel Begunkov
7d00916555 io_uring: cleanup io_import_fixed()
Clean io_import_fixed() call site and make it return proper type.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Pavel Begunkov
cf6fd4bd55 io_uring: inline struct sqe_submit
There is no point left in keeping struct sqe_submit. Inline it
into struct io_kiocb, so any req->submit.field is now just req->field

- moves initialisation of ring_file into io_get_req()
- removes duplicated req->sequence.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Pavel Begunkov
cc42e0ac17 io_uring: store timeout's sqe->off in proper place
Timeouts' sequence offset (i.e. sqe->off) is stored in
req->submit.sequence under a false name. Keep it in timeout.data
instead. The unused space for sequence will be reclaimed in the
following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-26 15:02:56 -07:00
Linus Torvalds
2be7d348fe Revert "vfs: properly and reliably lock f_pos in fdget_pos()"
This reverts commit 0be0ee7181.

I was hoping it would be benign to switch over entirely to FMODE_STREAM,
and we'd have just a couple of small fixups we'd need, but it looks like
we're not quite there yet.

While it worked fine on both my desktop and laptop, they are fairly
similar in other respects, and run mostly the same loads.  Kenneth
Crudup reports that it seems to break both his vmware installation and
the KDE upower service.  In both cases apparently leading to timeouts
due to waitinmg for the f_pos lock.

There are a number of character devices in particular that definitely
want stream-like behavior, but that currently don't get marked as
streams, and as a result get the exclusion between concurrent
read()/write() on the same file descriptor.  Which doesn't work well for
them.

The most obvious example if this is /dev/console and /dev/tty, which use
console_fops and tty_fops respectively (and ptmx_fops for the pty master
side).  It may be that it's just this that causes problems, but we
clearly weren't ready yet.

Because there's a number of other likely common cases that don't have
llseek implementations and would seem to act as stream devices:

  /dev/fuse		(fuse_dev_operations)
  /dev/mcelog		(mce_chrdev_ops)
  /dev/mei0		(mei_fops)
  /dev/net/tun		(tun_fops)
  /dev/nvme0		(nvme_dev_fops)
  /dev/tpm0		(tpm_fops)
  /proc/self/ns/mnt	(ns_file_operations)
  /dev/snd/pcm*		(snd_pcm_f_ops[])

and while some of these could be trivially automatically detected by the
vfs layer when the character device is opened by just noticing that they
have no read or write operations either, it often isn't that obvious.

Some character devices most definitely do use the file position, even if
they don't allow seeking: the firmware update code, for example, uses
simple_read_from_buffer() that does use f_pos, but doesn't allow seeking
back and forth.

We'll revisit this when there's a better way to detect the problem and
fix it (possibly with a coccinelle script to do more of the FMODE_STREAM
annotations).

Reported-by: Kenneth R. Crudup <kenny@panix.com>
Cc: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-26 11:34:06 -08:00
Johannes Thumshirn
88cfd30e18 iomap: remove unneeded variable in iomap_dio_rw()
The 'start' variable indicates the start of a filemap and is set to the
iocb's position, which we have already cached as 'pos', upon function
entry.

'pos' is used as a cursor indicating the current position and updated
later in iomap_dio_rw(), but not before the last use of 'start'.

Remove 'start' as it's synonym for 'pos' before we're entering the loop
calling iomapp_apply().

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-26 09:28:47 -08:00
Jan Kara
f550ee9b85 iomap: Do not create fake iter in iomap_dio_bio_actor()
iomap_dio_bio_actor() copies iter to a local variable and then limits it
to a file extent we have mapped. When IO is submitted,
iomap_dio_bio_actor() advances the original iter while the copied iter
is advanced inside bio_iov_iter_get_pages(). This logic is non-obvious
especially because both iters still point to same shared structures
(such as pipe info) so if iov_iter_advance() changes anything in the
shared structure, this scheme breaks. Let's just truncate and reexpand
the original iter as needed instead of playing games with copying iters
and keeping them in sync.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-26 09:28:47 -08:00
Linus Torvalds
436b2a8039 Printk changes for 5.5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl3bpjoACgkQUqAMR0iA
 lPJJDA/+IJT4YCRp2TwV2jvIs0QzvXZrzEsxgCLibLE85mYTJgoQBD3W1bH2eyjp
 T/9U0Zh5PGr/84cHd4qiMxzo+5Olz930weG59NcO4RJBSr671aRYs5tJqwaQAZDR
 wlwaob5S28vUmjPxKulvxv6V3FdI79ZE9xrCOCSTQvz4iCLsGOu+Dn/qtF64pImX
 M/EXzPMBrByiQ8RTM4Ege8JoBqiCZPDG9GR3KPXIXQwEeQgIoeYxwRYakxSmSzz8
 W8NduFCbWavg/yHhghHikMiyOZeQzAt+V9k9WjOBTle3TGJegRhvjgI7508q3tXe
 jQTMGATBOPkIgFaZz7eEn/iBa3jZUIIOzDY93RYBmd26aBvwKLOma/Vkg5oGYl0u
 ZK+CMe+/xXl7brQxQ6JNsQhbSTjT+746LvLJlCvPbbPK9R0HeKNhsdKpGY3ugnmz
 VAnOFIAvWUHO7qx+J+EnOo5iiPpcwXZj4AjrwVrs/x5zVhzwQ+4DSU6rbNn0O1Ak
 ELrBqCQkQzh5kqK93jgMHeWQ9EOUp1Lj6PJhTeVnOx2x8tCOi6iTQFFrfdUPlZ6K
 2DajgrFhti4LvwVsohZlzZuKRm5EuwReLRSOn7PU5qoSm5rcouqMkdlYG/viwyhf
 mTVzEfrfemrIQOqWmzPrWEXlMj2mq8oJm4JkC+jJ/+HsfK4UU8I=
 =QCEy
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow to print symbolic error names via new %pe modifier.

 - Use pr_warn() instead of the remaining pr_warning() calls. Fix
   formatting of the related lines.

 - Add VSPRINTF entry to MAINTAINERS.

* tag 'printk-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (32 commits)
  checkpatch: don't warn about new vsprintf pointer extension '%pe'
  MAINTAINERS: Add VSPRINTF
  tools lib api: Renaming pr_warning to pr_warn
  ASoC: samsung: Use pr_warn instead of pr_warning
  lib: cpu_rmap: Use pr_warn instead of pr_warning
  trace: Use pr_warn instead of pr_warning
  dma-debug: Use pr_warn instead of pr_warning
  vgacon: Use pr_warn instead of pr_warning
  fs: afs: Use pr_warn instead of pr_warning
  sh/intc: Use pr_warn instead of pr_warning
  scsi: Use pr_warn instead of pr_warning
  platform/x86: intel_oaktrail: Use pr_warn instead of pr_warning
  platform/x86: asus-laptop: Use pr_warn instead of pr_warning
  platform/x86: eeepc-laptop: Use pr_warn instead of pr_warning
  oprofile: Use pr_warn instead of pr_warning
  of: Use pr_warn instead of pr_warning
  macintosh: Use pr_warn instead of pr_warning
  idsn: Use pr_warn instead of pr_warning
  ide: Use pr_warn instead of pr_warning
  crypto: n2: Use pr_warn instead of pr_warning
  ...
2019-11-25 19:40:40 -08:00
Linus Torvalds
1b96a41b42 Merge branch 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "There are several notable changes here:

   - Single thread migrating itself has been optimized so that it
     doesn't need threadgroup rwsem anymore.

   - Freezer optimization to avoid unnecessary frozen state changes.

   - cgroup ID unification so that cgroup fs ino is the only unique ID
     used for the cgroup and can be used to directly look up live
     cgroups through filehandle interface on 64bit ino archs. On 32bit
     archs, cgroup fs ino is still the only ID in use but it is only
     unique when combined with gen.

   - selftest and other changes"

* 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (24 commits)
  writeback: fix -Wformat compilation warnings
  docs: cgroup: mm: Fix spelling of "list"
  cgroup: fix incorrect WARN_ON_ONCE() in cgroup_setup_root()
  cgroup: use cgrp->kn->id as the cgroup ID
  kernfs: use 64bit inos if ino_t is 64bit
  kernfs: implement custom exportfs ops and fid type
  kernfs: combine ino/id lookup functions into kernfs_find_and_get_node_by_id()
  kernfs: convert kernfs_node->id from union kernfs_node_id to u64
  kernfs: kernfs_find_and_get_node_by_ino() should only look up activated nodes
  kernfs: use dumber locking for kernfs_find_and_get_node_by_ino()
  netprio: use css ID instead of cgroup ID
  writeback: use ino_t for inodes in tracepoints
  kernfs: fix ino wrap-around detection
  kselftests: cgroup: Avoid the reuse of fd after it is deallocated
  cgroup: freezer: don't change task and cgroups status unnecessarily
  cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency
  cgroup: remove cgroup_enable_task_cg_lists() optimization
  cgroup: pids: use atomic64_t for pids->limit
  selftests: cgroup: Run test_core under interfering stress
  selftests: cgroup: Add task migration tests
  ...
2019-11-25 19:23:46 -08:00
Hrvoje Zeba
8042d6ce8c io_uring: remove superfluous check for sqe->off in io_accept()
This field contains a pointer to addrlen and checking to see if it's set
returns -EINVAL if the caller sets addr & addrlen pointers.

Fixes: 17f2fe35d0 ("io_uring: add support for IORING_OP_ACCEPT")
Signed-off-by: Hrvoje Zeba <zeba.hrvoje@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Jens Axboe
181e448d87 io_uring: async workers should inherit the user creds
If we don't inherit the original task creds, then we can confuse users
like fuse that pass creds in the request header. See link below on
identical aio issue.

Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#u
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Jens Axboe
576a347b7a io-wq: have io_wq_create() take a 'data' argument
We currently pass in 4 arguments outside of the bounded size. In
preparation for adding one more argument, let's bundle them up in
a struct to make it more readable.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
311ae9e159 io_uring: fix dead-hung for non-iter fixed rw
Read/write requests to devices without implemented read/write_iter
using fixed buffers can cause general protection fault, which totally
hangs a machine.

io_import_fixed() initialises iov_iter with bvec, but loop_rw_iter()
accesses it as iovec, dereferencing random address.

kmap() page by page in this case

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Jens Axboe
f8e85cf255 io_uring: add support for IORING_OP_CONNECT
This allows an application to call connect() in an async fashion. Like
other opcodes, we first try a non-blocking connect, then punt to async
context if we have to.

Note that we can still return -EINPROGRESS, and in that case the caller
should use IORING_OP_POLL_ADD to do an async wait for completion of the
connect request (just like for regular connect(2), except we can do it
async here too).

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Jens Axboe
c4a2ed72c9 io_uring: only return -EBUSY for submit on non-flushed backlog
We return -EBUSY on submit when we have a CQ ring overflow backlog, but
that can be a bit problematic if the application is using pure userspace
poll of the CQ ring. For that case, if the ring briefly overflowed and
we have pending entries in the backlog, the submit flushes the backlog
successfully but still returns -EBUSY. If we're able to fully flush the
CQ ring backlog, let the submission proceed.

Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
f9bd67f69a io_uring: only !null ptr to io_issue_sqe()
Pass only non-null @nxt to io_issue_sqe() and handle it at the caller's
side. And propagate it.

- kiocb_done() is only called from io_read() and io_write(), which are
only called from io_issue_sqe(), so it's @nxt != NULL

- io_put_req_find_next() is called either with explicitly non-null local
nxt, or from one of the functions in io_issue_sqe() switch (or their
callees).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
b18fdf71e0 io_uring: simplify io_req_link_next()
"if (nxt)" is always true, as it was checked in the while's condition.
io_wq_current_is_worker() is unnecessary, as non-async callers don't
pass nxt, so io_queue_async_work() will be called for them anyway.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
944e58bfed io_uring: pass only !null to io_req_find_next()
Make io_req_find_next() and io_req_link_next() to accept only non-null
nxt, and handle it in callers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
70cf9f3270 io_uring: remove io_free_req_find_next()
There is only one one-liner user of io_free_req_find_next(). Inline it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:11 -07:00
Pavel Begunkov
9835d6fafb io_uring: add likely/unlikely in io_get_sqring()
The number of SQEs to submit is specified by a user, so io_get_sqring()
in most of the cases succeeds. Hint compilers about that.

Checking ASM genereted by gcc 9.2.0 for x64, there is one branch
misprediction.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Pavel Begunkov
d732447fed io_uring: rename __io_submit_sqe()
__io_submit_sqe() is issuing requests, so call it as
such. Moreover, it ends by calling io_iopoll_req_issued().

Rename it and make terminology clearer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Jens Axboe
915967f69c io_uring: improve trace_io_uring_defer() trace point
We don't have shadow requests anymore, so get rid of the shadow
argument. Add the user_data argument, as that's often useful to easily
match up requests, instead of having to look at request pointers.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Pavel Begunkov
1b4a51b6d0 io_uring: drain next sqe instead of shadowing
There's an issue with the shadow drain logic in that we drop the
completion lock after deciding to defer a request, then re-grab it later
and assume that the state is still the same. In the mean time, someone
else completing a request could have found and issued it. This can cause
a stall in the queue, by having a shadow request inserted that nobody is
going to drain.

Additionally, if we fail allocating the shadow request, we simply ignore
the drain.

Instead of using a shadow request, defer the next request/link instead.
This also has the following advantages:

- removes semi-duplicated code
- doesn't allocate memory for shadows
- works better if only the head marked for drain
- doesn't need complex synchronisation

On the flip side, it removes the shadow->seq ==
last_drain_in_in_link->seq optimization. That shouldn't be a common
case, and can always be added back, if needed.

Fixes: 4fe2c96315 ("io_uring: add support for link with drain")
Cc: Jackie Liu <liuyun01@kylinos.cn>
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Jens Axboe
b76da70fc3 io_uring: close lookup gap for dependent next work
When we find new work to process within the work handler, we queue the
linked timeout before we have issued the new work. This can be
problematic for very short timeouts, as we have a window where the new
work isn't visible.

Allow the work handler to store a callback function for this in the work
item, and flag it with IO_WQ_WORK_CB if the caller has done so. If that
is set, then io-wq will call the callback when it has setup the new work
item.

Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:10 -07:00
Jens Axboe
4d7dd46297 io_uring: allow finding next link independent of req reference count
We currently try and start the next link when we put the request, and
only if we were going to free it. This means that the optimization to
continue executing requests from the same context often fails, as we're
not putting the final reference.

Add REQ_F_LINK_NEXT to keep track of this, and allow io_uring to find the
next request more efficiently.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Jens Axboe
eb065d301e io_uring: io_allocate_scq_urings() should return a sane state
We currently rely on the ring destroy on cleaning things up in case of
failure, but io_allocate_scq_urings() can leave things half initialized
if only parts of it fails.

Be nice and return with either everything setup in success, or return an
error with things nicely cleaned up.

Reported-by: syzbot+0d818c0d39399188f393@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Pavel Begunkov
bbad27b2f6 io_uring: Always REQ_F_FREE_SQE for allocated sqe
Always mark requests with allocated sqe and deallocate it in
__io_free_req(). It's easier to follow and doesn't add edge cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Jens Axboe
5d960724b0 io_uring: io_fail_links() should only consider first linked timeout
We currently clear the linked timeout field if we cancel such a timeout,
but we should only attempt to cancel if it's the first one we see.
Others should simply be freed like other requests, as they haven't
been started yet.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Pavel Begunkov
09fbb0a83e io_uring: Fix leaking linked timeouts
let have a dependant link: REQ -> LINK_TIMEOUT -> LINK_TIMEOUT

1. submission stage: submission references for REQ and LINK_TIMEOUT
are dropped. So, references respectively (1,1,2)

2. io_put(REQ) + FAIL_LINKS stage: calls io_fail_links(), which for all
linked timeouts will call cancel_timeout() and drop 1 reference.
So, references after: (0,0,1). That's a leak.

Make it treat only the first linked timeout as such, and pass others
through __io_double_put_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Pavel Begunkov
f70193d6d8 io_uring: remove redundant check
Pass any IORING_OP_LINK_TIMEOUT request further, where it will
eventually fail in io_issue_sqe().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Pavel Begunkov
d3b35796b1 io_uring: break links for failed defer
If io_req_defer() failed, it needs to cancel a dependant link.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:06 -07:00
Dan Carpenter
b2e9c7d64b io-wq: remove extra space characters
These lines are indented an extra space character.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:05 -07:00
Jens Axboe
b60fda6000 io-wq: wait for io_wq_create() to setup necessary workers
We currently have a race where if setup is really slow, we can be
calling io_wq_destroy() before we're done setting up. This will cause
the caller to get stuck waiting for the manager to set things up, but
the manager already exited.

Fix this by doing a sync setup of the manager. This also fixes the case
where if we failed creating workers, we'd also get stuck.

In practice this race window was really small, as we already wait for
the manager to start. Hence someone would have to call io_wq_destroy()
after the task has started, but before it started the first loop. The
reported test case forked tons of these, which is why it became an
issue.

Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
Fixes: 771b53d033 ("io-wq: small threadpool implementation for io_uring")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:05 -07:00
Jens Axboe
fba38c272a io_uring: request cancellations should break links
We currently don't explicitly break links if a request is cancelled, but
we should. Add explicitly link breakage for all types of request
cancellations that we support.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:05 -07:00
Jens Axboe
b0dd8a4126 io_uring: correct poll cancel and linked timeout expiration completion
Currently a poll request fills a completion entry of 0, even if it got
cancelled. This is odd, and it makes it harder to support with chains.
Ensure that it returns -ECANCELED in the completions events if it got
cancelled, and furthermore ensure that the linked timeout that triggered
it completes with -ETIME if we did indeed trigger the completions
through a timeout.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-25 19:56:05 -07:00