Go to file
Sagi Grimberg 9b942cb2d9 nvme-tcp: fix possible use-after-completion
commit 825619b09ad351894d2c6fb6705f5b3711d145c7 upstream.

Commit db5ad6b7f8 ("nvme-tcp: try to send request in queue_rq
context") added a second context that may perform a network send.
This means that now RX and TX are not serialized in nvme_tcp_io_work
and can run concurrently.

While there is correct mutual exclusion in the TX path (where
the send_mutex protect the queue socket send activity) RX activity,
and more specifically request completion may run concurrently.

This means we must guarantee that any mutation of the request state
related to its lifetime, bytes sent must not be accessed when a completion
may have possibly arrived back (and processed).

The race may trigger when a request completion arrives, processed
_and_ reused as a fresh new request, exactly in the (relatively short)
window between the last data payload sent and before the request iov_iter
is advanced.

Consider the following race:
1. 16K write request is queued
2. The nvme command and the data is sent to the controller (in-capsule
   or solicited by r2t)
3. After the last payload is sent but before the req.iter is advanced,
   the controller sends back a completion.
4. The completion is processed, the request is completed, and reused
   to transfer a new request (write or read)
5. The new request is queued, and the driver reset the request parameters
   (nvme_tcp_setup_cmd_pdu).
6. Now context in (2) resumes execution and advances the req.iter

==> use-after-completion as this is already a new request.

Fix this by making sure the request is not advanced after the last
data payload send, knowing that a completion may have arrived already.

An alternative solution would have been to delay the request completion
or state change waiting for reference counting on the TX path, but besides
adding atomic operations to the hot-path, it may present challenges in
multi-stage R2T scenarios where a r2t handler needs to be deferred to
an async execution.

Reported-by: Narayan Ayalasomayajula <narayan.ayalasomayajula@wdc.com>
Tested-by: Anil Mishra <anil.mishra@wdc.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-26 12:06:52 +02:00
arch powerpc: Fix early setup to make early_ioremap() work 2021-05-26 12:06:50 +02:00
block blk-mq: Swap two calls in blk_mq_exit_queue() 2021-05-19 10:13:14 +02:00
certs certs: Fix blacklist flag type confusion 2021-03-04 11:37:59 +01:00
crypto async_xor: increase src_offs when dropping destination page 2021-05-14 09:49:59 +02:00
Documentation tweewide: Fix most Shebang lines 2021-05-22 11:40:55 +02:00
drivers nvme-tcp: fix possible use-after-completion 2021-05-26 12:06:52 +02:00
fs cifs: fix memory leak in smb2_copychunk_range 2021-05-26 12:06:50 +02:00
include virtio_net: Do not pull payload in skb->head 2021-05-22 11:40:52 +02:00
init seccomp: Fix CONFIG tests for Seccomp_filters 2021-05-14 09:50:24 +02:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal 2021-05-26 12:06:50 +02:00
lib lib: stackdepot: turn depot_lock spinlock to raw_spinlock 2021-05-22 11:40:55 +02:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm/hugetlb: fix F_SEAL_FUTURE_WRITE 2021-05-19 10:13:11 +02:00
net ipv6: remove extra dev_hold() for fallback tunnels 2021-05-22 11:40:55 +02:00
samples samples/bpf: Fix broken tracex1 due to kprobe argument change 2021-05-19 10:12:57 +02:00
scripts scripts: switch explicitly to Python 3 2021-05-22 11:40:55 +02:00
security KEYS: trusted: Fix memory leak on object td 2021-05-19 10:12:50 +02:00
sound ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx 2021-05-26 12:06:52 +02:00
tools tools/testing/selftests/exec: fix link error 2021-05-26 12:06:49 +02:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt kvm: exit halt polling on need_resched() as well 2021-05-19 10:13:12 +02:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore kbuild: generate Module.symvers only when vmlinux exists 2021-05-19 10:12:59 +02:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS f2fs: move ioctl interface definitions to separated file 2021-05-19 10:13:00 +02:00
Makefile Linux 5.10.39 2021-05-22 11:40:55 +02:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.