If the server tells us that out layoutreturn raced with another layout
update, then we must ensure that the new layout segments are not in use
before we resend with an updated layout stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
I found that injecting disconnects with v4.18-rc resulted in
random failures of the multi-threaded git regression test.
The root cause appears to be that, after a reconnect, the
RPC/RDMA transport is waking pending RPCs before the transport has
posted enough Receive buffers to receive the Replies. If a Reply
arrives before enough Receive buffers are posted, the connection
is dropped. A few connection drops happen in quick succession as
the client and server struggle to regain credit synchronization.
This regression was introduced with commit 7c8d9e7c88 ("xprtrdma:
Move Receive posting to Receive handler"). The client is supposed to
post a single Receive when a connection is established because
it's not supposed to send more than one RPC Call before it gets
a fresh credit grant in the first RPC Reply [RFC 8166, Section
3.3.3].
Unfortunately there appears to be a longstanding bug in the Linux
client's credit accounting mechanism. On connect, it simply dumps
all pending RPC Calls onto the new connection. It's possible it has
done this ever since the RPC/RDMA transport was added to the kernel
ten years ago.
Servers have so far been tolerant of this bad behavior. Currently no
server implementation ever changes its credit grant over reconnects,
and servers always repost enough Receives before connections are
fully established.
The Linux client implementation used to post a Receive before each
of these Calls. This has covered up the flooding send behavior.
I could try to correct this old bug so that the client sends exactly
one RPC Call and waits for a Reply. Since we are so close to the
next merge window, I'm going to instead provide a simple patch to
post enough Receives before a reconnect completes (based on the
number of credits granted to the previous connection).
The spurious disconnects will be gone, but the client will still
send multiple RPC Calls immediately after a reconnect.
Addressing the latter problem will wait for a merge window because
a) I expect it to be a large change requiring lots of testing, and
b) obviously the Linux client has interoperated successfully since
day zero while still being broken.
Fixes: 7c8d9e7c88 ("xprtrdma: Move Receive posting to ... ")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Remove trailing whitespace and blank line at EOF
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events. On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.
The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure. After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.
The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210 ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If there is an error during processing of a callback message, it leads
to refrence leak on the client structure and eventually an unclean
superblock.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Smatch complains that "num" can be uninitialized when kstrtoul() returns
-ERANGE. It's true enough, but basically harmless in this case.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
kstrtoul() can return -ERANGE so Smatch complains that "num" can be
uninitialized. We check that it's within bounds so it's not a huge
deal.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The existing rpc_print_iostats has a few shortcomings. First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats. Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.
Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.
Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0
even though many ops would go over the wire. The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client. Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.
Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Add a helper function to add the metrics in two rpc_iostats structures.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Refactor the output of the metrics for one RPC op into an internal
function. No functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This turns rpc_auth_create_args into a const as it gets passed through the
auth stack.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
"dev->nr_children" is the number of children which were parsed
successfully in bl_parse_stripe(). It could be all of them and then, in
that case, it is equal to v->stripe.volumes_count. Either way, the >
should be >= so that we don't go beyond the end of what we're supposed
to.
Fixes: 5c83746a0c ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 530ea42192 ("nfs: Referrals should use the same proto setting
as their parent") encloses the fix with #ifdef CONFIG_SUNRPC_XPRT_RDMA.
CONFIG_SUNRPC_XPRT_RDMA is a tristate option, so it should be tested
with #if IS_ENABLED().
Fixes: 530ea42192 ("nfs: Referrals should use the same proto setting as their parent")
Reported-by: Helen Chao <helen.chao@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Bill Baker <bill.baker@oracle.com>
Signed-off-by: Calum Mackay <calum.mackay@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When reclaiming a delegation via CLAIM_PREVIOUS open, the server can
indicate that the delegation has been recalled since it was issued by
setting the "recalled" flag in the delegation.
Ensure that we respect the flag by initiating a delegation return when
it is set.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Use new return type vm_fault_t for fault handler
in struct vm_operations_struct. For now, this is
just documenting that the function returns a
VM_FAULT value rather than an errno. Once all
instances are converted, vm_fault_t will become
a distinct type.
see commit 1c8f422059 ("mm: change return type to
vm_fault_t") for reference.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Even though the caller of nfs_idmap_prepare_message() checks return
code in their side but it's better to add an error check for match_int()
so that we can avoid unnecessary operations when bad int arg is
detected.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Return -ESTALE to force a lookup when the file has no more links
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
execute_ok() will only check the mode bits if the object is not a
directory, so we don't need to revalidate the attributes in that case.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When nfs_update_inode() sets NFS_INO_INVALID_ACCESS it is a sign that
we want to revalidate the access cache, not the inode attributes.
In fact we only want to revalidate here if we see that the mode bits
are invalid, so check for NFS_INO_INVALID_OTHER instead.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the writes are being rescheduled due to a pNFS error, then we really
want to immediately start a new flush. The O_DIRECT code already does
this, so we only need to worry about buffered writes.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the client is sending a layoutget, but the server issues a callback
to recall what it thinks may be an outstanding layout, then we may find
an uninitialised layout attached to the inode due to the layoutget.
In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT
rather than NFS4ERR_DELAY, as the latter can end up deadlocking.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Even if the results of the permissions checks failed, we should parse
the results of the layout on open call so that we can return the
layout if required.
Note that we also want to ignore the sequence counter for whether or not
a layout recall occurred. If the recall pertained to our OPEN, then the
callback will know, and will attempt to wait for us to finih processing
anyway.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT
then we need to wait for all outstanding layoutget calls to complete
before we can send a new one.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a layout segment is carrying layoutstats or layout error information,
then we always want to return it rather than using a forgetful model.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If a layout has been recalled, then we should fire off a layoutreturn as
soon as all the layout segments that match the recall have been retired.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
RFC5661 doesn't state directly that the client should update the layout
stateid if it returns NFS4ERR_NOMATCHING_LAYOUT in response to a recall,
however it does state that this error will "cleanly indicate completion"
on par with returning the layout. For this reason, we assume that the
client should update the layout stateid. The Linux pNFS server definitely
does expect this behaviour.
However, if the client replies NFS4ERR_DELAY, then it is stating that
the recall was not processed, so it would be very wrong to update the
layout stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If there are layout segments that are marked for return, then we need
to ensure that pnfs_mark_matching_lsegs_return() does not just
silently discard them, but it should tell the caller that there is a
layoutreturn scheduled.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Here are a number of USB fixes and new device ids for 4.18-rc7.
The largest number are a bunch of gadget driver fixes that got delayed
in being submitted earlier due to vacation schedules, but nothing really
huge is present in them. There are some new device ids and some PHY
driver fixes that were connected to some USB ones. Full details are in
the shortlog.
All have been in linux-next for a while with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW1nXnQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoWACfTbc0TF6u9hNALIS9nsgLxevZLjYAnA5RJ12y
TTBeXMIAvCUKILAXPQok
=spWc
-----END PGP SIGNATURE-----
Merge tag 'usb-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of USB fixes and new device ids for 4.18-rc7.
The largest number are a bunch of gadget driver fixes that got delayed
in being submitted earlier due to vacation schedules, but nothing
really huge is present in them. There are some new device ids and some
PHY driver fixes that were connected to some USB ones. Full details
are in the shortlog.
All have been in linux-next for a while with no reported issues"
* tag 'usb-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (28 commits)
usb: core: handle hub C_PORT_OVER_CURRENT condition
usb: xhci: Fix memory leak in xhci_endpoint_reset()
usb: typec: tcpm: Fix sink PDO starting index for PPS APDO selection
usb: gadget: f_fs: Only return delayed status when len is 0
usb: gadget: f_uac2: fix endianness of 'struct cntrl_*_lay3'
usb: dwc2: Fix inefficient copy of unaligned buffers
usb: dwc2: Fix DMA alignment to start at allocated boundary
usb: dwc3: rockchip: Fix PHY documentation links.
tools: usb: ffs-test: Fix build on big endian systems
usb: gadget: aspeed: Workaround memory ordering issue
usb: dwc3: gadget: remove redundant variable maxpacket
usb: dwc2: avoid NULL dereferences
usb/phy: fix PPC64 build errors in phy-fsl-usb.c
usb: dwc2: host: do not delay retries for CONTROL IN transfers
usb: gadget: u_audio: protect stream runtime fields with stream spinlock
usb: gadget: u_audio: remove cached period bytes value
usb: gadget: u_audio: remove caching of stream buffer parameters
usb: gadget: u_audio: update hw_ptr in iso_complete after data copied
usb: gadget: u_audio: fix pcm/card naming in g_audio_setup()
usb: gadget: f_uac2: fix error handling in afunc_bind (again)
...
Here are 3 small staging driver fixes for 4.18-rc7. One is a revert of
an earlier patch that turned out to be incorrect, one is a fix for the
speakup drivers, and the last a fix for the ks7010 driver to resolve a
regression.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW1nWRg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn3sgCfRfyR5eWd//zjOd+RPK5pgVmntpsAn2RO4/hF
3/QBUH8Pnd0fQDCE87w8
=T47D
-----END PGP SIGNATURE-----
Merge tag 'staging-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are three small staging driver fixes for 4.18-rc7.
One is a revert of an earlier patch that turned out to be incorrect,
one is a fix for the speakup drivers, and the last a fix for the
ks7010 driver to resolve a regression.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: speakup: fix wraparound in uaccess length check
staging: ks7010: call 'hostif_mib_set_request_int' instead of 'hostif_mib_set_request_bool'
Revert "staging:r8188eu: Use lib80211 to support TKIP"
This is a single driver core fix for 4.18-rc7. It partially reverts a
previous commit to resolve some reported issues.
It has been in linux-next for a while now with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCW1nVXg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylt+ACgnOBM409ab+U2z/0EZcJrd3fXjXcAn1XD1e5p
Y0fBLaJ/sB+VpyiRP/gK
=WxMr
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"This is a single driver core fix for 4.18-rc7. It partially reverts a
previous commit to resolve some reported issues.
It has been in linux-next for a while now with no reported issues"
* tag 'driver-core-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Partially revert "driver core: correct device's shutdown order"
Fix a recent ACPICA regression causing the AML parser to get confused
and fail in some situations involving incorrect AML in an ACPI table
(Erik Schmauss).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbWdSOAAoJEILEb/54YlRxkccP/3kxz1Fr4lmwDaXkReOR1Zxa
ShdoKKdruDXMXYGMnJWw+jilhbRb6fWQq6jWzmpAOnd0l98voGYFhdelMp6JhyNi
4ei5SnM5HML4QJPR3xUhXYhEUwDy4YZIaubjXE7w4IDeh33eJG4h/LO90nzlWb15
Ge08iv8cQmZIOFVrnn2T5OWlseMAelCWVh++v9W40ns2RjDOIisH3eELvM90zlZ+
7BtD3jzeSp7DHzvYYdlSToEsiIAccWhttSiKZ4ViyWu2ncnxcO+ymzKkvh9jxWFr
8olAP9xnV3ceS1/m4vkfuhQb2JmDPz5quChLrLAbe5LfZiA1zLBprBs+tdIYbn/7
jmWvRR7oD0rF8itBoGb6oheOWA1M40hfsH3s+zh+IJRWFDkiFLdV5c3YCZvPLLEc
zv4swo/29I06gDlWWUu1Tq1TmnDX4Z62ZGo7VSw0d+5UHgqWaUpaTLn99Fl+zlaJ
lGyPFlee9ou7NBbi1yzcqLrstv7q79IgLBL1khvrLJOFB8WksPKUW1EnIOB2TjBa
YRj11pdMsJnvo6I5h9YY/fVR6vJxFVcUSITMPjKq2vQ/bOIeELTryEvMSX3FW2y8
KRv39zrTx1tDY2l2NHbDgbEoLd7DZ6u0DVahkU4oSumBSnebqIvhz24shpWzo9nm
Pgt3WRd2siIMm7FtFF+2
=B+ti
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a recent ACPICA regression causing the AML parser to get confused
and fail in some situations involving incorrect AML in an ACPI table
(Erik Schmauss)"
* tag 'acpi-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: AML Parser: ignore dispatcher error status during table load
Fix up the recently introduced cpufreq driver for Qualcomm Kryo
processors by adding a terminating NULL entry to its table of
device IDs (YueHaibing).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJbWdP8AAoJEILEb/54YlRxuxUP/AutbdUh/t83FRRXVtNa+uBx
SAJ3SHn4TCJ4GKAYj5+/ca5ztj7eO9PilGdCk2gtSLW269nE8zvrm0tSjJ3boh0n
QFUu/y53ZwAreB2Xa5p52sRolCpvEybSD0Ha0nY70w3ShvEQ3ggwRG8AskpS8d1j
lw3CQ/r36EYO+B8oo9kvkNq8GUQloQJQK4+pYc6CNHi1I8LPhQtx11c0KJp6TsKm
4eTOqIooyFXvJwLMPUvebgv2CzcZFTEfQFNllrBmxufmfzgL84vAQuS2rsdQLyK6
B0WZwFoktNZ/xKg/U/cmMnpEyZMbmxSXJroCNxAQ8Qbn/VTnaQo72VouWYvIKcrD
fPIbSUJM6QIpvSfVkc9rHt04GS3XCewY83BnkREQOT2IZTu8I5IXso0fNYU4ks3K
8Fkda8eRBHbcVu2+UUo88etGIZqoTmy47A1sJnHnNS2Wvsgecm38y5bIP/rekDrt
fVu4A3mpyzwncXX+hB57NEhit2OKvzV8p8XNEBwZQGLjTl4xgUe4pDpneybxDeJW
0oNPEIOgNB2VdF2yNh936OoVTQna0/3CwVBIM3mijRZOxCGtUAq/SFAe377aR7XX
IYpi1fXuq4Q6pWSXOf3VlwR7atFYCEmQ/u9vEMiJ/Xtox130SVPwlSpl1ZRioXD5
12V6HezJHaF7h1SfQSud
=I8t3
-----END PGP SIGNATURE-----
Merge tag 'pm-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix up the recently introduced cpufreq driver for Qualcomm Kryo
processors by adding a terminating NULL entry to its table of device
IDs (YueHaibing)"
* tag 'pm-4.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: qcom-kryo: add NULL entry to the end of_device_id array
are:
- Amlogic Meson audio divider fix and CPU clk critical marking
- Qualcomm multimedia GDSC marked as 'always on' to keep display working
- Aspeed fixes for critical clks, resets causing clks to stay disabled, and
an incorrect HPLL frequency calculation
- Marvell Armada 3700 cpu clks would undervolt when switching from low
frequencies to high frequencies because the voltage didn't stabilize in
time so now we switch to an intermediate frequency
Plus we have a core framework thinko that messed up the debugfs flag
printing logic to make it not very useful.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAltYo8cRHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSXJqQ/+JTY0qo+zxxWtoucwK2JNZGWPq3+3ZQc3
cce4u6+w8dP4QkbVh0lu6A0IqcLcpKQ+1IYuc4Zz5TW+IUr/5NHL2ynTXAQadGs1
1tqZwLJRLdr82FFGTqpRQ9dibzqqU384XAsKyfeGvXRrCvFFuzpy7P+yjxEj1+CU
Ji7LzkZK8TSC+90cAQpNC6txwvieqP/vI99iBbIBIzz+Dy8FYQ3sBKZoCcxjefhI
Ma0ZUCrCXw+IkG98Q3OkZ10lXmWfJmsUjdWTU9eBj6+V9raIR6d2QzJ2yOM+aShV
SrV2AMKCbKgSIqG/3K6QIhwElNkcNhZzANHfPr8MoYz75JfXtn32B77S3BD0ayPW
zl5PnLAckJyH1WSmNR9DMq0NXDHZnGGu7gbmeyRmgsCsvEof/c8cdH9/rO1PiETz
uJljXx/sk2MwDXRu/lxlqHtnzWaPNuf24oc8JSGRz62tMeBPE7sKPaCzJah17E82
lwnt8bn4p3Q7RtvuC+/fr56v/I1Bfl88L5wfet8/cpek4rSi78Dp/aUUnOXWHHod
hnVMILBTHSfotMBTs0+V+pViLYCX1aOHlcxgURI/9pjvqPQl6wt5bhKdTWkq14xS
l4VUtTKYoljJ8cnEA18MfZjhUO3E4gYCs94Gi2WfTp1c/BDwBwa3nsBoagIT2te7
BQUmGfH8v5k=
=bH+p
-----END PGP SIGNATURE-----
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"One more round of updates for problems seen this -rc series. Drivers
fixes are:
- Amlogic Meson audio divider fix and CPU clk critical marking
- Qualcomm multimedia GDSC marked as 'always on' to keep display
working
- Aspeed fixes for critical clks, resets causing clks to stay
disabled, and an incorrect HPLL frequency calculation
- Marvell Armada 3700 cpu clks would undervolt when switching from
low frequencies to high frequencies because the voltage didn't
stabilize in time so now we switch to an intermediate frequency
Plus we have a core framework thinko that messed up the debugfs flag
printing logic to make it not very useful"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: aspeed: Support HPLL strapping on ast2400
clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz
clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON
clk: aspeed: Treat a gate in reset as disabled
clk: Really show symbolic clock flags in debugfs
clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock
clk: meson: audio-divider is one based
clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
-----BEGIN PGP SIGNATURE-----
iQIVAwUAW1h/7Pu3V2unywtrAQJFOg/+NQswGJcTsJGeTL8tW+8nGtJVeTP6XaVh
xPEjdqTBmimt6ciaNP1LxLLt9jQ50S1f83rWGZeFBQoNgWinoe3VtzSVdlQKhZcT
jC5LtFVlTxw5rrL4uCsJywLLjD0NeH41ISbvCStcyYJExOZr+f4/VJXKNcwKjAvf
kD1xDGnVZsZiGLWFjwBVaPJwFigquoLEU564InMnZbvMW95uZOPGfnwxAGmKQX2n
BV3WxVizCc0MwlHMJYjs0cVMZNviuC+qg7YBJIoio3+Dq8FIn7ISn98LbhCpG7mi
FoiRi+7xs7VCGm9yqtkXL+euHcSzjnJPnlYxpU8xGqAay0qKxoHecZj2iMEX327K
E4mujQ40oqkMLhwy3GhT9cIpvbbQPu7+kS+k9x7UqVnzlhsKEeMp7TEFqSEebO9H
kIuvfRBD3uZY0B/loLCB3Cc/B9OoWAUi6IGBRclwS9+RUuBnZY/jb7iQsEvcOv9u
0EC0biSs1jizG1tLR0LmvjIyvS567t/DG/peLad1lOqPe6Up2mO4XIeyS9phwXAD
ryupnKPr3tGRgvfJ4jLUZPC8/nrv5Fg9R0YhICEEBqhwKn1uTZgyc085d2EHOdQp
fmfbXR/oz4TwjjwlgzrLMQbLB7GUpkgCFxsIPpEVeFH3RDbZNO1UbLovDMUuRaos
lFWHzd4K4XA=
=yuI9
-----END PGP SIGNATURE-----
Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull fscache/cachefiles fixes from David Howells:
- Allow cancelled operations to be queued so they can be cleaned up.
- Fix a refcounting bug in the monitoring of reads on backend files
whereby a race can occur between monitor objects being listed for
work, the work processing being queued and the work processor running
and destroying the monitor objects.
- Fix a ref overput in object attachment, whereby a tentatively
considered object is put in error handling without first being 'got'.
- Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
assertion occurs when we retry because it seems the object is now
active.
- Wait rather BUG'ing on an object collision in the depths of
cachefiles as the active object should be being cleaned up - also
depends on the one above.
* tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
fscache: Fix reference overput in fscache_attach_object() error handling
cachefiles: Fix refcounting bug in backing-file read monitoring
fscache: Allow cancelled operations to be enqueued
If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
the active object tree, we have been emitting a BUG after logging
information about it and the new object.
Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
on the old object (or return an error). The ACTIVE flag should be cleared
after it has been removed from the active object tree. A timeout of 60s is
used in the wait, so we shouldn't be able to get stuck there.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
In cachefiles_mark_object_active(), the new object is marked active and
then we try to add it to the active object tree. If a conflicting object
is already present, we want to wait for that to go away. After the wait,
we go round again and try to re-mark the object as being active - but it's
already marked active from the first time we went through and a BUG is
issued.
Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
Analysis from Kiran Kumar Modukuri:
[Impact]
Oops during heavy NFS + FSCache + Cachefiles
CacheFiles: Error: Overlong wait for old active object to go away.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
CacheFiles: Error: Object already active kernel BUG at
fs/cachefiles/namei.c:163!
[Cause]
In a heavily loaded system with big files being read and truncated, an
fscache object for a cookie is being dropped and a new object being
looked. The new object being looked for has to wait for the old object
to go away before the new object is moved to active state.
[Fix]
Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
retrying the object lookup.
[Testcase]
Have run ~100 hours of NFS stress tests and have not seen this bug recur.
[Regression Potential]
- Limited to fscache/cachefiles.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
When a cookie is allocated that causes fscache_object structs to be
allocated, those objects are initialised with the cookie pointer, but
aren't blessed with a ref on that cookie unless the attachment is
successfully completed in fscache_attach_object().
If attachment fails because the parent object was dying or there was a
collision, fscache_attach_object() returns without incrementing the cookie
counter - but upon failure of this function, the object is released which
then puts the cookie, whether or not a ref was taken on the cookie.
Fix this by taking a ref on the cookie when it is assigned in
fscache_object_init(), even when we're creating a root object.
Analysis from Kiran Kumar:
This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
fscache cookie ref count updated incorrectly during fscache object
allocation resulting in following Oops.
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
[Cause]
Two threads are trying to do operate on a cookie and two objects.
(1) One thread tries to unmount the filesystem and in process goes over a
huge list of objects marking them dead and deleting the objects.
cookie->usage is also decremented in following path:
nfs_fscache_release_super_cookie
-> __fscache_relinquish_cookie
->__fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage) <= 0);
(2) A second thread tries to lookup an object for reading data in following
path:
fscache_alloc_object
1) cachefiles_alloc_object
-> fscache_object_init
-> assign cookie, but usage not bumped.
2) fscache_attach_object -> fails in cant_attach_object because the
cookie's backing object or cookie's->parent object are going away
3) fscache_put_object
-> cachefiles_put_object
->fscache_object_destroy
->fscache_cookie_put
->BUG_ON(atomic_read(&cookie->usage) <= 0);
[NOTE from dhowells] It's unclear as to the circumstances in which (2) can
take place, given that thread (1) is in nfs_kill_super(), however a
conflicting NFS mount with slightly different parameters that creates a
different superblock would do it. A backtrace from Kiran seems to show
that this is a possibility:
kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
...
RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
Call Trace:
__fscache_relinquish_cookie+0x87/0x120 [fscache]
nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
nfs_kill_super+0x29/0x40 [nfs]
deactivate_locked_super+0x48/0x80
deactivate_super+0x5c/0x60
cleanup_mnt+0x3f/0x90
__cleanup_mnt+0x12/0x20
task_work_run+0x86/0xb0
exit_to_usermode_loop+0xc2/0xd0
syscall_return_slowpath+0x4e/0x60
int_ret_from_sys_call+0x25/0x9f
[Fix] Bump up the cookie usage in fscache_object_init, when it is first
being assigned a cookie atomically such that the cookie is added and bumped
up if its refcount is not zero. Remove the assignment in
fscache_attach_object().
[Testcase]
I have run ~100 hours of NFS stress tests and not seen this bug recur.
[Regression Potential]
- Limited to fscache/cachefiles.
Fixes: ccc4fc3d11 ("FS-Cache: Implement the cookie management part of the netfs API")
Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cachefiles_read_waiter() has the right to access a 'monitor' object by
virtue of being called under the waitqueue lock for one of the pages in its
purview. However, it has no ref on that monitor object or on the
associated operation.
What it is allowed to do is to move the monitor object to the operation's
to_do list, but once it drops the work_lock, it's actually no longer
permitted to access that object. However, it is trying to enqueue the
retrieval operation for processing - but it can only do this via a pointer
in the monitor object, something it shouldn't be doing.
If it doesn't enqueue the operation, the operation may not get processed.
If the order is flipped so that the enqueue is first, then it's possible
for the work processor to look at the to_do list before the monitor is
enqueued upon it.
Fix this by getting a ref on the operation so that we can trust that it
will still be there once we've added the monitor to the to_do list and
dropped the work_lock. The op can then be enqueued after the lock is
dropped.
The bug can manifest in one of a couple of ways. The first manifestation
looks like:
FS-Cache:
FS-Cache: Assertion failed
FS-Cache: 6 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:494!
RIP: 0010:fscache_put_operation+0x1e3/0x1f0
...
fscache_op_work_func+0x26/0x50
process_one_work+0x131/0x290
worker_thread+0x45/0x360
kthread+0xf8/0x130
? create_worker+0x190/0x190
? kthread_cancel_work_sync+0x10/0x10
ret_from_fork+0x1f/0x30
This is due to the operation being in the DEAD state (6) rather than
INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
fscache_put_operation().
The bug can also manifest like the following:
kernel BUG at fs/fscache/operation.c:69!
...
[exception RIP: fscache_enqueue_operation+246]
...
#7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
#8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
#9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
entirely clear which assertion failed.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Lei Xue <carmark.dlut@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Reported-by: Anthony DeRobertis <aderobertis@metrics.net>
Reported-by: NeilBrown <neilb@suse.com>
Reported-by: Daniel Axtens <dja@axtens.net>
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Daniel Axtens <dja@axtens.net>
Alter the state-check assertion in fscache_enqueue_operation() to allow
cancelled operations to be given processing time so they can be cleaned up.
Also fix a debugging statement that was requiring such operations to have
an object assigned.
Fixes: 9ae326a690 ("CacheFiles: A cache that backs onto a mounted filesystem")
Reported-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
- Fix an off-by-one in reporting PCI resource sizes to userland which
regressed in v3.12.
- Fix writes to DDR controller registers used to flush write buffers,
which regressed with some refactoring in v4.2.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCW1eaYhUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN3KEQD/R35j0nU28kbNtIG4bDABZmjN42qy
KRwLdryf3eRLVYkA/3QUGskbEKzJjvasM2Ia+hyx0CMmgigjtrr6iK5B+8sJ
=0pT0
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A couple more MIPS fixes for 4.18:
- Fix an off-by-one in reporting PCI resource sizes to userland which
regressed in v3.12.
- Fix writes to DDR controller registers used to flush write buffers,
which regressed with some refactoring in v4.2"
* tag 'mips_fixes_4.18_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: ath79: fix register address in ath79_ddr_wb_flush()
MIPS: Fix off-by-one in pci_resource_to_user()
Pull networking fixes from David Miller:
1) Handle stations tied to AP_VLANs properly during mac80211 hw
reconfig. From Manikanta Pubbisetty.
2) Fix jump stack depth validation in nf_tables, from Taehee Yoo.
3) Fix quota handling in aRFS flow expiration of mlx5 driver, from Eran
Ben Elisha.
4) Exit path handling fix in powerpc64 BPF JIT, from Daniel Borkmann.
5) Use ptr_ring_consume_bh() in page pool code, from Tariq Toukan.
6) Fix cached netdev name leak in nf_tables, from Florian Westphal.
7) Fix memory leaks on chain rename, also from Florian Westphal.
8) Several fixes to DCTCP congestion control ACK handling, from Yuchunk
Cheng.
9) Missing rcu_read_unlock() in CAIF protocol code, from Yue Haibing.
10) Fix link local address handling with VRF, from David Ahern.
11) Don't clobber 'err' on a successful call to __skb_linearize() in
skb_segment(). From Eric Dumazet.
12) Fix vxlan fdb notification races, from Roopa Prabhu.
13) Hash UDP fragments consistently, from Paolo Abeni.
14) If TCP receives lots of out of order tiny packets, we do really
silly stuff. Make the out-of-order queue ending more robust to this
kind of behavior, from Eric Dumazet.
15) Don't leak netlink dump state in nf_tables, from Florian Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
net: axienet: Fix double deregister of mdio
qmi_wwan: fix interface number for DW5821e production firmware
ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull
bnx2x: Fix invalid memory access in rss hash config path.
net/mlx4_core: Save the qpn from the input modifier in RST2INIT wrapper
r8169: restore previous behavior to accept BIOS WoL settings
cfg80211: never ignore user regulatory hint
sock: fix sg page frag coalescing in sk_alloc_sg
netfilter: nf_tables: move dumper state allocation into ->start
tcp: add tcp_ooo_try_coalesce() helper
tcp: call tcp_drop() from tcp_data_queue_ofo()
tcp: detect malicious patterns in tcp_collapse_ofo_queue()
tcp: avoid collapses in tcp_prune_queue() if possible
tcp: free batches of packets in tcp_prune_ofo_queue()
ip: hash fragments consistently
ipv6: use fib6_info_hold_safe() when necessary
can: xilinx_can: fix power management handling
can: xilinx_can: fix incorrect clear of non-processed interrupts
can: xilinx_can: fix RX overflow interrupt not being enabled
can: xilinx_can: keep only 1-2 frames in TX FIFO to fix TX accounting
...
Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:
BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
copy_to_user include/linux/uaccess.h:184 [inline]
put_cmsg+0x5ef/0x860 net/core/scm.c:242
ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
[..]
This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.
With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.
Call pskb_may_pull before accessing the pointer to ensure that it lies
in skb head.
Link: http://lkml.kernel.org/r/CAF=yD-LEJwZj5a1-bAAj2Oy_hKmGygV6rsJ_WOrAYnv-fnayiQ@mail.gmail.com
Reported-by: syzbot+9adb4b567003cac781f0@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rx hash/filter table configuration uses rss_conf_obj to configure filters
in the hardware. This object is initialized only when the interface is
brought up.
This patch adds driver changes to configure rss params only when the device
is in opened state. In port disabled case, the config will be cached in the
driver structure which will be applied in the successive load path.
Please consider applying it to 'net' branch.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Function mlx4_RST2INIT_QP_wrapper saved the qp number passed in the qp
context, rather than the one passed in the input modifier.
However, the qp number in the qp context is not defined as a
required parameter by the FW. Therefore, drivers may choose to not
specify the qp number in the qp context for the reset-to-init transition.
Thus, we must save the qp number passed in the command input modifier --
which is always present. (This saved qp number is used as the input
modifier for command 2RST_QP when a slave's qp's are destroyed).
Fixes: c82e9aa0a8 ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 7edf6d314c tried to resolve an inconsistency (BIOS WoL
settings are accepted, but device isn't wakeup-enabled) resulting
from a previous broken-BIOS workaround by making disabled WoL the
default.
This however had some side effects, most likely due to a broken BIOS
some systems don't properly resume from suspend when the MagicPacket
WoL bit isn't set in the chip, see
https://bugzilla.kernel.org/show_bug.cgi?id=200195
Therefore restore the WoL behavior from 4.16.
Reported-by: Albert Astals Cid <aacid@kde.org>
Fixes: 7edf6d314c ("r8169: disable WOL per default")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 fix from Martin Schwidefsky.
Guenter Roeck reports that the s390 allmodconfig build fails because of
a gcc plugin problem. The fix won't be in-tree until 4.19, so for now
disable the gcc plugins on s390.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: disable gcc plugins
Including asm/cacheflush.h first results in the following build error
when trying to build sparc32:allmodconfig, because 'struct page' has not
been declared, and the function declaration ends up creating a separate
(private) declaration of struct page (as a result of function arguments
being in the scope of the function declaration and definition, not in
global scope).
The C scoping rules do not just affect variable visibility, they also
affect type declaration visibility.
The end result is that when the actual call site is seen in
<linux/highmem.h>, the 'struct page' type in the caller is not the same
'struct page' that the function was declared with, resulting in:
In file included from arch/sparc/include/asm/page.h:10:0,
...
from drivers/staging/media/omap4iss/iss_video.c:15:
include/linux/highmem.h: In function 'clear_user_highpage':
include/linux/highmem.h:137:31: error:
passing argument 1 of 'sparc_flush_page_to_ram' from incompatible
pointer type
Include generic includes files first to fix the problem.
Fixes: fc96d58c10 ("[media] v4l: omap4iss: Add support for OMAP4 camera interface - Video devices")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
[ Added explanation of C scope rules - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>