Move code detecting HW flags based on device type and FW API version
into a single function.
Signed-off-by: Piotr Azarewicz <piotr.azarewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix display of parameters "Configured FEC encodings:" and "Advertised
FEC modes:" in ethtool. Implemented by setting proper FEC bits in
“advertising” bitmask of link_modes struct and “fec” bitmask in
ethtool_fecparam struct. Without this patch wrong FEC settings
can be shown.
Signed-off-by: Jaroslaw Gawin <jaroslawx.gawin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes function to read NVM module data and uses it to
read current LLDP agent configuration from NVM API version 1.8.
Signed-off-by: Sylwia Wnuczko <sylwia.wnuczko@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.
This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.
Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5fc ("nbd: fix max number of supported devs")
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This driver is using regulator_get_optional() to handle all the supplies
that it handles, and only ever enables and disables all supplies en masse
without ever doing any other configuration of the device to handle missing
power. These are clear signs that the API is being misused - it should only
be used for supplies that may be physically absent from the system and in
these cases the hardware usually needs different configuration if the
supply is missing. Instead use normal regualtor_get(), if the supply is
not described in DT then the framework will substitute a dummy regulator in
so no special handling is needed by the consumer driver.
In the case of the PHY regulator the handling in the driver is a hack to
deal with integrated PHYs; the supplies are only optional in the sense
that that there's some confusion in the code about where they're bound to.
From a code point of view they function exactly as normal supplies so can
be treated as such. It'd probably be better to model this by instantiating
a PHY object for integrated PHYs.
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We hit the following warning in production
print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
blk_mq_free_request+0xb7/0xf0
blk_mq_complete_request+0x62/0xf0
recv_work+0x29/0xa1 [nbd]
process_one_work+0x1f5/0x3f0
worker_thread+0x2d/0x3d0
? rescuer_thread+0x340/0x340
kthread+0x111/0x130
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---
This was preceded by us timing out everything and shutting down the
sockets for the device. The problem is we had a request in the queue at
the same time, so we completed the request twice. This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.
Fix this by checking cmd->status in nbd_read_stat. We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.
For the clear_req cause this is initiated by the user, so again is safe.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Revert __ksymtab_$namespace.$symbol naming scheme back to
__ksymtab_$symbol, as it was causing issues with depmod. Instead,
have modpost extract a symbol's namespace from __kstrtabns and
__ksymtab_strings.
- Fix `make nsdeps` for out of tree kernel builds (make O=...) caused by
unescaped '/'. Use a different sed delimiter to avoid this problem.
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJdsxgtAAoJEMBFfjjOO8Fy4UcP/A7IynJnayv1WYYf9OpwsRjg
YDFaxqcvB9ynnv7hfHA1UK6c8FJZFEBzFc4fOvatuNOfp8Hj8dhX1Hq0nVJNDEoD
ma3hgPIT2pjUstBd9exXFY6SJT3hoPHwmKP6U1ah8NXVkudpr9KjPoKRd4+dDCpG
ZdU6Ff0oDhxmneyqcNJZtXMq9ISl2nWaMuAMet54+NmWySfxoDUd4K4+xTMoiQ05
dPQJUEqzVAM9MRazJR/Kbi0st5b4Y4XnvJMWkN2PDgRDWJg11cS2RMb5Rkq1Nt0X
hW317kH05/r+81BsjFPbVYcx64gJMmoiwBBWOMepPm5VWRJ2Cy3DiH+Vj6d49qNl
c80m1rHmbJIS0GV9PslJyZncRw0eyweT3KQE3wGfVKJFzizxN08sZnW3gpnAVzix
2cB4x0dSSDEZJbB5jzD/dDoRYPIKTeB3geWsWUXY6ILfou22vjT91/A615DwJUjj
1rnmuUsc5fawA6R8P8L1mVQQQ9CMgPnK5GBOzuQdrVZmO8FDAe8h/A/nhKBbBWSE
HdCe1TSfgkun+jaUqnzg/3tFGkXk1xcYKL2zfwUCTikjX03b/dixct9FRSll6LkZ
s6tMQ861l01io15mCCwjiDTcxim+Rlvt7yhG69DLERBMDMOKVL5/fNADbeJQCVg3
PVBW6Bx9h5wWQqyHQ7PY
=Ge3Z
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules fixes from Jessica Yu:
- Revert __ksymtab_$namespace.$symbol naming scheme back to
__ksymtab_$symbol, as it was causing issues with depmod.
Instead, have modpost extract a symbol's namespace from __kstrtabns
and __ksymtab_strings.
- Fix `make nsdeps` for out of tree kernel builds (make O=...) caused
by unescaped '/'.
Use a different sed delimiter to avoid this problem.
* tag 'modules-for-v5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
scripts/nsdeps: use alternative sed delimiter
symbol namespaces: revert to previous __ksymtab name scheme
modpost: make updating the symbol namespace explicit
modpost: delegate updating namespaces to separate function
A slightly larger set of fixes have accrued in the last two weeks.
Mostly a collection of the usual smaller fixes:
- Marvell Armada: USB phy setup issues on Turris Mox
- Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
maintainer updates.
- OMAP: Fixlets for display config, interrupt settings for wifi, some
clock/PM pieces. Also IOMMU regression fix and a ti-sysc no-watchdog
regression fix.
- i.MX: A few fixes around PM/settings, some devicetree fixlets and
catching up with config option changes in DRM
- Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
panel settings
... and some smaller fixes for Davinci (backlight, McBSP DMA), Allwinner
(phy regulators, PMU removal on A64, etc).
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAl2zFiIPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3HtgQAI+fVql3qk2qnDhP2CJihxPWTl5tyM26hvRH
dSRmpAIEXyPyjEOkqfaYqNUgHQVR+GlT5JxL5M//nrVZDngpKmIZs3pJlT4FF75o
VyC/lufeKqPaAfEaewkw8ZasN9sOtOW4ZSNB9sWsqQ5wWaz40py0E+XzIb8njz3r
EvN8JvSpCRteyIpqXCwskwLLjjCyWKFrh1DglVdQ5UObRdqboZulwl3ll9koDMJO
GT6FFHr/oc8CHFntPcP2dCgtMLlxtK7AH6scy8RaHX8uysJBrpKH5cAvszi2n4je
vIS+h8/U/NhFt1M6QjvtC4+DqK5medWbw5Opd14PHeuNwSWjyrhIkNuoSLb2jXBG
QvfEQ0daXFAJLzzW4jl+EIHUJ0Ad/64NV3jQ2we4ah4d/eApGizdrKSxb+tRF7ma
s6ju0v1DNZWpzqVsoOprC/00h3Fm5OI5CtvzCO/Oi1jYSP+OVnGCmveleXxz+8Tm
z/MPml18ykeSOgwCmh8yvg0oVu7AGjqQ7JlFqErwdiSmW6dgLERcQANxQk1Bme7B
0aE94L/9SvNPElnCvmuQy1NYIMisE9r4+/7s46rQIlKajdke3GFZvTGQzynrVDAQ
C3EzBnflIjqjJsJ8TEslHld69ZzqcPzkxE1jKkNLHLh6Z13o3MXIhE4/93VDtlwG
6CbfV6T0
=Jy/6
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A slightly larger set of fixes have accrued in the last two weeks.
Mostly a collection of the usual smaller fixes:
- Marvell Armada: USB phy setup issues on Turris Mox
- Broadcom: GPIO/pinmux DT mapping corrections for Stingray, MMC bus
width fix for RPi Zero W, GPIO LED removal for RPI CM3. Also some
maintainer updates.
- OMAP: Fixlets for display config, interrupt settings for wifi, some
clock/PM pieces. Also IOMMU regression fix and a ti-sysc
no-watchdog regression fix.
- i.MX: A few fixes around PM/settings, some devicetree fixlets and
catching up with config option changes in DRM
- Rockchip: RockRro64 misc DT fixups, Hugsun X99 USB-C, Kevin display
panel settings
... and some smaller fixes for Davinci (backlight, McBSP DMA),
Allwinner (phy regulators, PMU removal on A64, etc)"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (42 commits)
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
MAINTAINERS: Update the Spreadtrum SoC maintainer
MAINTAINERS: Remove Gregory and Brian for ARCH_BRCMSTB
ARM: dts: bcm2837-rpi-cm3: Avoid leds-gpio probing issue
bus: ti-sysc: Fix watchdog quirk handling
ARM: OMAP2+: Add pdata for OMAP3 ISP IOMMU
ARM: OMAP2+: Plug in device_enable/idle ops for IOMMUs
ARM: davinci_all_defconfig: enable GPIO backlight
ARM: davinci: dm365: Fix McBSP dma_slave_map entry
ARM: dts: bcm2835-rpi-zero-w: Fix bus-width of sdhci
ARM: imx_v6_v7_defconfig: Enable CONFIG_DRM_MSM
arm64: dts: imx8mn: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mm: Use correct clock for usdhc's ipg clk
arm64: dts: imx8mq: Use correct clock for usdhc's ipg clk
ARM: dts: imx7s: Correct GPT's ipg clock source
ARM: dts: vf610-zii-scu4-aib: Specify 'i2c-mux-idle-disconnect'
ARM: dts: imx6q-logicpd: Re-Enable SNVS power key
arm64: dts: lx2160a: Correct CPU core idle state name
mailmap: Add Simon Arlott (replacement for expired email address)
arm64: dts: rockchip: Fix override mode for rk3399-kevin panel
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdst7mAAoJEL/70l94x66DZmgIAJJK+LdYja4NljoOd9gCt43g
lGlRJpmfaDUNQHrfuG1ESC+tD73ndaehFfBPSTnpUmgGyq11NCtuMVjVR6ZgIcsh
gUOzgk5PmJIUcb7bgOVkXHXTsqSmC7X8JQqrHmTESY7nEUOGO3GqVdviC/tIdM0Z
lS8F7b21OektJO7PPRgCsgOKwCXKL9SRMClBc7+7AaiShF7WJaKFHbu0iXsENv5D
8QOQDSDWAVWCdNy4Wrv40lJ2DYUydUFh579ekuKkvvus3dBdK+il0epu7kl+HCaU
OpTVQtWmLbgYs++IL4iLj0YIAxoTT19gz5pxOBtMXcPAppGrbfbgKqtpsLYne60=
=PwUK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bugfixes for ARM, PPC and x86, plus selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: nVMX: Don't leak L1 MMIO regions to L2
KVM: SVM: Fix potential wrong physical id in avic_handle_ldr_update
kvm: clear kvmclock MSR on reset
KVM: x86: fix bugon.cocci warnings
KVM: VMX: Remove specialized handling of unexpected exit-reasons
selftests: kvm: fix sync_regs_test with newer gccs
selftests: kvm: vmx_dirty_log_test: skip the test when VMX is not supported
selftests: kvm: consolidate VMX support checks
selftests: kvm: vmx_set_nested_state_test: don't check for VMX support twice
KVM: Don't shrink/grow vCPU halt_poll_ns if host side polling is disabled
selftests: kvm: synchronize .gitignore to Makefile
kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID
KVM: arm64: pmu: Reset sample period on overflow handling
KVM: arm64: pmu: Set the CHAINED attribute before creating the in-kernel event
arm64: KVM: Handle PMCR_EL0.LC as RES1 on pure AArch64 systems
KVM: arm64: pmu: Fix cycle counter truncation
KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in use
komeda:
- typo fixes
- flushing pipes fix
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJdspNYAAoJEAx081l5xIa+/moP/RF6DVDmdbIsjCeQfVgIGRAd
djKzdjR3pALJFrwb1k78kMydzaE7zf5s3sIW2aIJCjl4+TnC9YsHzViYQtf1vb4G
TqaUpi3xDLKXtgUoG3ruHcXjnQypmJclMZ6JgLraEFVr9JSqKgexEHiYbTMjnmHC
MRChtrlY720iU+SbTCwENOTkOPBT1pu9YjXehzWmmWFWT5QGzO3wPOAr+c9ld2S5
JYdpmXInUY5XKHcpZ9Gaaaa9MjF9rUz5oSJx/PXqjUTS1DrJDfzuWGrD1M0DCi0G
S6K5S657yXoIZrYVbMwfaeCfrZfF0oJh8iR8G5fdwwJ5om44bCuHkFNeeRNqLaA7
xPlPRHILq48UYWO37wHK7Z3tKD+4DGUS12ngFFvPFnp0KbTIdGaupErfLNYCoZSk
JHR0GDaAITT+qPO2fYC12qmv9mBmJQ5lU7f6jpuya9EbouyVT9RIcSvIsD+7fpaD
uXYspo/rv6sWi66XCZlzkqBqk3E1ZctjUnFP5V3Mo1JVwUMR0QkIyVfX2NKFpoE7
TWpbji9oXfu2+UKYVKIu1WgoSG/8HlUSzY8OXke4lbXyIiIwbctKxsJeaanTAjeI
vjCuNFXwzFysFykEKjAMC+xzMtosVIVZ51m+zmBFaU+KUBbL5TvFMkEUC5GHqj9T
4GGYcjiUgX2xX3R3mmFs
=k3rD
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Quiet week this week, which I suspect means some people just didn't
get around to sending me fixes pulls in time. This has 2 komeda and a
bunch of amdgpu fixes in it:
komeda:
- typo fixes
- flushing pipes fix
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl"
* tag 'drm-fixes-2019-10-25' of git://anongit.freedesktop.org/drm/drm:
drm/komeda: Fix typos in komeda_splitter_validate
drm/komeda: Don't flush inactive pipes
drm/amdgpu/vce: fix allocation size in enc ring test
drm/amdgpu: fix error handling in amdgpu_bo_list_create
drm/amdgpu: fix potential VM faults
drm/amdgpu: user pages array memory leak fix
drm/amdgpu/vcn: fix allocation size in enc ring test
drm/amdgpu/uvd7: fix allocation size in enc ring test (v2)
drm/amdgpu/uvd6: fix allocation size in enc ring test (v2)
We currently assume that submissions from the sqthread are successful,
and if IO polling is enabled, we use that value for knowing how many
completions to look for. But if we overflowed the CQ ring or some
requests simply got errored and already completed, they won't be
available for polling.
For the case of IO polling and SQTHREAD usage, look at the pending
poll list. If it ever hits empty then we know that we don't have
anymore pollable requests inflight. For that case, simply reset
the inflight count to zero.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently use the ring values directly, but that can lead to issues
if the application is malicious and changes these values on our behalf.
Created in-kernel cached versions of them, and just overwrite the user
side when we update them. This is similar to how we treat the sq/cq
ring tail/head updates.
Reported-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_ring_submit() finalises with
1. io_commit_sqring(), which releases sqes to the userspace
2. Then calls to io_queue_link_head(), accessing released head's sqe
Reorder them.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_sq_thread() processes sqes by 8 without considering links. As a
result, links will be randomely subdivided.
The easiest way to fix it is to call io_get_sqring() inside
io_submit_sqes() as do io_ring_submit().
Downsides:
1. This removes optimisation of not grabbing mm_struct for fixed files
2. It submitting all sqes in one go, without finer-grained sheduling
with cq processing.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a bug, where failed linked requests are returned not with
specified @user_data, but with garbage from a kernel stack.
The reason is that io_fail_links() uses req->user_data, which is
uninitialised when called from io_queue_sqe() on fail path.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Support for the kernel as Xen 32-bit PV guest will soon be removed.
Issue a warning when booted as such.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
- Sifive PLIC: force driver to skip non-relevant contexts
- GICv4: Don't send VMOVP commands to ITSs that don't have
this vPE mapped
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl2y1PUPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDz54P+wQaHlsTcA6vqI5L3BobmQawzY8la9qKqEOu
nAM2ni8/3hU4Q/VIYySrgK/qIlrrUYHWCvvInlEPQLsU/XiQnUfQfwFkQKAvsw1C
JCPTtrO6WBBvOr2uPGDD4Y/6/AvK2Ff1x3BC9JPdBOJKuNn/8YI0iKoz4jEGTAi4
9JRQIG+jwAPZ7BytRqJtVg54O3F7ukCNY8vFLL7Ej2getJgRZcZuho2ENx/F0iID
bN9o2z+IOtgcG2sumjlcDx8VX6+aNVaHHcWDKLTkMddznBggksvyCoZgeZgKxSt0
CSM5Ol1SS9UtswtKvxLt9aRqjoSm9GxggvKc2VbFMtuLqcNSUHwtu0TYNxbBFkuY
EEXDLYAUVMcU2S2C45V6rOotOrNfXvJm70bMwXE1Zd6FnwHsmT2zRMU2D9Bl3Pxb
vJSbZlo662aHWIVob5vyRt+XrJF5nte9S6tkl2UJU0uf/m7BH+4bBZ6J/lER5xEM
RQWt3kkSK8TRAwI8t8ZhC7XVQjcBa5ASocttd6tgmAlZdgpClWI2yuViR7tfLWZA
YlotswT9vuBdgkzbwTxV7xTTDQSfvggqqkvvunlUdXPUUChb/UWN3iAaxZ1RBnuY
ZJzEjUuvU5Md1umtvqNOPawsX/pxWjWh3CDuOUZNgz8OLAZXIdoTD2K9G0HeoYo2
riv2dfSd
=9wMN
-----END PGP SIGNATURE-----
Merge tag 'irqchip-fixes-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull the second lot of irqchip updates for 5.4 from Marc Zyngier:
- Sifive PLIC: force driver to skip non-relevant contexts
- GICv4: Don't send VMOVP commands to ITSs that don't have
this vPE mapped
This reorganization will allow us to call kvm_arch_destroy_vm in the
event that kvm_create_vm fails after calling kvm_arch_init_vm.
Suggested-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Modify plic_init() to skip .dts interrupt contexts other
than supervisor external interrupt.
The .dts entry for plic may specify multiple interrupt contexts.
For example, it may assign two entries IRQ_M_EXT and IRQ_S_EXT,
in that order, to the same interrupt controller. This patch
modifies plic_init() to skip the IRQ_M_EXT context since
IRQ_S_EXT is currently the only supported context.
If IRQ_M_EXT is not skipped, plic_init() will report "handler
already present for context" when it comes across the IRQ_S_EXT
context in the next iteration of its loop.
Without this patch, .dts would have to be edited to replace the
value of IRQ_M_EXT with -1 for it to be skipped.
Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # arch/riscv
Link: https://lkml.kernel.org/r/1571933503-21504-1-git-send-email-alan.mikhak@sifive.com
The _PPC change notifications from the platform firmware are per-CPU,
so acpi_processor_ppc_init() needs to add a frequency QoS request
for each CPU covered by a cpufreq policy to take all of them into
account.
Even though ACPI thermal control of CPUs sets frequency limits
per processor package, it also needs a frequency QoS request for each
CPU in a cpufreq policy in case some of them are taken offline and
the frequency limit needs to be set through the remaining online
ones (this is slightly excessive, because all CPUs covered by one
cpufreq policy will set the same frequency limit through their QoS
requests, but it is not incorrect).
Modify the code in accordance with the above observations.
Fixes: d15ce41273 ("ACPI: cpufreq: Switch to QoS requests instead of cpufreq notifier")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
drm-fixes-5.4-2019-10-23:
amdgpu:
- Fix suspend/resume issue related to multi-media engines
- Fix memory leak in user ptr code related to hmm conversion
- Fix possible VM faults when allocating page table memory
- Fix error handling in bo list ioctl
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191024031809.3155-1-alexander.deucher@amd.com
when flushing inactive pipes
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXbA41gAKCRDj7w1vZxhR
xS2CAP9XSTy6vXr2wJfCMoDPpCdiDGMvjC1jYJJyzSBn6Wv+wgD/TivQz/9p+jhQ
Y//1ZyHbQ42VMskj9akpTEFWQ3WkSwQ=
=LXWq
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2019-10-23' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
Two fixes for komeda, one for typos and one to prevent an hardware issue
when flushing inactive pipes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20191023112643.evpp6f23mpjwdsn4@gilmour
There's a deadlock that is possible and can easily be seen with
a test where multiple readers open/read/close of the same file
and a disruption occurs causing reconnect. The deadlock is due
a reader thread inside cifs_strict_readv calling down_read and
obtaining lock_sem, and then after reconnect inside
cifs_reopen_file calling down_read a second time. If in
between the two down_read calls, a down_write comes from
another process, deadlock occurs.
CPU0 CPU1
---- ----
cifs_strict_readv()
down_read(&cifsi->lock_sem);
_cifsFileInfo_put
OR
cifs_new_fileinfo
down_write(&cifsi->lock_sem);
cifs_reopen_file()
down_read(&cifsi->lock_sem);
Fix the above by changing all down_write(lock_sem) calls to
down_write_trylock(lock_sem)/msleep() loop, which in turn
makes the second down_read call benign since it will never
block behind the writer while holding lock_sem.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Suggested-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed--by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Currently the code assumes that if a file info entry belongs
to lists of open file handles of an inode and a tcon then
it has non-zero reference. The recent changes broke that
assumption when putting the last reference of the file info.
There may be a situation when a file is being deleted but
nothing prevents another thread to reference it again
and start using it. This happens because we do not hold
the inode list lock while checking the number of references
of the file info structure. Fix this by doing the proper
locking when doing the check.
Fixes: 487317c994 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
Fixes: cb248819d2 ("cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic")
Cc: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
When the client hits reconnect it iterates over the mid
pending queue marking entries for retry and moving them
to a temporary list to issue callbacks later without holding
GlobalMid_Lock. In the same time there is no guarantee that
mids can't be removed from the temporary list or even
freed completely by another thread. It may cause a temporary
list corruption:
[ 430.454897] list_del corruption. prev->next should be ffff98d3a8f316c0, but was 2e885cb266355469
[ 430.464668] ------------[ cut here ]------------
[ 430.466569] kernel BUG at lib/list_debug.c:51!
[ 430.468476] invalid opcode: 0000 [#1] SMP PTI
[ 430.470286] CPU: 0 PID: 13267 Comm: cifsd Kdump: loaded Not tainted 5.4.0-rc3+ #19
[ 430.473472] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 430.475872] RIP: 0010:__list_del_entry_valid.cold+0x31/0x55
...
[ 430.510426] Call Trace:
[ 430.511500] cifs_reconnect+0x25e/0x610 [cifs]
[ 430.513350] cifs_readv_from_socket+0x220/0x250 [cifs]
[ 430.515464] cifs_read_from_socket+0x4a/0x70 [cifs]
[ 430.517452] ? try_to_wake_up+0x212/0x650
[ 430.519122] ? cifs_small_buf_get+0x16/0x30 [cifs]
[ 430.521086] ? allocate_buffers+0x66/0x120 [cifs]
[ 430.523019] cifs_demultiplex_thread+0xdc/0xc30 [cifs]
[ 430.525116] kthread+0xfb/0x130
[ 430.526421] ? cifs_handle_standard+0x190/0x190 [cifs]
[ 430.528514] ? kthread_park+0x90/0x90
[ 430.530019] ret_from_fork+0x35/0x40
Fix this by obtaining extra references for mids being retried
and marking them as MID_DELETED which indicates that such a mid
has been dequeued from the pending list.
Also move mid cleanup logic from DeleteMidQEntry to
_cifs_mid_q_entry_release which is called when the last reference
to a particular mid is put. This allows to avoid any use-after-free
of response buffers.
The patch needs to be backported to stable kernels. A stable tag
is not mentioned below because the patch doesn't apply cleanly
to any actively maintained stable kernel.
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-and-tested-by: David Wysochanski <dwysocha@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This patch makes a few changes to btf_ctx_access() to prepare
it for non raw_tp use case where the attach_btf_id is not
necessary a BTF_KIND_TYPEDEF.
It moves the "btf_trace_" prefix check and typedef-follow logic to a new
function "check_attach_btf_id()" which is called only once during
bpf_check(). btf_ctx_access() only operates on a BTF_KIND_FUNC_PROTO
type now. That should also be more efficient since it is done only
one instead of every-time check_ctx_access() is called.
"check_attach_btf_id()" needs to find the func_proto type from
the attach_btf_id. It needs to store the result into the
newly added prog->aux->attach_func_proto. func_proto
btf type has no name, so a proper name should be stored into
"attach_func_name" also.
v2:
- Move the "btf_trace_" check to an earlier verifier phase (Alexei)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191025001811.1718491-1-kafai@fb.com
When rdmacm module is not loaded, and when netlink message is received to
get char device info, it results into a deadlock due to recursive locking
of rdma_nl_mutex with the below call sequence.
[..]
rdma_nl_rcv()
mutex_lock()
[..]
rdma_nl_rcv_msg()
ib_get_client_nl_info()
request_module()
iw_cm_init()
rdma_nl_register()
mutex_lock(); <- Deadlock, acquiring mutex again
Due to above call sequence, following call trace and deadlock is observed.
kernel: __mutex_lock+0x35e/0x860
kernel: ? __mutex_lock+0x129/0x860
kernel: ? rdma_nl_register+0x1a/0x90 [ib_core]
kernel: rdma_nl_register+0x1a/0x90 [ib_core]
kernel: ? 0xffffffffc029b000
kernel: iw_cm_init+0x34/0x1000 [iw_cm]
kernel: do_one_initcall+0x67/0x2d4
kernel: ? kmem_cache_alloc_trace+0x1ec/0x2a0
kernel: do_init_module+0x5a/0x223
kernel: load_module+0x1998/0x1e10
kernel: ? __symbol_put+0x60/0x60
kernel: __do_sys_finit_module+0x94/0xe0
kernel: do_syscall_64+0x5a/0x270
kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
process stack trace:
[<0>] __request_module+0x1c9/0x460
[<0>] ib_get_client_nl_info+0x5e/0xb0 [ib_core]
[<0>] nldev_get_chardev+0x1ac/0x320 [ib_core]
[<0>] rdma_nl_rcv_msg+0xeb/0x1d0 [ib_core]
[<0>] rdma_nl_rcv+0xcd/0x120 [ib_core]
[<0>] netlink_unicast+0x179/0x220
[<0>] netlink_sendmsg+0x2f6/0x3f0
[<0>] sock_sendmsg+0x30/0x40
[<0>] ___sys_sendmsg+0x27a/0x290
[<0>] __sys_sendmsg+0x58/0xa0
[<0>] do_syscall_64+0x5a/0x270
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
To overcome this deadlock and to allow multiple netlink messages to
progress in parallel, following scheme is implemented.
1. Split the lock protecting the cb_table into a per-index lock, and make
it a rwlock. This lock is used to ensure no callbacks are running after
unregistration returns. Since a module will not be registered once it
is already running callbacks, this avoids the deadlock.
2. Use smp_store_release() to update the cb_table during registration so
that no lock is required. This avoids lockdep problems with thinking
all the rwsems are the same lock class.
Fixes: 0e2d00eb6f ("RDMA: Add NLDEV_GET_CHARDEV to allow char dev discovery and autoload")
Link: https://lore.kernel.org/r/20191015080733.18625-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix a ref count, memory leak, and Risc-V cpu schema warnings.
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl2wtukQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhwyTdD/4lVzqzzip1CH6yQ4PrNQiSYqMIYhYwnZCJ
uqaq0OdOQGrLFtM4/YbsU+oeTeV+tBHen5k/C3XtwAowH49TQwPZcgFas43eWVqW
/FoWqDPwyB94JjgkAsTbpnieHII7ZSKwMGEuCcHPf6ppOZe2Y9VLgsJWG576kwYe
bZUFJQ7EV+rCaFxWlEYq7kJ1KDPKPIVEqAsULi5TYLRTPPiKUKaYoM525IS+2M0U
NDn9Szvqoy4+ZC2yBn9bBuG6gdwPp0ZlxhRRY/hyjXYn4keC9x0e1rKpvs1DLE+L
Uxbb0OZ7mFWyqLdzptSD1WfghxkhjdGhgwYRJccjkPG+iVDHNAsTTdgPsEBSa4x1
YDnwfkcxkqoYh2tK/TaG7ZGpG5dv7nVhek8IL+bpuvvc9ZdT8mEL6Q86GsKTAEw1
yBzNrEGryZZV62a3xeaNQKQ8v1b3KJXT110UQyZA0yOT/q7k3QcCJ4tfBB4lNdUS
9YLig/llunwyJI+vBMu2M3GuiQy0ClsEPemsm1CIvKEZ5uVMiIzhI5MWu5P/KDJ4
LOYcSYVBtNPhUxMwU0PvIDROJKDTe3g0bVUzarQq+lvsR7muDmmRGvDJT0J6Ziyz
qqMEqRPYsbJoWiE5PgDQIDBaw1wObZ4mjuBvclffyrUiFX2MHk0O0IydzTwM54I/
BEm1DnQGbg==
=vXnQ
-----END PGP SIGNATURE-----
Merge tag 'devicetree-fixes-for-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree fixes from Rob Herring:
"A couple more DT fixes for 5.4: fix a ref count, memory leak, and
Risc-V cpu schema warnings"
* tag 'devicetree-fixes-for-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
of: reserved_mem: add missing of_node_put() for proper ref-counting
of: unittest: fix memory leak in unittest_data_add
dt-bindings: riscv: Fix CPU schema errors
Madalin Bucur says:
====================
DPAA Ethernet changes
v3: add newline at the end of error messages
v2: resending with From: field matching signed-off-by
Here's a series of changes for the DPAA Ethernet, addressing minor
or unapparent issues in the codebase, adding probe ordering based on
a recently added DPAA QMan API, removing some redundant code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Newline was missing at the end of the error message.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove unused struct member second_largest_buf_size. Also, an out of
bounds access would have occurred in the removed code if there was only
one buffer pool in use.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DPAA Ethernet driver is using the FMan MAC as the device for DMA
mapping. This is not actually correct, as the real DMA device is the
FMan port (the FMan Rx port for reception and the FMan Tx port for
transmission). Changing the device used for DMA mapping to the Fman
Rx and Tx port devices.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an API that retrieves the 'struct device' that the specified FMan
port probed against. The new API will be used in a subsequent patch
that corrects the DMA devices used by the dpaa_eth driver.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Condition was previously checked, removing duplicate code.
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the DPAA 1 Ethernet driver gets probed before the QBMan driver it will
cause a boot crash. Add predictability in the probing order by deferring
the Ethernet driver probe after QBMan and portals by using the recently
introduced QBMan APIs.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The liodn base registers are specific to PAMU based NXP systems and are
reserved on SMMU based ones. Don't access them unless PAMU is compiled in.
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo says:
====================
net: fix nested device bugs
This patchset fixes several bugs that are related to nesting
device infrastructure.
Current nesting infrastructure code doesn't limit the depth level of
devices. nested devices could be handled recursively. at that moment,
it needs huge memory and stack overflow could occur.
Below devices type have same bug.
VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, and VXLAN.
But I couldn't test all interface types so there could be more device
types, which have similar problems.
Maybe qmi_wwan.c code could have same problem.
So, I would appreciate if someone test qmi_wwan.c and other modules.
Test commands:
ip link add dummy0 type dummy
ip link add vlan1 link dummy0 type vlan id 1
for i in {2..100}
do
let A=$i-1
ip link add name vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
1st patch actually fixes the root cause.
It adds new common variables {upper/lower}_level that represent
depth level. upper_level variable is depth of upper devices.
lower_level variable is depth of lower devices.
[U][L] [U][L]
vlan1 1 5 vlan4 1 4
vlan2 2 4 vlan5 2 3
vlan3 3 3 |
| |
+------------+
|
vlan6 4 2
dummy0 5 1
After this patch, the nesting infrastructure code uses this variable to
check the depth level.
2nd patch fixes Qdisc lockdep related problem.
Before this patch, devices use static lockdep map.
So, if devices that are same types are nested, lockdep will warn about
recursive situation.
These patches make these devices use dynamic lockdep key instead of
static lock or subclass.
3rd patch fixes unexpected IFF_BONDING bit unset.
When nested bonding interface scenario, bonding interface could lost it's
IFF_BONDING flag. This should not happen.
This patch adds a condition before unsetting IFF_BONDING.
4th patch fixes nested locking problem in bonding interface
Bonding interface has own lock and this uses static lock.
Bonding interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
5th patch fixes nested locking problem in team interface
Team interface has own lock and this uses static lock.
Team interface could be nested and it uses same lockdep key.
So that unexisting lockdep warning occurs.
6th patch fixes a refcnt leak in the macsec module.
When the macsec module is unloaded, refcnt leaks occur.
But actually, that holding refcnt is unnecessary.
So this patch just removes these code.
7th patch adds ignore flag to an adjacent structure.
In order to exchange an adjacent node safely, ignore flag is needed.
8th patch makes vxlan add an adjacent link to limit depth level.
Vxlan interface could set it's lower interface and these lower interfaces
are handled recursively.
So, if the depth of lower interfaces is too deep, stack overflow could
happen.
9th patch removes unnecessary variables and callback.
After 1st patch, subclass callback and variables are unnecessary.
This patch just removes these variables and callback.
10th patch fix refcnt leaks in the virt_wifi module
Like every nested interface, the upper interface should be deleted
before the lower interface is deleted.
In order to fix this, the notifier routine is added in this patch.
v4 -> v5 :
- Update log messages
- Move variables position, 1st patch
- Fix iterator routine, 1st patch
- Add generic lockdep key code, which replaces 2, 4, 5, 6, 7 patches.
- Log message update, 10th patch
- Fix wrong error value in error path of __init routine, 10th patch
- hold module refcnt when interface is created, 10th patch
v3 -> v4 :
- Add new 12th patch to fix refcnt leaks in the virt_wifi module
- Fix wrong usage netdev_upper_dev_link() in the vxlan.c
- Preserve reverse christmas tree variable ordering in the vxlan.c
- Add missing static keyword in the dev.c
- Expose netdev_adjacent_change_{prepare/commit/abort} instead of
netdev_adjacent_dev_{enable/disable}
v2 -> v3 :
- Modify nesting infrastructure code to use iterator instead of recursive.
v1 -> v2 :
- Make the 3rd patch do not add a new priv_flag.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to link an adjacent node, netdev_upper_dev_link() is used
and in order to unlink an adjacent node, netdev_upper_dev_unlink() is used.
unlink operation does not fail, but link operation can fail.
In order to exchange adjacent nodes, we should unlink an old adjacent
node first. then, link a new adjacent node.
If link operation is failed, we should link an old adjacent node again.
But this link operation can fail too.
It eventually breaks the adjacent link relationship.
This patch adds an ignore flag into the netdev_adjacent structure.
If this flag is set, netdev_upper_dev_link() ignores an old adjacent
node for a moment.
This patch also adds new functions for other modules.
netdev_adjacent_change_prepare()
netdev_adjacent_change_commit()
netdev_adjacent_change_abort()
netdev_adjacent_change_prepare() inserts new device into adjacent list
but new device is not allowed to use immediately.
If netdev_adjacent_change_prepare() fails, it internally rollbacks
adjacent list so that we don't need any other action.
netdev_adjacent_change_commit() deletes old device in the adjacent list
and allows new device to use.
netdev_adjacent_change_abort() rollbacks adjacent list.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a macsec interface is created, it increases a refcnt to a lower
device(real device). when macsec interface is deleted, the refcnt is
decreased in macsec_free_netdev(), which is ->priv_destructor() of
macsec interface.
The problem scenario is this.
When nested macsec interfaces are exiting, the exit routine of the
macsec module makes refcnt leaks.
Test commands:
ip link add dummy0 type dummy
ip link add macsec0 link dummy0 type macsec
ip link add macsec1 link macsec0 type macsec
modprobe -rv macsec
[ 208.629433] unregister_netdevice: waiting for macsec0 to become free. Usage count = 1
Steps of exit routine of macsec module are below.
1. Calls ->dellink() in __rtnl_link_unregister().
2. Checks refcnt and wait refcnt to be 0 if refcnt is not 0 in
netdev_run_todo().
3. Calls ->priv_destruvtor() in netdev_run_todo().
Step2 checks refcnt, but step3 decreases refcnt.
So, step2 waits forever.
This patch makes the macsec module do not hold a refcnt of the lower
device because it already holds a refcnt of the lower device with
netdev_upper_dev_link().
Fixes: c09440f7dc ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
team interface could be nested and it's lock variable could be nested too.
But this lock uses static lockdep key and there is no nested locking
handling code such as mutex_lock_nested() and so on.
so the Lockdep would warn about the circular locking scenario that
couldn't happen.
In order to fix, this patch makes the team module to use dynamic lock key
instead of static key.
Test commands:
ip link add team0 type team
ip link add team1 type team
ip link set team0 master team1
ip link set team0 nomaster
ip link set team1 master team0
ip link set team1 nomaster
Splat that looks like:
[ 40.364352] WARNING: possible recursive locking detected
[ 40.364964] 5.4.0-rc3+ #96 Not tainted
[ 40.365405] --------------------------------------------
[ 40.365973] ip/750 is trying to acquire lock:
[ 40.366542] ffff888060b34c40 (&team->lock){+.+.}, at: team_set_mac_address+0x151/0x290 [team]
[ 40.367689]
but task is already holding lock:
[ 40.368729] ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.370280]
other info that might help us debug this:
[ 40.371159] Possible unsafe locking scenario:
[ 40.371942] CPU0
[ 40.372338] ----
[ 40.372673] lock(&team->lock);
[ 40.373115] lock(&team->lock);
[ 40.373549]
*** DEADLOCK ***
[ 40.374432] May be due to missing lock nesting notation
[ 40.375338] 2 locks held by ip/750:
[ 40.375851] #0: ffffffffabcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 40.376927] #1: ffff888051201c40 (&team->lock){+.+.}, at: team_del_slave+0x29/0x60 [team]
[ 40.377989]
stack backtrace:
[ 40.378650] CPU: 0 PID: 750 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 40.379368] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 40.380574] Call Trace:
[ 40.381208] dump_stack+0x7c/0xbb
[ 40.381959] __lock_acquire+0x269d/0x3de0
[ 40.382817] ? register_lock_class+0x14d0/0x14d0
[ 40.383784] ? check_chain_key+0x236/0x5d0
[ 40.384518] lock_acquire+0x164/0x3b0
[ 40.385074] ? team_set_mac_address+0x151/0x290 [team]
[ 40.385805] __mutex_lock+0x14d/0x14c0
[ 40.386371] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387038] ? team_set_mac_address+0x151/0x290 [team]
[ 40.387632] ? mutex_lock_io_nested+0x1380/0x1380
[ 40.388245] ? team_del_slave+0x60/0x60 [team]
[ 40.388752] ? rcu_read_lock_sched_held+0x90/0xc0
[ 40.389304] ? rcu_read_lock_bh_held+0xa0/0xa0
[ 40.389819] ? lock_acquire+0x164/0x3b0
[ 40.390285] ? lockdep_rtnl_is_held+0x16/0x20
[ 40.390797] ? team_port_get_rtnl+0x90/0xe0 [team]
[ 40.391353] ? __module_text_address+0x13/0x140
[ 40.391886] ? team_set_mac_address+0x151/0x290 [team]
[ 40.392547] team_set_mac_address+0x151/0x290 [team]
[ 40.393111] dev_set_mac_address+0x1f0/0x3f0
[ ... ]
Fixes: 3d249d4ca7 ("net: introduce ethernet teaming device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All bonding device has same lockdep key and subclass is initialized with
nest_level.
But actual nest_level value can be changed when a lower device is attached.
And at this moment, the subclass should be updated but it seems to be
unsafe.
So this patch makes bonding use dynamic lockdep key instead of the
subclass.
Test commands:
ip link add bond0 type bond
for i in {1..5}
do
let A=$i-1
ip link add bond$i type bond
ip link set bond$i master bond$A
done
ip link set bond5 master bond0
Splat looks like:
[ 307.992912] WARNING: possible recursive locking detected
[ 307.993656] 5.4.0-rc3+ #96 Tainted: G W
[ 307.994367] --------------------------------------------
[ 307.995092] ip/761 is trying to acquire lock:
[ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.997045]
but task is already holding lock:
[ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.999215]
other info that might help us debug this:
[ 308.000251] Possible unsafe locking scenario:
[ 308.001137] CPU0
[ 308.001533] ----
[ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.003302]
*** DEADLOCK ***
[ 308.004310] May be due to missing lock nesting notation
[ 308.005319] 3 locks held by ip/761:
[ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
[ 308.009422]
stack backtrace:
[ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96
[ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 308.012179] Call Trace:
[ 308.012601] dump_stack+0x7c/0xbb
[ 308.013089] __lock_acquire+0x269d/0x3de0
[ 308.013669] ? register_lock_class+0x14d0/0x14d0
[ 308.014318] lock_acquire+0x164/0x3b0
[ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.015520] _raw_spin_lock_nested+0x2e/0x60
[ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.017215] bond_get_stats+0xb8/0x500 [bonding]
[ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding]
[ 308.019710] ? rcu_read_lock_held+0x90/0xa0
[ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding]
[ 308.021953] dev_get_stats+0x1ec/0x270
[ 308.022508] bond_get_stats+0x1d1/0x500 [bonding]
Fixes: d3fff6c443 ("net: add netdev_lockdep_set_classes() helper")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code doesn't limit the number of nested devices.
Nested devices would be handled recursively and this needs huge stack
memory. So, unlimited nested devices could make stack overflow.
This patch adds upper_level and lower_level, they are common variables
and represent maximum lower/upper depth.
When upper/lower device is attached or dettached,
{lower/upper}_level are updated. and if maximum depth is bigger than 8,
attach routine fails and returns -EMLINK.
In addition, this patch converts recursive routine of
netdev_walk_all_{lower/upper} to iterator routine.
Test commands:
ip link add dummy0 type dummy
ip link add link dummy0 name vlan1 type vlan id 1
ip link set vlan1 up
for i in {2..55}
do
let A=$i-1
ip link add vlan$i link vlan$A type vlan id $i
done
ip link del dummy0
Splat looks like:
[ 155.513226][ T908] BUG: KASAN: use-after-free in __unwind_start+0x71/0x850
[ 155.514162][ T908] Write of size 88 at addr ffff8880608a6cc0 by task ip/908
[ 155.515048][ T908]
[ 155.515333][ T908] CPU: 0 PID: 908 Comm: ip Not tainted 5.4.0-rc3+ #96
[ 155.516147][ T908] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 155.517233][ T908] Call Trace:
[ 155.517627][ T908]
[ 155.517918][ T908] Allocated by task 0:
[ 155.518412][ T908] (stack is not available)
[ 155.518955][ T908]
[ 155.519228][ T908] Freed by task 0:
[ 155.519885][ T908] (stack is not available)
[ 155.520452][ T908]
[ 155.520729][ T908] The buggy address belongs to the object at ffff8880608a6ac0
[ 155.520729][ T908] which belongs to the cache names_cache of size 4096
[ 155.522387][ T908] The buggy address is located 512 bytes inside of
[ 155.522387][ T908] 4096-byte region [ffff8880608a6ac0, ffff8880608a7ac0)
[ 155.523920][ T908] The buggy address belongs to the page:
[ 155.524552][ T908] page:ffffea0001822800 refcount:1 mapcount:0 mapping:ffff88806c657cc0 index:0x0 compound_mapcount:0
[ 155.525836][ T908] flags: 0x100000000010200(slab|head)
[ 155.526445][ T908] raw: 0100000000010200 ffffea0001813808 ffffea0001a26c08 ffff88806c657cc0
[ 155.527424][ T908] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 155.528429][ T908] page dumped because: kasan: bad access detected
[ 155.529158][ T908]
[ 155.529410][ T908] Memory state around the buggy address:
[ 155.530060][ T908] ffff8880608a6b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.530971][ T908] ffff8880608a6c00: fb fb fb fb fb f1 f1 f1 f1 00 f2 f2 f2 f3 f3 f3
[ 155.531889][ T908] >ffff8880608a6c80: f3 fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 155.532806][ T908] ^
[ 155.533509][ T908] ffff8880608a6d00: fb fb fb fb fb fb fb fb fb f1 f1 f1 f1 00 00 00
[ 155.534436][ T908] ffff8880608a6d80: f2 f3 f3 f3 f3 fb fb fb 00 00 00 00 00 00 00 00
[ ... ]
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>