Retry the PCIe link training up to 'pcie_retry' times
if the PCIe link width is narrower than the previous width.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current method of allocating MSIx resources is a bit cumbersome,
and not very easily added to.
Refactor and re-order the code paths into a more consistent interface.
Update the interface so that allocations are not order dependent.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The current HFI1 MSIx API is difficult to follow, change, or add to.
In anticipation of moving to an more flexible API, move the current
MSIx functionality to the new msix.c module.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently several things occur before the hfi1_devdata structure is
allocated. This leads to an inconsistent logging ability and makes
it more difficult to restructure some code paths.
Allocate (and do a minimal init) the structure as soon as possible.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The tune_pcie_caps needs to occur sometime after PCI is enabled, but
before the HFI is enabled. Currently it is placed in the MSIx
allocation code which doesn't really fit. Moving it to just after
the gen3 bump.
Clean up the associated code (modules, etc.).
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
TXREQ defines are duplicated, incompletely, in the sdma header file.
Remove duplicate defines.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We want to keep files in alphabetical order in our makefile, however this
just makes for messy diffs when adding (or removing) files. Let's just clean
this up and make it line by line.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
- Switch SMC over to rdma_get_gid_attr and remove the compat
- Fix a crash in HFI1 with some BIOS's
- Fix a randconfig failure
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlt9z6kACgkQOG33FX4g
mxqLag//Zr3RrNSmswNjl+6PrWeg88GQstAm4gQps8mR03eI8Ha5fAS6MWuq1U8+
LGP8bbBtE6fHSCkPSnN/owReZSTcTIbgWB021vBIUKUpytmxie+ugMVXIHwm6cIM
E3pgPsRngTrqtuRxyOhyoaMKSjRCbSdMe43SWf42/S+9SxDkmXtBzymYT119bu/A
eqU7z/HuRqcVxrgOinhPjcQkEFdRYUaRk+g6jzaQmZl8IjKHp/d3BSzmzbE6txMt
txP5hu/msz4xkyusX/I6ZS3do+4WMIAMNhq9bVMo5pNu2RG1eo0LLmHgjsDsFI3u
ZTaZQj6eYWlbwWRiOU+2Frf4kH4CGAyxlFVCr4yEvfe2VvEbZsLZbYuF1iasnqrP
3Mrbh7Esj4MfDT/QbunnoX+49X/Rzma/a+tu6hqAhnGjvYsaDyy8WSNwHMs65f7m
sii58arik9exJHVKMXZ5FJJydBYBcPt5IlkUdoTTZ1eLkoc2B9lG6lKZbWxUYNPx
uS6XOyR4fcsmLfKfEvQuLZznWi3Jo4R17QKlL5ulqHrZUJV4PZ4Eiu5izScZ4ci/
xcPGz1a4MT7Zg3V611qGSGt7zYDdD5+R+k/sDIOleCFIpueFzzh/I1oLkSIoXFIx
DmwqCdRGeB5F7ZhfArFTezgiaMe55SPJKY5PAiThxzoostUGOfw=
=6z1N
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull more rdma updates from Jason Gunthorpe:
"This is the SMC cleanup promised, a randconfig regression fix, and
kernel oops fix.
Summary:
- Switch SMC over to rdma_get_gid_attr and remove the compat
- Fix a crash in HFI1 with some BIOS's
- Fix a randconfig failure"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/ucm: fix UCM link error
IB/hfi1: Invalid NUMA node information can cause a divide by zero
RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr
There are several blockable mmu notifiers which might sleep in
mmu_notifier_invalidate_range_start and that is a problem for the
oom_reaper because it needs to guarantee a forward progress so it cannot
depend on any sleepable locks.
Currently we simply back off and mark an oom victim with blockable mmu
notifiers as done after a short sleep. That can result in selecting a new
oom victim prematurely because the previous one still hasn't torn its
memory down yet.
We can do much better though. Even if mmu notifiers use sleepable locks
there is no reason to automatically assume those locks are held. Moreover
majority of notifiers only care about a portion of the address space and
there is absolutely zero reason to fail when we are unmapping an unrelated
range. Many notifiers do really block and wait for HW which is harder to
handle and we have to bail out though.
This patch handles the low hanging fruit.
__mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
are not allowed to sleep if the flag is set to false. This is achieved by
using trylock instead of the sleepable lock for most callbacks and
continue as long as we do not block down the call chain.
I think we can improve that even further because there is a common pattern
to do a range lookup first and then do something about that. The first
part can be done without a sleeping lock in most cases AFAICS.
The oom_reaper end then simply retries if there is at least one notifier
which couldn't make any progress in !blockable mode. A retry loop is
already implemented to wait for the mmap_sem and this is basically the
same thing.
The simplest way for driver developers to test this code path is to wrap
userspace code which uses these notifiers into a memcg and set the hard
limit to hit the oom. This can be done e.g. after the test faults in all
the mmu notifier managed memory and set the hard limit to something really
small. Then we are looking for a proper process tear down.
[akpm@linux-foundation.org: coding style fixes]
[akpm@linux-foundation.org: minor code simplification]
Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
Reported-by: David Rientjes <rientjes@google.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Sudeep Dutt <sudeep.dutt@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the system BIOS does not supply NUMA node information to the
PCI devices, the NUMA node is selected by choosing the current
node.
This can lead to the following crash:
divide error: 0000 SMP
CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G IOE
------------ 3.10.0-693.21.1.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
SE5C610.86B.01.01.0005.101720141054 10/17/2014
Workqueue: events work_for_cpu_fn
task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
RSP: 0018:ffff88017448bbf8 EFLAGS: 00010246
RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
FS: 0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
hfi1_init_dd+0x14b3/0x27a0 [hfi1]
? pcie_capability_write_word+0x46/0x70
? hfi1_pcie_init+0xc0/0x200 [hfi1]
do_init_one+0x153/0x4c0 [hfi1]
? sched_clock_cpu+0x85/0xc0
init_one+0x1b5/0x260 [hfi1]
local_pci_probe+0x4a/0xb0
work_for_cpu_fn+0x1a/0x30
process_one_work+0x17f/0x440
worker_thread+0x278/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork+0x77/0xb0
? insert_kthread_work+0x40/0x40
If the BIOS is not supplying NUMA information:
- set the default table count to 1 for all possible nodes
- select node 0 (instead of current NUMA) node to get consistent
performance
- generate an error indicating that the BIOS should be upgraded
Reviewed-by: Gary Leshner <gary.s.leshner@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma.git merge resolution for the 4.19 merge window
Conflicts:
drivers/infiniband/core/rdma_core.c
- Use the rdma code and revise with the new spelling for
atomic_fetch_add_unless
drivers/nvme/host/rdma.c
- Replace max_sge with max_send_sge in new blk code
drivers/nvme/target/rdma.c
- Use the blk code and revise to use NULL for ib_post_recv when
appropriate
- Replace max_sge with max_recv_sge in new blk code
net/rds/ib_send.c
- Use the net code and revise to use NULL for ib_post_recv when
appropriate
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltwm2geHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGITkH/iSzkVhT2OxHoir0
mLVzTi7/Z17L0e/ELl7TvAC0iLFlWZKdlGR0g3b4/QpXLPmNK4HxiDRTQuWn8ke0
qDZyDq89HqLt+mpeFZ43PCd9oqV8CH2xxK3iCWReqv6bNnowGnRpSStlks4rDqWn
zURC/5sUh7TzEG4s997RrrpnyPeQWUlf/Mhtzg2/WvK2btoLWgu5qzjX1uFh3s7u
vaF2NXVJ3X03gPktyxZzwtO1SwLFS1jhwUXWBZ5AnoJ99ywkghQnkqS/2YpekNTm
wFk80/78sU+d91aAqO8kkhHj8VRrd+9SGnZ4mB2aZHwjZjGcics4RRtxukSfOQ+6
L47IdXo=
=sJkt
-----END PGP SIGNATURE-----
Merge tag 'v4.18' into rdma.git for-next
Resolve merge conflicts from the -rc cycle against the rdma.git tree:
Conflicts:
drivers/infiniband/core/uverbs_cmd.c
- New ifs added to ib_uverbs_ex_create_flow in -rc and for-next
- Merge removal of file->ucontext in for-next with new code in -rc
drivers/infiniband/core/uverbs_main.c
- for-next removed code from ib_uverbs_write() that was modified
in for-rc
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlt1f9AUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxbdhAArnhRvkwOk4m4/LCuKF6HpmlxbBNC
TjnBCenNf+lFXzWskfDFGFl/Wif4UzGbRTSCNQrwMzj3Ww3f/6R2QIq9rEJvyNC4
VdxQnaBEZSUgN87q5UGqgdjMTo3zFvlFH6fpb5XDiQ5IX/QZeXeYqoB64w+HvKPU
M+IsoOvnA5gb7pMcpchrGUnSfS1e6AqQbbTt6tZflore6YCEA4cH5OnpGx8qiZIp
ut+CMBvQjQB01fHeBc/wGrVte4NwXdONrXqpUb4sHF7HqRNfEh0QVyPhvebBi+k1
kquqoBQfPFTqgcab31VOcQhg70dEx+1qGm5/YBAwmhCpHR/g2gioFXoROsr+iUOe
BtF6LZr+Y8cySuhJnkCrJBqWvvBaKbJLg0KMbI+7p4o9MZpod2u7LS5LFrlRDyKW
3nz3o+b1+v3tCCKVKIhKo0ljolgkweQtR1f6KIHvq93wBODHVQnAOt9NlPfHVyks
ryGBnOhMjoU5hvfexgIWFk9Ph9MEVQSffkI+TeFPO/tyGBfGfQyGtESiXuEaMQaH
FGdZHX2RLkY3pWHOtWeMzRHzOnr2XjpDFcAqL3HBGPdJ30K3Umv3WOgoFe2SaocG
0gaddPjKSwwM4Sa/VP+O5cjGuzi7QnczSDdpYjxIGZzBav32hqx4/rsnLw7bHH8y
XkEme7cYJc8MGsA=
=2Dmn
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci updates from Bjorn Helgaas:
- Decode AER errors with names similar to "lspci" (Tyler Baicar)
- Expose AER statistics in sysfs (Rajat Jain)
- Clear AER status bits selectively based on the type of recovery (Oza
Pawandeep)
- Honor "pcie_ports=native" even if HEST sets FIRMWARE_FIRST (Alexandru
Gagniuc)
- Don't clear AER status bits if we're using the "Firmware-First"
strategy where firmware owns the registers (Alexandru Gagniuc)
- Use sysfs_match_string() to simplify ASPM sysfs parsing (Andy
Shevchenko)
- Remove unnecessary includes of <linux/pci-aspm.h> (Bjorn Helgaas)
- Defer DPC event handling to work queue (Keith Busch)
- Use threaded IRQ for DPC bottom half (Keith Busch)
- Print AER status while handling DPC events (Keith Busch)
- Work around IDT switch ACS Source Validation erratum (James
Puthukattukaran)
- Emit diagnostics for all cases of PCIe Link downtraining (Links
operating slower than they're capable of) (Alexandru Gagniuc)
- Skip VFs when configuring Max Payload Size (Myron Stowe)
- Reduce Root Port Max Payload Size if necessary when hot-adding a
device below it (Myron Stowe)
- Simplify SHPC existence/permission checks (Bjorn Helgaas)
- Remove hotplug sample skeleton driver (Lukas Wunner)
- Convert pciehp to threaded IRQ handling (Lukas Wunner)
- Improve pciehp tolerance of missed events and initially unstable
links (Lukas Wunner)
- Clear spurious pciehp events on resume (Lukas Wunner)
- Add pciehp runtime PM support, including for Thunderbolt controllers
(Lukas Wunner)
- Support interrupts from pciehp bridges in D3hot (Lukas Wunner)
- Mark fall-through switch cases before enabling -Wimplicit-fallthrough
(Gustavo A. R. Silva)
- Move DMA-debug PCI init from arch code to PCI core (Christoph
Hellwig)
- Fix pci_request_irq() usage of IRQF_ONESHOT when no handler is
supplied (Heiner Kallweit)
- Unify PCI and DMA direction #defines (Shunyong Yang)
- Add PCI_DEVICE_DATA() macro (Andy Shevchenko)
- Check for VPD completion before checking for timeout (Bert Kenward)
- Limit Netronome NFP5000 config space size to work around erratum
(Jakub Kicinski)
- Set IRQCHIP_ONESHOT_SAFE for PCI MSI irqchips (Heiner Kallweit)
- Document ACPI description of PCI host bridges (Bjorn Helgaas)
- Add "pci=disable_acs_redir=" parameter to disable ACS redirection for
peer-to-peer DMA support (we don't have the peer-to-peer support yet;
this is just one piece) (Logan Gunthorpe)
- Clean up devm_of_pci_get_host_bridge_resources() resource allocation
(Jan Kiszka)
- Fixup resizable BARs after suspend/resume (Christian König)
- Make "pci=earlydump" generic (Sinan Kaya)
- Fix ROM BAR access routines to stay in bounds and check for signature
correctly (Rex Zhu)
- Add DMA alias quirk for Microsemi Switchtec NTB (Doug Meyer)
- Expand documentation for pci_add_dma_alias() (Logan Gunthorpe)
- To avoid bus errors, enable PASID only if entire path supports
End-End TLP prefixes (Sinan Kaya)
- Unify slot and bus reset functions and remove hotplug knowledge from
callers (Sinan Kaya)
- Add Function-Level Reset quirks for Intel and Samsung NVMe devices to
fix guest reboot issues (Alex Williamson)
- Add function 1 DMA alias quirk for Marvell 88SS9183 PCIe SSD
Controller (Bjorn Helgaas)
- Remove Xilinx AXI-PCIe host bridge arch dependency (Palmer Dabbelt)
- Remove Aardvark outbound window configuration (Evan Wang)
- Fix Aardvark bridge window sizing issue (Zachary Zhang)
- Convert Aardvark to use pci_host_probe() to reduce code duplication
(Thomas Petazzoni)
- Correct the Cadence cdns_pcie_writel() signature (Alan Douglas)
- Add Cadence support for optional generic PHYs (Alan Douglas)
- Add Cadence power management ops (Alan Douglas)
- Remove redundant variable from Cadence driver (Colin Ian King)
- Add Kirin MSI support (Xiaowei Song)
- Drop unnecessary root_bus_nr setting from exynos, imx6, keystone,
armada8k, artpec6, designware-plat, histb, qcom, spear13xx (Shawn
Guo)
- Move link notification settings from DesignWare core to individual
drivers (Gustavo Pimentel)
- Add endpoint library MSI-X interfaces (Gustavo Pimentel)
- Correct signature of endpoint library IRQ interfaces (Gustavo
Pimentel)
- Add DesignWare endpoint library MSI-X callbacks (Gustavo Pimentel)
- Add endpoint library MSI-X test support (Gustavo Pimentel)
- Remove unnecessary GFP_ATOMIC from Hyper-V "new child" allocation
(Jia-Ju Bai)
- Add more devices to Broadcom PAXC quirk (Ray Jui)
- Work around corrupted Broadcom PAXC config space to enable SMMU and
GICv3 ITS (Ray Jui)
- Disable MSI parsing to work around broken Broadcom PAXC logic in some
devices (Ray Jui)
- Hide unconfigured functions to work around a Broadcom PAXC defect
(Ray Jui)
- Lower iproc log level to reduce console output during boot (Ray Jui)
- Fix mobiveil iomem/phys_addr_t type usage (Lorenzo Pieralisi)
- Fix mobiveil missing include file (Lorenzo Pieralisi)
- Add mobiveil Kconfig/Makefile support (Lorenzo Pieralisi)
- Fix mvebu I/O space remapping issues (Thomas Petazzoni)
- Use generic pci_host_bridge in mvebu instead of ARM-specific API
(Thomas Petazzoni)
- Whitelist VMD devices with fast interrupt handlers to avoid sharing
vectors with slow handlers (Keith Busch)
* tag 'pci-v4.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (153 commits)
PCI/AER: Don't clear AER bits if error handling is Firmware-First
PCI: Limit config space size for Netronome NFP5000
PCI/MSI: Set IRQCHIP_ONESHOT_SAFE for PCI-MSI irqchips
PCI/VPD: Check for VPD access completion before checking for timeout
PCI: Add PCI_DEVICE_DATA() macro to fully describe device ID entry
PCI: Match Root Port's MPS to endpoint's MPSS as necessary
PCI: Skip MPS logic for Virtual Functions (VFs)
PCI: Add function 1 DMA alias quirk for Marvell 88SS9183
PCI: Check for PCIe Link downtraining
PCI: Add ACS Redirect disable quirk for Intel Sunrise Point
PCI: Add device-specific ACS Redirect disable infrastructure
PCI: Convert device-specific ACS quirks from NULL termination to ARRAY_SIZE
PCI: Add "pci=disable_acs_redir=" parameter for peer-to-peer support
PCI: Allow specifying devices using a base bus and path of devfns
PCI: Make specifying PCI devices in kernel parameters reusable
PCI: Hide ACS quirk declarations inside PCI core
PCI: Delay after FLR of Intel DC P3700 NVMe
PCI: Disable Samsung SM961/PM961 NVMe before FLR
PCI: Export pcie_has_flr()
PCI: mvebu: Drop bogus comment above mvebu_pcie_map_registers()
...
Pull networking updates from David Miller:
"Highlights:
- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.
- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.
- Re-enable ASPM in r8169, from Kai-Heng Feng.
- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.
- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.
- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.
- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.
- Lots of cleanups and fixes in L2TP code from Guillaume Nault.
- Add IPSEC offload support to netdevsim, from Shannon Nelson.
- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.
- Add UDP GSO support to mlx5e, from Boris Pismenny.
- Support offloading of Team LAG in NFP, from John Hurley.
- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.
- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.
- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.
- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.
- Busyloop optimizations in vhost_net, from Toshiaki Makita.
- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.
- Add parameter support to devlink, from Moshe Shemesh.
- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.
- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.
- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.
- Support regions and region snapshots in devlink, from Alex Vesker.
- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.
- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.
- Use PHYLIB in r8169 driver, from Heiner Kallweit.
- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.
- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.
- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.
- Support for templates in packet scheduler classifier, from Jiri
Pirko.
- IPV6 support in RDS, from Ka-Cheong Poon.
- Native tproxy support in nf_tables, from Máté Eckl.
- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.
- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...
hns bitmap allocation functions return 0 on success and -1 on failure.
Callers of these functions wrongly used their return value as an errno,
fix that by making a proper conversion.
Fixes: a598c6f4c5 ("IB/hns: Simplify function of pd alloc and qp alloc")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Acked-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds support for SRQ's created in user space and update
qedr_affiliated_event to deal with general SRQ events.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Implement the SRQ specific verbs and update the poll_cq verb to deal with
SRQ completions.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Today, we are using idr mechanism for QP's only.
This patch prepares the qedr_idr stuctures and the idr routines for
both QP's and SRQ's.
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5_ib_create_qp_resp was never initialized and only the first 4 bytes
were written.
Fixes: 41d902cb7c ("RDMA/mlx5: Fix definition of mlx5_ib_create_qp_resp")
Cc: <stable@vger.kernel.org>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Several handlers need temporary allocations for the life of the method,
switch them to use the uverbs_alloc allocator.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
There is no reason for drivers to do this, the core code should take of
everything. The drivers will provide their information from rodata to
describe their modifications to the core's base uapi specification.
The core uses this to build up the runtime uapi for each device.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
This will allow FW to not send more data to TP (which would then need to
be buffered). Pass the negotiated TCP window scale to FW in the FLOWC WR.
Also refactor send_flowc() a bit to clean it up.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.
The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
- priv_destructor needs to switch to point to the ULP's destructor
which will then call the rdma_ndev's in the right order
- We need to be careful around the error unwind of register_netdev
as it sometimes calls priv_destructor on failure
- ULPs need to use ndo_init/uninit to ensure proper ordering
of failures around register_netdev
Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.
The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
To optimize NVME-oF READ IOPs, use a specialized WQE that combines
the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target
driver.
This reduces uP overhead per NVME-oF IO, and results in over 10%
improvement in NVME-oF 4K READ IOPs.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In c4iw_create_qp() there are several struct members which potentially
aren't inintialized like uresp.rq_key. I've fixed this code before in
in commit ae1fe07f3f ("RDMA/cxgb4: Fix stack info leak in
c4iw_create_qp()") so this time I'm just going to take a big hammer
approach and memset the whole struct to zero. Hopefully, it will stay
fixed this time.
In c4iw_create_srq() we don't clear uresp.reserved.
Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to IB protocol, there are some cases that work requests must
return the flush error completion status through the completion queue. Due
to hardware limitation, the driver needs to assist the flush process.
This patch adds the support of flush cqe for hip08 in the cases that
needed, such as poll cqe, post send, post recv and aeqe handle.
The patch also considered the compatibility between kernel and user space.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This does the same as the patch before, except for ioctl. The rules are
the same, but for the ioctl methods the core code handles setting up the
uobject.
- Retrieve the ib_dev from the uobject->context->device. This is
safe under ioctl as the core has already done rdma_alloc_begin_uobject
and so CREATE calls are entirely protected by the rwsem.
- Retrieve the ib_dev from uobject->object
- Call ib_uverbs_get_ucontext()
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse reports the following warning:
drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that sparse complains about casts to restricted __be32.
Fixes: a3cdaa69e4 ("cxgb4: Adds CPL support for Shared Receive Queues")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following warning is reported when building with
W=1:
drivers/infiniband/hw/cxgb4/cm.c:1860:5: warning: variable 'status' set but not used [-Wunused-but-set-variable]
u8 status;
^~~~~~
Fixes: 6a0b6174d3 ("rdma/cxgb4: Add support for kernel mode SRQ's")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This was missed in a few places, and was just using 0.
Also correct the spelling of HNS_ROCE_FLOW_LABEL_MASK
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch mainly uses CMD_CSQ_DESC_NUM instead of magic number in order
to improve readability.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Set for ret was missing in the error path here, resulting in incorrect
error code for modify_qp.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch mainly fills the correct value into the vlan id field of qp
context as well as update the vlan field name according to the latest
hardware user manual.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Only when the IB_QP_AV flag of attr_mask is set is it valid to assign the
related fields of the av into the qp context.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The rdma core is taking care of return the right error code when the
rdma device callbacks aren't supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the create_srq() callback
is not supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In the current implementation, the driver tries to allocate contiguous
memory, and if it fails, it falls back to 4K fragmented allocation.
Once the memory is fragmented, the first allocation might take a lot
of time, and even fail, which can cause connection failures.
This patch changes the logic to always allocate with 4K granularity,
since it's more robust and more likely to succeed.
This patch was tested with Lustre and no performance degradation
was observed.
Note: This commit eliminates the "shrinking WQE" feature. This feature
depended on using vmap to create a virtually contiguous send WQ.
vmap use was abandoned due to problems with several processors (see the
commit cited below). As a result, shrinking WQE was available only with
physically contiguous send WQs. Allocating such send WQs caused the
problems described above.
Therefore, as a side effect of eliminating the use of large physically
contiguous send WQs, the shrinking WQE feature became unavailable.
Warning example:
worker/20:1: page allocation failure: order:8, mode:0x80d0
CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<ffffffff81686d81>] dump_stack+0x19/0x1b
[<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[<ffffffff81184fae>] __get_free_pages+0xe/0x50
[<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
[<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
[<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
[<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
[<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
[<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
[<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
[<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
[<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
[<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]
Fixes: 73898db043 ("net/mlx4: Avoid wrong virtual mappings")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This clearly indicates that the input is a bitwise combination of values
in an enum, and identifies which enum contains the definition of the bits.
Special accessors are provided that handle the mandatory validation of the
allowed bits and enforce the correct type for bitwise flags.
If we had introduced this at the start then the kabi would have uniformly
used u64 data to pass flags, however today there is a mixture of u64 and
u32 flags. All places are converted to accept both sizes and the accessor
fixes it. This allows all existing flags to grow to u64 in future without
any hassle.
Finally all flags are, by definition, optional. If flags are not passed
the accessor does not fail, but provides a value of zero.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.
To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since the next patch will constify the wr pointer, do not modify the data
that pointer points at.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When posting a send work request, the work request that is posted is not
modified by any of the RDMA drivers. Make this explicit by constifying
most ib_send_wr pointers in RDMA transport drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The driver implements the modify_cq callback, but did not set the bit to
expose it to userspace.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Because the data structure of hip08 is little endian, it needs to fix the
immediate field of wqe and cqe into __le32.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In order to avoid using usleep function in lock function, we use delay
function instead of it. Besides, it also use brackets for standardized
the computed order.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When hop_num is more than three, it need to return -EINVAL. This patch
fixes it.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When create loop qp fail, it will return the correct result when
modify_qp() fails.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When init cmq fail in initial flow of RoCE, it should return the errno of
cmq_init function, not of the rest call.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When a CX5 device is configured in dual-port RoCE mode, after creating
many VFs against port 1, creating the same number of VFs against port 2
will flood kernel/syslog with something like
"mlx5_*:mlx5_ib_bind_slave_port:4266:(pid 5269): port 2 already
affiliated."
So basically, when traversing mlx5_ib_dev_list, mlx5_ib_add_slave_port()
repeatedly attempts to bind the new mpi structure to every device on the
list until it finds an unbound device.
Change the log level from warn to dbg to avoid log flooding as the warning
should be harmless.
Signed-off-by: Qing Huang <qing.huang@oracle.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:
drivers/infiniband/hw/usnic/usnic_fwd.c:95:2: warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 20 [-Wstringop-truncation]
strncpy(ufdev->name, netdev_name(ufdev->netdev),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(ufdev->name) - 1);
~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This driver doesn't provide any kernel services, it only provides
an interface via uverbs, so it should depend on, not select, uverbs
support.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch implements the srq specific verbs such as create/destroy/modify
and post_srq_recv. And adds srq specific structures and defines to t4.h
and uapi.
Also updates the cq poll logic to deal with completions that are
associated with the SRQ's.
This patch also handles kernel mode SRQ_LIMIT events as well as flushed
SRQ buffers
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds kernel mode t4_srq structures and support functions,
uapi structures and defines, as well as firmware work request structures.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch avoids that the following compiler warning is reported when
building with gcc 8 and W=1:
In function 'ocrdma_mbx_get_ctrl_attribs',
inlined from 'ocrdma_init_hw' at drivers/infiniband/hw/ocrdma/ocrdma_hw.c:3224:11:
drivers/infiniband/hw/ocrdma/ocrdma_hw.c:1368:3: warning: 'strncpy' output may be truncated copying 31 bytes from a string of length 31 [-Wstringop-truncation]
strncpy(dev->model_number,
^~~~~~~~~~~~~~~~~~~~~~~~~~
hba_attribs->controller_model_number, 31);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
We have a parallel unlocked reader and writer with ib_uverbs_get_context()
vs everything else, and nothing guarantees this works properly.
Audit and fix all of the places that access ucontext to use one of the
following locking schemes:
- Call ib_uverbs_get_ucontext() under SRCU and check for failure
- Access the ucontext through an struct ib_uobject context member
while holding a READ or WRITE lock on the uobject.
This value cannot be NULL and has no race.
- Hold the ucontext_lock and check for ufile->ucontext !NULL
This also re-implements ib_uverbs_get_ucontext() in a way that is safe
against concurrent ib_uverbs_get_context() and disassociation.
As a side effect, every access to ucontext in the commands is via
ib_uverbs_get_context() with an error check, or via the uobject, so there
is no longer any need for the core code to check ucontext on every command
call. These checks are also removed.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This approach matches the standard flow of the typical write method that
relies on the HW object to store the device and the uobject to access the
ucontext. Avoids the use of the devx_ufile2uctx in several places will
make revising the semantics of ib_uverbs_get_ucontext() in the next patch
simpler.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Expose the mlx5 flow steering parsing trees, exposing the functionality to
user space.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to set a destination that is a flow table, this can come from
the DEVX destination.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add support to set a public flow steering rule when its destination is a
TIR by using raw specification data.
The logic follows the verbs API but instead of using ib_spec(s) the raw,
device specific, description is used.
This allows supporting specialty matchers without having to define new
matches in the verbs struct based language.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce driver create and destroy flow methods on the uverbs flow
object.
This allows the driver to get its specific device attributes to match the
underlay specification while still using the generic ib_flow object for
cleanup and code sharing.
The IB object's attributes are set via the ib_set_flow() helper function.
The specific implementation for the given specification is added in
downstream patches.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce flow steering matcher object and its create and destroy methods.
This matcher object holds some mlx5 specific driver properties that
matches the underlay device specification when an mlx5 flow steering group
is created.
It will be used in downstream patches to be part of mlx5 specific create
flow method.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git
This is required to resolve dependencies of the next series of RDMA
patches.
* branch 'mellanox/mlx5-next':
net/mlx5: Add support for flow table destination number
net/mlx5: Add forward compatible support for the FTE match data
net/mlx5: Fix tristate and description for MLX5 module
net/mlx5: Better return types for CQE API
net/mlx5: Use ERR_CAST() instead of coding it
net/mlx5: Add missing SET_DRIVER_VERSION command translation
net/mlx5: Add XRQ commands definitions
net/mlx5: Add core support for double vlan push/pop steering action
net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures
net/mlx5: FW tracer, add hardware structures
net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
mlx5 core infrastructure updates and fixes.
From Eran:
- Add MPEGC (Management PCIe General Configuration) registers and btis
- Fix tristate and description for MLX5 module
rom Feras:
- Add hardware structures for the firmware tracer
From Jainbo:
- Core support for double vlan push/pop steering action
From Max:
- Add XRQ commands definitions
From Noa:
- Add missing SET_DRIVER_VERSION command translation
From Roi:
- Use ERR_CAST() instead of coding it
From Tariq:
- Better return types for CQE API
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch avoids that gcc reports the following warning when building
with W=1:
drivers/infiniband/hw/bnxt_re/ib_verbs.c:2404:4: warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove "uctx" and "pa" variables that were set but not used.
Fixes: a8b92ca1b0 ("IB/mlx5: Introduce DEVX")
Fixes: 8f06228733 ("RDMA/mlx5: Remove debug prints of VMA pointers")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Convert all users of struct acpi_reference_args to more generic
fwnode_reference_args. This will
1) avoid an ACPI specific references to device nodes with integer
arguments as well as
2) allow making references to nodes other than device nodes in ACPI.
As a by-product, convert the fwnode interger arguments to u64. The
arguments were 64-bit integers on ACPI but the fwnode arguments were
just 32-bit.
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().
Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not. A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.
Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Getting ready to hide pci_reset_bridge_secondary_bus() from the drivers.
pci_reset_bridge_secondary_bus() should only be used internally by the
PCI code itself.
Other drivers should rely on higher level pci_try_reset_bus() API.
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Current description did not include new devices. Fix that by proving the
correct generic description.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
- cxgb4 could wrongly fail MR creation due to a typo
- Various crashes if the wrong QP type is mixed in with APIs that expect
other types
- Syzkaller oops
- Using ERR_PTR and NULL together cases HFI1 to crash in some cases
- mlx5 memory leak in error unwind
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbSNzSAAoJEDht9xV+IJsarfQP/38i9pbDqWdniEhv42CisS1D
aZYygJ6yAsoEPmEI1oIGtJURte44wxHWmWO2Jbz8aTFl19tSh2cZgFfsNKImSU5A
5ON19gxOeQ/KiZPmfbgP8cgunU41DpYq3twW8NvW9u5JTl1nbFKtpWfJxKVvjlsu
PwiFrQxG43/9BrooHJc4eogJHmB77iypR3NmkagAM3oSx/d35zt+Wnw45bybIl8e
T6OvyEvNHlGLQoqE8j4JYDN6whLwr7uqtcJXv/ukjhkD4WMb8ti9QZH6FPA+8pGG
oRO5AlbWpcSHThu4tYYoThEdVMLjS5RnzOOUyMiHr7CS+MlEPrKtR5D23f3egSyU
lUETkyPfkVRSrfD9LnOrj+W6xjwNMJm2Rq/zexVnoKjRrl5asfmDTTAhWJxu7CMK
Oeb39ue6r7A3wEpaLWlqqrVnzIS+rKKKFCxqc+ni/JUTX0EPr+OcVJav+JmfjpKd
Q24sbiSkoAp7HRJnsenCX4Kv2x55ClhLktGnltgoJtzsMbT/EOO38d2LrYpWkCkX
aoi4Xpg1rOvY9biy4dDtGmawsKgEKwJ2j/xr6RQdKZZAifeabnoro/ekSj5/SquQ
vUfATYG+JEKGethWgO2A7/tgdo7artrN77w0eM0yKtNuJwUGLHCuXbDFUgFxapvm
RIXvnUEp9Q27bc2JjX3z
=WDR8
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Things have been quite slow, only 6 RC patches have been sent to the
list. Regression, user visible bugs, and crashing fixes:
- cxgb4 could wrongly fail MR creation due to a typo
- various crashes if the wrong QP type is mixed in with APIs that
expect other types
- syzkaller oops
- using ERR_PTR and NULL together cases HFI1 to crash in some cases
- mlx5 memory leak in error unwind"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/mlx5: Fix memory leak in mlx5_ib_create_srq() error path
RDMA/uverbs: Don't fail in creation of multiple flows
IB/hfi1: Fix incorrect mixing of ERR_PTR and NULL return values
RDMA/uverbs: Fix slab-out-of-bounds in ib_uverbs_ex_create_flow
RDMA/uverbs: Protect from attempts to create flows on unsupported QP
iw_cxgb4: correctly enforce the max reg_mr depth
User's supplied index is checked again total number of system pages, but
this number already includes num_static_sys_pages, so addition of that
value to supplied index causes to below error while trying to access
sys_pages[].
BUG: KASAN: slab-out-of-bounds in bfregn_to_uar_index+0x34f/0x400
Read of size 4 at addr ffff880065561904 by task syz-executor446/314
CPU: 0 PID: 314 Comm: syz-executor446 Not tainted 4.18.0-rc1+ #256
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0xef/0x17e
print_address_description+0x83/0x3b0
kasan_report+0x18d/0x4d0
bfregn_to_uar_index+0x34f/0x400
create_user_qp+0x272/0x227d
create_qp_common+0x32eb/0x43e0
mlx5_ib_create_qp+0x379/0x1ca0
create_qp.isra.5+0xc94/0x22d0
ib_uverbs_create_qp+0x21b/0x2a0
ib_uverbs_write+0xc2c/0x1010
vfs_write+0x1b0/0x550
ksys_write+0xc6/0x1a0
do_syscall_64+0xa7/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x433679
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 91 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fff2b3d8e48 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002f8 RCX: 0000000000433679
RDX: 0000000000000040 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000006d4018 R08: 00000000004002f8 R09: 00000000004002f8
R10: 00000000004002f8 R11: 0000000000000217 R12: 0000000000000000
R13: 000000000040cb00 R14: 000000000040cb90 R15: 0000000000000006
Allocated by task 314:
kasan_kmalloc+0xa0/0xd0
__kmalloc+0x1a9/0x510
mlx5_ib_alloc_ucontext+0x966/0x2620
ib_uverbs_get_context+0x23f/0xa60
ib_uverbs_write+0xc2c/0x1010
__vfs_write+0x10d/0x720
vfs_write+0x1b0/0x550
ksys_write+0xc6/0x1a0
do_syscall_64+0xa7/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 1:
__kasan_slab_free+0x12e/0x180
kfree+0x159/0x630
kvfree+0x37/0x50
single_release+0x8e/0xf0
__fput+0x2d8/0x900
task_work_run+0x102/0x1f0
exit_to_usermode_loop+0x159/0x1c0
do_syscall_64+0x408/0x590
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The buggy address belongs to the object at ffff880065561100
which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 2052 bytes inside of
4096-byte region [ffff880065561100, ffff880065562100)
The buggy address belongs to the page:
page:ffffea0001955800 count:1 mapcount:0 mapping:ffff88006c402480 index:0x0 compound_mapcount: 0
flags: 0x4000000000008100(slab|head)
raw: 4000000000008100 ffffea0001a7c000 0000000200000002 ffff88006c402480
raw: 0000000000000000 0000000080070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff880065561800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff880065561880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff880065561900: 04 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff880065561980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff880065561a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Cc: <stable@vger.kernel.org> # 4.15
Fixes: 1ee47ab3e8 ("IB/mlx5: Enable QP creation with a given blue flame index")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need for three consecutive calls to alloc_bfreg(). It can be
implemented with one function.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds support for iw_cxb4 to extend cqes from existing 32Byte
size to 64Byte.
Also includes adds backward compatibility support (for 32Byte) to work
with older libraries.
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Avoid that the following compiler warning is reported when building
with gcc 8:
drivers/infiniband/hw/hfi1/verbs.c:1896:2: warning: 'strncpy' output may be truncated copying 64 bytes from a string of length 64 [-Wstringop-truncation]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix memory leak in the error path of mlx5_ib_create_srq() by making sure
to free the allocated srq.
Fixes: c2b37f7648 ("IB/mlx5: Fix integer overflows in mlx5_ib_create_srq")
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch updates the implementation of set_mac by using
command queue instead of directly writing registers.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch updates the implementation of set_gid by using
command queue instead of directly writing registers.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In hip08, the TPQ(Timer Poll Queue) should be extended
to host memory. This patch adds the support of TPQ.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In hip08, TSQ(Transport Service Queue) should be extended
to host memory to store the doorbells. This patch adds the
support of creating TSQ, and then configured to the hardware.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch does not change any functionality but avoids that sparse
reports the following:
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c:1818:31: warning: context imbalance in 'ocrdma_destroy_qp' - different lock contexts for basic block
Compile-tested only.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Cc: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The nes infiniband driver uses current_kernel_time() to get a nanosecond
granunarity timestamp to initialize its tcp sequence counters. This is
one of only a few remaining users of that deprecated function, so we
should try to get rid of it.
Aside from using a deprecated API, there are several problems I see here:
- Using a CLOCK_REALTIME based time source makes it predictable in
case the time base is synchronized.
- Using a coarse timestamp means it only gets updated once per jiffie,
making it even more predictable in order to avoid having to access
the hardware clock source
- The upper 2 bits are always zero because the nanoseconds are at most
999999999.
For the Linux TCP implementation, we use secure_tcp_seq(), which appears
to be appropriate here as well, and solves all the above problems.
i40iw uses a variant of the same code, so I do that same thing there
for ipv4. Unlike nes, i40e also supports ipv6, which needs to call
secure_tcpv6_seq instead.
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Avoid that the compiler reports the following when building with W=1:
drivers/infiniband/hw/nes/nes_utils.c: In function 'nes_arp_table':
drivers/infiniband/hw/nes/nes_utils.c:689:9: warning: variable 'tmp_addr' set but not used [-Wunused-but-set-variable]
__be32 tmp_addr;
^~~~~~~~
drivers/infiniband/hw/nes/nes_hw.c: In function 'flush_wqes':
drivers/infiniband/hw/nes/nes_hw.c:3840:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^~~
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_setup_virt_qp':
drivers/infiniband/hw/nes/nes_verbs.c:811:6: warning: variable 'pbl_entries' set but not used [-Wunused-but-set-variable]
u32 pbl_entries;
^~~~~~~~~~~
drivers/infiniband/hw/nes/nes_verbs.c: In function 'nes_dereg_mr':
drivers/infiniband/hw/nes/nes_verbs.c:2487:6: warning: variable 'minor_code' set but not used [-Wunused-but-set-variable]
u16 minor_code;
^~~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'mini_cm_recv_pkt':
drivers/infiniband/hw/nes/nes_cm.c:2570:20: warning: variable 'tmp_saddr' set but not used [-Wunused-but-set-variable]
__be32 tmp_daddr, tmp_saddr;
^~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c:2570:9: warning: variable 'tmp_daddr' set but not used [-Wunused-but-set-variable]
__be32 tmp_daddr, tmp_saddr;
^~~~~~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'cm_event_connected':
drivers/infiniband/hw/nes/nes_cm.c:3578:22: warning: variable 'raddr' set but not used [-Wunused-but-set-variable]
struct sockaddr_in *raddr;
^~~~~
drivers/infiniband/hw/nes/nes_cm.c: In function 'cm_event_reset':
drivers/infiniband/hw/nes/nes_cm.c:3753:6: warning: variable 'ret' set but not used [-Wunused-but-set-variable]
int ret;
^~~
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In some configurations even gcc 7 cannot unravel this complexity and still
throws a warning.
Fixes: 4ab39e2f98 ("RDMA/cxgb4: Make c4iw_poll_cq_one() easier to analyze")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Enable uverbs_destroy_def_handler to be used by drivers and replace
current code to use it.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Extend the existing grh_required flag to check when AV's are handled that
a GRH is present.
Since we don't want to do query_port during the AV checks for performance
reasons move the flag into the immutable_data.
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
grh_required is intended to be a global setting where all AV's will
require a GRH, not just the sm_lid. Move the special logic to the creation
of the SM AH.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The internal flag IP_BASED_GIDS was added to a field that was being used
to hold the port Info CapabilityMask without considering the effects this
will have. Since most drivers just use the value from the HW MAD it means
IP_BASED_GIDS will also become set on any HW that sets the IBA flag
IsOtherLocalChangesNoticeSupported - which is not intended.
Fix this by keeping port_cap_flags only for the IBA CapabilityMask value
and store unrelated flags externally. Move the bit definitions for this to
ib_mad.h to make it clear what is happening.
To keep the uAPI unchanged define a new set of flags in the uapi header
that are only used by ib_uverbs_query_port_resp.port_cap_flags which match
the current flags supported in rdma-core, and the values exposed by the
current kernel.
Fixes: b4a26a2728 ("IB: Report using RoCE IP based gids in port caps")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It is incorrect to depend on set_id value to know if counters were
allocated or not. set_id_valid field is set to true when counters
were allocated. Therefore, use set_id_valid while deciding to
free counters.
Cc: <stable@vger.kernel.org> # 4.15
Fixes: aac4492ef2 ("IB/mlx5: Update counter implementation for dual port RoCE")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Clean up a little bit code to drop unused port_num parameter.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.
In this case, the affected files are in debugfs (and should therefore only
be accessible to root), and the read handlers check that *pos is zero
(meaning that at least sys_splice() can't trigger kernel memory
corruption). Because of the root requirement, this is not a security fix,
but rather a cleanup.
For the read handlers, fix it by using simple_read_from_buffer() instead
of custom logic. Add min() calls to the write handlers.
Fixes: 4a2da0b8c0 ("IB/mlx5: Add debug control parameters for congestion control")
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce the function __c4iw_poll_cq_one() such that c4iw_poll_cq_one()
becomes easier to analyze for static source code analyzers. This patch
avoids that sparse reports the following:
drivers/infiniband/hw/cxgb4/cq.c:401:36: warning: context imbalance in 'c4iw_flush_hw_cq' - unexpected unlock
drivers/infiniband/hw/cxgb4/cq.c:824:9: warning: context imbalance in 'c4iw_poll_cq_one' - different lock contexts for basic block
Compile-tested only.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>