linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-25 21:35:39 +07:00

Author	SHA1	Message	Date
Gal Pressman	f89adedaf3	RDMA/uverbs: Initialize udata struct on destroy flows Cited commit introduced the udata parameter to different destroy flows but the uapi method definition does not have udata (i.e has_udata flag is not set). As a result, an uninitialized udata struct is being passed down to the driver callbacks. Fix that by clearing the driver udata even in cases where has_udata flag is not set. Fixes: `c4367a2635` ("IB: Pass uverbs_attr_bundle down ib_x destroy path") Cc: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Co-developed-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-02 17:07:02 -03:00
Shiraz Saleem	7872168a83	RDMA/umem: Handle page combining avoidance correctly in ib_umem_add_sg_table() The flag update_cur_sg tracks whether contiguous pages from a new set of page_list pages can be merged into the SGE passed into ib_umem_add_sg_table(). If this flag is true, but the total segment length exceeds the max_seg_size supported by HW, we avoid combining to this SGE and move to a new SGE (x) and merge 'len' pages to it. However, if i < npages, the next iteration can incorrectly merge 'len' contiguous pages into x instead of into a new SGE since update_cur_sg is still true. Reset update_cur_sg to false always after the check to merge pages into the first SGE passed in to ib_umem_add_sg_table(). Also, prevent a new SGE's segment length from ever exceeding HW max_seg_sz. There is a crash on hfi1 as result of this where-in max_seg_sz is defaulting to 64K. Due to above bug, unfolding SGE's in __ib_umem_release points to a bad page ptr. TEST comp-wfr.perfnative.STL-22166-WDT _ perftest native 2-Write_4097QP_4MB STARTING at 1555387093 BUG: Bad page state in process ib_write_bw pfn:7ebca0 page:ffffcd675faf2800 count:0 mapcount:1 mapping:0000000000000000 index:0x1 flags: 0x17ffffc0000000() raw: 0017ffffc0000000 dead000000000100 dead000000000200 0000000000000000 raw: 0000000000000001 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount CPU: 18 PID: 15853 Comm: ib_write_bw Tainted: G B 5.1.0-rc4 #1 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015 Call Trace: dump_stack+0x5a/0x73 bad_page+0xf5/0x10f free_pcppages_bulk+0x62c/0x680 free_unref_page+0x54/0x70 __ib_umem_release+0x148/0x1a0 [ib_uverbs] ib_umem_release+0x22/0x80 [ib_uverbs] rvt_dereg_mr+0x67/0xb0 [rdmavt] ib_dereg_mr_user+0x37/0x60 [ib_core] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] uobj_destroy+0x4d/0x60 [ib_uverbs] __uobj_get_destroy+0x33/0x50 [ib_uverbs] __uobj_perform_destroy+0xa/0x30 [ib_uverbs] ib_uverbs_dereg_mr+0x66/0x90 [ib_uverbs] ib_uverbs_write+0x3e1/0x500 [ib_uverbs] vfs_write+0xad/0x1b0 ksys_write+0x5a/0xd0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `d10bcf947a` ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs") Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-05-02 16:50:12 -03:00
Saeed Mahameed	c515e70d67	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Aya: Enable general events on all physical link types and restrict general event handling of subtype DELAY_DROP_TIMEOUT in mlx5 rdma driver to ethernet links only as it was intended. 2) From Eli: Introduce low level bits for prio tag mode 3) From Maor: Low level steering updates to support RDMA RX flow steering and enables RoCE loopback traffic when switchdev is enabled. 4) From Vu and Parav: Two small mlx5 core cleanups 5) From Yevgeny add HW definitions of geneve offloads Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-05-01 13:57:48 -07:00
Gal Pressman	923abb9d79	RDMA/core: Introduce RDMA subsystem ibdev_* print functions Similarly to dev/netdev/etc printk helpers, add standard printk helpers for the RDMA subsystem. Example output: efa 0000:00:06.0 efa_0: Hello World! efa_0: Hello World! (no parent device set) (NULL ib_device): Hello World! (ibdev is NULL) Cc: Jason Baron <jbaron@akamai.com> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Suggested-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-05-01 12:29:28 -04:00
Aya Levin	6cfdc7e468	IB/mlx5: Restrict 'DELAY_DROP_TIMEOUT' subtype to Ethernet interfaces Subtype 'DELAY_DROP_TIMEOUT' (under 'GENERAL' event) is restricted to Ethernet interfaces. This patch doesn't change functionality or breaks current flow. In the downstream patch, non Ethernet (like IB) interfaces will receive 'GENERAL' event. Fixes: `5d3c537f90` ("net/mlx5: Handle event of power detection in the PCIE slot") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-29 16:55:05 -07:00
Vu Pham	c42260f195	net/mlx5: Separate and generalize dma device from pci device The mlx5 Sub-Function (SF) sub device will be introduced in subsequent patches. It will be created as mediated device and belong to mdev bus. It is necessary to treat dma operations on PF, VF and SF in uniform way, hence reduce the dependency on pdev pci dev struct and work directly out of newly introduced 'struct device' from previous patch. This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-29 16:55:05 -07:00
Linus Torvalds	6a5c5d26c4	rdma: fix build errors on s390 and MIPS due to bad ZERO_PAGE use The parameter to ZERO_PAGE() was wrong, but since all architectures except for MIPS and s390 ignore it, it wasn't noticed until 0-day reported the build error. Fixes: `67f269b37f` ("RDMA/ucontext: Fix regression with disassociate") Cc: stable@vger.kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-04-29 09:48:53 -07:00
Linus Torvalds	14f974d7f0	5.1 Third RC pull request One core bug fix and a few driver ones - FRWR memory registration for hfi1/qib didn't work with with some iovas causing a NFSoRDMA failure regression due to a fix in the NFS side - A command flow error in mlx5 allowed user space to send a corrupt command (and also smash the kernel stack we've since learned) - Fix a regression and some bugs with device hot unplug that was discovered while reviewing Andrea's patches - hns has a failure if the user asks for certain QP configurations -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzFkh8ACgkQOG33FX4g mxrWDQ/8CFK0TNGIf+LTQk2urQ5XAT0amNDNjEvi5kT4Vk2PFdkT5IZxlfK2FU+W 68FKzP0zpUfSgz83BS26wBH939mJZV+4hUE/6ESyHtsEV9Hsin1zIgrraiad0l4E WOXQMB76rIzKLj1Ws1G8udW7Tr4d9tm0kNb/PQhlhZW8+yt6lsAcJRdoetKT+kYj WaSqJ+U2Y1LhOxHfc+w3M8NJOvIW3qx9ju7sx2RyIYxU46M4f4r+pT8Z25LnMrh1 7PoOsfoDXZlng6UNueSmM1glTlRQDbiy3XdW4wQcvQABmmJfSLOLf9beeSn6pgPC YfNT6fznOTPGUrLhpiMMSsA5R6S/4cGZ9CVpGuojGl7VOWu/fr/Aja3JY2krNpWn jIcvh6nnGg5GuGTg/ZCmBYyAF22xbFmEmV7K0FP+dXZJyDVEiuC02j+JkTCknZYJ DaqzV/K/l1ROlKD+CBwWewrDztXjnxu3BvnNfMeAE9C8X/AGNdNY/86/IdIAgJSe QRrjf4rV8dqvb0i7lgkEe7swjwLoocjcM6OqMW42J35HUXjnkytrNhhZcgtQzSsq M1SM8ascnXE5OxIKfuAWQdHRR46rkgZVIsf8JLXaJQp+ZP55uiq355txwkeKgYrg oyC/7yuADZtXwEYsMDGgbI1RMpgMlAyAkDoPEumSol2LtmUNSgk= =K4Hb -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "One core bug fix and a few driver ones - FRWR memory registration for hfi1/qib didn't work with with some iovas causing a NFSoRDMA failure regression due to a fix in the NFS side - A command flow error in mlx5 allowed user space to send a corrupt command (and also smash the kernel stack we've since learned) - Fix a regression and some bugs with device hot unplug that was discovered while reviewing Andrea's patches - hns has a failure if the user asks for certain QP configurations" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/hns: Bugfix for mapping user db RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page IB/mlx5: Fix scatter to CQE in DCT QP creation IB/rdmavt: Fix frwr memory registration	2019-04-28 10:00:45 -07:00
Johannes Berg	8cb081746c	netlink: make validation more configurable for future strictness We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be everything. The current _strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to _parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:07:21 -04:00
Michal Kubecek	ae0be8de9a	netlink: make nla_nest_start() add NLA_F_NESTED flag Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 \| NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-27 17:03:44 -04:00
David S. Miller	8b44836583	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-25 23:52:29 -04:00
Matthew Wilcox	b9b0f34531	uverbs: Convert idr to XArray The word 'idr' is scattered throughout the API, so I haven't changed it, but the 'idr' variable is now an XArray. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 12:27:11 -03:00
Matthew Wilcox	a7b36d5fa8	ib/bnxt: Remove mention of idr_alloc from comment Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 12:02:18 -03:00
Lijun Ou	2557fabd6e	RDMA/hns: Bugfix for mapping user db When the maximum send wr delivered by the user is zero, the qp does not have a sq. When allocating the sq db buffer to store the user sq pi pointer and map it to the kernel mode, max_send_wr is used as the trigger condition, while the kernel does not consider the max_send_wr trigger condition when mapmping db. It will cause sq record doorbell map fail and create qp fail. The failed print information as follows: hns3 0000:7d:00.1: Send cmd: tail - 418, opcode - 0x8504, flag - 0x0011, retval - 0x0000 hns3 0000:7d:00.1: Send cmd: 0xe59dc000 0x00000000 0x00000000 0x00000000 0x00000116 0x0000ffff hns3 0000:7d:00.1: sq record doorbell map failed! hns3 0000:7d:00.1: Create RC QP failed Fixes: `0425e3e6e0` ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:40:04 -03:00
Jason Gunthorpe	1d045aa76f	Merge branch 'mlx5_tir_icm' into rdma.git for-next Ariel Levkovich says: ==================== The series exposes the ICM address of the receive transport interface (TIR) of Raw Packet and RSS QPs to the user since they are required to properly create and insert steering rules that direct flows to these QPs. ==================== For dependencies this branch is based on mlx5-next from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'mlx5_tir_icm': IB/mlx5: Expose TIR ICM address to user space net/mlx5: Introduce new TIR creation core API net/mlx5: Expose TIR ICM address in command outbox Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:33:00 -03:00
Ariel Levkovich	1f1d6abbf0	IB/mlx5: Expose TIR ICM address to user space This patch exposes the TIR ICM address of raw packet and RSS QPs to user space. In order to pass the new field, the patch extends the mlx5 specific QP creation response structure and fills it with the icm address returned by the FW command, if available. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-25 10:03:19 -03:00
Eric Biggers	877b5691f2	crypto: shash - remove shash_desc::flags The flags field in 'struct shash_desc' never actually does anything. The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP. However, no shash algorithm ever sleeps, making this flag a no-op. With this being the case, inevitably some users who can't sleep wrongly pass MAY_SLEEP. These would all need to be fixed if any shash algorithm actually started sleeping. For example, the shash_ahash_() functions, which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP from the ahash API to the shash API. However, the shash functions are called under kmap_atomic(), so actually they're assumed to never sleep. Even if it turns out that some users do need preemption points while hashing large buffers, we could easily provide a helper function crypto_shash_update_large() which divides the data into smaller chunks and calls crypto_shash_update() and cond_resched() for each chunk. It's not necessary to have a flag in 'struct shash_desc', nor is it necessary to make individual shash algorithms aware of this at all. Therefore, remove shash_desc::flags, and document that the crypto_shash_() functions can be called from any context. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2019-04-25 15:38:12 +08:00
Jason Gunthorpe	449a224c10	Merge branch 'rdma_mmap' into rdma.git for-next Jason Gunthorpe says: ==================== Upon review it turns out there are some long standing problems in BAR mapping area: * BAR pages intended for read-only can be switched to writable via mprotect. * Missing use of rdma_user_mmap_io for the mlx5 clock BAR page. * Disassociate causes SIGBUS when touching the pages. * CPU pages are being mapped through to the process via remap_pfn_range instead of the more appropriate vm_insert_page, causing weird behaviors during disassociation. This series adds the missing VM_* flag manipulation, adds faulting a zero page for disassociation and revises the CPU page mappings to use vm_insert_page. ==================== For dependencies this branch is based on for-rc from git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git * branch 'rdma_mmap': RDMA: Remove rdma_user_mmap_page RDMA/mlx5: Use get_zeroed_page() for clock_info RDMA/ucontext: Fix regression with disassociate RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages RDMA/mlx5: Do not allow the user to write to the clock page Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 16:20:34 -03:00
Jason Gunthorpe	4eb6ab13b9	RDMA: Remove rdma_user_mmap_page Upon further research drivers that want this should simply call the core function vm_insert_page(). The VMA holds a reference on the page and it will be automatically freed when the last reference drops. No need for disassociate to sequence the cleanup. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 16:18:36 -03:00
Jason Gunthorpe	ddcdc368b1	RDMA/mlx5: Use get_zeroed_page() for clock_info get_zeroed_page() returns a virtual address for the page which is better than allocating a struct page and doing a permanent kmap on it. Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 13:40:50 -03:00
Jason Gunthorpe	67f269b37f	RDMA/ucontext: Fix regression with disassociate When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Fixes: `5f9794dc94` ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 13:32:25 -03:00
Jason Gunthorpe	d5e560d3f7	RDMA/mlx5: Use rdma_user_map_io for mapping BAR pages Since mlx5 supports device disassociate it must use this API for all BAR page mmaps, otherwise the pages can remain mapped after the device is unplugged causing a system crash. Cc: stable@vger.kernel.org Fixes: `5f9794dc94` ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-04-24 13:06:40 -03:00
Jason Gunthorpe	c660133c33	RDMA/mlx5: Do not allow the user to write to the clock page The intent of this VMA was to be read-only from user space, but the VM_MAYWRITE masking was missed, so mprotect could make it writable. Cc: stable@vger.kernel.org Fixes: `5c99eaecb1` ("IB/mlx5: Mmap the HCA's clock info to user-space") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-04-24 13:06:24 -03:00
John Fleck	3c176c9d72	IB/hfi1: Remove reference to RHF.VCRCErr The bit VCRCErr in the receive header flag is actually a reserved field. Remove bit operations on this field. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: John Fleck <john.fleck@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:48:11 -03:00
Mike Marciniszyn	a9c62e0078	IB/hfi1: Add selected Rcv counters These counters are required for error analysis and debug. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:48:10 -03:00
Mike Marciniszyn	d40f69c9b9	IB/{rdmavt, qib, hfi1}: Use new routine to release reference counts The reference count adjustments on reference count completion are open coded throughout. Add a routine to do all reference count adjustments and use. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Mike Marciniszyn	52cdbcc2b1	IB/rdmavt: Use more efficient allowed_ops QP creation already records the allowed_ops. Take advantage of that single field to replace multiple qp_type specific tests. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Mike Marciniszyn	715ab1a862	IB/rdmavt: Fix ab/ba include issues The currently include file ordering for rdmavt headers has an ab/ba include issue the precludes using inlines from rdma_vt.h in rdmavt_qp.h. At the heart of the issue is that rdma_vt.h includes rdmavt_qp.h. Fix the ordering issue by adjusting rdma_vt.h to not require rdmavt_qp.h and move qp related inlines to rdmavt_qp.h. Additionally, promote rvt_mmap_info to rdma_vt.h since it is shared by rdmavt_cq.h and rdmavt_qp.h. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Mike Marciniszyn	62644c1d2b	IB/hfi1: Make opfn.h self sufficient The opfn.h include file build-ablility depends on the including file having the correct includes. Fix by making opfn.h self sufficient. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:49 -03:00
Kaike Wan	ea752bc5e5	IB/{rdmavt, hfi1): Miscellaneous comment fixes This patch fixes miscellaneous comment errors. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:31:48 -03:00
Josh Collier	07c5ba9124	IB/hfi1: Add debugfs to control expansion ROM write protect Some kernels now enable CONFIG_IO_STRICT_DEVMEM which prevents multiple handles to PCI resource0. In order to continue to support expansion ROM updates while the driver is loaded, the driver must now provide an interface to control the expansion ROM write protection. This patch adds an exprom_wp debugfs interface that allows the hfi1_eprom user tool to disable the expansion ROM write protection by opening the file and writing a '1'. The write protection is released when writing a '0' or automatically re-enabled when the file handle is closed. The current implementation will only allow one handle to be opened at a time across all hfi1 devices. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 11:26:41 -03:00
Leon Romanovsky	5742582222	RDMA/hns: Remove asynchronic QP destroy Verbs destroy callbacks are synchronous operations and can't be delayed. The expectation is that after driver returned from destroy function, the memory can be freed and user won't be able to access it again. Ditch workqueue implementation used in HNS driver. Fixes: `d838c481e0` ("IB/hns: Fix the bug when destroy qp") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: oulijun <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 10:55:31 -03:00
Parav Pandit	5d7ed2f27b	RDMA/cma: Consider scope_id while binding to ipv6 ll address When two netdev have same link local addresses (such as vlan and non vlan), two rdma cm listen id should be able to bind to following different addresses. listener-1: addr=lla, scope_id=A, port=X listener-2: addr=lla, scope_id=B, port=X However while comparing the addresses only addr and port are considered, due to which 2nd listener fails to listen. In below example of two listeners, 2nd listener is failing with address in use error. $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1 -p 4545& $ rping -sv -a fe80::268a:7ff:feb3:d113%ens2f1.200 -p 4545 rdma_bind_addr: Address already in use To overcome this, consider the scope_ids as well which forms the accurate IPv6 link local address. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 10:39:16 -03:00
Parav Pandit	823b23da71	IB/core: Allow vlan link local address based RoCE GIDs IPv6 link local address for a VLAN netdevice has nothing to do with its resemblance with the default GID, because VLAN link local GID is in different layer 2 domain. Now that RoCE MAD packet processing and route resolution consider the right GID index, there is no need for an unnecessary check which prevents the addition of vlan based IPv6 link local GIDs. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-24 10:08:15 -03:00
Saeed Mahameed	c3bdd5e651	Linux 5.1-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ 7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z 4+dwGBk= =pyXR -----END PGP SIGNATURE----- Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next Linux 5.1-rc1 We forgot to reset the branch last merge window thus mlx5-next is outdated and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next branch with 5.1-rc1. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-22 15:25:39 -07:00
Mark Bloch	5fb58c9e2f	RDMA/mlx5: Don't create IB representors when in multiport RoCE mode Switchdev mode and mutiport RoCE mode aren't compatible at this point. Don't create IB reps when a user switches to switchdev mode and the driver operates in that mode. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	d3b5cc1cd9	RDMA/mlx5: Initialize roce port info before multiport master init When working in mutliport RoCE mode it is possible to attach a slave before the master. In that case the slave is waiting for a master to be attached. When the master is attached it goes over the list of waiting slaves, finds a slave that is compatible and tries to bind it to itself. The call stack is: mlx5_ib_init_multiport_master() -> mlx5_ib_bind_slave_port() In the bind function we will create a netdev notifier, but this is done before we initialize the RoCE structure (this is done at a later stage by the master in the ROCE stage). Once events are delivered to that notifier we will use mlx5_ib_get_native_port_mdev() to get the actual port and as the native port is zero we will access an invalid index in the port structure. Move the RoCE structure initialization to an earlier stage. Fixes: `32f69e4be2` ("{net, IB}/mlx5: Manage port association for multiport RoCE") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	7f575103b0	RDMA/mlx5: Allow DEVX and raw creation flow on reps Remove the limitations that were in place and provide support for DEVX and raw flow creation on reps. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Maor Gottlieb	56e5acd405	RDMA/mlx5: Add query e-switch vport context to devx white list Add MLX5_OP_QUERY_ESW_VPORT_CONTEXT to devx white list. It will be allowed only if HCA_CAP.eswitch_manager==1. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	52438be441	RDMA/mlx5: Allow inserting a steering rule to the FDB Allow this only via mlx5 raw create flow API, legacy verbs are not supported. To accommodate that, we add a new attribute to matcher creation to indicate the type of flow table to be used. MLX5_IB_ATTR_FLOW_MATCHER_FT_TYPE With this new attribute MLX5_IB_ATTR_FLOW_MATCHER_FLOW_FLAGS is no longer needed, we keep it for compatibility but at most only a single attribute can be passed of the two. When inserting a flow rule to the FDB we require that a DEVX FT is provided as a destination, no other configuration is allowed. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	3b70508a6b	RDMA/mlx5: Create flow table with max size supported Instead of failing the request, just use the supported number of flow entries. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Mark Bloch	13a4376568	RDMA/mlx5: Access the prio bypass inside the FDB flow table namespace Now that we have a specific prio inside the FDB namespace allow retrieving it from the RDMA side. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 15:24:05 -03:00
Parav Pandit	2e5b8a0116	RDMA/core: Add a netlink command to change net namespace of rdma device Provide an option to change the net namespace of a rdma device through a netlink command. When multiple rdma devices exists in a system, and when containers are used, this will limit rdma device visibility to a specified net namespace. An example command to change net namespace of mlx5_1 device to the previously created net namespace 'foo' is: $ ip netns add foo $ rdma dev set mlx5_1 netns foo Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 14:44:58 -03:00
Parav Pandit	decbc7a6b0	RDMA/core: Introduce a helper function to change net namespace of rdma device Introduce a helper function that changes rdma device's net namespace which performs mini disable/enable sequence to have device visible only in assigned net namespace. Device unregistration, device rename and device change net namespace may be invoked concurrently. (a) device unregistration needs to wait if a device change (rename or net namespace change) operation is in progress. (b) device net namespace change should not proceed if the unregistration has started. (c) while one cpu is changing device net namespace, other cpu should not be able to rename or change net namespace. To address above concurrency, (a) Use unreg_mutex to synchronize between ib_unregister_device() and net namespace change operation (b) In cases where unregister_device() has started unregistration before change_netns got chance to acquire unreg_mutex, validate the refcount - if it dropped to zero, abort the net namespace change operation. Finally use the helper function to change net namespace of ib device to move the device back to init_net when such net is deleted. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 14:44:58 -03:00
Parav Pandit	3042492bd1	RDMA/core: Avoid freeing netdevs in disable_device() So we can use the disable_device() helper while changing the net namespace of the rdma device in a subsequent patch, move free_netdevs() out of it. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 14:44:58 -03:00
Chengguang Xu	2d95984977	infiniband/qib: Fix typo in comment Fix typo 'faspath' -> 'pastpath'. Signed-off-by: Chengguang Xu <cgxu519@gmx.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-22 14:08:15 -03:00
Andrea Arcangeli	04f5866e41	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3b4e6" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: `86039bd3b4` ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Jann Horn <jannh@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Jann Horn <jannh@google.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-04-19 09:46:05 -07:00
David Howells	5dd50aaeb1	Make anon_inodes unconditional Make the anon_inodes facility unconditional so that it can be used by core VFS code and pidfd code. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [christian@brauner.io: adapt commit message to mention pidfds] Signed-off-by: Christian Brauner <christian@brauner.io>	2019-04-19 14:03:11 +02:00
Colin Ian King	ff5eefe6d3	RDMA/cxgb4: Fix spelling mistake "immedate" -> "immediate" There is a spelling mistake in a module parameter description. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-18 03:17:42 -03:00
Guy Levi	7249c8ea22	IB/mlx5: Fix scatter to CQE in DCT QP creation When scatter to CQE is enabled on a DCT QP it corrupts the mailbox command since it tried to treat it as as QP create mailbox command instead of a DCT create command. The corrupted mailbox command causes userspace to malfunction as the device doesn't create the QP as expected. A new mlx5 capability is exposed to user-space which ensures that it will not enable the feature on DCT without this fix in the kernel. Fixes: `5d6ff1babe` ("IB/mlx5: Support scatter to CQE for DC transport type") Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-18 03:13:41 -03:00
David S. Miller	6b0a7f84ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-17 11:26:25 -07:00
Colin Ian King	a6d2a5a92e	RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure Currently if alloc_skb fails to allocate the skb a null skb is passed to t4_set_arp_err_handler and this ends up dereferencing the null skb. Avoid the NULL pointer dereference by checking for a NULL skb and returning early. Addresses-Coverity: ("Dereference null return") Fixes: `b38a0ad8ec` ("RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-16 08:03:17 -03:00
Josh Collier	7c39f7f671	IB/rdmavt: Fix frwr memory registration Current implementation was not properly handling frwr memory registrations. This was uncovered by commit 27f26cec761das ("xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR)") in which xprtrdma, which is used for NFS over RDMA, started failing as it was the first ULP to modify the ib_mr iova resulting in the NFS server getting REMOTE ACCESS ERROR when attempting to perform RDMA Writes to the client. The fix is to properly capture the true iova, offset, and length in the call to ib_map_mr_sg, and then update the iova when processing the IB_WR_REG_MEM on the send queue. Fixes: `a41081aa59` ("IB/rdmavt: Add support for ib_map_mr_sg") Cc: stable@vger.kernel.org Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Josh Collier <josh.d.collier@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-16 07:11:29 -03:00
Colin Ian King	1db86318c4	RDMA/mlx5: Check for error return in flow_rule rather than err Currently when the call to create_flow_rule_vport_sq fails, the error check is being performed on err rather than on the return pointer flow_rule. The return flow_rule maybe NULL (which is not considered an error) or an error code, so check for the error on flow_rule. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `d5ed8ac34c` ("RDMA/mlx5: Move default representors SQ steering to rule to modify QP") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-12 11:19:49 -03:00
Devesh Sharma	1c00d7bc96	RDMA/ocrdma: Remove use of idr use pci bdf instead Removing the use of IDR variable just to name the function ids. Using the PCI_FUNC(pdev->devfn) instead to create the device name, associated resources and to print driver into at various places. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-12 10:59:02 -03:00
Kaike Wan	d737b25b1a	IB/hfi1: Do not flush send queue in the TID RDMA second leg When a QP is put into error state, the send queue will be flushed. This mechanism is implemented in both the first and the second leg of the send engine. Since the second leg is only responsible for data transactions in the KDETH space for the TID RDMA WRITE request, it should not perform the flushing of the send queue. This patch removes the flushing function of the second leg, but still keeps the bailing out of the QP if it is put into error state. Fixes: `70dcb2e3dc` ("IB/hfi1: Add the TID second leg send packet builder") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:09:30 -03:00
Mark Bloch	fb652d3299	RDMA/mlx5: Remove VF representor profile Now that we have a single IB device with multiple ports we can remove the VF representor profile. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:40 -03:00
Mark Bloch	26628e2d58	RDMA/mlx5: Move to single device multiport ports in switchdev mode Move from IB device (representor) per virtual function to single IB device with port per virtual function (port 1 represents the uplink). As number of ports is a static property of an IB device, declare the IB device with as many port as the possible according to the PCI bus. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	a989ea01cb	RDMA/mlx5: Move SMI caps logic We store the SMI information in the core device's struct, make sure we set that information only once (and not per port), while here make the for loop based on the actual size of the array. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	35b0aa67b2	RDMA/mlx5: Refactor netdev affinity code The design of representors is such that once an IB representor is created, the netdev of representor already exists, we can use that fact to simplify the netdev affinity code. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	d5ed8ac34c	RDMA/mlx5: Move default representors SQ steering to rule to modify QP Currently the steering for SQs created on representors is done on creation, once we move to representors as ports of an IB device we need the port argument which is given only at the modify QP stage, adjust the code appropriately. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	6a4d00be08	RDMA/mlx5: Move rep into port struct In preparation of moving into a model of single IB device multiple ports move rep to be part of the port structure. We mark a representor device by setting is_rep, no functional change with this patch. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	5d8f6a0e92	RDMA/mlx5: Use correct size for device resources On allocation we use the array size and on destruction num_ports, use the array size of destruction as well, in this context the array corresponds to the native/actual ports on the NIC so no need to adjust this logic for representors. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	da796ccb3e	RDMA/mlx5: Move ports allocation to outside of INIT stage In downstream patches we will need access to the ports before doing any stages, in order to set net device per representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	4a6dc8552a	RDMA/mlx5: Free IB device on remove Simplify the code and move the deallocation of the IB device into the remove function. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Mark Bloch	95579e785a	RDMA/mlx5: Move netdev info into the port struct Netdev info is stored in a separate array and holds data relevant on a per port basis, move it to be part of the port struct. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 15:05:39 -03:00
Jason Gunthorpe	5331fa0db7	Merge branch 'mlx5-next' into rdma.git for-next From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies on the next series * branch 'mlx5-next': net/mlx5: E-Switch, add a new prio to be used by the RDMA side net/mlx5: E-Switch, don't use hardcoded values for FDB prios net/mlx5: Fix false compilation warning net/mlx5: Expose MPEIN (Management PCIE INfo) register layout net/mlx5: Add rate limit print macros net/mlx5: Add explicit bar address field net/mlx5: Replace dev_err/warn/info by mlx5_core_err/warn/info net/mlx5: Use dev->priv.name instead of dev_name net/mlx5: Make mlx5_core messages independent from mdev->pdev net/mlx5: Break load_one into three stages net/mlx5: Function setup/teardown procedures net/mlx5: Move health and page alloc init to mdev_init net/mlx5: Split mdev init and pci init net/mlx5: Remove redundant init functions parameter net/mlx5: Remove spinlock support from mlx5_write64 net/mlx5: Remove unused MLX5_*_DOORBELL_LOCK macros Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-10 14:59:27 -03:00
Steve Wise	ab7efbe24b	RDMA/cxgb4: Use ib_device_set_netdev() cxgb4 has a simple non-dynamic use of get_netdev, so conversion is straightforward. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-09 10:14:54 -03:00
Jason Gunthorpe	4b38da75e0	RDMA/drivers: Convert easy drivers to use ib_device_set_netdev() Drivers that never change their ndev dynamically do not need to use the get_netdev callback. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Acked-by: Adit Ranadive <aditr@vmware.com>	2019-04-09 10:14:54 -03:00
David Ahern	1550c17193	ipv4: Prepare rtable for IPv6 gateway To allow the gateway to be either an IPv4 or IPv6 address, remove rt_uses_gateway from rtable and replace with rt_gw_family. If rt_gw_family is set it implies rt_uses_gateway. Rename rt_gateway to rt_gw4 to represent the IPv4 version. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-04-08 15:22:40 -07:00
Yangyang Li	00fb67ec6b	RDMA/hns: Bugfix for SCC hem free The method of hem free for SCC context is different from qp context. In the current version, if free SCC hem during the execution of qp free, there may be smmu error as below: arm-smmu-v3 arm-smmu-v3.1.auto: event 0x10 received: arm-smmu-v3 arm-smmu-v3.1.auto: 0x00007d0000000010 arm-smmu-v3 arm-smmu-v3.1.auto: 0x000012000000017c arm-smmu-v3 arm-smmu-v3.1.auto: 0x00000000000009e0 arm-smmu-v3 arm-smmu-v3.1.auto: 0x0000000000000000 As SCC context is still used by hardware after qp free, we can solve this problem by removing SCC hem free from hns_roce_qp_free. Fixes: `6a157f7d1b` ("RDMA/hns: Add SCC context allocation support for hip08") Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:42:16 -03:00
Lijun Ou	4772e03d23	RDMA/hns: Fix bug that caused srq creation to fail Due to the incorrect use of the seg and obj information, the position of the mtt is calculated incorrectly, and the free space of the page is not enough to store the entire mtt, resulting in access to the next page. This patch fixes this problem. Unable to handle kernel paging request at virtual address ffff00006e3cd000 ... Call trace: hns_roce_write_mtt+0x154/0x2f0 [hns_roce] hns_roce_buf_write_mtt+0xa8/0xd8 [hns_roce] hns_roce_create_srq+0x74c/0x808 [hns_roce] ib_create_srq+0x28/0xc8 Fixes: `0203b14c4f` ("RDMA/hns: Unify the calculation for hem index in hip08") Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:27:58 -03:00
chenglang	2b277dae06	RDMA/hns: Support to create 1M srq queue In mhop 0 mode, 64bt_num queues can be supported. In mhop 1 mode, 32Kbt_num queues can be supported. Config srqc_hop_num to 1 to support 1M SRQ queues. Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:26:44 -03:00
Shiraz Saleem	d0b5c01bb4	RDMA/umem: Use correct value for SG entries in sg_copy_to_buffer() With page combining, the assumption that number of SG entries in umem SGL equal to number of system pages in umem no longer holds. umem->sg_nents tracks the SG entries in umem SGL. Use it in sg_pcopy_to_buffer() as opposed to ib_umem_num_pages(umem). Fixes: `d10bcf947a` ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs") Reported-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Lijun Ou	e1c9a0dc29	RDMA/hns: Dump detailed driver-specific CQ This patch adds support of resource track for hip08 and take dumping cq context state used for debugging as an example. More resources track supports for hns driver will be added in future. The output should be as follows. $ rdma res show cq dev hnseth0 -d dev hnseth0 cqe 1023 users 2 poll-ctx WORKQUEUE pid 0 comm [ib_core] drv_state 2 drv_ceq n 0 drv_cqn 0 drv_hopnum 1 drv_pi 0 drv_ci 0 drv_coalesce 0 drv_period 0 drv_cnt 0 Signed-off-by: Tao Tian <tiantao6@huawei.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: chenglang <chenglang@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Leon Romanovsky	68e326dea1	RDMA: Handle SRQ allocations by IB/core Convert SRQ allocation from drivers to be in the IB/core Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Leon Romanovsky	d345691471	RDMA: Handle AH allocations by IB/core Simplify drivers by ensuring lifetime of ib_ah object. The changes in .create_ah() go hand in hand with relevant update in .destroy_ah(). We will use this opportunity and convert .destroy_ah() to don't fail, as it was suggested a long time ago, because there is nothing to do in case of failure during destroy. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Jason Gunthorpe	feec576a6a	IB: When attrs.udata/ufile is available use that instead of uobject The ucontext and ufile should not be accessed via the uobject, all these cases have an attrs so use that instead. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Jason Gunthorpe	e79c9c6062	IB/mlx5: Remove references to uboject->context These should all go through udata now. Add mlx5_udata_to_mdev to convert a udata into the struct mlx5_ib_dev as these call sites require. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Leon Romanovsky	9e886b39a7	RDMA/nldev: Return device protocol Add new RDMA_NLDEV_ATTR_DEV_PROTOCOL attribute to give ability for UDEV rules create IB device stable names based on link type protocol. The assumption that devices like mlx4 with duality in their link type under one IB device struct won't be allowed in the future. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:25 -03:00
Leon Romanovsky	c87e65cfb9	RDMA/cm: Move debug counters to be under relevant IB device The sysfs layout is created by CM incorrectly presented RDMA devices with InfiniBand link layer. Layout of such devices represents device tree of connections. By moving CM statistics to be under relevant port of IB device, we will fix the following issues: * Symlink name - It used device name instead of specific identifier. * Target location - It was supposed to point to PCI-ID/infiniband_cm/ instead of PCI-ID/infiniband/ * Target name - It created extra device file under already existing device folder, e.g. mlx5_0/mlx5_0 * Crash during boot with RDMA persistent naming patches. sysfs: cannot create duplicate filename '/class/infiniband_cm/mlx5_0' CPU: 29 PID: 433 Comm: modprobe Not tainted 5.0.0-rc5+ #178 Call Trace: dump_stack+0xcc/0x180 sysfs_warn_dup.cold.3+0x17/0x2d sysfs_do_create_link_sd.isra.2+0xd0/0xf0 device_add+0x7cb/0x1450 device_create_groups_vargs+0x1ae/0x220 device_create+0x93/0xc0 cm_add_one+0x38f/0xf60 [ib_cm] add_client_context+0x167/0x210 [ib_core] enable_device_and_get+0x230/0x3f0 [ib_core] ib_register_device+0x823/0xbf0 [ib_core] __mlx5_ib_add+0x45/0x150 [mlx5_ib] mlx5_ib_add+0x1b3/0x5e0 [mlx5_ib] mlx5_add_device+0x130/0x3a0 [mlx5_core] mlx5_register_interface+0x1a9/0x270 [mlx5_core] do_one_initcall+0x14f/0x5de do_init_module+0x247/0x7c0 load_module+0x4c2f/0x60d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe After this change: [leonro@server ~]$ ls -al /sys/class/infiniband/ibp0s12f0/ports/1/ drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_duplicates drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_rx_msgs drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_msgs drwxr-xr-x 2 root root 0 Mar 11 11:17 cm_tx_retries Fixes: `110cf374a8` ("infiniband: make cm_device use a struct device and not a kobject.") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:24 -03:00
Colin Ian King	4d2e11d42f	opa_vnic: fix check on record->event, incorrect operator used The check on record->event is always true because the wrong operator is being used, used && instead of \|\| Addresses-Coverity: ("Constant expression result") Fixes: `fae7a699a9` ("opa_vnic: Convert vport_idr to XArray") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:24 -03:00
Shiraz Saleem	d10bcf947a	RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs Combine contiguous regions of PAGE_SIZE pages into single scatter list entry while building the scatter table for a umem. This minimizes the number of the entries in the scatter list and reduces the DMA mapping overhead, particularly with the IOMMU. Set default max_seg_size in core for IB devices to 2G and do not combine if we exceed this limit. Also, purge npages in struct ib_umem as we now DMA map the umem SGL with sg_nents and npage computation is not needed. Drivers should now be using ib_umem_num_pages(), so fix the last stragglers. Move npages tracking to ib_umem_odp as ODP drivers still need it. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Tested-by: Gal Pressman <galpress@amazon.com> Tested-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:05:24 -03:00
Kamal Heib	ea7a5c706f	RDMA/vmw_pvrdma: Fix memory leak on pvrdma_pci_remove Make sure to free the DSR on pvrdma_pci_remove() to avoid the memory leak. Fixes: `29c8d9eba5` ("IB: Add vmw_pvrdma driver") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-08 13:04:46 -03:00
Will Deacon	1b8546d7e2	i40iw: Redefine i40iw_mmiowb() to do nothing mmiowb() is now implicit in spin_unlock(), so there's no reason to call it from driver code. Redefine i40iw_mmiowb() to do nothing instead. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2019-04-08 12:09:15 +01:00
Will Deacon	fb24ea52f7	drivers: Remove explicit invocations of mmiowb() mmiowb() is now implied by spin_unlock() on architectures that require it, so there is no reason to call it from driver code. This patch was generated using coccinelle: @mmiowb@ @@ - mmiowb(); and invoked as: $ for d in drivers include/linux/qed sound; do \ spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done NOTE: mmiowb() has only ever guaranteed ordering in conjunction with spin_unlock(). However, pairing each mmiowb() removal in this patch with the corresponding call to spin_unlock() is not at all trivial, so there is a small chance that this change may regress any drivers incorrectly relying on mmiowb() to order MMIO writes between CPUs using lock-free synchronisation. If you've ended up bisecting to this commit, you can reintroduce the mmiowb() calls using wmb() instead, which should restore the old behaviour on all architectures other than some esoteric ia64 systems. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2019-04-08 12:01:02 +01:00
Will Deacon	949b8c7276	drivers: Remove useless trailing comments from mmiowb() invocations In preparation for using coccinelle to remove all mmiowb() instances from drivers, remove all trailing comments since they won't be picked up by spatch later on and will end up being preserved in the code. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2019-04-08 12:00:56 +01:00
Saeed Mahameed	b6460c72c3	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux This merge commit includes some misc shared code updates from mlx5-next branch needed for net-next. 1) From Maxim, Remove un-used macros and spinlock from mlx5 code. 2) From Aya, Expose Management PCIE info register layout and add rate limit print macros. 3) From Tariq, Compilation warning fix in fs_core.c 4) From Vu, Huy and Saeed, Improve mlx5 initialization flow: The goal is to provide a better logical separation of mlx5 core device initialization flow and will help to seamlessly support creating different mlx5 device types such as PF, VF and SF mlx5 sub-function virtual devices. Mlx5_core driver needs to separate HCA resources from pci resources. Its initialize/load/unload will be broken into stages: 1. Initialize common data structures 2. Setup function which initializes pci resources (for PF/VF) or some other specific resources for virtual device 3. Initialize software objects according to hardware capabilities 4. Load all mlx5_core components It is also necessary to detach mlx5_core mdev name/message from pci device mdev->pdev name/message for a clearer report/debug of different mlx5 device types. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-05 14:10:16 -07:00
Leon Romanovsky	c7252a6532	RDMA/cm: Remove useless zeroing of static global variable Static global variables are initialized to zero by C standard, there is no need to zero them again. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-04 08:29:04 -03:00
Potnuri Bharat Teja	d2c33370ae	RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state On receiving a TERM from tje peer, Host moves the QP to TERMINATE state and then moves the adapter out of RDMA mode. After issuing a TERM, peer issues a CLOSE and at this point of time if the connectivity between peer and host is lost for a significant amount of time, the QP remains in TERMINATE state. Therefore c4iw_modify_qp() needs to initiate a close on entering terminate state. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-04 08:29:04 -03:00
Leon Romanovsky	0f51427bd0	RDMA/mlx5: Cleanup WQE page fault handler Refactor the page fault handler to be more readable and extensible, this cleanup was triggered by the error reported below. The code structure made it unclear to the automatic tools to identify that such a flow is not possible in real life because "requestor != NULL" means that "qp != NULL" too. drivers/infiniband/hw/mlx5/odp.c:1254 mlx5_ib_mr_wqe_pfault_handler() error: we previously assumed 'qp' could be null (see line 1230) Fixes: `08100fad5c` ("IB/mlx5: Add ODP SRQ support") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-04 08:28:59 -03:00
Jason Gunthorpe	1c726c4421	Merge HFI1 updates into k.o/for-next Based on rdma.git for-rc for dependencies. From Dennis Dalessandro: ==================== Here are some code improvement patches and fixes for less serious bugs to TID RDMA than we sent for RC. ==================== * HFI1 updates: IB/hfi1: Implement CCA for TID RDMA protocol IB/hfi1: Remove WARN_ON when freeing expected receive groups IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE IB/hfi1: Add a function to read next expected psn from hardware flow IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:28:05 -03:00
Kaike Wan	747b931fbe	IB/hfi1: Implement CCA for TID RDMA protocol Currently, FECN handling is not implemented on TID RDMA expected receive packets and therefore CCA can't be turned on when TID RDMA is enabled. This patch adds the CCA support to TID RDMA protocol by: - modifying FECN RSM rule to include kernel receive contexts - For TID_RDMA READ RESP or TID RDMA ACK packet, a CNP will be sent out if the FECN bit is set. For other TID RDMA packets that generate at least one response packet, the BECN bit will be set in the first response packet - Copying expected packet data to destination buffer when FECN bit is set in the TID RDMA READ RESP or TID RDMA WRITE DATA packet. In this case, the expected packet is received as an eager packet - Handling the TID sequence error for subsequent normal expected packets. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:27:39 -03:00
Kaike Wan	8da0f0f26f	IB/hfi1: Remove WARN_ON when freeing expected receive groups When PSM user receive context is freed, the expected receive groups allocated by the receive context will also been freed. However, if there are still TID entries in use, the receive groups rcd->tid_full_list or rcd->tid_used_list will not be empty, and thus triggering the WARN_ONs in the function hfi1_free_ctxt_rcv_groups(). Even if the two lists may not be empty, the hfi1 driver will free all TID entries and receive groups associated with the receive context to prevent any resource leakage. Since a clean user application exit is not controlled by the hfi1 driver, this patch will remove the WARN_ONs in hfi1_free_ctxt_rcv_groups(). Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:27:30 -03:00
Kaike Wan	b885d5be9c	IB/hfi1: Unify the software PSN check for TID RDMA READ/WRITE For expected packet receiving, the hfi1 hardware checks the KDETH PSN automatically. However, when sequence error occurs, the hfi1 driver can check the sequence instead until the hardware flow generation is reloaded. TID RDMA READ and WRITE protocols implement similar software checking mechanisms, but with different flags and different local variables to store next expected PSN. Unify the handling by using only one set of flag and local variable for both TID RDMA READ and WRITE protocols. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:27:30 -03:00
Kaike Wan	6a40693a88	IB/hfi1: Add a function to read next expected psn from hardware flow This patch adds a function to read next expected KDETH PSN from hardware flow to simplify the code. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:27:30 -03:00
Kaike Wan	f6f3f53255	IB/hfi1: Delay the release of destination mr for TID RDMA WRITE DATA The reference of destination memory region is first obtained when TID RDMA WRITE request is first received on the responder side. This reference is released once all TID RDMA WRITE RESP packets are sent to the requester side, even though not all TID RDMA WRITE DATA packets may have been received. This early release will especially be undesired if the software needs to access the destination memory before the last data packet is received. This patch delays the release of the MR until all TID RDMA DATA packets have been received. A helper function to release the reference is also created to simplify the code. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:27:30 -03:00
Leon Romanovsky	061ccb52d2	RDMA/cma: Set proper port number as index Conversion from IDR to XArray missed the fact that idr_alloc() returned index as a return value, this index was saved in port variable and used as query index later on. This caused to the following error. BUG: KASAN: use-after-free in cma_check_port+0x86a/0xa20 [rdma_cm] Read of size 8 at addr ffff888069fde998 by task ucmatose/387 CPU: 3 PID: 387 Comm: ucmatose Not tainted 5.1.0-rc2+ #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0x7c/0xc0 print_address_description+0x6c/0x23c ? cma_check_port+0x86a/0xa20 [rdma_cm] kasan_report.cold.3+0x1c/0x35 ? cma_check_port+0x86a/0xa20 [rdma_cm] ? cma_check_port+0x86a/0xa20 [rdma_cm] cma_check_port+0x86a/0xa20 [rdma_cm] rdma_bind_addr+0x11bc/0x1b00 [rdma_cm] ? find_held_lock+0x33/0x1c0 ? cma_ndev_work_handler+0x180/0x180 [rdma_cm] ? wait_for_completion+0x3d0/0x3d0 ucma_bind+0x120/0x160 [rdma_ucm] ? ucma_resolve_addr+0x1a0/0x1a0 [rdma_ucm] ucma_write+0x1f8/0x2b0 [rdma_ucm] ? ucma_open+0x260/0x260 [rdma_ucm] vfs_write+0x157/0x460 ksys_write+0xb8/0x170 ? __ia32_sys_read+0xb0/0xb0 ? trace_hardirqs_off_caller+0x5b/0x160 ? do_syscall_64+0x18/0x3c0 do_syscall_64+0x95/0x3c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 381: __kasan_kmalloc.constprop.5+0xc1/0xd0 cma_alloc_port+0x4d/0x160 [rdma_cm] rdma_bind_addr+0x14e7/0x1b00 [rdma_cm] ucma_bind+0x120/0x160 [rdma_ucm] ucma_write+0x1f8/0x2b0 [rdma_ucm] vfs_write+0x157/0x460 ksys_write+0xb8/0x170 do_syscall_64+0x95/0x3c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 381: __kasan_slab_free+0x12e/0x180 kfree+0xed/0x290 rdma_destroy_id+0x6b6/0x9e0 [rdma_cm] ucma_close+0x110/0x300 [rdma_ucm] __fput+0x25a/0x740 task_work_run+0x10e/0x190 do_exit+0x85e/0x29e0 do_group_exit+0xf0/0x2e0 get_signal+0x2e0/0x17e0 do_signal+0x94/0x1570 exit_to_usermode_loop+0xfa/0x130 do_syscall_64+0x327/0x3c0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Reported-by: <syzbot+2e3e485d5697ea610460@syzkaller.appspotmail.com> Reported-by: Ran Rozenstein <ranro@mellanox.com> Fixes: `638267537a` ("cma: Convert portspace IDRs to XArray") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-03 15:20:32 -03:00
Huy Nguyen	aa8106f137	net/mlx5: Add explicit bar address field Add bar_addr field to store bar-0 address to avoid calling pci_resource_start with hard-coded bar-0 as parameter. Also note that different mlx5 device types will have bar_addr on different bars. This patch does not change any functionality. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-02 12:49:38 -07:00
Maxim Mikityanskiy	bbf29f618e	net/mlx5: Remove spinlock support from mlx5_write64 As there is no user of mlx5_write64 that passes a spinlock to mlx5_write64, remove this functionality and simplify the function. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-04-02 12:49:37 -07:00
Leon Romanovsky	6734b29735	RDMA/hns: Fix bad endianess of port_pd variable port_pd is treated as le32 in declaration and read, fix assignment to be in le32 too. This change fixes the following compilation warnings. drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: warning: incorrect type in assignment (different base types) drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: expected restricted __le32 [usertype] port_pd drivers/infiniband/hw/hns/hns_roce_ah.c:67:24: got restricted __be32 [usertype] Fixes: `9a4435375c` ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Lijun Ou <ouliun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-02 10:40:41 -03:00
Shamir Rabinovitch	ff23dfa134	IB: Pass only ib_udata in function prototypes Now when ib_udata is passed to all the driver's object create/destroy APIs the ib_udata will carry the ib_ucontext for every user command. There is no need to also pass the ib_ucontext via the functions prototypes. Make ib_udata the only argument psssed. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 15:00:47 -03:00
Shamir Rabinovitch	bdeacabd1a	IB: Remove 'uobject->context' dependency in object destroy APIs Now that we have the udata passed to all the ib_xxx object destroy APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 14:59:35 -03:00
Shamir Rabinovitch	c4367a2635	IB: Pass uverbs_attr_bundle down ib_x destroy path The uverbs_attr_bundle with the ucontext is sent down to the drivers ib_x destroy path as ib_udata. The next patch will use the ib_udata to free the drivers destroy path from the dependency in 'uobject->context' as we already did for the create path. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 14:57:35 -03:00
Shamir Rabinovitch	a6a3797df2	IB: Pass uverbs_attr_bundle down uobject destroy path Pass uverbs_attr_bundle down the uobject destroy path. The next patch will use this to eliminate the dependecy of the drivers in ib_x->uobject pointers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 14:55:36 -03:00
Shamir Rabinovitch	70f06b26f0	IB: ucontext should be set properly for all cmd & ioctl paths the Attempt to use the below commit to initialize the ucontext for the uobject destroy path has shown that the below commit is incomplete. Parts were reverted and the ucontext set up in the uverbs_attr_bundle was moved to rdma_lookup_get_uobject which is called from the uobj_get_XXX macros and rdma_alloc_begin_uobject which is called when uobject is created. Fixes: `3d9dfd0603` ("IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows") Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 14:55:03 -03:00
Matthew Wilcox	fae7a699a9	opa_vnic: Convert vport_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 14:18:15 -03:00
Matthew Wilcox	059d48fbf6	qib: Convert qib_unit_table to XArray Also remove qib_devs_list. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 13:31:35 -03:00
Matthew Wilcox	03b92789e5	hfi1: Convert hfi1_unit_table to XArray Also remove hfi1_devs_list. Signed-off-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-04-01 13:27:35 -03:00
Matthew Wilcox	9fd15987ed	qedr: Convert srqidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-29 14:54:51 -03:00
Matthew Wilcox	b6014f9e5f	qedr: Convert qpidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-29 14:53:57 -03:00
David Ahern	3616d08bcb	ipv6: Move ipv6 stubs to a separate header file The number of stubs is growing and has nothing to do with addrconf. Move the definition of the stubs to a separate header file and update users. In the move, drop the vxlan specific comment before ipv6_stub. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-29 10:53:45 -07:00
Matthew Wilcox	0ee3b915b1	hfi1: Convert vesw_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-29 14:51:50 -03:00
Matthew Wilcox	736b5a70db	RDMA/hns: Convert qp_table_tree to XArray Also fully initialise the qp before storing it in the XArray. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-29 14:51:50 -03:00
Matthew Wilcox	27e19f4510	RDMA/hns: Convert cq_table to XArray Change the locking to not disable interrupts as the lookup in interrupt context will not see a freed CQ, thanks to the synchronize_irq() call before freeing the CQ. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-29 14:40:40 -03:00
Parav Pandit	2b34c55802	RDMA/core: Add command to set ib_core device net namspace sharing mode Add netlink command that enables/disables sharing rdma device among multiple net namespaces. Using rdma tool, $rdma sys set netns shared (default mode) When rdma subsystem netns mode is set to shared mode, rdma devices will be accessible in all net namespaces. Using rdma tool, $rdma sys set netns exclusive When rdma subsystem netns mode is set to exclusive mode, devices will be accessible in only one net namespace at any given point of time. If there are any net namespaces other than default init_net exists, while executing this command, it will fail and mode cannot be changed. To change this mode, netlink command is used instead of sysctl, because netlink command allows to auto load a module. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	cb7e0e1305	RDMA/core: Add interface to read device namespace sharing mode Add an interface via netlink command to query whether rdma devices are shared among multiple net namespaces or not. When using RDMAtool, it can be queried as, $rdma system show netns netns shared Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	37eeab55ae	RDMA/core: Extend ib_device_get_by_index for net namespace Extend ib_device_get_by_index() API to check device access for net namespace for serving netlink commands. Also enforce net ns check on dumpit commands which iterate over all registered rdma devices and which don't call ib_device_get_by_index(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	41c6140189	RDMA: Check net namespace access for uverbs, umad, cma and nldev Introduce an API rdma_dev_access_netns() to check whether a rdma device can be accessed from the specified net namespace or not. Use rdma_dev_access_netns() while opening character uverbs, umad network device and also check while rdma cm_id binds to rdma device. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	a56bc45b27	RDMA/core: Add module param to disable device sharing among net ns Add module parameter to change a sharing mode of ib_core early in the boot process. This parameter helps to those systems where modern up to date rdma tool (iproute2) package may not be available during kernel upgrade cycle. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	5417783eab	RDMA/core: Support core port attributes in non init_net Now that sysfs compatibility layer for non init_net exists, add core port attributes such as pkey and gid table to non init_net ns. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	4e0f7b9070	RDMA/core: Implement compat device/sysfs tree in net namespace Implement compatibility layer sysfs entries of ib_core so that non init_net net namespaces can also discover rdma devices. Each non init_net net namespace has ib_core_device created in it. Such ib_core_device sysfs tree resembles rdma devices found in init_net namespace. This allows discovering rdma devices in multiple non init_net net namespaces via sysfs entries and helpful to rdma-core userspace. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	62dfa7955e	RDMA/core: Restrict sysfs entries view to init_net This is a preparation patch to provide isolation of rdma device in a network namespace. As first step, make rdma device visible only in init net namespace. Subsequent patch will enable rdma device visibility back in multiple net namespaces using compat ib_core_device device/sysfs tree. Given that the IB subsystem depends on net stack, it needs to be initialized after netdev and since it support devices, it needs to be initialized before the device subsystem; therefore, change initcall sequence to fs_initcall, so that when ib_core is compiled in the kernel image, the right init sequence is followed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Parav Pandit	cebe556bd7	RDMA/core: Introduce ib_core_device to hold device In order to support sysfs entries in multiple net namespaces for a rdma device, introduce a ib_core_device whose scope is limited to hold core device and per port sysfs related entries. This is preparation patch so that multiple ib_core_devices in each net namespace can be created in subsequent patch who all can share ib_device. (a) Move sysfs specific fields to ib_core_device. (b) Make sysfs and device life cycle related routines to work on ib_core_device. (c) Introduce and use rdma_init_coredev() helper to initialize coredev fields. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:52:02 -03:00
Shiraz Saleem	629e6f9db6	RDMA/rdmavt: Use correct sizing on buffers holding page DMA addresses The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:13:27 -03:00
Shiraz Saleem	93923d309b	RDMA/rxe: Use correct sizing on buffers holding page DMA addresses The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Cc: Moni Shoua <monis@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:13:27 -03:00
Shiraz Saleem	41d34865b2	RDMA/mthca: Use correct sizing on buffers holding page DMA addresses The buffer that holds the page DMA addresses is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this buffer. Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:13:27 -03:00
Shiraz Saleem	5f818d676a	RDMA/cxbg: Use correct sizing on buffers holding page DMA addresses The PBL array that hold the page DMA address is sized off umem->nmap. This can potentially cause out of bound accesses on the PBL array when iterating the umem DMA-mapped SGL. This is because if umem pages are combined, umem->nmap can be much lower than the number of system pages in umem. Use ib_umem_num_pages() to size this array. Cc: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:13:27 -03:00
Selvin Xavier	5aa8484080	RDMA/bnxt_re: Use correct sizing on buffers holding page DMA addresses umem->nmap is used while allocating internal buffer for storing page DMA addresses. This causes out of bounds array access while iterating the umem DMA-mapped SGL with umem page combining as umem->nmap can be less than number of system pages in umem. Use ib_umem_num_pages() instead of umem->nmap to size the page array. Add a new structure (bnxt_qplib_sg_info) to pass sglist, npages and nmap. Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 14:13:27 -03:00
Bart Van Assche	196b4ce57d	IB/qib: Remove a set-but-not-used variable This patch avoids that a compiler warning is reported when building with W=1. Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: `49c0e2414b` ("IB/qib: Change SDMA progression mode depending on single- or multi-rail") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 11:03:49 -03:00
Bart Van Assche	920d10e458	IB/hfi1: Fix two format strings Enable format string checking for hfi1_cdbg() and fix the resulting compiler warnings. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 11:03:49 -03:00
Bart Van Assche	1f687edee2	IB/mlx5: Declare devx_async_cmd_event_fops static Avoid that sparse complains about a missing declaration. Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: `6bf8f22aea` ("IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 10:22:48 -03:00
Bart Van Assche	0080aed4e4	RDMA/uverbs: Allow the compiler to verify declaration and definition consistency This patch avoids that sparse reports the following warnings: drivers/infiniband/core/uverbs_std_types_flow_action.c:442:30: warning: symbol 'uverbs_def_obj_flow_action' was not declared. Should it be static? drivers/infiniband/core/uverbs_std_types_dm.c:112:30: warning: symbol 'uverbs_def_obj_dm' was not declared. Should it be static? drivers/infiniband/core/uverbs_std_types_counters.c:153:30: warning: symbol 'uverbs_def_obj_counters' was not declared. Should it be static? drivers/infiniband/core/uverbs_std_types_mr.c:213:30: warning: symbol 'uverbs_def_obj_mr' was not declared. Should it be static? Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: `0bd01f3d09` ("RDMA/uverbs: Require all objects to have a driver destroy function") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 10:22:48 -03:00
Bart Van Assche	2dcdebff5e	RDMA/uverbs: Annotate uverbs_request_next_ptr() return value as a __user pointer This patch avoids that sparse complains about a mismatch between the returned value and the function return type. Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: `c3bea3d2dc` ("RDMA/uverbs: Use the iterator for ib_uverbs_unmarshall_recv()") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 10:22:48 -03:00
Bart Van Assche	259e66bcdf	RDMA/uverbs: Add a __user annotation to a pointer This patch avoids that sparse and smatch report the following: warning: cast removes address space of expression Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Fixes: `3a6532c9af` ("RDMA/uverbs: Use uverbs_attr_bundle to pass udata for write") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-28 10:22:48 -03:00
David S. Miller	356d71e00d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2019-03-27 17:37:58 -07:00
Zhu Yanjun	08304d7146	IB/rxe: Replace av->network_type with skb->protocol In the function rxe_init_packet, based on av->network_type, skb->protocol is set to ipv4 or ipv6. The functions rxe_prepare and rxe_send are called after the functin rxe_init_packet. So in these functions, av->network_type can be replaced with skb->protocol. The functions are in the xmit fast path. So with skb->protocol, the performance will be better. Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 16:23:09 -03:00
Ira Weiny	2ccfbb70c2	IB/MAD: Add SMP details to MAD tracing Decode more information from the packet and include it in the trace. Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:52:01 -03:00
Ira Weiny	056533192a	IB/UMAD: Add umad trace points Trace MADs going to/from user space. Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:52:01 -03:00
Ira Weiny	0e65bae205	IB/MAD: Add agent trace points Trace agent details when agents are [un]registered. In addition, report agent details on send/recv. Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:52:00 -03:00
Ira Weiny	821bf1de45	IB/MAD: Add recv path trace point Trace received MAD details. Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:52:00 -03:00
Ira Weiny	4d60cad5db	IB/MAD: Add send path trace points Use the standard Linux trace mechanism to trace MADs being sent. 4 trace points are added, when the MAD is posted to the qp, when the MAD is completed, if a MAD is resent, and when the MAD completes in error. Reviewed-by: "Ruhl, Michael J" <michael.j.ruhl@intel.com> Suggested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:52:00 -03:00
Yuval Shaia	6a1096611c	RDMA/vmw_pvrdma: Skip zeroing device attrs Caller already clears props before calling query_device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Acked-by: Adit Ranadive <aditr@vmware.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:30:43 -03:00
Artemy Kovalyov	d623dfd283	IB/mlx5: Compare only index part of a memory window rkey The InfiniBand Architecture Specification section 10.6.7.2.4 TYPE 2 MEMORY WINDOWS says that if the CI supports the Base Memory Management Extensions defined in this specification, the R_Key format for a Type 2 Memory Window must consist of: * 24 bit index in the most significant bits of the R_Key, which is owned by the CI, and * 8 bit key in the least significant bits of the R_Key, which is owned by the Consumer. This means that the kernel should compare only the index part of a R_Key to determine equality with another R_Key. Fixes: `db570d7dea` ("IB/mlx5: Add ODP support to MW") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:27:56 -03:00
Artemy Kovalyov	1e5887b700	IB/mlx5: WQE dump jumps over first 16 bytes Move index increment after its is used or otherwise it will start the dump of the WQE from second WQE BB. Fixes: `34f4c9554d` ("IB/mlx5: Use fragmented QP's buffer for in-kernel users") Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:27:33 -03:00
Moni Shoua	1abe186ed8	IB/mlx5: Reset access mask when looping inside page fault handler If page-fault handler spans multiple MRs then the access mask needs to be reset before each MR handling or otherwise write access will be granted to mapped pages instead of read-only. Cc: <stable@vger.kernel.org> # 3.19 Fixes: `7bdf65d411` ("IB/mlx5: Handle page faults") Reported-by: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:27:11 -03:00
Leon Romanovsky	e95e52a178	RDMA/hns: Limit scope of hns_roce_cmq_send() The forgotten static keyword causes to the following error to appear while building HNS driver. Declare hns_roce_cmq_send() to be static function to fix this warning. drivers/infiniband/hw/hns/hns_roce_hw_v2.c:1089:5: warning: no previous prototype for _hns_roce_cmq_send_ [-Wmissing-prototypes] int hns_roce_cmq_send(struct hns_roce_dev *hr_dev, Fixes: `6a04aed6af` ("RDMA/hns: Fix the chip hanging caused by sending mailbox&CMQ during reset") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 15:25:50 -03:00
Kaike Wan	d029434447	IB/hfi1: Fix the allocation of RSM table The receive side mapping (RSM) on hfi1 hardware is a special matching mechanism to direct an incoming packet to a given hardware receive context. It has 4 instances of matching capabilities (RSM0 - RSM3) that share the same RSM table (RMT). The RMT has a total of 256 entries, each of which points to a receive context. Currently, three instances of RSM have been used: 1. RSM0 by QOS; 2. RSM1 by PSM FECN; 3. RSM2 by VNIC. Each RSM instance should reserve enough entries in RMT to function properly. Since both PSM and VNIC could allocate any receive context between dd->first_dyn_alloc_ctxt and dd->num_rcv_contexts, PSM FECN must reserve enough RMT entries to cover the entire receive context index range (dd->num_rcv_contexts - dd->first_dyn_alloc_ctxt) instead of only the user receive contexts allocated for PSM (dd->num_user_contexts). Consequently, the sizing of dd->num_user_contexts in set_up_context_variables is incorrect. Fixes: `2280740f01` ("IB/hfi1: Virtual Network Interface Controller (VNIC) HW support") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 14:34:31 -03:00
Kaike Wan	a8639a79e8	IB/hfi1: Eliminate opcode tests on mr deref When an old ack_queue entry is used to store an incoming request, it may need to clean up the old entry if it is still referencing the MR. Originally only RDMA READ request needed to reference MR on the responder side and therefore the opcode was tested when cleaning up the old entry. The introduction of tid rdma specific operations in the ack_queue makes the specific opcode tests wrong. Multiple opcodes (RDMA READ, TID RDMA READ, and TID RDMA WRITE) may need MR ref cleanup. Remove the opcode specific tests associated with the ack_queue. Fixes: `f48ad614c1` ("IB/hfi1: Move driver out of staging") Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 14:34:31 -03:00
Kaike Wan	93b289b9af	IB/hfi1: Clear the IOWAIT pending bits when QP is put into error state When a QP is put into error state, it may be waiting for send engine resources. In this case, the QP will be removed from the send engine's waiting list, but its IOWAIT pending bits are not cleared. This will normally not have any major impact as the QP is being destroyed. However, the QP still needs to wind down its operations, such as draining the send queue by scheduling the send engine. Clearing the pending bits will avoid any potential complications. In addition, if the QP will eventually hang, clearing the pending bits can help debugging by presenting a consistent picture if the user dumps the qp_stats. This patch clears a QP's IOWAIT_PENDING_IB and IO_PENDING_TID bits in priv->s_iowait.flags in this case. Fixes: `5da0fc9dbf` ("IB/hfi1: Prepare resource waits for dual leg") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 14:34:31 -03:00
Kaike Wan	662d664666	IB/hfi1: Failed to drain send queue when QP is put into error state When a QP is put into error state, all pending requests in the send work queue should be drained. The following sequence of events could lead to a failure, causing a request to hang: (1) The QP builds a packet and tries to send through SDMA engine. However, PIO engine is still busy. Consequently, this packet is put on the QP's tx list and the QP is put on the PIO waiting list. The field qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN; (2) The QP is put into error state by the user application and notify_error_qp() is called, which removes the QP from the PIO waiting list and the packet from the QP's tx list. In addition, qp->s_flags is cleared of RVT_S_ANY_WAIT_IO bits, which does not include HFI1_S_WAIT_PIO_DRAIN bit; (3) The hfi1_schdule_send() function is called to drain the QP's send queue. Subsequently, hfi1_do_send() is called. Since the flag bit HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails. As a result, hfi1_do_send() bails out without draining any request from the send queue; (4) The PIO engine completes the sending and tries to wake up any QP on its waiting list. But the QP has been removed from the PIO waiting list and therefore is kept in sleep forever. The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2). HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN. Fixes: `2e2ba09e48` ("IB/rdmavt, IB/hfi1: Create device dependent s_flags") Cc: <stable@vger.kernel.org> # 4.19.x+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 14:34:31 -03:00
Colin Ian King	9513ea4f67	IB/iser: remove uninitialized variable len The variable len is not being inintialized and the uninitialized value is being returned. However, this return path is never reached because the default case in the switch statement returns -ENOSYS. Clean up the code by replacing the return -ENOSYS with a break for the default case and returning -ENOSYS at the end of the function. This allows len to be removed. Also remove redundant break that follows a return statement. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 10:20:33 -03:00
Kangjie Lu	e2a438bd71	RDMA/i40iw: Handle workqueue allocation failure alloc_ordered_workqueue may fail and return NULL. The fix captures the failure and handles it properly to avoid potential NULL pointer dereferences. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Reviewed-by: Shiraz, Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-27 10:19:07 -03:00
Ira Weiny	4ae2744410	IB/core: Ensure an invalidate_range callback on ODP MR No device supports ODP MR without an invalidate_range callback. Warn on any any device which attempts to support ODP without supplying this callback. Then we can remove the checks for the callback within the code. This stems from the discussion https://www.spinics.net/lists/linux-rdma/msg76460.html ...which concluded this code was no longer necessary. Acked-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 16:39:40 -03:00
Leon Romanovsky	a4b7013db2	RDMA/rxe: Fix slab-out-bounds access which lead to kernel crash later BUG: KASAN: slab-out-of-bounds in rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] Read of size 8 at addr ffff88805c01a608 by task ib_send_bw/573 CPU: 24 PID: 573 Comm: ib_send_bw Not tainted 5.0.0-rc5+ #189 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 Call Trace: rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] rxe_mem_init_user+0x6c1/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Allocated by task 573: __kasan_kmalloc.constprop.5+0xc1/0xd0 __kmalloc+0x161/0x310 rxe_mem_alloc+0x52/0x470 [rdma_rxe] rxe_mem_init_user+0x113/0x740 [rdma_rxe] rxe_reg_user_mr+0x9b/0x110 [rdma_rxe] ib_uverbs_reg_mr+0x428/0x9c0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2b0/0x410 [ib_uverbs] ib_uverbs_run_method+0x79c/0x1da0 [ib_uverbs] ib_uverbs_cmd_verbs+0x5f2/0xf20 [ib_uverbs] ib_uverbs_ioctl+0x202/0x310 [ib_uverbs] do_vfs_ioctl+0x193/0x1440 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0x13f/0x570 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: __kasan_slab_free+0x12e/0x180 kfree+0x10a/0x2c0 rcu_process_callbacks+0xa77/0x1260 __do_softirq+0x2ad/0xacb Test scenario: ib_send_bw -x 1 -d rxe0 -a & ib_send_bw -x 1 -d rxe0 -a localhost Fixes: `8700e3e7c4` ("Soft RoCE driver") Reported-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Tested-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 12:51:10 -03:00
Enrico Weigelt, metux IT consult	1a2e158327	drivers: infiniband: Fix whitespace in kconfig Adjust the kconfig whitespace in bnxt_re/iser to match the kernel standard. Signed-off-by: Enrico Weigelt, metux IT consult <info@metux.net> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 12:49:33 -03:00
Colin Ian King	a6a9274a7c	RDMA/nes: remove redundant check on udata The non-null check on udata is redundant as this check was performed just a few statements earlier and the check is always true as udata must be non-null at this point. Remove redundant the check on udata and the redundant else part that can never be executed. Detected by CoverityScan, CID#1477317 ("Logically dead code") Fixes: `8994445054` ("IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIs") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 12:07:45 -03:00
Matthew Wilcox	638267537a	cma: Convert portspace IDRs to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 12:00:15 -03:00
Matthew Wilcox	81cc440883	ucm: Convert ctx_id_table to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 11:50:29 -03:00
Matthew Wilcox	8e5a9d61e2	ib core: Convert query_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 11:47:05 -03:00
Matthew Wilcox	ae78ff3a0f	RDMA/cm: Convert local_id_table to XArray Also introduce cm_local_id() to reduce the amount of boilerplate when converting a local ID to an XArray index. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 11:44:22 -03:00
Matthew Wilcox	949a237046	IB/mad: Convert ib_mad_clients to XArray Pull the allocation function out into its own function to reduce the length of ib_register_mad_agent() a little and keep all the allocation logic together. Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 11:36:20 -03:00
Matthew Wilcox	f1430536e0	mlx4: Convert pv_id_table to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 10:09:54 -03:00
Matthew Wilcox	b02a29eb88	mlx5: Convert mlx5_srq_table to XArray Remove the custom spinlock as the XArray handles its own locking. Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 09:39:42 -03:00
Mike Marciniszyn	270a9833b2	IB/hfi1: Add running average for adaptive pio The adaptive PIO implementation only considers the current packet size when deciding between SDMA and pio for a packet. This causes credit return forces if small and large packets are interleaved. Add a running average to avoid costly credit forces so that a large sequence of small packets is required to go below the threshold that chooses pio. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-26 09:33:21 -03:00
Erez Alfasi	19b1a294b0	RDMA: Use __packed annotation instead of __attribute__ ((packed)) "__attribute__" set of macros has been standardized, have became more potentially portable and consistent code back in v2.6.21 by commit `82ddcb040` ("[PATCH] extend the set of "__attribute__" shortcut macros"). Moreover, nowadays checkpatch.pl warns about using __attribute__((packed)) instead of __packed. This patch converts all the "__attribute__ ((packed))" annotations to "__packed" within the RDMA subsystem. Signed-off-by: Erez Alfasi <ereza@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 21:14:12 -03:00
Lijun Ou	d0a935563b	RDMA/hns: Delete unused variable in hns_roce_v2_modify_qp function The src_mac array is not used in hns_roce_v2_modify_qp function. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 21:11:30 -03:00
Lijun Ou	82342e493b	RDMA/hns: Bugfix for sending with invalidate According to IB protocol, the send with invalidate operation will not invalidate mr that was created through a register mr or reregister mr. Fixes: `e93df01085` ("RDMA/hns: Support local invalidate for hip08 in kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:56 -03:00
Lijun Ou	07c2339a91	RDMA/hns: Hide error print information with roce vf device The driver should not print the error information when the hip08 driver not support virtual function. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:55 -03:00
Lijun Ou	5b01b243b0	RDMA/hns: Only assgin some fields if the relatived attr_mask is set According to IB protocol, some fields of qp context are filled with optional when the relatived attr_mask are set. The relatived attr_mask include IB_QP_TIMEOUT, IB_QP_RETRY_CNT, IB_QP_RNR_RETRY and IB_QP_MIN_RNR_TIMER. Besides, we move some assignments of the fields of qp context into the outside of the specific qp state jump function. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:55 -03:00
Lijun Ou	834fa8cf6f	RDMA/hns: Update the range of raq_psn field of qp context According to hip08 UM(User Manual), the raq_psn field size is [23:0]. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:55 -03:00
Lijun Ou	601f3e6d06	RDMA/hns: Only assign the fields of the rq psn if IB_QP_RQ_PSN is set Only when the IB_QP_RQ_PSN flags of attr_mask is set is it valid to assign the relatived fields of rq'psn into the qp context when modified qp. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:55 -03:00
Lijun Ou	f04cc17878	RDMA/hns: Only assign the relatived fields of psn if IB_QP_SQ_PSN is set Only when the IB_QP_SQ_PSN flags of attr_mask is set is it valid to assign the relatived fields of psn into the qp context when modified qp. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:59:55 -03:00
Matthew Wilcox	401b44804c	cxgb4: Convert stid_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:56:36 -03:00
Matthew Wilcox	9f5a9632e4	cxgb4: Convert atid_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 20:56:36 -03:00
Matthew Wilcox	f254ba6ae5	cxgb4: Convert hwtid_idr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:42:12 -03:00
Matthew Wilcox	7a268a9397	cxgb4: Convert mmidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:40:37 -03:00
Matthew Wilcox	2f43129127	cxgb4: Convert qpidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:39:18 -03:00
Matthew Wilcox	52e124c27e	cxgb4: Convert cqidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:38:18 -03:00
Matthew Wilcox	e64a7c02f1	cxgb3: Convert mmidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:36:29 -03:00
Matthew Wilcox	27114876ce	cxgb3: Convert qpidr to XArray Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:35:08 -03:00
Matthew Wilcox	a2f409713e	cxgb3: Convert cqidr to XArray It would make sense to convert this to an allocating XArray and remove the kfifo that is currently used to allocate the CQID, but that work is better done by someone who has the hardware to test with. Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-25 15:34:22 -03:00
Paolo Abeni	a350eccee5	net: remove 'fallback' argument from dev->ndo_select_queue() After the previous patch, all the callers of ndo_select_queue() provide as a 'fallback' argument netdev_pick_tx. The only exceptions are nested calls to ndo_select_queue(), which pass down the 'fallback' available in the current scope - still netdev_pick_tx. We can drop such argument and replace fallback() invocation with netdev_pick_tx(). This avoids an indirect call per xmit packet in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen) with device drivers implementing such ndo. It also clean the code a bit. Tested with ixgbe and CONFIG_FCOE=m With pktgen using queue xmit: threads vanilla patched (kpps) (kpps) 1 2334 2428 2 4166 4278 4 7895 8100 v1 -> v2: - rebased after helper's name change Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-20 11:18:55 -07:00
Feng Tang	ec4fe4bcc5	i40iw: Avoid panic when handling the inetdev event There is a panic reported that on a system with x722 ethernet, when doing the operations like: # ip link add br0 type bridge # ip link set eno1 master br0 # systemctl restart systemd-networkd The system will panic "BUG: unable to handle kernel null pointer dereference at 0000000000000034", with call chain: i40iw_inetaddr_event notifier_call_chain blocking_notifier_call_chain notifier_call_chain __inet_del_ifa inet_rtm_deladdr rtnetlink_rcv_msg netlink_rcv_skb rtnetlink_rcv netlink_unicast netlink_sendmsg sock_sendmsg __sys_sendto It is caused by "local_ipaddr = ntohl(in->ifa_list->ifa_address)", while the in->ifa_list is NULL. So add a check for the "in->ifa_list == NULL" case, and skip the ARP operation accordingly. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-17 21:40:40 -03:00
Aya Levin	cd27287562	IB/mlx5: Fix mapping of link-mode to IB width and speed Add mapping of link mode: CAUI4 100Gbps CR4/KR4 with 4 lines and 25Gbps. Fix mapping of link mode: GAUI2 50Gbps CR2/KR2 to be 2 lines with 25Gbps. Fixes: `08e8676f16` ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-17 21:40:39 -03:00
Yishai Hadas	c5ae1954c4	IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT To prevent a hardware memory leak when a DEVX DCT object is destroyed without calling DRAIN DCT before, (e.g. under cleanup flow), need to manage its creation and destruction via mlx5 core. In that case the DRAIN DCT command will be called and only once that it will be completed the DESTROY DCT command will be called. Otherwise, the DESTROY DCT may fail and a hardware leak may occur. As of that change the DRAIN DCT command should not be exposed any more from DEVX, it's managed internally by the driver to work as expected by the device specification. Fixes: `7efce3691d` ("IB/mlx5: Add obj create and destroy functionality") Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-17 21:40:39 -03:00
Jack Morgenstein	587443e777	IB/mlx4: Fix race condition between catas error reset and aliasguid flows Code review revealed a race condition which could allow the catas error flow to interrupt the alias guid query post mechanism at random points. Thiis is fixed by doing cancel_delayed_work_sync() instead of cancel_delayed_work() during the alias guid mechanism destroy flow. Fixes: `a0c64a17ab` ("mlx4: Add alias_guid mechanism") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-17 21:40:39 -03:00
Linus Torvalds	ea295481b6	XArray updates for 5.1-rc1 -----BEGIN PGP SIGNATURE----- iQFIBAABCgAyFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAlyHF2oUHHdpbGx5QGlu ZnJhZGVhZC5vcmcACgkQDpNsjXcpgj5j9AgAlpeptRfnPO0+VXj+EbxaOOI8tOG+ w+vBasWoQB+lZ9ctf1qUQVSeLn0ErxTM7BaIP7plfDrEWiIbRWkV18B+heS5d1Yz aTV1d/8tG6/eo61K2VqXHbUhymgMtbXDsg1rwWTF8+Q4xIcMqfYAR0f9ptU1Oejc pNAn16dYgKi6+4eluY7gXxruBosQ6yNml6iEje9A3uR8nhzTI/P3Yf2GGIZnQLsL +UIx4Ps38dJ3VCYBPfbnszZfYPpILUH9/Bdx+mAMUtZwvpM3JYqc8XsiFfqDO7n1 3003yUytnRkb1UK3QIvkbPt0G8UOI4s9fxRPsA8lLSww/f2y1r5kC4Mxbg== =HSP/ -----END PGP SIGNATURE----- Merge tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax Pull XArray updates from Matthew Wilcox: "This pull request changes the xa_alloc() API. I'm only aware of one subsystem that has started trying to use it, and we agree on the fixup as part of the merge. The xa_insert() error code also changed to match xa_alloc() (EEXIST to EBUSY), and I added xa_alloc_cyclic(). Beyond that, the usual bugfixes, optimisations and tweaking. I now have a git tree with all users of the radix tree and IDR converted over to the XArray that I'll be feeding to maintainers over the next few weeks" * tag 'xarray-5.1-rc1' of git://git.infradead.org/users/willy/linux-dax: XArray: Fix xa_reserve for 2-byte aligned entries XArray: Fix xa_erase of 2-byte aligned entries XArray: Use xa_cmpxchg to implement xa_reserve XArray: Fix xa_release in allocating arrays XArray: Mark xa_insert and xa_reserve as must_check XArray: Add cyclic allocation XArray: Redesign xa_alloc API XArray: Add support for 1s-based allocation XArray: Change xa_insert to return -EBUSY XArray: Update xa_erase family descriptions XArray tests: RCU lock prohibits GFP_KERNEL	2019-03-11 20:06:18 -07:00
Linus Torvalds	92fff53b71	SCSI misc on 20190306 This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc, hisi_sas, target/iscsi and target/core. Additionally Christoph refactored gdth as part of the dma changes. The major mid-layer change this time is the removal of bidi commands and with them the whole of the osd/exofs driver and filesystem. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIC54SYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishT1GAPwJEV23 ExPiPsnuVgKj49nLTagZ3rILRQcYNbL+MNYqxQEA0cT8FHzSDBfWY5OKPNE+RQ8z f69LpXGmMpuagKGvvd4= =Fhy1 -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc, hisi_sas, target/iscsi and target/core. Additionally Christoph refactored gdth as part of the dma changes. The major mid-layer change this time is the removal of bidi commands and with them the whole of the osd/exofs driver and filesystem. This is a major simplification for block and mq in particular" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits) scsi: cxgb4i: validate tcp sequence number only if chip version <= T5 scsi: cxgb4i: get pf number from lldi->pf scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c scsi: mpt3sas: Add missing breaks in switch statements scsi: aacraid: Fix missing break in switch statement scsi: kill command serial number scsi: csiostor: drop serial_number usage scsi: mvumi: use request tag instead of serial_number scsi: dpt_i2o: remove serial number usage scsi: st: osst: Remove negative constant left-shifts scsi: ufs-bsg: Allow reading descriptors scsi: ufs: Allow reading descriptor via raw upiu scsi: ufs-bsg: Change the calling convention for write descriptor scsi: ufs: Remove unused device quirks Revert "scsi: ufs: disable vccq if it's not needed by UFS device" scsi: megaraid_sas: Remove a bunch of set but not used variables scsi: clean obsolete return values of eh_timed_out scsi: sd: Optimal I/O size should be a multiple of physical block size scsi: MAINTAINERS: SCSI initiator and target tweaks scsi: fcoe: make use of fip_mode enum complete ...	2019-03-09 16:53:47 -08:00
Linus Torvalds	a50243b1dd	5.1 Merge Window Pull Request This has been a slightly more active cycle than normal with ongoing core changes and quite a lot of collected driver updates. - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe - A new data transfer mode for HFI1 giving higher performance - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR feature - A chip hang reset recovery system for hns - Change mm->pinned_vm to an atomic64 - Update bnxt_re to support a new 57500 chip - A sane netlink 'rdma link add' method for creating rxe devices and fixing the various unregistration race conditions in rxe's unregister flow - Allow lookup up objects by an ID over netlink - Various reworking of the core to driver interface: * Drivers should not assume umem SGLs are in PAGE_SIZE chunks * ucontext is accessed via udata not other means * Start to make the core code responsible for object memory allocation * Drivers should convert struct device to struct ib_device via a helper * Drivers have more tools to avoid use after unregister problems -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlyAJYYACgkQOG33FX4g mxrWwQ/+OyAx4Moru7Aix0C6GWxTJp/wKgw21CS3reZxgLai6x81xNYG/s2wCNjo IccObVd7mvzyqPdxOeyHBsJBbQDqWvoD6O2duH8cqGMgBRgh3CSdUep2zLvPpSAx 2W1SvWYCLDnCuarboFrCA8c4AN3eCZiqD7z9lHyFQGjy3nTUWzk1uBaOP46uaiMv w89N8EMdXJ/iY6ONzihvE05NEYbMA8fuvosKLLNdghRiHIjbMQU8SneY23pvyPDd ZziPu9NcO3Hw9OVbkwtJp47U3KCBgvKHmnixyZKkikjiD+HVoABw2IMwcYwyBZwP Bic/ddONJUvAxMHpKRnQaW7znAiHARk21nDG28UAI7FWXH/wMXgicMp6LRcNKqKF vqXdxHTKJb0QUR4xrYI+eA8ihstss7UUpgSgByuANJ0X729xHiJtlEvPb1DPo1Dz 9CB4OHOVRl5O8sA5Jc6PSusZiKEpvWoyWbdmw0IiwDF5pe922VLl5Nv88ta+sJ38 v2Ll5AgYcluk7F3599Uh9D7gwp5hxW2Ph3bNYyg2j3HP4/dKsL9XvIJPXqEthgCr 3KQS9rOZfI/7URieT+H+Mlf+OWZhXsZilJG7No0fYgIVjgJ00h3SF1/299YIq6Qp 9W7ZXBfVSwLYA2AEVSvGFeZPUxgBwHrSZ62wya4uFeB1jyoodPk= =p12E -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This has been a slightly more active cycle than normal with ongoing core changes and quite a lot of collected driver updates. - Various driver fixes for bnxt_re, cxgb4, hns, mlx5, pvrdma, rxe - A new data transfer mode for HFI1 giving higher performance - Significant functional and bug fix update to the mlx5 On-Demand-Paging MR feature - A chip hang reset recovery system for hns - Change mm->pinned_vm to an atomic64 - Update bnxt_re to support a new 57500 chip - A sane netlink 'rdma link add' method for creating rxe devices and fixing the various unregistration race conditions in rxe's unregister flow - Allow lookup up objects by an ID over netlink - Various reworking of the core to driver interface: - drivers should not assume umem SGLs are in PAGE_SIZE chunks - ucontext is accessed via udata not other means - start to make the core code responsible for object memory allocation - drivers should convert struct device to struct ib_device via a helper - drivers have more tools to avoid use after unregister problems" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (280 commits) net/mlx5: ODP support for XRC transport is not enabled by default in FW IB/hfi1: Close race condition on user context disable and close RDMA/umem: Revert broken 'off by one' fix RDMA/umem: minor bug fix in error handling path RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp cxgb4: kfree mhp after the debug print IB/rdmavt: Fix concurrency panics in QP post_send and modify to error IB/rdmavt: Fix loopback send with invalidate ordering IB/iser: Fix dma_nents type definition IB/mlx5: Set correct write permissions for implicit ODP MR bnxt_re: Clean cq for kernel consumers only RDMA/uverbs: Don't do double free of allocated PD RDMA: Handle ucontext allocations by IB/core RDMA/core: Fix a WARN() message bnxt_re: fix the regression due to changes in alloc_pbl IB/mlx4: Increase the timeout for CM cache IB/core: Abort page fault handler silently during owning process exit IB/mlx5: Validate correct PD before prefetch MR IB/mlx5: Protect against prefetch of invalid MR RDMA/uverbs: Store PR pointer before it is overwritten ...	2019-03-09 15:53:03 -08:00
Michael J. Ruhl	bc5add0976	IB/hfi1: Close race condition on user context disable and close When disabling and removing a receive context, it is possible for an asynchronous event (i.e IRQ) to occur. Because of this, there is a race between cleaning up the context, and the context being used by the asynchronous event. cpu 0 (context cleanup) rc->ref_count-- (ref_count == 0) hfi1_rcd_free() cpu 1 (IRQ (with rcd index)) rcd_get_by_index() lock ref_count+++ <-- reference count race (WARNING) return rcd unlock cpu 0 hfi1_free_ctxtdata() <-- incorrect free location lock remove rcd from array unlock free rcd This race will cause the following WARNING trace: WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 Call Trace: dump_stack+0x19/0x1b __warn+0xd8/0x100 warn_slowpath_null+0x1d/0x20 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1] is_rcv_urgent_int+0x24/0x90 [hfi1] general_interrupt+0x1b6/0x210 [hfi1] __handle_irq_event_percpu+0x44/0x1c0 handle_irq_event_percpu+0x32/0x80 handle_irq_event+0x3c/0x60 handle_edge_irq+0x7f/0x150 handle_irq+0xe4/0x1a0 do_IRQ+0x4d/0xf0 common_interrupt+0x162/0x162 The race can also lead to a use after free which could be similar to: general protection fault: 0000 1 SMP CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1 Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015 task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230 Call Trace: ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1] hfi1_aio_write+0xba/0x110 [hfi1] do_sync_readv_writev+0x7b/0xd0 do_readv_writev+0xce/0x260 ? handle_mm_fault+0x39d/0x9b0 ? pick_next_task_fair+0x5f/0x1b0 ? sched_clock_cpu+0x85/0xc0 ? __schedule+0x13a/0x890 vfs_writev+0x35/0x60 SyS_writev+0x7f/0x110 system_call_fastpath+0x22/0x27 Use the appropriate kref API to verify access. Reorder context cleanup to ensure context removal before cleanup occurs correctly. Cc: stable@vger.kernel.org # v4.14.0+ Fixes: `f683c80ca6` ("IB/hfi1: Resolve kernel panics by reference counting receive contexts") Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-06 14:47:09 -04:00
John Hubbard	0c507d8f84	RDMA/umem: Revert broken 'off by one' fix The previous attempted bug fix overlooked the fact that ib_umem_odp_map_dma_single_page() was doing a put_page() upon hitting an error. So there was not really a bug there. Therefore, this reverts the off-by-one change, but keeps the change to use release_pages() in the error path. Fixes: `75a3e6a3c1` ("RDMA/umem: minor bug fix in error handling path") Suggested-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-06 14:42:37 -04:00
Anshuman Khandual	98fa15f34c	mm: replace all open encodings for NUMA_NO_NODE Patch series "Replace all open encodings for NUMA_NO_NODE", v3. All these places for replacement were found by running the following grep patterns on the entire kernel code. Please let me know if this might have missed some instances. This might also have replaced some false positives. I will appreciate suggestions, inputs and review. 1. git grep "nid == -1" 2. git grep "node == -1" 3. git grep "nid = -1" 4. git grep "node = -1" This patch (of 2): At present there are multiple places where invalid node number is encoded as -1. Even though implicitly understood it is always better to have macros in there. Replace these open encodings for an invalid node number with the global macro NUMA_NO_NODE. This helps remove NUMA related assumptions like 'invalid node' from various places redirecting them to a common definition. Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx] Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-03-05 21:07:14 -08:00
John Hubbard	75a3e6a3c1	RDMA/umem: minor bug fix in error handling path 1. Bug fix: fix an off by one error in the code that cleans up if it fails to dma-map a page, after having done a get_user_pages_remote() on a range of pages. 2. Refinement: for that same cleanup code, release_pages() is better than put_page() in a loop. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 16:41:31 -04:00
YueHaibing	4e69cf1fe2	RDMA/hns: Use GFP_ATOMIC in hns_roce_v2_modify_qp The the below commit, hns_roce_v2_modify_qp is called inside spinlock while using GFP_KERNEL. Change it to GFP_ATOMIC. Fixes: `0425e3e6e0` ("RDMA/hns: Support flush cqe for hip08 in kernel space") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 16:41:30 -04:00
Shaobo He	952a3cc9c0	cxgb4: kfree mhp after the debug print In function `c4iw_dealloc_mw`, variable mhp's value is printed after freed, it is clearer to have the print before the kfree. Otherwise racing threads could allocate another mhp with the same pointer value and create confusing tracing. Signed-off-by: Shaobo He <shaobo@cs.utah.edu> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 16:41:30 -04:00
Michael J. Ruhl	d757c60eca	IB/rdmavt: Fix concurrency panics in QP post_send and modify to error The RC/UC code path can go through a software loopback. In this code path the receive side QP is manipulated. If two threads are working on the QP receive side (i.e. post_send, and modify_qp to an error state), QP information can be corrupted. (post_send via loopback) set r_sge loop update r_sge (modify_qp) take r_lock update r_sge <---- r_sge is now incorrect (post_send) update r_sge <---- crash, etc. ... This can lead to one of the two following crashes: BUG: unable to handle kernel NULL pointer dereference at (null) IP: hfi1_copy_sge+0xf1/0x2e0 [hfi1] PGD 8000001fe6a57067 PUD 1fd9e0c067 PMD 0 Call Trace: ruc_loopback+0x49b/0xbc0 [hfi1] hfi1_do_send+0x38e/0x3e0 [hfi1] _hfi1_do_send+0x1e/0x20 [hfi1] process_one_work+0x17f/0x440 worker_thread+0x126/0x3c0 kthread+0xd1/0xe0 ret_from_fork_nospec_begin+0x21/0x21 or: BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: rvt_clear_mr_refs+0x45/0x370 [rdmavt] PGD 80000006ae5eb067 PUD ef15d0067 PMD 0 Call Trace: rvt_error_qp+0xaa/0x240 [rdmavt] rvt_modify_qp+0x47f/0xaa0 [rdmavt] ib_security_modify_qp+0x8f/0x400 [ib_core] ib_modify_qp_with_udata+0x44/0x70 [ib_core] modify_qp.isra.23+0x1eb/0x2b0 [ib_uverbs] ib_uverbs_modify_qp+0xaa/0xf0 [ib_uverbs] ib_uverbs_write+0x272/0x430 [ib_uverbs] vfs_write+0xc0/0x1f0 SyS_write+0x7f/0xf0 system_call_fastpath+0x1c/0x21 Fix by using the appropriate locking on the receiving QP. Fixes: `1570346153` ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt") Cc: <stable@vger.kernel.org> #v4.9+ Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 15:47:23 -04:00
Mike Marciniszyn	38bbc9f038	IB/rdmavt: Fix loopback send with invalidate ordering The IBTA spec notes: o9-5.2.1: For any HCA which supports SEND with Invalidate, upon receiving an IETH, the Invalidate operation must not take place until after the normal transport header validation checks have been successfully completed. The rdmavt loopback code does the validation after the invalidate. Fix by relocating the operation specific logic for all SEND variants until after the validity checks. Cc: <stable@vger.kernel.org> #v4.20+ Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 15:47:23 -04:00
Max Gurtovoy	c1545f1a20	IB/iser: Fix dma_nents type definition The retured value from ib_dma_map_sg saved in dma_nents variable. To avoid future mismatch between types, define dma_nents as an integer instead of unsigned. Fixes: `57b26497fa` ("IB/iser: Pass the correct number of entries for dma mapped SGL") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 15:40:25 -04:00
Moni Shoua	7095ec3ca0	IB/mlx5: Set correct write permissions for implicit ODP MR The write access of an implicit MR is inherited to all of its children. Therefore we must set the correct write access to the parent MR. Pass full access_flags when creating umem to let it calculate write access correctly. Fixes: `da6a496a34` ("IB/mlx5: Ranges in implicit ODP MR inherit its write access") Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-03-04 10:57:19 -04:00

... 2 3 4 5 6 ...

10019 Commits