linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 07:05:08 +07:00

Author	SHA1	Message	Date
oulijun	05d6a4ddb6	RDMA/hns: Bugfix for cq record db for kernel When use cq record db for kernel, it needs to set the hr_cq->db_en to 1 and configure the dma address of record cq db of qp context. Fixes: `86188a8810` ("RDMA/hns: Support cq record doorbell for kernel space") Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-05-23 15:45:44 -06:00
Kalderon, Michal	30bf066cd9	RDMA/qedr: Fix doorbell bar mapping for dpi > 1 Each user_context receives a separate dpi value and thus a different address on the doorbell bar. The qedr_mmap function needs to validate the address and map the doorbell bar accordingly. The current implementation always checked against dpi=0 doorbell range leading to a wrong mapping for doorbell bar. (It entered an else case that mapped the address differently). qedr_mmap should only be used for doorbells, so the else was actually wrong in the first place. This only has an affect on arm architecture and not an issue on a x86 based architecture. This lead to doorbells not occurring on arm based systems and left applications that use more than one dpi (or several applications run simultaneously ) to hang. Fixes: `ac1b36e55a` ("qedr: Add support for user context verbs") Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-05-23 15:12:14 -06:00
Lidong Chen	8e907ed488	IB/umem: Use the correct mm during ib_umem_release User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads. If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has exited, get_pid_task will return NULL and ib_umem_release will not decrease mm->pinned_vm. Instead of using threads to locate the mm, use the overall tgid from the ib_ucontext struct instead. This matches the behavior of ODP and disassociate in handling the mm of the process that called ibv_reg_mr. Cc: <stable@vger.kernel.org> Fixes: `87773dd56d` ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-05-15 17:09:10 -06:00
Christophe Jaillet	3d69191086	iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()' The error handling path of 'c4iw_get_dma_mr()' does not free resources in the correct order. If an error occures, it can leak 'mhp->wr_waitp'. Fixes: `a3f12da0e9` ("iw_cxgb4: allocate wait object for each memory object") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:19 -04:00
Andrew Boyer	43731753c4	RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint The current code sets an affinity hint with a cpumask_t stored on the stack. This value can then be accessed through /proc/irq/*/affinity_hint/, causing a segfault or returning corrupt data. Move the cpumask_t into struct i40iw_msix_vector so it is available later. Backtrace: BUG: unable to handle kernel paging request at ffffb16e600e7c90 IP: irq_affinity_hint_proc_show+0x60/0xf0 PGD 17c0c6d067 PUD 17c0c6e067 PMD 15d4a0e067 PTE 0 Oops: 0000 [#1] SMP Modules linked in: ... CPU: 3 PID: 172543 Comm: grep Tainted: G OE ... #1 Hardware name: ... task: ffff9a5caee08000 task.stack: ffffb16e659d8000 RIP: 0010:irq_affinity_hint_proc_show+0x60/0xf0 RSP: 0018:ffffb16e659dbd20 EFLAGS: 00010086 RAX: 0000000000000246 RBX: ffffb16e659dbd20 RCX: 0000000000000000 RDX: ffffb16e600e7c90 RSI: 0000000000000003 RDI: 0000000000000046 RBP: ffffb16e659dbd88 R08: 0000000000000038 R09: 0000000000000001 R10: 0000000070803079 R11: 0000000000000000 R12: ffff9a59d1d97a00 R13: ffff9a5da47a6cd8 R14: ffff9a5da47a6c00 R15: ffff9a59d1d97a00 FS: 00007f946c31d740(0000) GS:ffff9a5dc1800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb16e600e7c90 CR3: 00000016a4339000 CR4: 00000000007406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: seq_read+0x12d/0x430 ? sched_clock_cpu+0x11/0xb0 proc_reg_read+0x48/0x70 __vfs_read+0x37/0x140 ? security_file_permission+0xa0/0xc0 vfs_read+0x96/0x140 SyS_read+0x58/0xc0 do_syscall_64+0x5a/0x190 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f946bbc97e0 RSP: 002b:00007ffdd0c4ae08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 000000000096b000 RCX: 00007f946bbc97e0 RDX: 000000000096b000 RSI: 00007f946a2f0000 RDI: 0000000000000004 RBP: 0000000000001000 R08: 00007f946a2ef011 R09: 000000000000000a R10: 0000000000001000 R11: 0000000000000246 R12: 00007f946a2f0000 R13: 0000000000000004 R14: 0000000000000000 R15: 00007f946a2f0000 Code: b9 08 00 00 00 49 89 c6 48 89 df 31 c0 4d 8d ae d8 00 00 00 f3 48 ab 4c 89 ef e8 6c 9a 56 00 49 8b 96 30 01 00 00 48 85 d2 74 3f <48> 8b 0a 48 89 4d 98 48 8b 4a 08 48 89 4d a0 48 8b 4a 10 48 89 RIP: irq_affinity_hint_proc_show+0x60/0xf0 RSP: ffffb16e659dbd20 CR2: ffffb16e600e7c90 Fixes: `8e06af711b` ("i40iw: add main, hdr, status") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:19 -04:00
Andrew Boyer	9f7b16afab	RDMA/i40iw: Avoid reference leaks when processing the AEQ In this switch there is a reference held on the QP. 'continue' will grab the next event without releasing the reference, causing a leak. Change it to 'break' to drop the reference before grabbing the next event. Fixes: `4e9042e647` ("i40iw: add hw and utils files") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
Andrew Boyer	a75895b1eb	RDMA/i40iw: Avoid panic when objects are being created and destroyed A panic occurs when there is a newly-registered element on the QP/CQ MR list waiting to be attached, but a different MR is deregistered. The current code only checks for whether the list is empty, not whether the element being deregistered is actually on the list. Fix the panic by adding a boolean to track if the object is on the list. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	a0403be8af	RDMA/hns: Fix the bug with NULL pointer When the last QP of eight QPs is not exist in hns_roce_v1_mr_free_work_fn function, the print for qpn of hr_qp may introduce a calltrace for NULL pointer. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	79d442071a	RDMA/hns: Set NULL for __internal_mr This patch mainly configure value for __internal_mr of mr_free_pd. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	85e0274dc6	RDMA/hns: Enable inner_pa_vld filed of mpt When enabled inner_pa_vld field of mpt, The pa0 and pa1 will be valid and the hardware will use it directly and not use base address of pbl. As a result, it can reduce the delay. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	90e7a4d506	RDMA/hns: Set desc_dma_addr for zero when free cmq desc In order to avoid illegal use for desc_dma_addr of ring, it needs to set it zero when free cmq desc. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	778cc5a8b7	RDMA/hns: Fix the bug with rq sge When received multiply rq sge, it should tag the invalid lkey for the last non-zero length sge when have some sges' length are zero. This patch fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	391bd5fc7d	RDMA/hns: Not support qp transition from reset to reset for hip06 Because hip06 hardware is not support for qp transition from reset to reset state, it need to return errno when qp transited from reset to reset. This patch fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	2349fdd483	RDMA/hns: Add return operation when configured global param fail When configure global param function run fail, it should directly return and the initial flow will stop. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	ad18e20ba2	RDMA/hns: Update convert function of endian format Because the sys_image_guid of ib_device_attr structure is __be64, it need to use cpu_to_be64 for converting. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	f97a62c394	RDMA/hns: Load the RoCE dirver automatically To enable the linux-kernel system to load the hns-roce-hw-v2 driver automatically when hns-roce-hw-v2 is plugged in pci bus, it need to create a MODULE_DEVICE_TABLE for expose the pci_table of hns-roce-hw-v2 to user. Signed-off-by: Lijun Ou <oulijun@huawei.com> Reported-by: Zhou Wang <wangzhou1@hisilicon.com> Tested-by: Xiaojun Tan <tanxiaojun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	3a39bbecc8	RDMA/hns: Bugfix for rq record db for kernel When used rq record db for kernel, it needs to set the rdb_en of hr_qp to 1 and configures the dma address of record rq db of qp context. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
oulijun	ecaaf1e26a	RDMA/hns: Add rq inline flags judgement It needs to set the rqie field of qp context by configured rq inline flags. Besides, it need to decide whether posting inline rqwqe by judged rq inline flags. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:45:18 -04:00
Alexandru Moise	1661d3b0e2	nvmet,rxe: defer ip datagram sending to tasklet This addresses 3 separate problems: 1. When using NVME over Fabrics we may end up sending IP packets in interrupt context, we should defer this work to a tasklet. [ 50.939957] WARNING: CPU: 3 PID: 0 at kernel/softirq.c:161 __local_bh_enable_ip+0x1f/0xa0 [ 50.942602] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G W 4.17.0-rc3-ARCH+ #104 [ 50.945466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 50.948163] RIP: 0010:__local_bh_enable_ip+0x1f/0xa0 [ 50.949631] RSP: 0018:ffff88009c183900 EFLAGS: 00010006 [ 50.951029] RAX: 0000000080010403 RBX: 0000000000000200 RCX: 0000000000000001 [ 50.952636] RDX: 0000000000000000 RSI: 0000000000000200 RDI: ffffffff817e04ec [ 50.954278] RBP: ffff88009c183910 R08: 0000000000000001 R09: 0000000000000614 [ 50.956000] R10: ffffea00021d5500 R11: 0000000000000001 R12: ffffffff817e04ec [ 50.957779] R13: 0000000000000000 R14: ffff88009566f400 R15: ffff8800956c7000 [ 50.959402] FS: 0000000000000000(0000) GS:ffff88009c180000(0000) knlGS:0000000000000000 [ 50.961552] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.963798] CR2: 000055c4ec0ccac0 CR3: 0000000002209001 CR4: 00000000000606e0 [ 50.966121] Call Trace: [ 50.966845] <IRQ> [ 50.967497] __dev_queue_xmit+0x62d/0x690 [ 50.968722] dev_queue_xmit+0x10/0x20 [ 50.969894] neigh_resolve_output+0x173/0x190 [ 50.971244] ip_finish_output2+0x2b8/0x370 [ 50.972527] ip_finish_output+0x1d2/0x220 [ 50.973785] ? ip_finish_output+0x1d2/0x220 [ 50.975010] ip_output+0xd4/0x100 [ 50.975903] ip_local_out+0x3b/0x50 [ 50.976823] rxe_send+0x74/0x120 [ 50.977702] rxe_requester+0xe3b/0x10b0 [ 50.978881] ? ip_local_deliver_finish+0xd1/0xe0 [ 50.980260] rxe_do_task+0x85/0x100 [ 50.981386] rxe_run_task+0x2f/0x40 [ 50.982470] rxe_post_send+0x51a/0x550 [ 50.983591] nvmet_rdma_queue_response+0x10a/0x170 [ 50.985024] __nvmet_req_complete+0x95/0xa0 [ 50.986287] nvmet_req_complete+0x15/0x60 [ 50.987469] nvmet_bio_done+0x2d/0x40 [ 50.988564] bio_endio+0x12c/0x140 [ 50.989654] blk_update_request+0x185/0x2a0 [ 50.990947] blk_mq_end_request+0x1e/0x80 [ 50.991997] nvme_complete_rq+0x1cc/0x1e0 [ 50.993171] nvme_pci_complete_rq+0x117/0x120 [ 50.994355] __blk_mq_complete_request+0x15e/0x180 [ 50.995988] blk_mq_complete_request+0x6f/0xa0 [ 50.997304] nvme_process_cq+0xe0/0x1b0 [ 50.998494] nvme_irq+0x28/0x50 [ 50.999572] __handle_irq_event_percpu+0xa2/0x1c0 [ 51.000986] handle_irq_event_percpu+0x32/0x80 [ 51.002356] handle_irq_event+0x3c/0x60 [ 51.003463] handle_edge_irq+0x1c9/0x200 [ 51.004473] handle_irq+0x23/0x30 [ 51.005363] do_IRQ+0x46/0xd0 [ 51.006182] common_interrupt+0xf/0xf [ 51.007129] </IRQ> 2. Work must always be offloaded to tasklet for rxe_post_send_kernel() when using NVMEoF in order to solve lock ordering between neigh->ha_lock seqlock and the nvme queue lock: [ 77.833783] Possible interrupt unsafe locking scenario: [ 77.833783] [ 77.835831] CPU0 CPU1 [ 77.837129] ---- ---- [ 77.838313] lock(&(&n->ha_lock)->seqcount); [ 77.839550] local_irq_disable(); [ 77.841377] lock(&(&nvmeq->q_lock)->rlock); [ 77.843222] lock(&(&n->ha_lock)->seqcount); [ 77.845178] <Interrupt> [ 77.846298] lock(&(&nvmeq->q_lock)->rlock); [ 77.847986] [ 77.847986] * DEADLOCK * 3. Same goes for the lock ordering between sch->q.lock and nvme queue lock: [ 47.634271] Possible interrupt unsafe locking scenario: [ 47.634271] [ 47.636452] CPU0 CPU1 [ 47.637861] ---- ---- [ 47.639285] lock(&(&sch->q.lock)->rlock); [ 47.640654] local_irq_disable(); [ 47.642451] lock(&(&nvmeq->q_lock)->rlock); [ 47.644521] lock(&(&sch->q.lock)->rlock); [ 47.646480] <Interrupt> [ 47.647263] lock(&(&nvmeq->q_lock)->rlock); [ 47.648492] [ 47.648492] * DEADLOCK * Using NVMEoF after this patch seems to finally be stable, without it, rxe eventually deadlocks the whole system and causes RCU stalls. Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:51 -04:00
Mustafa Ismail	eeb1af4f53	i40iw: Use correct address in dst_neigh_lookup for IPv6 Use of incorrect structure address for IPv6 neighbor lookup causes connections to IPv6 addresses to fail. Fix this by using correct address in call to dst_neigh_lookup. Fixes: `f27b4746f3` ("i40iw: add connection management code") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:51 -04:00
Mustafa Ismail	5a7189d529	i40iw: Fix memory leak in error path of create QP If i40iw_allocate_dma_mem fails when creating a QP, the memory allocated for the QP structure using kzalloc is not freed because iwqp->allocated_buffer is used to free the memory and it is not setup until later. Fix this by setting iwqp->allocated_buffer before allocating the dma memory. Fixes: `d374984179` ("i40iw: add files for iwarp interface") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:50 -04:00
Daria Velikovsky	37da2a03c0	RDMA/mlx5: Use proper spec flow label type Flow label is defined as u32 in the in ipv6 flow spec, but used internally in the flow specs parsing as u8. That was causing loss of part of flow_label value. Fixes: `2d1e697e9b` ('IB/mlx5: Add support to match inner packet fields') Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Daria Velikovsky <daria@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:50 -04:00
Yishai Hadas	18b0362e87	RDMA/mlx5: Don't assume that medium blueFlame register exists User can leave system without medium BlueFlames registers, however the code assumed that at least one such register exists. This patch fixes that assumption. Fixes: `c1be5232d2` ("IB/mlx5: Fix micro UAR allocator") Reported-by: Rohit Zambre <rzambre@uci.edu> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:50 -04:00
Michael J. Ruhl	f9e76ca377	IB/hfi1: Use after free race condition in send context error path A pio send egress error can occur when the PSM library attempts to to send a bad packet. That issue is still being investigated. The pio error interrupt handler then attempts to progress the recovery of the errored pio send context. Code inspection reveals that the handling lacks the necessary locking if that recovery interleaves with a PSM close of the "context" object contains the pio send context. The lack of the locking can cause the recovery to access the already freed pio send context object and incorrectly deduce that the pio send context is actually a kernel pio send context as shown by the NULL deref stack below: [<ffffffff8143d78c>] _dev_info+0x6c/0x90 [<ffffffffc0613230>] sc_restart+0x70/0x1f0 [hfi1] [<ffffffff816ab124>] ? __schedule+0x424/0x9b0 [<ffffffffc06133c5>] sc_halted+0x15/0x20 [hfi1] [<ffffffff810aa3ba>] process_one_work+0x17a/0x440 [<ffffffff810ab086>] worker_thread+0x126/0x3c0 [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0 [<ffffffff810b252f>] kthread+0xcf/0xe0 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 [<ffffffff816b8798>] ret_from_fork+0x58/0x90 [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40 This is the best case scenario and other scenarios can corrupt the already freed memory. Fix by adding the necessary locking in the pio send context error handler. Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 10:39:50 -04:00
Greg Thelen	9533b292a7	IB: remove redundant INFINIBAND kconfig dependencies INFINIBAND_ADDR_TRANS depends on INFINIBAND. So there's no need for options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND. Remove the unnecessary INFINIBAND depends. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-09 08:51:03 -04:00
Parav Pandit	9aa169213d	RDMA/cma: Do not query GID during QP state transition to RTR When commit [1] was added, SGID was queried to derive the SMAC address. Then, later on during a refactor [2], SMAC was no longer needed. However, the now useless GID query remained. Then during additional code changes later on, the GID query was being done in such a way that it caused iWARP queries to start breaking. Remove the useless GID query and resolve the iWARP breakage at the same time. This is discussed in [3]. [1] commit `dd5f03beb4` ("IB/core: Ethernet L2 attributes in verbs/cm structures") [2] commit `5c266b2304` ("IB/cm: Remove the usage of smac and vid of qp_attr and cm_av") [3] https://www.spinics.net/lists/linux-rdma/msg63951.html Suggested-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:45:18 -04:00
Jack Morgenstein	b03bcde962	IB/mlx4: Fix integer overflow when calculating optimal MTT size When the kernel was compiled using the UBSAN option, we saw the following stack trace: [ 1184.827917] UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx4/mr.c:349:27 [ 1184.828114] signed integer overflow: [ 1184.828247] -2147483648 - 1 cannot be represented in type 'int' The problem was caused by calling round_up in procedure mlx4_ib_umem_calc_optimal_mtt_size (on line 349, as noted in the stack trace) with the second parameter (1 << block_shift) (which is an int). The second parameter should have been (1ULL << block_shift) (which is an unsigned long long). (1 << block_shift) is treated by the compiler as an int (because 1 is an integer). Now, local variable block_shift is initialized to 31. If block_shift is 31, 1 << block_shift is 1 << 31 = 0x80000000=-214748368. This is the most negative int value. Inside the round_up macro, there is a cast applied to ((1 << 31) - 1). However, this cast is applied AFTER ((1 << 31) - 1) is calculated. Since (1 << 31) is treated as an int, we get the negative overflow identified by UBSAN in the process of calculating ((1 << 31) - 1). The fix is to change (1 << block_shift) to (1ULL << block_shift) on line 349. Fixes: `9901abf583` ("IB/mlx4: Use optimal numbers of MTT entries") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:31:43 -04:00
Sebastian Sanchez	59482a1491	IB/hfi1: Fix memory leak in exception path in get_irq_affinity() When IRQ affinity is set and the interrupt type is unknown, a cpu mask allocated within the function is never freed. Fix this memory leak by allocating memory within the scope where it is used. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:48 -04:00
Sebastian Sanchez	e9777ad439	IB/{hfi1, rdmavt}: Fix memory leak in hfi1_alloc_devdata() upon failure When allocating device data, if there's an allocation failure, the already allocated memory won't be freed such as per-cpu counters. Fix memory leaks in exception path by creating a common reentrant clean up function hfi1_clean_devdata() to be used at driver unload time and device data allocation failure. To accomplish this, free_platform_config() and clean_up_i2c() are changed to be reentrant to remove dependencies when they are called in different order. This helps avoid NULL pointer dereferences introduced by this patch if those two functions weren't reentrant. In addition, set dd->int_counter, dd->rcv_limit, dd->send_schedule and dd->tx_opstats to NULL after they're freed in hfi1_clean_devdata(), so that hfi1_clean_devdata() is fully reentrant. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:48 -04:00
Sebastian Sanchez	45d924571a	IB/hfi1: Fix NULL pointer dereference when invalid num_vls is used When an invalid num_vls is used as a module parameter, the code execution follows an exception path where the macro dd_dev_err() expects dd->pcidev->dev not to be NULL in hfi1_init_dd(). This causes a NULL pointer dereference. Fix hfi1_init_dd() by initializing dd->pcidev and dd->pcidev->dev earlier in the code. If a dd exists, then dd->pcidev and dd->pcidev->dev always exists. BUG: unable to handle kernel NULL pointer dereference at 00000000000000f0 IP: __dev_printk+0x15/0x90 Workqueue: events work_for_cpu_fn RIP: 0010:__dev_printk+0x15/0x90 Call Trace: dev_err+0x6c/0x90 ? hfi1_init_pportdata+0x38d/0x3f0 [hfi1] hfi1_init_dd+0xdd/0x2530 [hfi1] ? pci_conf1_read+0xb2/0xf0 ? pci_read_config_word.part.9+0x64/0x80 ? pci_conf1_write+0xb0/0xf0 ? pcie_capability_clear_and_set_word+0x57/0x80 init_one+0x141/0x490 [hfi1] local_pci_probe+0x3f/0xa0 work_for_cpu_fn+0x10/0x20 process_one_work+0x152/0x350 worker_thread+0x1cf/0x3e0 kthread+0xf5/0x130 ? max_active_store+0x80/0x80 ? kthread_bind+0x10/0x10 ? do_syscall_64+0x6e/0x1a0 ? SyS_exit_group+0x10/0x10 ret_from_fork+0x35/0x40 Cc: <stable@vger.kernel.org> # 4.9.x Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:47 -04:00
Mike Marciniszyn	0a0bcb046b	IB/hfi1: Fix loss of BECN with AHG AHG may be armed to use the stored header, which by design is limited to edits in the PSN/A 32 bit word (bth2). When the code is trying to send a BECN, the use of the stored header will lose the BECN bit. Fix by avoiding AHG when getting ready to send a BECN. This is accomplished by always claiming the packet is not a middle packet which is an AHG precursor. BECNs are not a normal case and this should not hurt AHG optimizations. Cc: <stable@vger.kernel.org> # 4.14.x Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:47 -04:00
Michael J. Ruhl	5da9e742be	IB/hfi1 Use correct type for num_user_context The module parameter num_user_context is defined as 'int' and defaults to -1. The module_param_named() says that it is uint. Correct module_param_named() type information and update the modinfo text to reflect the default value. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:47 -04:00
Mike Marciniszyn	f59fb9e051	IB/hfi1: Fix handling of FECN marked multicast packet The code for handling a marked UD packet unconditionally returns the dlid in the header of the FECN marked packet. This is not correct for multicast packets where the DLID is in the multicast range. The subsequent attempt to send the CNP with the multicast lid will cause the chip to halt the ack send context because the source lid doesn't match the chip programming. The send context will be halted and flush any other pending packets in the pio ring causing the CNP to not be sent. A part of investigating the fix, it was determined that the 16B work broke the FECN routine badly with inconsistent use of 16 bit and 32 bits types for lids and pkeys. Since the port's source lid was correctly 32 bits the type mixmatches need to be dealt with at the same time as fixing the CNP header issue. Fix these issues by: - Using the ports lid for as the SLID for responding to FECN marked UD packets - Insure pkey is always 16 bit in this and subordinate routines - Insure lids are 32 bits in this and subordinate routines Cc: <stable@vger.kernel.org> # 4.14.x Fixes: `88733e3b84` ("IB/hfi1: Add 16B UD support") Reviewed-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-05-03 15:24:44 -04:00
Håkon Bugge	db82476f37	IB/core: Make ib_mad_client_id atomic Currently, the kernel protects access to the agent ID allocator on a per port basis using a spinlock, so it is impossible for two apps/threads on the same port to get the same TID, but it is entirely possible for two threads on different ports to end up with the same TID. As this can be confusing (regardless of it being legal according to the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID usage), and as the rdma-core user space API for /dev/umad devices implies unique TIDs even across ports, make the TID an atomic type so that no two allocations, regardless of port number, will be the same. Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-30 13:07:28 -04:00
Bharat Potnuri	2df19e19ae	iw_cxgb4: Atomically flush per QP HW CQEs When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire corresponding QP lock before moving the CQEs into its corresponding SW queue and accessing the SQ contents for completing a WR. Ignore CQEs if corresponding QP is already flushed. Cc: stable@vger.kernel.org Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:38:44 -04:00
Ariel Levkovich	54e7e48b13	IB/uverbs: Fix kernel crash during MR deregistration flow This patch fixes a crash that happens due to access to an uninitialized DM pointer within the MR object. The change makes sure the DM pointer in the MR object is set to NULL during a non-DM MR creation to prevent a false indication that this MR is related to a DM in the dereg flow. Fixes: `be934cca9e` ("IB/uverbs: Add device memory registration ioctl support") Reported-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:22:24 -04:00
Ariel Levkovich	5ccbf63f87	IB/uverbs: Prevent reregistration of DM_MR to regular MR This patch adds a check in the ib_uverbs_rereg_mr flow to make sure there's no attempt to rereg a device memory MR to regular MR. In such case the command will fail with -EINVAL status. fixes: `be934cca9e` ("IB/uverbs: Add device memory registration ioctl support") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:22:24 -04:00
Leon Romanovsky	4f9ca2d868	RDMA/mlx4: Add missed RSS hash inner header flag Despite being advertised to user space application, the RSS inner header flag was filtered by checks at the beginning of QP creation routine. Cc: <stable@vger.kernel.org> # 4.15 Fixes: `4d02ebd9bb` ("IB/mlx4: Fix RSS hash fields restrictions") Fixes: `07d84f7b6a` ("IB/mlx4: Add support to RSS hash for inner headers") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:22:23 -04:00
oulijun	ab17884903	RDMA/hns: Fix a couple misspellings This patch fixes two spelling errors. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:21:51 -04:00
oulijun	137ae32084	RDMA/hns: Submit bad wr When generated bad work reqeust, it needs to report to user. This patch mainly fixes it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:48 -04:00
oulijun	634f639022	RDMA/hns: Update assignment method for owner field of send wqe When posting a work reqeust, it need to update the owner bit of send wqe. This patch mainly fix the bug when posting multiply work request. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	ae25db0028	RDMA/hns: Adjust the order of cleanup hem table This patch update the order of cleaning hem table for trrl_table and irrl_table as well as mtt_cqe_table and mtt_table. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	b6dd9b3483	RDMA/hns: Only assign dqpn if IB_QP_PATH_DEST_QPN bit is set Only when the IB_QP_PATH_DEST_QPN flag of attr_mask is set is it valid to assign the dqpn field of qp context Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	734f38638d	RDMA/hns: Remove some unnecessary attr_mask judgement This patch deletes some unnecessary attr_mask if condition in hip08 according to the IB protocol. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	6852af8662	RDMA/hns: Only assign mtu if IB_QP_PATH_MTU bit is set Only when the IB_QP_PATH_MTU flag of attr_mask is set it is valid to assign the mtu field of qp context when qp type is not GSI and UD. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	6e1a70943c	RDMA/hns: Fix the qp context state diagram According to RoCE protocol, it is possible to transition from error to error state for modifying qp in hip08. This patch fix it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	328d405b3d	RDMA/hns: Intercept illegal RDMA operation when use inline data RDMA read operation is not supported inline data. If user cofigures issue a RDMA read and use inline data, it will happen a hardware error. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
oulijun	215a8c09e5	RDMA/hns: Bugfix for init hem table During init hem table, type should be used instead of table->type which is finally initializaed with type. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
Zhu Yanjun	9fd4350ba8	IB/rxe: avoid double kfree_skb When skb is sent, it will pass the following functions in soft roce. rxe_send [rdma_rxe] ip_local_out __ip_local_out ip_output ip_finish_output ip_finish_output2 dev_queue_xmit __dev_queue_xmit dev_hard_start_xmit In the above functions, if error occurs in the above functions or iptables rules drop skb after ip_local_out, kfree_skb will be called. So it is not necessary to call kfree_skb in soft roce module again. Or else crash will occur. The steps to reproduce: server client --------- --------- \|1.1.1.1\|<----rxe-channel--->\|1.1.1.2\| --------- --------- On server: rping -s -a 1.1.1.1 -v -C 10000 -S 512 On client: rping -c -a 1.1.1.1 -v -C 10000 -S 512 The kernel configs CONFIG_DEBUG_KMEMLEAK and CONFIG_DEBUG_OBJECTS are enabled on both server and client. When rping runs, run the following command in server: iptables -I OUTPUT -p udp --dport 4791 -j DROP Without this patch, crash will occur. CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00
Jianchao Wang	2da36d44a9	IB/rxe: add RXE_START_MASK for rxe_opcode IB_OPCODE_RC_SEND_ONLY_INV w/o RXE_START_MASK, the last_psn of IB_OPCODE_RC_SEND_ONLY_INV will not be updated in update_wqe_psn, and the corresponding wqe will not be acked in rxe_completer due to its last_psn is zero. Finally, the other wqe will also not be able to be acked, because the wqe of IB_OPCODE_RC_SEND_ONLY_INV with last_psn 0 is still there. This causes large amount of io timeout when nvmeof is over rxe. Add RXE_START_MASK for IB_OPCODE_RC_SEND_ONLY_INV to fix this. Signed-off-by: Jianchao Wang <jianchao.w.wang@oracle.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-04-27 14:20:47 -04:00

1 2 3 4 5 ...

8322 Commits