linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-17 03:56:33 +07:00

Author	SHA1	Message	Date
Ariel Levkovich	cdbd0d2bae	net/mlx5: Mkey creation command adjustments This change updates the mlx5 interface to create mkey on the device. The updates in the command mailbox include increasing the access mode type field to 5 bits in order to support additional types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device memory access type and will be used when registering MR on allocated device memory. All the places that use the old access mode format are adjusted as well. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-05 13:04:49 -06:00
Ariel Levkovich	e72bd817ae	net/mlx5: Query device memory capabilities This patch adds querying of device memory capabilities by the mlx5_core driver during initialization. Device memory capabilities is a new capability type and structure which contains the necessary data that is needed for future device memory allocation. The presence of this new capabilities struct is indicated in the general capabilities struct which is queried first by the driver. If the presence bit is set, the driver will also query the new capabilities struct and save it in the device context. Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-04-05 13:04:48 -06:00
Bodong Wang	05d3ac978e	net/mlx5: Packet pacing enhancement Add two new parameters: max_burst_sz and typical_pkt_size (both in bytes) to rate limit configurations. max_burst_sz: The device will schedule bursts of packets for an SQ connected to this rate, smaller than or equal to this value. Value 0x0 indicates packet bursts will be limited to the device defaults. This field should be used if bursts of packets must be strictly kept under a certain value. typical_pkt_size: When the rate limit is intended for a stream of similar packets, stating the typical packet size can improve the accuracy of the rate limiter. The expected packet size will be the same for all SQs associated with the same rate limit index. Ethernet driver is updated according to this change, but these two parameters will be kept as 0 due to lacking of proper way to get the configurations from user space which requires to change ndo_set_tx_maxrate interface. Signed-off-by: Bodong Wang <bodong@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-19 11:54:41 -06:00
Jason Gunthorpe	48962f5c6f	RDMA/mlx4: Move flag constants to uapi header MLX4_USER_DEV_CAP_LARGE_CQE (via mlx4_ib_alloc_ucontext_resp.dev_caps) and MLX4_IB_QUERY_DEV_RESP_MASK_CORE_CLOCK_OFFSET (via mlx4_uverbs_ex_query_device_resp.comp_mask) are copied directly to userspace and form part of the uAPI. Move them to the uapi header where they belong. Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-03-15 15:58:03 -06:00
Doug Ledford	2d873449a2	Merge branch 'k.o/wip/dl-for-rc' into k.o/wip/dl-for-next Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit `42cea83f95` (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit `b5ca15ad7e` (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch Signed-off-by: Doug Ledford <dledford@redhat.com>	2018-03-14 19:28:58 -04:00
Leon Romanovsky	31135eb388	net/mlx5: Fix wrongly assigned CQ reference counter The kernel compiled with CONFIG_REFCOUNT_FULL produces the following error. The reason to it that initial value of refcount_t is supposed to be more than 0, change it. [ 3.106634] ------------[ cut here ]------------ [ 3.107756] refcount_t: increment on 0; use-after-free. [ 3.109130] WARNING: CPU: 0 PID: 1 at lib/refcount.c:153 refcount_inc+0x27/0x30 [ 3.110085] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00028-gf683e04bdccc #137 [ 3.110085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 3.110085] RIP: 0010:refcount_inc+0x27/0x30 [ 3.110085] RSP: 0000:ffffaa620000fba0 EFLAGS: 00010286 [ 3.110085] RAX: 0000000000000000 RBX: ffff9a6d1a1821c8 RCX: ffffffff98a50f48 [ 3.110085] RDX: 0000000000000001 RSI: 0000000000000086 RDI: 0000000000000246 [ 3.110085] RBP: ffff9a6d1ac800a0 R08: 0000000000000289 R09: 000000000000000a [ 3.110085] R10: fffff03bc0682840 R11: ffffffff9949856d R12: ffff9a6d1b4a4000 [ 3.110085] R13: 0000000000000000 R14: ffff9a6d1a0a6c00 R15: ffffaa620000fc5c [ 3.110085] FS: 0000000000000000(0000) GS:ffff9a6d1fc00000(0000) knlGS:0000000000000000 [ 3.110085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3.110085] CR2: 0000000000000000 CR3: 000000000ba0a000 CR4: 00000000000006b0 [ 3.110085] Call Trace: [ 3.110085] mlx5_core_create_cq+0xde/0x250 [ 3.110085] ? __kmalloc+0x1ce/0x1e0 [ 3.110085] mlx5e_create_cq+0x15c/0x1e0 [ 3.110085] mlx5e_open_drop_rq+0xea/0x190 [ 3.110085] mlx5e_attach_netdev+0x53/0x140 [ 3.110085] mlx5e_attach+0x3d/0x60 [ 3.110085] mlx5e_add+0x11d/0x2f0 [ 3.110085] mlx5_add_device+0x77/0x170 [ 3.110085] mlx5_register_interface+0x74/0xc0 [ 3.110085] ? set_debug_rodata+0x11/0x11 [ 3.110085] init+0x67/0x72 [ 3.110085] ? mlx4_en_init_ptys2ethtool_map+0x346/0x346 [ 3.110085] do_one_initcall+0x98/0x147 [ 3.110085] ? set_debug_rodata+0x11/0x11 [ 3.110085] kernel_init_freeable+0x164/0x1e0 [ 3.110085] ? rest_init+0xb0/0xb0 [ 3.110085] kernel_init+0xa/0x100 [ 3.110085] ret_from_fork+0x35/0x40 [ 3.110085] Code: 00 00 00 00 e8 ab ff ff ff 84 c0 74 02 f3 c3 80 3d 3b c3 64 01 00 75 f5 48 c7 c7 68 0b 81 98 c6 05 2b c3 64 01 01 e8 79 d7 a3 ff <0f> ff c3 66 0f 1f 44 00 00 8b 06 83 f8 ff 74 39 31 c9 39 f8 89 [ 3.110085] ---[ end trace a0068e1c68438a74 ]--- Fixes: `f105b45bf7` ("net/mlx5: CQ hold/put API") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:36 -08:00
Aviad Yehezkel	cb01008390	net/mlx5: IPSec, Add support for ESN Currently ESN is not supported with IPSec device offload. This patch adds ESN support to IPsec device offload. Implementing new xfrm device operation to synchronize offloading device ESN with xfrm received SN. New QP command to update SA state at the following: ESN 1 ESN 2 ESN 3 \|----------------------\|----------------------\|-----------* ^ ^ ^ ^ ^ ^ ^ - marks where QP command invoked to update the SA ESN state machine. \| - marks the start of the ESN scope (0-2^32-1). At this point move SA ESN overlap bit to zero and increment ESN. * - marks the middle of the ESN scope (2^31). At this point move SA ESN overlap bit to one. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:36 -08:00
Aviad Yehezkel	75ef3f5515	net/mlx5e: Added common function for to_ipsec_sa_entry New function for getting driver internal sa entry from xfrm state. All checks are done in one function. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:35 -08:00
Aviad Yehezkel	05564d0ae0	net/mlx5: Add flow-steering commands for FPGA IPSec implementation In order to add a context to the FPGA, we need to get both the software transform context (which includes the keys, etc) and the source/destination IPs (which are included in the steering rule). Therefore, we register new set of firmware like commands for the FPGA. Each time a rule is added, the steering core infrastructure calls the FPGA command layer. If the rule is intended for the FPGA, it combines the IPs information with the software transformation context and creates the respective hardware transform. Afterwards, it calls the standard steering command layer. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:35 -08:00
Aviad Yehezkel	d6c4f0298c	net/mlx5: Refactor accel IPSec code The current code has one layer that executed FPGA commands and the Ethernet part directly used this code. Since downstream patches introduces support for IPSec in mlx5_ib, we need to provide some abstractions. This patch refactors the accel code into one layer that creates a software IPSec transformation and another one which creates the actual hardware context. The internal command implementation is now hidden in the FPGA core layer. The code also adds the ability to share FPGA hardware contexts. If two contexts are the same, only a reference count is taken. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:34 -08:00
Aviad Yehezkel	af9fe19d66	net/mlx5: Added required metadata capability for ipsec Currently our device requires additional metadata in packet to perform ipsec crypto offload. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:34 -08:00
Aviad Yehezkel	1d2005e204	net/mlx5: Export ipsec capabilities We will need that for ipsec verbs. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:54:30 -08:00
Aviad Yehezkel	65802f4800	net/mlx5: IPSec, Add command V2 support This patch adds V2 command support. New fpga devices support extended features (udp encap, esn etc...), this features require new hardware sadb format therefore we have a new version of commands to manipulate it. Signed-off-by: Yossef Efraim <yossefe@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:53:18 -08:00
Yossi Kuperman	788a821076	net/mlx5e: IPSec, Add support for ESP trailer removal by hardware Current hardware decrypts and authenticates incoming ESP packets. Subsequently, the software extracts the nexthdr field, truncates the trailer and adjusts csum accordingly. With this patch and a capable device, the trailer is being removed by the hardware and the nexthdr field is conveyed via PET. This way we avoid both the need to access the trailer (cache miss) and to compute its relative checksum, which significantly improve the performance. Experiment shows that trailer removal improves the performance by 2Gbps, (netperf). Both forwarding and host-to-host configurations. Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:53:18 -08:00
Yossi Kuperman	581fdddee4	net/mlx5: IPSec, Generalize sandbox QP commands The current code assume only SA QP commands. Refactor in order to pave the way for new QP commands: 1. Generic cmd response format. 2. SA cmd checks are in dedicated functions. 3. Aligned debug prints. Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-07 15:53:17 -08:00
Saeed Mahameed	d83a69c2b1	net/mlx5: Use MLX5_IPSEC_DEV macro for ipsec caps Fix build break of mlx5_accel_ipsec_device_caps is not defined when MLX5_ACCEL is not selected, use MLX5_IPSEC_DEV instead which handles such case. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Doug Ledford <dledford@redhat.com>	2018-03-07 15:49:19 -08:00
Aviad Yehezkel	e810bf5e96	net/mlx5: Flow steering cmd interface should get the fte when deleting Previously, deleting a flow steering entry only got the index. Since the FPGA implementation of FTE's deletion might need to dig inside the FTE itself, we would like to get the FTE's context. Changing the interface to pass the FTE context. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:20:15 -08:00
Matan Barak	d2ec6a35e8	net/mlx5: Embed mlx5_flow_act into fs_fte fte objects contain the match value and action. Currently, extending the actions require in adding them both to the API and fs_fte. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:20:13 -08:00
Aviad Yehezkel	5f4183781a	net/mlx5: Add empty egress namespace to flow steering core Currently, we don't support egress flow steering namespace in mlx5 flow steering core implementation. However, when we want to encrypt a packet, we model it as a flow steering rule in the egress path. To overcome this, we add an empty egress namespace to flow steering. This namespace is initialized only when ipsec support exists. In the future, this will grow to a full blown full steering implementation, resembling the ingress path. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:20:13 -08:00
Matan Barak	af76c50198	net/mlx5: Add shim layer between fs and cmd The shim layer allows each namespace to define possibly different functionality for add/delete/update commands. The shim layer introduced here, will be used to support flow steering with the FPGA. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:19:56 -08:00
Matan Barak	a9db0ecf15	{net,IB}/mlx5: Add has_tag to mlx5_flow_act The has_tag member will indicate whether a tag action was specified in flow specification. A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0 means that the user doesn't care about flow_tag. HW always provide a flow_tag = 0 if all flow tags requested on a specific flow are 0. So we need a way (in the driver) to differentiate between a user really requesting flow_tag = 0 and a user who does not care, in order to be able to report conflicting flow tags on a specific flow. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:33 -08:00
Matan Barak	04e87170b0	net/mlx5: FPGA and IPSec initialization to be before flow steering Some flow steering namespace initialization (i.e. egress namespace) might depend on FPGA capabilities. Changing the initialization order such that the FPGA will be initialized before flow steering. Flow steering fs cmds initialization might depend on IPSec capabilities. Changing the initialization order such that the IPSec will be initialized before flow steering as well. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:10 -08:00
Aviad Yehezkel	1c9a10ebc7	net/mlx5e: Removed not need synchronize_rcu This is already done by xfrm layer between state_dev_del callback to state_dev_free callback. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:09 -08:00
Aviad Yehezkel	dc7debec07	net/mlx5e: Fixed sleeping inside atomic context We can't allocate with GFP_KERNEL inside spinlock. Actually ida_simple doesn't require spinlock so remove it. Fixes: `547eede070` ("net/mlx5e: IPSec, Innova IPSec offload infrastructure") Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:09 -08:00
Aviad Yehezkel	ef927a9c16	net/mlx5e: Wait for FPGA command responses with a timeout Generally, FPGA IPSec commands must always complete. We want to wait for one minute for them to complete gracefully also when killing a process. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:08 -08:00
Aviad Yehezkel	46f3ee4f3a	net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabled IPSec init and cleanup functions also depends on linux/mlx5/driver.h. Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-03-06 22:06:08 -08:00
Daniel Jurgens	aba4621346	{net, IB}/mlx5: Raise fatal IB event when sys error occurs All other mlx5_events report the port number as 1 based, which is how FW reports it in the port event EQE. Reporting 0 for this event causes mlx5_ib to not raise a fatal event notification to registered clients due to a seemingly invalid port. All switch cases in mlx5_ib_event that go through the port check are supposed to set the port now, so just do it once at variable declaration. Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver") Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2018-02-28 12:10:32 -07:00
Mark Bloch	c5447c7059	net/mlx5: E-Switch, Reload IB interface when switching devlink modes Up until this point it wasn't possible to activate IB representors when switching to switchdev mode, remove this limitation. We trigger reload of the PF IB interface in order to make sure that already allocated resources are invalid and new resources will be opened correctly with all the limitations of switchdev mode applied (only raw packet capabilities, without RoCE). We also move the remove/add to a place where the E-Switch mode is set/unset to better control when to trigger this action, this will allow the IB side to start in the correct mode. For better code reuse, create a function which reloads an interface and export it. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-23 12:36:39 -08:00
Mark Bloch	f80be5436d	net/mlx5: E-Switch, Optimize HW steering tables in switchdev mode Under switchdev mode we insert an eswitch miss rule causing any unmatched traffic to be sent towards the PF vport. This miss rule can be optimized if we break it to two, one case is for multicast traffic and the other for unicast. Breaking the miss rule into two (unicast and multicast) allows the firmware to program the hardware in a more efficient way. Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors): IXIA -> NIC -> PF -> IB representor -> NIC -> VF: - Without this optimization: 9.2 MPPS. - With this optimization: 18 MPPS. VF -> NIC -> IB representor-> PF -> NIC -> IXIA: - Without this optimization: 17 MPPS. - With this optimization: 23.4 MPPS. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-23 12:36:38 -08:00
Mark Bloch	cd3d07e7db	net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev mode The max FTE number should be the max number of SQs that can be opened. Ethernet representors open one SQ each. Once we add IB representor this will increase (depends on the user). For now lets start with 31 per IB representor and if needed increase in the future. This increase only affects the number of FTEs in the slow path FDB, offloaded rules (done via TC on the fast path portion of the FDB) aren't affected. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-23 12:36:38 -08:00
Mark Bloch	57cbd893c4	net/mlx5: E-Switch, Move representors definition to a global scope In preparation for IB representors, move representors structs to a global scope, also expose functions needed for registration, unregistration, eswitch mode and creating a flow rule to direct traffic from SQs to the right VF. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-23 12:36:38 -08:00
Mark Bloch	22215908d8	net/mlx5: E-Switch, Add callback to get representor device Add a callback interface to get a protocol device (per representor type). The Ethernet representors will expose their netdev via this interface. This functionality can be later used by IB representor in order to find the corresponding net device representor. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-23 12:36:38 -08:00
Yonatan Cohen	388ca8be00	IB/mlx5: Implement fragmented completion queue (CQ) The current implementation of create CQ requires contiguous memory, such requirement is problematic once the memory is fragmented or the system is low in memory, it causes for failures in dma_zalloc_coherent(). This patch implements new scheme of fragmented CQ to overcome this issue by introducing new type: 'struct mlx5_frag_buf_ctrl' to allocate fragmented buffers, rather than contiguous ones. Base the Completion Queues (CQs) on this new fragmented buffer. It fixes following crashes: kworker/29:0: page allocation failure: order:6, mode:0x80d0 CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0 Workqueue: ib_cm cm_work_handler [ib_cm] Call Trace: [<>] dump_stack+0x19/0x1b [<>] warn_alloc_failed+0x110/0x180 [<>] __alloc_pages_slowpath+0x6b7/0x725 [<>] __alloc_pages_nodemask+0x405/0x420 [<>] dma_generic_alloc_coherent+0x8f/0x140 [<>] x86_swiotlb_alloc_coherent+0x21/0x50 [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core] [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core] [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core] [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core] [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib] [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib] Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-02-15 00:30:03 -08:00
Saeed Mahameed	3ec5693b17	net/mlx5: Remove redundant EQ API exports EQ structure and API is private to mlx5_core driver only, external drivers should not have access or the means to manipulate EQ objects. Remove redundant exports and move API functions out of the linux/mlx5 include directory into the driver's mlx5_core.h private include file. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:30:02 -08:00
Saeed Mahameed	3ac7afdbcf	net/mlx5: Move CQ completion and event forwarding logic to eq.c Since CQ tree is now per EQ, CQ completion and event forwarding became specific implementation of EQ logic, this patch moves that logic to eq.c and makes those functions static. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:30:02 -08:00
Saeed Mahameed	f105b45bf7	net/mlx5: CQ hold/put API Now as the CQ table is per EQ, add an API to hold/put CQ to be used from eq.c in downstream patch. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:30:01 -08:00
Saeed Mahameed	d5c07157dd	net/mlx5: EQ add/del CQ API Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ creation/destruction, as CQ table is now private to eq.c. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:30:00 -08:00
Saeed Mahameed	d2ff4fa575	net/mlx5: Add missing likely/unlikely hints to cq events If a hardware event is targeting a CQ, that CQ should exist. Add unlikely to error handling flows. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:30:00 -08:00
Saeed Mahameed	02d92f7903	net/mlx5: CQ Database per EQ Before this patch the driver had one CQ database protected via one spinlock, this spinlock is meant to synchronize between CQ adding/removing and CQ IRQ interrupt handling. On a system with large number of CPUs and on a work load that requires lots of interrupts, this global spinlock becomes a very nasty hotspot and introduces a contention between the active cores, which will significantly hurt performance and becomes a bottleneck that prevents seamless cpu scaling. To solve this we simply move the CQ database and its spinlock to be per EQ (IRQ), thus per core. Tested with: system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores netperf command: ./super_netperf 200 -P 0 -t TCP_RR -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M WITHOUT THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.32 0.00 36.15 0.09 0.00 34.02 0.00 0.00 0.00 25.41 Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271 Overhead Command Shared Object Symbol + 14.28% swapper [kernel.vmlinux] [k] intel_idle + 12.25% swapper [kernel.vmlinux] [k] queued_spin_lock_slowpath + 10.29% netserver [kernel.vmlinux] [k] queued_spin_lock_slowpath + 1.32% netserver [kernel.vmlinux] [k] mlx5e_xmit WITH THIS PATCH: Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle Average: all 4.27 0.00 34.31 0.01 0.00 18.71 0.00 0.00 0.00 42.69 Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483 Overhead Command Shared Object Symbol + 23.33% swapper [kernel.vmlinux] [k] intel_idle + 1.69% netserver [kernel.vmlinux] [k] mlx5e_xmit Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Gal Pressman <galp@mellanox.com>	2018-02-15 00:29:54 -08:00
John Allen	2fa56a4944	ibmvnic: Remove skb->protocol checks in ibmvnic_xmit Having these checks in ibmvnic_xmit causes problems with VLAN tagging and balance-alb/tlb bonding modes. The restriction they imposed can be removed. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:47:00 -05:00
Niklas Cassel	1029117127	net: stmmac: remove redundant enable of PMT irq For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt is set, and if so, setting the bit again. For dwmac1000, GMAC_INT_DEFAULT_MASK does not include GMAC_INT_DISABLE_PMT, so it is redundant to check if hw->pmt is set, and if so, clearing an already cleared bit. Improve code readability by removing this redundant code. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:23:04 -05:00
Niklas Cassel	e879b7ab37	net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4 GMAC_INT_DEFAULT_MASK is written to the interrupt enable register. In previous versions of the IP (e.g. dwmac1000), this register was instead an interrupt mask register. To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK to GMAC_INT_DEFAULT_ENABLE. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:23:04 -05:00
Niklas Cassel	1b84ca1875	net: stmmac: discard disabled flags in interrupt status register The interrupt status register in both dwmac1000 and dwmac4 ignores interrupt enable (for dwmac4) / interrupt mask (for dwmac1000). Therefore, if we want to check only the bits that can actually trigger an irq, we have to filter the interrupt status register manually. Commit `0a764db103` ("stmmac: Discard masked flags in interrupt status register") fixed this for dwmac1000. Fix the same issue for dwmac4. Just like commit `0a764db103` ("stmmac: Discard masked flags in interrupt status register"), this makes sure that we do not get spurious link up/link down prints. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:23:04 -05:00
Thomas Falcon	faefaa9721	ibmvnic: Reset long term map ID counter When allocating RX or TX buffer pools, the driver needs to provide a unique mapping ID to firmware for each pool. This value is assigned using a counter which is incremented after a new pool is created. The ID can be an integer ranging from 1-255. When migrating to a device that requests a different number of queues, this value was not being reset properly. As a result, after enough migrations, the counter exceeded the upper bound and pool creation failed. This is fixed by resetting the counter to one in this case. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:20:39 -05:00
David S. Miller	437a4db66d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2018-02-09 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Two fixes for BPF sockmap in order to break up circular map references from programs attached to sockmap, and detaching related sockets in case of socket close() event. For the latter we get rid of the smap_state_change() and plug into ULP infrastructure, which will later also be used for additional features anyway such as TX hooks. For the second issue, dependency chain is broken up via map release callback to free parse/verdict programs, all from John. 2) Fix a libbpf relocation issue that was found while implementing XDP support for Suricata project. Issue was that when clang was invoked with default target instead of bpf target, then various other e.g. debugging relevant sections are added to the ELF file that contained relocation entries pointing to non-BPF related sections which libbpf trips over instead of skipping them. Test cases for libbpf are added as well, from Jesper. 3) Various misc fixes for bpftool and one for libbpf: a small addition to libbpf to make sure it recognizes all standard section prefixes. Then, the Makefile in bpftool/Documentation is improved to explicitly check for rst2man being installed on the system as we otherwise risk installing empty man pages; the man page for bpftool-map is corrected and a set of missing bash completions added in order to avoid shipping bpftool where the completions are only partially working, from Quentin. 4) Fix applying the relocation to immediate load instructions in the nfp JIT which were missing a shift, from Jakub. 5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL in this case; and explicitly delete the veth devices in the two tests test_xdp_{meta,redirect}.sh before dismantling the netnses as when selftests are run in batch mode, then workqueue to handle destruction might not have finished yet and thus veth creation in next test under same dev name would fail, from Yonghong. 6) Fix test_kmod.sh to check the test_bpf.ko module path before performing an insmod, and fallback to modprobe. Especially the latter is useful when having a device under test that has the modules installed instead, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-09 14:05:10 -05:00
Dean Nelson	88c991a917	net: thunder: change q_len's type to handle max ring size The Cavium thunder nicvf driver supports rx/tx rings of up to 65536 entries per. The number of entires are stored in the q_len member of struct q_desc_mem. The problem is that q_len being a u16, results in 65536 becoming 0. In getting pointers to descriptors in the rings, the driver uses q_len minus 1 as a mask after incrementing the pointer, in order to go back to the beginning and not go past the end of the ring. With the q_len set to 0 the mask is no longer correct and the driver does go beyond the end of the ring, causing various ills. Usually the first thing that shows up is a "NETDEV WATCHDOG: enP2p1s0f1 (nicvf): transmit queue 7 timed out" warning. This patch remedies the problem by changing q_len to a u32. Signed-off-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-08 15:34:23 -05:00
Nathan Fontenot	ec95dffa40	ibmvnic: queue reset when CRQ gets closed during reset While handling a driver reset we get a H_CLOSED return trying to send a CRQ event. When this occurs we need to queue up another reset attempt. Without doing this we see instances where the driver is left in a closed state because the reset failed and there is no further attempts to reset the driver. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-08 15:11:15 -05:00
Jakub Kicinski	1a5e8e3500	nfp: populate MODULE_VERSION DKMS and similar out-of-tree module replacement services use module version to make sure the out-of-tree software is not older than the module shipped with the kernel. We use the kernel version in ethtool -i output, put it into MODULE_VERSION as well. Reported-by: Jan Gutter <jan.gutter@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-08 10:01:27 -05:00
Jakub Kicinski	0d592e52fb	nfp: limit the number of TSO segments Most FWs limit the number of TSO segments a frame can produce to 64. This is for fairness and efficiency (of FW datapath) reasons. If a frame with larger number of segments is submitted the FW will drop it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-08 10:01:27 -05:00
Jakub Kicinski	d692403e5c	nfp: forbid disabling hw-tc-offload on representors while offload active All netdevs which can accept TC offloads must implement .ndo_set_features(). nfp_reprs currently do not do that, which means hw-tc-offload can be turned on and off even when offloads are active. Whether the offloads are active is really a question to nfp_ports, so remove the per-app tc_busy callback indirection thing, and simply count the number of offloaded items in nfp_port structure. Fixes: `8a2768732a` ("nfp: provide infrastructure for offloading flower based TC filters") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-02-08 10:01:27 -05:00

1 2 3 4 5 ...

21944 Commits