linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-27 19:55:05 +07:00

Author	SHA1	Message	Date
Petr Machata	23effa2479	mlxsw: reg: Add max_shaper_bs to QoS ETS Element Configuration The QEEC register configures scheduling elements. One of the bits of configuration is the burst size to use for the shaper installed on the element. Add the necessary fields to support this configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:31 +01:00
Petr Machata	be1d5a8a77	mlxsw: spectrum_qdisc: Extract a common leaf unoffload function When the RED Qdisc is unoffloaded, it needs to reduce the reported backlog by the amount that is in the HW, so that only the SW backlog is contained in the counter. The same thing will need to be done by TBF, and likely any other leaf Qdisc as well. Extract a helper mlxsw_sp_qdisc_leaf_unoffload() and call it from mlxsw_sp_qdisc_red_unoffload(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:31 +01:00
Petr Machata	3d0d592193	mlxsw: spectrum_qdisc: Add mlxsw_sp_qdisc_get_class_stats() Add a wrapper around mlxsw_sp_qdisc_collect_tc_stats() and mlxsw_sp_qdisc_update_stats() for the simple case of doing both in one go: mlxsw_sp_qdisc_get_class_stats(). Dispatch to that function from mlxsw_sp_qdisc_get_red_stats(). This new function will be useful for other leaf Qdiscs as well. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:31 +01:00
Petr Machata	cf9af379cd	mlxsw: spectrum_qdisc: Extract a per-TC stat function Extract from mlxsw_sp_qdisc_get_prio_stats() two new functions: mlxsw_sp_qdisc_collect_tc_stats() to accumulate stats for that one TC only, and mlxsw_sp_qdisc_update_stats() that makes the stats relative to base values stored earlier. Use them from mlxsw_sp_qdisc_get_red_stats(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:31 +01:00
Petr Machata	ef6aadcc76	net: sched: Make TBF Qdisc offloadable Invoke ndo_setup_tc as appropriate to signal init / replacement, destroying and dumping of TBF Qdisc. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:31 +01:00
Petr Machata	c207015274	net: sched: sch_tbf: Don't overwrite backlog before dumping In 2011, in commit `b0460e4484` ("sch_tbf: report backlog information"), TBF started copying backlog depth from the child Qdisc before dumping, with the motivation that the backlog was otherwise not visible in "tc -s qdisc show". Later, in 2016, in commit `8d5958f424` ("sch_tbf: update backlog as well"), TBF got a full-blown backlog tracking. However it kept copying the child's backlog over before dumping. That line is now unnecessary, so remove it. As shown in the following example, backlog is still reported correctly: # tc -s qdisc show dev veth0 invisible qdisc tbf 1: root refcnt 2 rate 1Mbit burst 128Kb lat 82.8s Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 812544 requeues 0) backlog 81972b 66p requeues 0 qdisc bfifo 0: parent 1:1 limit 10Mb Sent 505475370 bytes 406985 pkt (dropped 0, overlimits 0 requeues 0) backlog 81972b 66p requeues 0 Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:56:30 +01:00
Michael Ellerman	3546d8f1bb	net: cxgb3_main: Add CAP_NET_ADMIN check to CHELSIO_GET_MEM The cxgb3 driver for "Chelsio T3-based gigabit and 10Gb Ethernet adapters" implements a custom ioctl as SIOCCHIOCTL/SIOCDEVPRIVATE in cxgb_extension_ioctl(). One of the subcommands of the ioctl is CHELSIO_GET_MEM, which appears to read memory directly out of the adapter and return it to userspace. It's not entirely clear what the contents of the adapter memory contains, but the assumption is that it shouldn't be accessible to all users. So add a CAP_NET_ADMIN check to the CHELSIO_GET_MEM case. Put it after the is_offload() check, which matches two of the other subcommands in the same function which also check for is_offload() and CAP_NET_ADMIN. Found by Ilja by code inspection, not tested as I don't have the required hardware. Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:50:42 +01:00
David S. Miller	2f64ab27c8	Merge branch 'hv_netvsc-Add-XDP-support' Haiyang Zhang says: ==================== hv_netvsc: Add XDP support Add XDP support and update related document. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:43:19 +01:00
Haiyang Zhang	12fa74383e	hv_netvsc: Update document for XDP support Added the new section in the document regarding XDP support by hv_netvsc driver. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:43:19 +01:00
Haiyang Zhang	351e158139	hv_netvsc: Add XDP support This patch adds support of XDP in native mode for hv_netvsc driver, and transparently sets the XDP program on the associated VF NIC as well. Setting / unsetting XDP program on synthetic NIC (netvsc) propagates to VF NIC automatically. Setting / unsetting XDP program on VF NIC directly is not recommended, also not propagated to synthetic NIC, and may be overwritten by setting of synthetic NIC. The Azure/Hyper-V synthetic NIC receive buffer doesn't provide headroom for XDP. We thought about re-use the RNDIS header space, but it's too small. So we decided to copy the packets to a page buffer for XDP. And, most of our VMs on Azure have Accelerated Network (SRIOV) enabled, so most of the packets run on VF NIC. The synthetic NIC is considered as a fallback data-path. So the data copy on netvsc won't impact performance significantly. XDP program cannot run with LRO (RSC) enabled, so you need to disable LRO before running XDP: ethtool -K eth0 lro off XDP actions not yet supported: XDP_REDIRECT Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:43:19 +01:00
Moshe Shemesh	6ec8b6cd79	devlink: Add health recover notifications on devlink flows Devlink health recover notifications were added only on driver direct updates of health_state through devlink_health_reporter_state_update(). Add notifications on updates of health_state by devlink flows of report and recover. Moved functions devlink_nl_health_reporter_fill() and devlink_recover_notify() to avoid forward declaration. Fixes: `97ff3bd37f` ("devlink: add devink notification when reporter update health state") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:34:42 +01:00
Florian Fainelli	148965df1a	net: bcmgenet: Use netif_tx_napi_add() for TX NAPI Before commit `7587935cfa` ("net: bcmgenet: move NAPI initialization to ring initialization") moved the code, this used to be netif_tx_napi_add(), but we lost that small semantic change in the process, restore that. Fixes: `7587935cfa` ("net: bcmgenet: move NAPI initialization to ring initialization") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Doug Berger <opendmb@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:31:28 +01:00
Jon Maloy	61b1f2aff4	tipc: change maintainer email address Reflecting new realities. Signed-off-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:18:02 +01:00
Andy Shevchenko	53c6770095	net: fddi: skfp: Use print_hex_dump() helper Use the print_hex_dump() helper, instead of open-coding the same operations. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:15:49 +01:00
Andy Shevchenko	79ac522402	net: atm: use %ph to print small buffer Use %ph format to print small buffer as hex string. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:14:40 +01:00
Ajay Gupta	b9f0b2f634	net: stmmac: platform: fix probe for ACPI devices Use generic device API to get phy mode to fix probe failure with ACPI based devices. Signed-off-by: Ajay Gupta <ajayg@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 10:09:47 +01:00
Mat Martineau	edc7e4898d	mptcp: Fix code formatting checkpatch.pl had a few complaints in the last set of MPTCP patches: ERROR: code indent should use tabs where possible +^I subflow, sk->sk_family, icsk->icsk_af_ops, target, mapped);$ CHECK: Comparison to NULL could be written "!new_ctx" + if (new_ctx == NULL) { ERROR: "foo * bar" should be "foo bar" +static const struct proto_ops tcp_proto_ops(struct sock *sk) Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 08:14:39 +01:00
Florian Westphal	e42f1ac626	mptcp: do not inherit inet proto ops We need to initialise the struct ourselves, else we expose tcp-specific callbacks such as tcp_splice_read which will then trigger splat because the socket is an mptcp one: BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478 CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3 Call Trace: tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57 tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612 tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674 tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791 do_splice+0x1259/0x1560 fs/splice.c:1205 To prevent build error with ipv6, add the recv/sendmsg function declaration to ipv6.h. The functions are already accessible "thanks" to retpoline related work, but they are currently only made visible by socket.c specific INDIRECT_CALLABLE macros. Reported-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-25 08:14:39 +01:00
Linus Torvalds	d5d359b0ac	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: - add sanity checks to USB endpoints in various dirvers - max77650-onkey was missing an OF table which was preventing module autoloading - a revert and a different fix for F54 handling in Synaptics dirver - a fixup for handling register in pm8xxx vibrator driver * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: pm8xxx-vib - fix handling of separate enable register Input: keyspan-remote - fix control-message timeouts Input: max77650-onkey - add of_match table Input: rmi_f54 - read from FIFO in 32 byte blocks Revert "Input: synaptics-rmi4 - don't increment rmiaddr for SMBus transfers" Input: sur40 - fix interface sanity checks Input: gtco - drop redundant variable reinit Input: gtco - fix extra-descriptor debug message Input: gtco - fix endpoint sanity check Input: aiptek - use descriptors of current altsetting Input: aiptek - fix endpoint sanity check Input: pegasus_notetaker - fix endpoint sanity check Input: sun4i-ts - add a check for devm_thermal_zone_of_sensor_register Input: evdev - convert kzalloc()/vzalloc() to kvzalloc()	2020-01-24 19:27:42 -08:00
Tony Nguyen	31ad4e4ee1	ice: Allocate flow profile Create an extraction sequence based on the packet header protocols to be programmed and allocate a flow profile for the extraction sequence. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2020-01-24 16:06:32 -08:00
Tony Nguyen	c90ed40cef	ice: Enable writing hardware filtering tables Enable the driver to write the filtering hardware tables to allow for changing of RSS rules. Upon loading of DDP package, a minimal configuration should be written to hardware. Introduce and initialize structures for storing configuration and make the top level calls to configure the RSS tables to initial values. A packet segment will be created but nothing is written to hardware yet. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2020-01-24 13:18:19 -08:00
Olof Johansson	6716cb162d	Few minor fixes for omaps Looks like we have wrong default memory size for beaglebone black, it has at least 512 MB of RAM and not 256 MB. This causes an issue when booted with GRUB2 that does not seem to pass memory info to the kernel. And for am43x-epos-evm the SPI pin directions need to be configured for SPI to work. -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAl4rSQkRHHRvbnlAYXRv bWlkZS5jb20ACgkQG9Q+yVyrpXMSeg//f607jg+6RqkJ6dMczxBSHpkAYSdGBjpB nwa8O7jNbHfjtUGXTm+9Xx7sKXTR+600lobgMYguRlitTYNtC2jQG5ux92/6sTU/ 5qB6m0qkhMqOKDOI2AZ1wIcc6PP067M7x+il0Rs9/PRdIaDNUMHsNZ6+sxEWUR7N 1vcYMz3PL8KW/ucAWxBO4NeMDwg5vi6dIILBl63sBlrH6AgYTVT8LqNhSysZWmXY Gj5JNIt+FjvhKGYNnD1jeFYX/+TQH10B8Ouepj6ixdcXaA4v0553S7S7R3HYR+l8 +AFcWB1TMuvlNv5vbnVd+PUXhhyByVDQv15/92i+rKMXUxxdLZDuKeqa3/GbL/vI oU5ShNXWQ7C7g+SrxYfHT0OQz38j7TUOKrRJKBxFYykneoC33uwpSaAUTSN0PuXT MA7RJVBtjHe8tfbkII3UqqvHxmVB1TAkj9TE9y21Dznlq2fyq7w/13UNS67VCZ57 4vO722ULOHVLnTmwH6gA+e1Cjhiz17fJJq3wpDTqqn0As4cuLOAio7PmHgNcm2YG Qg9MVQV2Rj3YGdxGVDKCfTOLt4tQzJ8qbN0o7XIp0CgyK7Sg2djcjzUMtJDx++Ez 6tRmyfcY/GD5ykPfEUXvJcDWw0tIFSRUQMtXMi3vnuNh5mYFQ2ciQB5KkPBhyTST CaAAg8vCvek= =XXZc -----END PGP SIGNATURE----- Merge tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes Few minor fixes for omaps Looks like we have wrong default memory size for beaglebone black, it has at least 512 MB of RAM and not 256 MB. This causes an issue when booted with GRUB2 that does not seem to pass memory info to the kernel. And for am43x-epos-evm the SPI pin directions need to be configured for SPI to work. * tag 'omap-for-fixes-whenever-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: dts: am43x-epos-evm: set data pin directions for spi0 and spi1 ARM: dts: am335x-boneblack-common: fix memory size Link: https://lore.kernel.org/r/pull-1579895109-287828@atomide.com Signed-off-by: Olof Johansson <olof@lixom.net>	2020-01-24 12:05:33 -08:00
Olof Johansson	088307d216	Fix OP-TEE compile error with nommu -----BEGIN PGP SIGNATURE----- iQJOBAABCgA4FiEEcK3MsDvGvFp6zV9ztbC4QZeP7NMFAl4pbiIaHGplbnMud2lr bGFuZGVyQGxpbmFyby5vcmcACgkQtbC4QZeP7NP14w//ZIO2PDmn/f7w3qafb+Gx bXWqpqNgr/kC7T1Z21vyTozTyqW3ZXv3VpONGqyYSzD8skW80LPq6MjySTCBcCIS AcpooDIA/jXMa52lmDv8nNzr2IuULKGEZk4kBY4zzlzW6hrASUuLqo7W7qkN4xkh Tpu38r7k6FPir0R4oCphG4TPUS9B7vN0fmgN+bJXNFC1pn/fUzJa14k5FVPVqi8w Z71s2NXTV95dmRgNM5DQNXUbdS4TY5mhnZfdFCxo0f39cob7oTM4e0Ii1b280lQv xsiAfTbPAOPhchrMf12TqD53TvECUe/7SoAoaZpHSepkxQ2vqD+JBCkADfdTmPje FhCaTVix6eJLIMgvrFaiWfmKkuLwQY2k50nLDlIfHVgRZhsChrjsaP8gAJf8VYcN 4zzmDeDm4dDewpp/+KjnUVXuxtYLI9Zc1QnKXyoRfuBBQuVH5pcSbCrt4cn8PPtk 4KpW6Of+oKPt7uQSCEIejell8ew/X/WPFQN0dDaaJYtlT4kuityDWwNrebfdtZ/2 1GeS5yrc+weG4Sl+ATK7wtAeIA84c4wYyI2NkJffZAikXmy9VkYmmOyJL6+SPAp4 Ff7hvbVJatEeCeMcM44R8C8KmsCY5n4cz1SpxpACzrAxNiWzOfbsI0HCDeelFlVD wF6kGTYn3B3afDTcRdeRQGs= =JTLy -----END PGP SIGNATURE----- Merge tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander/linux-tee into arm/fixes Fix OP-TEE compile error with nommu * tag 'tee-optee-fix2-for-5.5' of https://git.linaro.org:/people/jens.wiklander/linux-tee: tee: optee: Fix compilation issue with nommu Link: https://lore.kernel.org/r/20200123101310.GA10320@jax Signed-off-by: Olof Johansson <olof@lixom.net>	2020-01-24 12:05:08 -08:00
Tariq Toukan	342508c1c7	net/mlx5e: kTLS, Do not send decrypted-marked SKBs via non-accel path When TCP out-of-order is identified (unexpected tcp seq mismatch), driver analyzes the packet and decides what handling should it get: 1. go to accelerated path (to be encrypted in HW), 2. go to regular xmit path (send w/o encryption), 3. drop. Packets marked with skb->decrypted by the TLS stack in the TX flow skips SW encryption, and rely on the HW offload. Verify that such packets are never sent un-encrypted on the wire. Add a WARN to catch such bugs, and prefer dropping the packet in these cases. Fixes: `46a3ea9807` ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:40 -08:00
Tariq Toukan	1e92899791	net/mlx5e: kTLS, Remove redundant posts in TX resync flow The call to tx_post_resync_params() is done earlier in the flow, the post of the control WQEs is unnecessarily repeated. Remove it. Fixes: `700ec49742` ("net/mlx5e: kTLS, Fix missing SQ edge fill") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:37 -08:00
Tariq Toukan	ffbd9ca94e	net/mlx5e: kTLS, Fix corner-case checks in TX resync flow There are the following cases: 1. Packet ends before start marker: bypass offload. 2. Packet starts before start marker and ends after it: drop, not supported, breaks contract with kernel. 3. packet ends before tls record info starts: drop, this packet was already acknowledged and its record info was released. Add the above as comment in code. Mind possible wraparounds of the TCP seq, replace the simple comparison with a call to the TCP before() method. In addition, remove logic that handles negative sync_len values, as it became impossible. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Fixes: `46a3ea9807` ("net/mlx5e: kTLS, Enhance TX resync flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:35 -08:00
Dmytro Linkin	3b83b6c2e0	net/mlx5e: Clear VF config when switching modes Currently VF in LEGACY mode are not able to go up. Also in OFFLOADS mode, when switching to it first time, VF can go up independently to his representor, which is not expected. Perform clearing of VF config when switching modes and set link state to AUTO as default value. Also, when switching to OFFLOADS mode set link state to DOWN, which allow VF link state to be controlled by its REP. Fixes: `1ab2068a4c` ("net/mlx5: Implement vports admin state backup/restore") Fixes: `556b9d16d3` ("net/mlx5: Clear VF's configuration on disabling SRIOV") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:32 -08:00
Erez Shitrit	c0702a4bd4	net/mlx5: DR, use non preemptible call to get the current cpu number Use raw_smp_processor_id instead of smp_processor_id() otherwise we will get the following trace in debug-kernel: BUG: using smp_processor_id() in preemptible [00000000] code: devlink caller is dr_create_cq.constprop.2+0x31d/0x970 [mlx5_core] Call Trace: dump_stack+0x9a/0xf0 debug_smp_processor_id+0x1f3/0x200 dr_create_cq.constprop.2+0x31d/0x970 genl_family_rcv_msg+0x5fd/0x1170 genl_rcv_msg+0xb8/0x160 netlink_rcv_skb+0x11e/0x340 Fixes: `297cccebdc` ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:29 -08:00
Eli Cohen	e401a1848b	net/mlx5: E-Switch, Prevent ingress rate configuration of uplink rep Since the implementation relies on limiting the VF transmit rate to simulate ingress rate limiting, and since either uplink representor or ecpf are not associated with a VF, we limit the rate limit configuration for those ports. Fixes: `fcb64c0f56` ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:27 -08:00
Erez Shitrit	b850a82114	net/mlx5: DR, Enable counter on non-fwd-dest objects The current code handles only counters that attached to dest, we still have the cases where we have counter on non-dest, like over drop etc. Fixes: `6a48faeeca` ("net/mlx5: Add direct rule fs_cmd implementation") Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com> Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:25 -08:00
Meir Lichtinger	505a7f5478	net/mlx5: Update the list of the PCI supported devices Add the upcoming ConnectX-7 device ID. Fixes: `85327a9c41` ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:22 -08:00
Paul Blakey	93b8a7ecb7	net/mlx5: Fix lowest FDB pool size The pool sizes represent the pool sizes in the fw. when we request a pool size from fw, it will return the next possible group. We track how many pools the fw has left and start requesting groups from the big to the small. When we start request 4k group, which doesn't exists in fw, fw wants to allocate the next possible size, 64k, but will fail since its exhausted. The correct smallest pool size in fw is 128 and not 4k. Fixes: `e52c280240` ("net/mlx5: E-Switch, Add chains and priorities") Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-01-24 12:04:20 -08:00
Praveen Chaudhary	189c9b1e94	net: Fix skb->csum update in inet_proto_csum_replace16(). skb->csum is updated incorrectly, when manipulation for NF_NAT_MANIP_SRC\DST is done on IPV6 packet. Fix: There is no need to update skb->csum in inet_proto_csum_replace16(), because update in two fields a.) IPv6 src/dst address and b.) L4 header checksum cancels each other for skb->csum calculation. Whereas inet_proto_csum_replace4 function needs to update skb->csum, because update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum and c.) L4 header checksum results in same diff as L4 Header checksum for skb->csum calculation. [ pablo@netfilter.org: a few comestic documentation edits ] Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com> Signed-off-by: Zhenggen Xu <zxu@linkedin.com> Signed-off-by: Andy Stracner <astracner@linkedin.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-24 20:54:30 +01:00
Pablo Neira Ayuso	eb014de4fd	netfilter: nf_tables: autoload modules from the abort path This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: `ec7470b834` ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: syzbot+29125d208b3dae9a7019@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-24 20:54:29 +01:00
Pablo Neira Ayuso	826035498e	netfilter: nf_tables: add __nft_chain_type_get() This new helper function validates that unknown family and chain type coming from userspace do not trigger an out-of-bound array access. Bail out in case __nft_chain_type_get() returns NULL from nft_chain_parse_hook(). Fixes: `9370761c56` ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: syzbot+156a04714799b1d480bc@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-24 20:54:28 +01:00
wenxu	c83de17dd6	netfilter: nf_tables_offload: fix check the chain offload flag In the nft_indr_block_cb the chain should check the flag with NFT_CHAIN_HW_OFFLOAD. Fixes: `9a32669fec` ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-24 20:54:11 +01:00
Linus Torvalds	6381b44283	IOMMU Fixes for Linux v5.5-rc7 Two Fixes: - Fix NULL-ptr dereference bug in Intel IOMMU driver - Properly safe and restore AMD IOMMU performance counter registers when testing if they are writable. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAl4rKJ0ACgkQK/BELZcB GuM8BxAAse7i4EAB0DZ26aDVU/ZZAGBTc5pH+7IOvuGFWpu3Ik7siltMZKTa56XD 5reQl1hUHyBesdwNcd+ZQz/0QzgMu0zEcxJtzLD5nbk2Rp1pDYab84+wA0oqftmN 7TBqPJjqGYfnfkobPW+4SaiCSxM6XKEXYaJvXqF6dsz5fLHHqTVjGyEXuu0pWey5 1AG1nNWey5TobR39ZUNVIGUEf4ftN8tOGutjGKch29OmMg1UkHZjDyYpVJurjSUl 1YDf0WA2X5cUWzRg519o9Uxs+5y/tBItS2qMJ/Uwh1fp/kOYrGoEPXRH9brU3owh j8kVKyz/cNlvhdJN/QPVq/A+GBgzAq+u9FUGH68LBVx05cuCRGWb6qP/blEZWUSA zQvbHPIPqPBK0pYtbVCgJFCpOS5C6X33RMgbRr3XncAItmfEp2FH7tcHIsDZUzmO EGBgSPrhRFJrgfa3hopc93L4OEHmGy+20o7oT4kk0SBcHVDSyDZVBu1lX+/gAZKd ZoV5FyF9KNB+qQQ6E+vpL5gLh7BWiBgOI41Bdw5ut09I0gYURPdFWUPHRFwHK1/1 KTSWLyuLkS4VxdhcwhA30eVmSrbfocPPwuUxcGn2p1YnaycFkDNoJnhifvFiOS6w tFOJ3wEVL2OX7On6U3jviVy7MQapX3MvWrWKlgg/UoMv36dbqRA= =ViDp -----END PGP SIGNATURE----- Merge tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Two fixes: - Fix NULL-ptr dereference bug in Intel IOMMU driver - Properly save and restore AMD IOMMU performance counter registers when testing if they are writable" * tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix IOMMU perf counter clobbering during init iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer	2020-01-24 09:56:59 -08:00
Linus Torvalds	3c45d7510c	powerpc fixes for 5.5 #6 Fix our hash MMU code to avoid having overlapping ids between user and kernel, which isn't as bad as it sounds but led to crashes on some machines. A fix for the Power9 XIVE interrupt code, which could return the wrong interrupt state in obscure error conditions. A minor Kconfig fix for the recently added CONFIG_PPC_UV code. Thanks to: Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic Barrat. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl4q3iMTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgM1fEAC7YefQZsuIstV2bNH9xFJe/oEOSLWt DqZYfWZRBRZRdRMAP7+hmXpsLfHAB3QSGv4yEc2Pgm5Y5vYuv8MWqEKkpJoivIPW /Fu7mNGI06IWinMyl59dLA5fXPNeOzSNcdPnBdIUC/62XU414IGsy273X3LN4YiM hfXSpPA3tpSjVBTeqfKTJowXWMOvbkwiHMs2ziAu4Y9GDauJsU0UiOxMkl2MYT7Z PULO/SrjYSbP8+MazD/YdJ5k7F4rNV3bVDPUCU305tKIkrqRmBs+8DXjeUAMWVjj y2huUZ/0XXsr8QZyyGfMhEC1sJKAD3colG4LHUysbWz1oInY93CvXELdPAMXhqrH xXTzx9ae74Xk7t7WaGgq5xJGHJCDSg1aA2HPkUg6Gbk6iJIztOCkKMdSPvFMo18S p+Kjis8db83RPXQm3ooW7s/xhp6AaDiXd8khQ7A2efgt4SDTSPFvgv2q4CMDC1lP njchgzgPJ635MSd3CC/u6i7eViuUylb6SfhFVYY0DqOOvNJVrj1W6M4N7tpWSrRS PXJI2/NwlL3WIajKO3KfqTRI7QaSLtccTrXGLiGBxx/ooAowP94/WL3CYJAeuO2F NmA1GXH6/WXvzMMNm5NB0kCOUOOhtJk/MFuO7Agly6YL9p36fqVz4WIJVAqtwJLO uPu0xASJCgp9Mg== =r/RM -----END PGP SIGNATURE----- Merge tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.5: - Fix our hash MMU code to avoid having overlapping ids between user and kernel, which isn't as bad as it sounds but led to crashes on some machines. - A fix for the Power9 XIVE interrupt code, which could return the wrong interrupt state in obscure error conditions. - A minor Kconfig fix for the recently added CONFIG_PPC_UV code. Thanks to Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic Barrat" * tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/hash: Fix sharing context ids between kernel & userspace powerpc/xive: Discard ESB load value when interrupt is invalid powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV	2020-01-24 09:49:20 -08:00
Linus Torvalds	274adbff45	drm fixes for 5.5-rc8 core/mst: - Fix SST branch device handling amdgpu: - enable renoir outside experimental i915: - Avoid overflow with huge userptr objects - uAPI fix to correctly handle negative values in engine->uabi_class/instance (cc: stable) panfrost: - Fix mapping of globally visible BO's (Boris) -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJeKoezAAoJEAx081l5xIa+hRMP/jC45Om4YDtsoH+75v41mYfO yUt1jBsCDXHAvPjBbRcoM2kbZU+ZcaDR4EZS7lA7QTW5CAosqzxrGdkzGobjmj5v TN/0BWWyQYtQ+lbt/pUxM+kktaV9IkjZIAUlW3msCwKsYE+nPIWfXDnW2ioFpitk sXk3vZYC4Lh6IVJ5PG879nz5XPpo2eb4sYYQ4PTFj65u436NEd8AFqmcDzp64Jhb xh1jLSxeS/JROyb/AmTHt0PgrGB5B08U8V7CAJXhIgf7zM28iiA3ynanoO3PauYb 3uPg8eVdP2tOqcpplV5CBkiw/in2sPgRprMEg66cKYzo9rieWu7nPL+1zmaxD16s NIXSmov4Sm7yOS4LAGAckS12ejniy7L6OcP5N2S2Gz0HpIMXluBfLw3d4ZiIW5Q5 yKdpmrwTfl0IfTiblweaHi8uT4K19vSqoXUf4bFa4UxqeYOtipHnNP907+nDs0JN GiHjIvrQJ4KH/A/JAkjEyNzHRAqJhJke84E519UP+mI+8paqzftsbDvf0ADA7nzZ q6TbZpTbEPV+NqnaH6n8volfh/WougW2BpxozMO3UP/474AZq1bXJIXi45ksatRW Cq0KQP8iI//C0xhCDPs7LQ/XzgkcQtDPZ1Nx6vWjrn6O3DMjKCm0RxZgfKj5CfbJ PovYqZQXJ01989DfXfQv =K7TY -----END PGP SIGNATURE----- Merge tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "This one has a core mst fix and two i915 fixes. amdgpu just enables some hw outside experimental. The panfrost fix is a little bigger than I'd like at this stage but it fixes a fairly fundamental problem with global shared buffers in that driver, and since it's confined to that driver and I've taken a look at it, I think it's fine to get into the tree now, so it can get stable propagated as well. core/mst: - Fix SST branch device handling amdgpu: - enable renoir outside experimental i915: - Avoid overflow with huge userptr objects - uAPI fix to correctly handle negative values in engine->uabi_class/instance (cc: stable) panfrost: - Fix mapping of globally visible BO's (Boris)" * tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: remove the experimental flag for renoir drm/panfrost: Add the panfrost_gem_mapping concept drm/i915: Align engine->uabi_class/instance with i915_drm.h drm/i915/userptr: fix size calculation drm/dp_mst: Handle SST-only branch device case	2020-01-24 09:38:04 -08:00
Christophe Leroy	ab10ae1c3b	lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user() The range passed to user_access_begin() by strncpy_from_user() and strnlen_user() starts at 'src' and goes up to the limit of userspace although reads will be limited by the 'count' param. On 32 bits powerpc (book3s/32) access has to be granted for each 256Mbytes segment and the cost increases with the number of segments to unlock. Limit the range with 'count' param. Fixes: `594cc251fd` ("make 'user_access_begin()' do 'access_ok()'") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-24 09:27:34 -08:00
Jiri Wiesner	ab658b9fa7	netfilter: conntrack: sctp: use distinct states for new SCTP connections The netlink notifications triggered by the INIT and INIT_ACK chunks for a tracked SCTP association do not include protocol information for the corresponding connection - SCTP state and verification tags for the original and reply direction are missing. Since the connection tracking implementation allows user space programs to receive notifications about a connection and then create a new connection based on the values received in a notification, it makes sense that INIT and INIT_ACK notifications should contain the SCTP state and verification tags available at the time when a notification is sent. The missing verification tags cause a newly created netfilter connection to fail to verify the tags of SCTP packets when this connection has been created from the values previously received in an INIT or INIT_ACK notification. A PROTOINFO event is cached in sctp_packet() when the state of a connection changes. The CLOSED and COOKIE_WAIT state will be used for connections that have seen an INIT and INIT_ACK chunk, respectively. The distinct states will cause a connection state change in sctp_packet(). Signed-off-by: Jiri Wiesner <jwiesner@suse.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-24 18:26:53 +01:00
Shuah Khan	8c17bbf6c8	iommu/amd: Fix IOMMU perf counter clobbering during init init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Fixes: `30861ddc9c` ("perf/x86/amd: Add IOMMU Performance Counter resource management") Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-01-24 15:28:40 +01:00
Jerry Snitselaar	bf708cfb2f	iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9. Cc: Joerg Roedel <joro@8bytes.org> Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: stable@vger.kernel.org # 5.3+ Cc: linux-kernel@vger.kernel.org Fixes: `ae23bfb68f` ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <jroedel@suse.de>	2020-01-24 15:23:50 +01:00
Qu Wenruo	1bbb97b8ce	btrfs: scrub: Require mandatory block group RO for dev-replace [BUG] For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071, looped runs can lead to random failure, where scrub finds csum error. The possibility is not high, around 1/20 to 1/100, but it's causing data corruption. The bug is observable after commit `b12de52896` ("btrfs: scrub: Don't check free space before marking a block group RO") [CAUSE] Dev-replace has two source of writes: - Write duplication All writes to source device will also be duplicated to target device. Content: Not yet persisted data/meta - Scrub copy Dev-replace reused scrub code to iterate through existing extents, and copy the verified data to target device. Content: Previously persisted data and metadata The difference in contents makes the following race possible: Regular Writer \| Dev-replace ----------------------------------------------------------------- ^ \| \| Preallocate one data extent \| \| at bytenr X, len 1M \| v \| ^ Commit transaction \| \| Now extent [X, X+1M) is in \| v commit root \| ================== Dev replace starts ========================= \| ^ \| \| Scrub extent [X, X+1M) \| \| Read [X, X+1M) \| \| (The content are mostly garbage \| \| since it's preallocated) ^ \| v \| Write back happens for \| \| extent [X, X+512K) \| \| New data writes to both \| \| source and target dev. \| v \| \| ^ \| \| Scrub writes back extent [X, X+1M) \| \| to target device. \| \| This will over write the new data in \| \| [X, X+512K) \| v This race can only happen for nocow writes. Thus metadata and data cow writes are safe, as COW will never overwrite extents of previous transaction (in commit root). This behavior can be confirmed by disabling all fallocate related calls in fsstress (), then all related tests can pass a 2000 run loop. : FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \ -f collapse=0 -f punch=0 -f resvsp=0" I didn't expect resvsp ioctl will fallback to fallocate in VFS... [FIX] Make dev-replace to require mandatory block group RO, and wait for current nocow writes before calling scrub_chunk(). This patch will mostly revert commit `76a8efa171` ("btrfs: Continue replace when set_block_ro failed") for dev-replace path. The side effect is, dev-replace can be more strict on avaialble space, but definitely worth to avoid data corruption. Reported-by: Filipe Manana <fdmanana@suse.com> Fixes: `76a8efa171` ("btrfs: Continue replace when set_block_ro failed") Fixes: `b12de52896` ("btrfs: scrub: Don't check free space before marking a block group RO") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-01-24 14:35:56 +01:00
David S. Miller	08a45c59f1	Merge branch 'mptcp-part-two' Christoph Paasch says: ==================== Multipath TCP part 2: Single subflow & RFC8684 support v2 -> v3: Added RFC8684-style handshake (see below fore more details) and some minor fixes v1 -> v2: Rebased on latest "Multipath TCP: Prerequisites" v3 series This set adds MPTCP connection establishment, writing & reading MPTCP options on data packets, a sysctl to allow MPTCP per-namespace, and self tests. This is sufficient to establish and maintain a connection with a MPTCP peer, but will not yet allow or initiate establishment of additional MPTCP subflows. We also add the necessary code for the RFC8684-style handshake. RFC8684 obsoletes the experimental RFC6824 and makes MPTCP move-on to version 1. Originally our plan was to submit single-subflow and RFC8684 support in two patchsets, but to simplify the merging-process and ensure that a coherent MPTCP-version lands in Linux we decided to merge the two sets into a single one. The MPTCP patchset exclusively supports RFC 8684. Although all MPTCP deployments are currently based on RFC 6824, future deployments will be migrating to MPTCP version 1. 3GPP's 5G standardization also solely supports RFC 8684. In addition, we believe that this initial submission of MPTCP will be cleaner by solely supporting RFC 8684. If later on support for the old MPTCP-version is required it can always be added in the future. The major difference between RFC 8684 and RFC 6824 is that it has a better support for servers using TCP SYN-cookies by reliably retransmitting the MP_CAPABLE option. Before ending this cover letter with some refs, it is worth mentioning that we promise David Miller that merging this series will be rewarded by Twitter dopamine hits :-D Clone/fetch: https://github.com/multipath-tcp/mptcp_net-next.git (tag: netdev-v3-part2) Browse: https://github.com/multipath-tcp/mptcp_net-next/tree/netdev-v3-part2 Thank you for your review. You can find us at mptcp@lists.01.org and https://is.gd/mptcp_upstream ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00
Paolo Abeni	8ab183deb2	mptcp: cope with later TCP fallback With MPTCP v1, passive connections can fallback to TCP after the subflow becomes established: syn + MP_CAPABLE -> <- syn, ack + MP_CAPABLE ack, seq = 3 -> // OoO packet is accepted because in-sequence // passive socket is created, is in ESTABLISHED // status and tentatively as MP_CAPABLE ack, seq = 2 -> // no MP_CAPABLE opt, subflow should fallback to TCP We can't use the 'subflow' socket fallback, as we don't have it available for passive connection. Instead, when the fallback is detected, replace the mptcp socket with the underlying TCP subflow. Beyond covering the above scenario, it makes a TCP fallback socket as efficient as plain TCP ones. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00
Christoph Paasch	d22f4988ff	mptcp: process MP_CAPABLE data option This patch implements the handling of MP_CAPABLE + data option, as per RFC 6824 bis / RFC 8684: MPTCP v1. On the server side we can receive the remote key after that the connection is established. We need to explicitly track the 'missing remote key' status and avoid emitting a mptcp ack until we get such info. When a late/retransmitted/OoO pkt carrying MP_CAPABLE[+data] option is received, we have to propagate the mptcp seq number info to the msk socket. To avoid ABBA locking issue, explicitly check for that in recvmsg(), where we own msk and subflow sock locks. The above also means that an established mp_capable subflow - still waiting for the remote key - can be 'downgraded' to plain TCP. Such change could potentially block a reader waiting for new data forever - as they hook to msk, while later wake-up after the downgrade will be on subflow only. The above issue is not handled here, we likely have to get rid of msk->fallback to handle that cleanly. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00
Christoph Paasch	cc7972ea19	mptcp: parse and emit MP_CAPABLE option according to v1 spec This implements MP_CAPABLE options parsing and writing according to RFC 6824 bis / RFC 8684: MPTCP v1. Local key is sent on syn/ack, and both keys are sent on 3rd ack. MP_CAPABLE messages len are updated accordingly. We need the skbuff to correctly emit the above, so we push the skbuff struct as an argument all the way from tcp code to the relevant mptcp callbacks. When processing incoming MP_CAPABLE + data, build a full blown DSS-like map info, to simplify later processing. On child socket creation, we need to record the remote key, if available. Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00
Paolo Abeni	65492c5a6a	mptcp: move from sha1 (v0) to sha256 (v1) For simplicity's sake use directly sha256 primitives (and pull them as a required build dep). Add optional, boot-time self-tests for the hmac function. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00
Florian Westphal	048d19d444	mptcp: add basic kselftest for mptcp Add mptcp_connect tool: xmit two files back and forth between two processes, several net namespaces including some adding delays, losses and reordering. Wrapper script tests that data was transmitted without corruption. The "-c" command line option for mptcp_connect.sh is there for debugging: The script will use tcpdump to create one .pcap file per test case, named according to the namespaces, protocols, and connect address in use. For example, the first test case writes the capture to ns1-ns1-MPTCP-MPTCP-10.0.1.1.pcap. The stderr output from tcpdump is printed after the test completes to show tcpdump's "packets dropped by kernel" information. Also check that userspace can't create MPTCP sockets when mptcp.enabled sysctl is off. The "-b" option allows to tune/lower send buffer size. "-m mmap" can be used to test blocking io. Default is non-blocking io using read/write/poll. Will run automatically on "make kselftest". Note that the default timeout of 45 seconds is used even if there is a "settings" changing it to 450. 45 seconds should be enough in most cases but this depends on the machine running the tests. A fix to correctly read the "settings" file has been proposed upstream but not applied yet. It is not blocking the execution of these new tests but it would be nice to have it: https://patchwork.kernel.org/patch/11204935/ Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Co-developed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-24 13:44:08 +01:00

1 2 3 4 5 ...

890270 Commits