linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-25 17:49:54 +07:00

Author	SHA1	Message	Date
Oliver Hartkopp	48452c169d	can: remove obsolete pernet_operations definitions The namespace support for the CAN subsystem does not need any additional memory. So when ".size = 0" there's no extra memory allocated by the system. And therefore ".id" is obsolete too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:28 +02:00
Oliver Hartkopp	a7bbd28f04	can: fix memory leak in initial namespace support The can_rx_alldev_list is a per-net data structure now and allocated in can_pernet_init(). Make sure the memory is free'd in can_pernet_exit() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:04:27 +02:00
Remigiusz Kołłątaj	51f3baad7d	can: mcba_usb: Add support for Microchip CAN BUS Analyzer SocketCAN driver for Microchip CAN BUS Analyzer (http://www.microchip.com/development-tools/) Changes in v4: - possible memory leak fixed in mcba_usb_write_bulk_callback - LED support added - failure handling in mcba_usb_probe improved - C99 initializers for structs on stack Changes in v3: - improved/simplified CAN ID conversion - functions for transmission of skb and cmd separated - fixed/improved netif_stop_queue handling - style/cosmetic corrections Changes in v2: - Termination handling reimplemented to fit new netlink API (IFLA_CAN_TERMINATION) - Bitrate handling reimplemented to fit new netlink API (IFLA_CAN_BITRATE) - CAN ID conversion refactored (changed from macro to inline functions) - CAN DLC handling using get_can_dlc() - Endianness handling for can_speed introduced - Debugging removed - Redundant error prints removed - Style/cosmetic corrections (i.e. macro names, redefs, inits etc.) Signed-off-by: Remigiusz Kołłątaj <remigiusz.kollataj@mobica.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:49 +02:00
Mario Huettel	10c1c3975a	can: m_can: Enable TX FIFO Handling for M_CAN IP version >= v3.1.x * Added defines for TX Event FIFO Element * Adapted ndo_start_xmit function. For versions >= v3.1.x it uses the TX FIFO to optimize the data throughput. It stores the echo skb at the same index as in the M_CAN's TX FIFO. The frame's message marker is set to this index. This message marker is received in the TX Event FIFO after the message was successfully transmitted. It is used to echo the correct echo skb back to the network stack. * Added m_can_echo_tx_event function. It reads all received message markers in the TX Event FIFO and loops back the corresponding echo skbs. * ISR checks for new TX Event Entry interrupt for version >= 3.1.x. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:48 +02:00
Mario Huettel	428479e471	can: m_can: Configuration for TX and TX event FIFOs * TX/TX Event FIFO sizes are configured for version >= v3.1.x Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:48 +02:00
Mario Huettel	b03cfc5bb0	can: m_can: Enable M_CAN version dependent initialization This patch adapts the initialization of the M_CAN. So it can be used with all versions >= 3.0.x. Changes: * Added version element to m_can_priv structure to hold M_CAN version. * Renamed bittiming structs for version 3.0.x * Added new bittiming structs for version >= 3.1.x * Function alloc_m_can_dev takes 2 new arguments. The TX FIFO size and the base address of the module. * Chip configuration for CAN_CTRLMODE_LOOPBACK is changed: Enabled CCCR_MON bit. In combination with TEST_LBCK it activates the internal loopback mode. Leaving CCCR_MON '0' results in external loopback mode. * Clocks are temporarily enabled by platform_propbe function in order to allow read access to the Core Release register and the Control Register. Registers are used to detect M_CAN version and optional Non-ISO Feature. Initialization of M_CAN for version >= 3.1.x: * TX FIFO of M_CAN is used to transmit frames. The driver does not need to stop the tx queue after each frame sent. * Initialization of TX Event FIFO is added. * NON-ISO is fixed for all M_CAN versions < 3.2.x. Version 3.2.x _can_ have the NISO (Non-ISO) bit which can switch the mode of the M_CAN to Non-ISO mode. This bit does not have to be writeable. Therefore it is checked. If it is writable Non-ISO support is added to the controllers supported CAN modes. New Functions: * Function to check the Core Release version. The read value determines the behaviour of the driver. * Function to check if the NISO bit for version >= 3.2.x is implemented. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:47 +02:00
Mario Huettel	5e1bd15a37	can: m_can: Updated register defines to newest version * Updated register defines to newest M_CAN version (v3.2.1). * Changed defines in the whole code. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:47 +02:00
Mario Huettel	ee8c3f6f75	can: m_can: Removed virtual address from print The virtual address of the device was printed. I removed it because it leaks internal information. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:46 +02:00
Mario Huettel	8f265895df	can: m_can: Removed initialization of FIFO water marks FIFO water marks disabled because the driver doesn't handle water mark events. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:46 +02:00
Mario Huettel	52973810b5	can: m_can: Disabled Interrupt Line 1 * Disabled interrupt line 1. The driver didn't use it. Signed-off-by: Mario Huettel <mario.huettel@gmx.net> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:45 +02:00
Stephane Grosjean	8ac8321e4a	can: peak: add support for PEAK PCAN-PCIe FD CAN-FD boards This patch adds the support of the PCAN-PCI Express FD boards made by PEAK-System, for computers using the PCI Express slot. The PCAN-PCI Express FD has one or two CAN FD channels, depending on the model. A galvanic isolation of the CAN ports protects the electronics of the card and the respective computer against disturbances of up to 500 Volts. The PCAN-PCI Express FD can be operated with ambient temperatures in a range of -40 to +85 °C. Such boards run an extented version of the CAN-FD IP running into USB CAN-FD interfaces from PEAK-System, so this patch adds several new commands and their corresponding data types to the PEAK CAN-FD common definitions header file too. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:45 +02:00
Stephane Grosjean	c3df7c5755	can: peak: move header file to new can common subdir The CAN-FD IP from PEAK-System runs into several kinds of PC CAN-FD interfaces. Up to now, only the USB CAN-FD adapters were supported by the Kernel. In order to prepare the adding of some new non-USB CAN-FD interfaces, this patch moves - and rename - the IP definitions file from its private (usb) sub-directory into a - newly created - CAN specific one. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:44 +02:00
Stephane Grosjean	113ab88b2b	can: peak: fix usage of const qualifier in pointers args Fixes the usage of the const qualifier in the memory pointer arguments of the declared inline functions. By changing the line containing "const", this patch also changes the name of the arg into a more usual one. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:44 +02:00
Stephane Grosjean	81c5e13d90	can: peak: fix usage of usb specific data type This patch fixes the wrong usage of a specific USB data type into a common header file. This common header file is intended to define the common data types and values that define access to the PEAK-System CAN-FD IP, whatever the PC interface is. Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2017-04-25 09:00:43 +02:00
David S. Miller	86a5df1495	Merge branch 'virtio-net-tx-napi' Willem de Bruijn says: ==================== virtio-net tx napi Add napi for virtio-net transmit completion processing. Changes: v2 -> v3: - convert __netif_tx_trylock to __netif_tx_lock on tx napi poll ensure that the handler always cleans, to avoid deadlock - unconditionally clean in start_xmit avoid adding an unnecessary "if (use_napi)" branch - remove virtqueue_disable_cb in patch 5/5 a noop in the common event_idx based loop - document affinity_hint_set constraint v1 -> v2: - disable by default - disable unless affinity_hint_set because cache misses add up to a third higher cycle cost, e.g., in TCP_RR tests. This is not limited to the patch that enables tx completion cleaning in rx napi. - use trylock to avoid contention between tx and rx napi - keep interrupts masked during xmit_more (new patch 5/5) this improves cycles especially for multi UDP_STREAM, which does not benefit from cleaning tx completions on rx napi. - move free_old_xmit_skbs (new patch 3/5) to avoid forward declaration not changed: - deduplicate virnet_poll_tx and virtnet_poll_txclean they look similar, but have differ too much to make it worthwhile. - delay netif_wake_subqueue for more than 2 + MAX_SKB_FRAGS evaluated, but made no difference - patch 1/5 RFC -> v1: - dropped vhost interrupt moderation patch: not needed and likely expensive at light load - remove tx napi weight - always clean all tx completions - use boolean to toggle tx-napi, instead - only clean tx in rx if tx-napi is enabled - then clean tx before rx - fix: add missing braces in virtnet_freeze_down - testing: add 4KB TCP_RR + UDP test results Based on previous patchsets by Jason Wang: [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html Before commit `b0c39dbdc2` ("virtio_net: don't free buffers in xmit ring") the virtio-net driver would free transmitted packets on transmission of new packets in ndo_start_xmit and, to catch the edge case when no new packet is sent, also in a timer at 10HZ. A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls due to low free descriptor count. It does not address a stalls due to low socket SO_SNDBUF. Increasing timer frequency decreases that stall time, but increases interrupt rate and, thus, cycle count. Currently, with no timer, packets are freed only at ndo_start_xmit. Latency of consume_skb is now unbounded. To avoid a deadlock if a sock reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small queues. Reenable TCP small queues by removing the orphan. Instead of using a timer, convert the driver to regular tx napi. This does not have the unresolved stall issue and does not have any frequency to tune. By keeping interrupts enabled by default, napi increases tx interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if one is already unacknowledged, so makes this more feasible today. Combine that with an optimization that brings interrupt rate back in line with the existing version for most workloads: Tx completion cleaning on rx interrupts elides most explicit tx interrupts by relying on the fact that many rx interrupts fire. Tested by running {1, 10, 100} {TCP, UDP} STREAM, RR, 4K_RR benchmarks from a guest to a server on the host, on an x86_64 Haswell. The guest runs 4 vCPUs pinned to 4 cores. vhost and the test server are pinned to a core each. All results are the median of 5 runs, with variance well < 10%. Used neper (github.com/google/neper) as test process. Napi increases single stream throughput, but increases cycle cost. The optimizations bring this down. The previous patchset saw a regression with UDP_STREAM, which does not benefit from cleaning tx interrupts in rx napi. This regression is now gone for 10x, 100x. Remaining difference is higher 1x TCP_STREAM, lower 1x UDP_STREAM. The latest results are with process, rx napi and tx napi affine to the same core. All numbers are lower than the previous patchset. upstream napi TCP_STREAM: 1x: Mbps 27816 39805 Gcycles 274 285 10x: Mbps 42947 42531 Gcycles 300 296 100x: Mbps 31830 28042 Gcycles 279 269 TCP_RR Latency (us): 1x: p50 21 21 p99 27 27 Gcycles 180 167 10x: p50 40 39 p99 52 52 Gcycles 214 211 100x: p50 281 241 p99 411 337 Gcycles 218 226 TCP_RR 4K: 1x: p50 28 29 p99 34 36 Gcycles 177 167 10x: p50 70 71 p99 85 134 Gcycles 213 214 100x: p50 442 611 p99 802 785 Gcycles 237 216 UDP_STREAM: 1x: Mbps 29468 26800 Gcycles 284 293 10x: Mbps 29891 29978 Gcycles 285 312 100x: Mbps 30269 30304 Gcycles 318 316 UDP_RR: 1x: p50 19 19 p99 23 23 Gcycles 180 173 10x: p50 35 40 p99 54 64 Gcycles 245 237 100x: p50 234 286 p99 484 473 Gcycles 224 214 Note that GSO is enabled, so 4K RR still translates to one packet per request. Lower throughput at 100x vs 10x can be (at least in part) explained by looking at bytes per packet sent (nstat). It likely also explains the lower throughput of 1x for some variants. upstream: N=1 bytes/pkt=16581 N=10 bytes/pkt=61513 N=100 bytes/pkt=51558 at_rx: N=1 bytes/pkt=65204 N=10 bytes/pkt=65148 N=100 bytes/pkt=56840 ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com>	2017-04-24 23:55:20 -04:00
Willem de Bruijn	bdb12e0d2f	virtio-net: keep tx interrupts disabled unless kick Tx napi mode increases the rate of transmit interrupts. Suppress some by masking interrupts while more packets are expected. The interrupts will be reenabled before the last packet is sent. This optimization reduces the througput drop with tx napi for unidirectional flows such as UDP_STREAM that do not benefit from cleaning tx completions in the the receive napi handler. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	7b0411ef4a	virtio-net: clean tx descriptors from rx napi Amortize the cost of virtual interrupts by doing both rx and tx work on reception of a receive interrupt if tx napi is enabled. With VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion interrupts for bidirectional workloads. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	ea7735d97b	virtio-net: move free_old_xmit_skbs An upcoming patch will call free_old_xmit_skbs indirectly from virtnet_poll. Move the function above this to avoid having to introduce a forward declaration. This is a pure move: no code changes. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	b92f1e6751	virtio-net: transmit napi Convert virtio-net to a standard napi tx completion path. This enables better TCP pacing using TCP small queues and increases single stream throughput. The virtio-net driver currently cleans tx descriptors on transmission of new packets in ndo_start_xmit. Latency depends on new traffic, so is unbounded. To avoid deadlock when a socket reaches its snd limit, packets are orphaned on tranmission. This breaks socket backpressure, including TSQ. Napi increases the number of interrupts generated compared to the current model, which keeps interrupts disabled as long as the ring has enough free descriptors. Keep tx napi optional and disabled for now. Follow-on patches will reduce the interrupt cost. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
Willem de Bruijn	e4e8452a4a	virtio-net: napi helper functions Prepare virtio-net for tx napi by converting existing napi code to use helper functions. This also deduplicates some logic. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 23:55:19 -04:00
David S. Miller	14933dc8d9	sparc64: Improve 64-bit constant loading in eBPF JIT. Doing a full 64-bit decomposition is really stupid especially for simple values like 0 and -1. But if we are going to optimize this, go all the way and try for all 2 and 3 instruction sequences not requiring a temporary register as well. First we do the easy cases where it's a zero or sign extended 32-bit number (sethi+or, sethi+xor, respectively). Then we try to find a range of set bits we can load simply then shift up into place, in various ways. Then we try negating the constant and see if we can do a simple sequence using that with a xor at the end. (f.e. the range of set bits can't be loaded simply, but for the negated value it can) The final optimized strategy involves 4 instructions sequences not needing a temporary register. Otherwise we sadly fully decompose using a temp.. Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000: 0000000000000000 <foo>: 0: 9d e3 bf 50 save %sp, -176, %sp 4: 01 00 00 00 nop 8: 90 10 00 18 mov %i0, %o0 c: 13 3f ff ff sethi %hi(0xfffffc00), %o1 10: 92 12 63 ff or %o1, 0x3ff, %o1 ! ffffffff <foo+0xffffffff> 14: 93 2a 70 10 sllx %o1, 0x10, %o1 18: 15 3f ff ff sethi %hi(0xfffffc00), %o2 1c: 94 12 a3 ff or %o2, 0x3ff, %o2 ! ffffffff <foo+0xffffffff> 20: 95 2a b0 10 sllx %o2, 0x10, %o2 24: 92 1a 60 00 xor %o1, 0, %o1 28: 12 e2 40 8a cxbe %o1, %o2, 38 <foo+0x38> 2c: 9a 10 20 02 mov 2, %o5 30: 10 60 00 03 b,pn %xcc, 3c <foo+0x3c> 34: 01 00 00 00 nop 38: 9a 10 20 01 mov 1, %o5 ! 1 <foo+0x1> 3c: 81 c7 e0 08 ret 40: 91 eb 40 00 restore %o5, %g0, %o0 Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 20:32:15 -07:00
David S. Miller	e3a724edee	sparc64: Support cbcond instructions in eBPF JIT. cbcond combines a compare with a branch into a single instruction. The limitations are: 1) Only newer chips support it 2) For immediate compares we are limited to 5-bit signed immediate values 3) The branch displacement is limited to 10-bit signed 4) We cannot use it for JSET Also, cbcond (unlike all other sparc control transfers) lacks a delay slot. Currently we don't have a useful instruction we can push into the delay slot of normal branches. So using cbcond pretty much always increases code density, and is therefore a win. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 15:56:21 -07:00
David S. Miller	0e43d1009d	Merge branch 'bpf-misc-cleanups' Alexander Alemayhu says: ==================== Misc BPF cleanup while looking into making the Makefile in samples/bpf better handle O= I saw several warnings when running `make clean && make samples/bpf/`. This series reduces those warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:20 -04:00
Alexander Alemayhu	dfc5be0dc0	samples/bpf: check before defining offsetof Fixes the following warning samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) In file included from ./tools/lib/bpf/bpf.h:25:0, from samples/bpf/libbpf.h:5, from samples/bpf/test_lru_dist.c:24: /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER) Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Alexander Alemayhu	4784726f69	samples/bpf: add static to function with no prototype Fixes the following warning samples/bpf/cookie_uid_helper_example.c: At top level: samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes] void finish(int ret) ^~~~~~ HOSTLD samples/bpf/per_socket_stats_example Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Alexander Alemayhu	69b6a7f743	samples/bpf: add -Wno-unknown-warning-option to clang I was initially going to remove '-Wno-address-of-packed-member' because I thought it was not supposed to be there but Daniel suggested using '-Wno-unknown-warning-option'. This silences several warnings similiar to the one below warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option] 1 warning generated. clang -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h \ -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \ -Wno-compare-distinct-pointer-types \ -Wno-gnu-variable-sized-type-not-at-end \ -Wno-address-of-packed-member -Wno-tautological-compare \ -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -\| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o $ clang --version clang version 3.9.1 (tags/RELEASE_391/final) Target: x86_64-unknown-linux-gnu Thread model: posix InstalledDir: /usr/bin Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:20:19 -04:00
Daniel Borkmann	e390b55d5a	bpf: make bpf_xdp_adjust_head support mandatory Now that also the last in-tree user of the xdp_adjust_head bit has been removed, we can remove the flag from struct bpf_prog altogether. This, at the same time, also makes sure that any future driver for XDP comes with bpf_xdp_adjust_head() support right away. A rejection based on this flag would also mean that tail calls couldn't be used with such driver as per `c2002f9837` ("bpf: fix checking xdp_adjust_head on tail calls") fix, thus lets not allow for it in the first place. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:18:10 -04:00
Pan Bian	91ec701a55	qlcnic: fix unchecked return value Function pci_find_ext_capability() may return 0, which is an invalid address. In function qlcnic_sriov_virtid_fn(), its return value is used without validation. This may result in invalid memory access bugs. This patch fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 16:10:53 -04:00
Pan Bian	2a39e7aa8a	wan: pc300too: abort path on failure In function pc300_pci_init_one(), on the ioremap error path, function pc300_pci_remove_one() is called to free the allocated memory. However, the path is not terminated, and the freed memory will be used later, resulting in use-after-free bugs. This path fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 15:51:31 -04:00
Pan Bian	78302fd405	tipc: check return value of nlmsg_new Function nlmsg_new() will return a NULL pointer if there is no enough memory, and its return value should be checked before it is used. However, in function tipc_nl_node_get_monitor(), the validation of the return value of function nlmsg_new() is missed. This patch fixes the bug. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 15:51:30 -04:00
Pan Bian	a50fe0ffd7	lwtunnel: check return value of nla_nest_start Function nla_nest_start() may return a NULL pointer on error. However, in function lwtunnel_fill_encap(), the return value of nla_nest_start() is not validated before it is used. This patch checks the return value of nla_nest_start() against NULL. Signed-off-by: Pan Bian <bianpan2016@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 15:51:30 -04:00
David S. Miller	09d36071cf	Merge branch 'nfp-dma-adjust_head-fixes' Jakub Kicinski says: ==================== nfp: DMA flags, adjust head and fixes This series takes advantage of Alex's DMA_ATTR_SKIP_CPU_SYNC to make XDP packet modifications "correct" from DMA API point of view. It also allows us to parse the metadata before we run XDP at no additional DMA sync cost. That way we can get rid of the metadata memcpy, and remove the last upstream user of bpf_prog->xdp_adjust_head. David's patch adds a way to read capabilities from the management firmware. There are also two net-next fixes. Patch 4 which fixes what seems to be a result of a botched rebase on my part. Patch 5 corrects locking when state of ethernet ports is being refreshed. v3: move the sync from alloc func to the actual give to hw func v2: sync rx buffers before giving them to the card (Alex) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:45 -04:00
Jakub Kicinski	90fdc561b0	nfp: remove the refresh of all ports optimization The code refreshing the eth port state was trying to update state of all ports of the card. Unfortunately to safely walk the port list we would have to hold the port lock, which we can't due to lock ordering constraints against rtnl. Make the per-port sync refresh and async refresh of all ports completely separate routines. Fixes: `172f638c93` ("nfp: add port state refresh") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:44 -04:00
Jakub Kicinski	ee200a7377	nfp: fix free list buffer size reporting XDP headroom should not be included in free list buffer size. Fixes: `6fe0c3b438` ("nfp: add support for xdp_adjust_head()") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:44 -04:00
David Brunecz	010e2f9cc5	nfp: add NSP routine to get static information Retrieve identifying information from the NSP. For now it only contains versions of firmware subcomponents. Signed-off-by: David Brunecz <david.brunecz@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:44 -04:00
Jakub Kicinski	e524a6a9cd	nfp: parse metadata prepend before XDP runs Calling memcpy to shift metadata out of the way for XDP to run seems like an overkill. The most common metadata contents are 8 bytes containing type and flow hash. Simply parse the metadata before we run XDP. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:44 -04:00
Jakub Kicinski	5cd4fbeab2	nfp: make use of the DMA_ATTR_SKIP_CPU_SYNC attr DMA unmap may destroy changes CPU made to the buffer. To make XDP run correctly on non-x86 platforms we should use the DMA_ATTR_SKIP_CPU_SYNC attribute. Thanks to using the attribute we can now push the sync operation to the common code path from XDP handler. A little bit of variable name reshuffling is required to bring the code back to readable state. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:35:44 -04:00
David S. Miller	0d688a0d33	Merge branch 'cls_flower-MPLS' Benjamin LaHaise says: ==================== flower: add MPLS matching support This patch series adds support for parsing MPLS flows in the flow dissector and the flower classifier. Each of the MPLS TTL, BOS, TC and Label fields can be used for matching. v2: incorporate style feedback, move #defines to linux/include/mpls.h Note: this omits Jiri's request to remove tabs between the type and field names in struct declarations. This would be inconsistent with numerous other struct definitions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:30:47 -04:00
Benjamin LaHaise	a577d8f793	cls_flower: add support for matching MPLS fields (v2) Add support to the tc flower classifier to match based on fields in MPLS labels (TTL, Bottom of Stack, TC field, Label). Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:30:46 -04:00
Benjamin LaHaise	029c1ecbb2	flow_dissector: add mpls support (v2) Add support for parsing MPLS flows to the flow dissector in preparation for adding MPLS match support to cls_flower. Signed-off-by: Benjamin LaHaise <benjamin.lahaise@netronome.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Simon Horman <simon.horman@netronome.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Eric Dumazet <jhs@mojatatu.com> Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Gao Feng <fgao@ikuai8.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:30:46 -04:00
David S. Miller	3ec21b6580	Merge branch 'tcp-fastopen-middlebox-fixes' Wei Wang says: ==================== net/tcp_fastopen: Fix for various TFO firewall issues Currently there are still some firewall issues in the middlebox which make the middlebox drop packets silently for TFO sockets. This kind of issue is hard to be detected by the end client. This patch series tries to detect such issues in the kernel and disable TFO temporarily. More details about the issues and the fixes are included in the following patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:27:18 -04:00
Wei Wang	59450f8d83	net/tcp_fastopen: Remove mss check in tcp_write_timeout() Christoph Paasch from Apple found another firewall issue for TFO: After successful 3WHS using TFO, server and client starts to exchange data. Afterwards, a 10s idle time occurs on this connection. After that, firewall starts to drop every packet on this connection. The fix for this issue is to extend existing firewall blackhole detection logic in tcp_write_timeout() by removing the mss check. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:27:17 -04:00
Wei Wang	46c2fa3987	net/tcp_fastopen: Add snmp counter for blackhole detection This counter records the number of times the firewall blackhole issue is detected and active TFO is disabled. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:27:17 -04:00
Wei Wang	cf1ef3f071	net/tcp_fastopen: Disable active side TFO in certain scenarios Middlebox firewall issues can potentially cause server's data being blackholed after a successful 3WHS using TFO. Following are the related reports from Apple: https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf Slide 31 identifies an issue where the client ACK to the server's data sent during a TFO'd handshake is dropped. C ---> syn-data ---> S C <--- syn/ack ----- S C (accept & write) C <---- data ------- S C ----- ACK -> X S [retry and timeout] https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf Slide 5 shows a similar situation that the server's data gets dropped after 3WHS. C ---- syn-data ---> S C <--- syn/ack ----- S C ---- ack --------> S S (accept & write) C? X <- data ------ S [retry and timeout] This is the worst failure b/c the client can not detect such behavior to mitigate the situation (such as disabling TFO). Failing to proceed, the application (e.g., SSL library) may simply timeout and retry with TFO again, and the process repeats indefinitely. The proposed solution is to disable active TFO globally under the following circumstances: 1. client side TFO socket detects out of order FIN 2. client side TFO socket receives out of order RST We disable active side TFO globally for 1hr at first. Then if it happens again, we disable it for 2h, then 4h, 8h, ... And we reset the timeout to 1hr if a client side TFO sockets not opened on loopback has successfully received data segs from server. And we examine this condition during close(). The rational behind it is that when such firewall issue happens, application running on the client should eventually close the socket as it is not able to get the data it is expecting. Or application running on the server should close the socket as it is not able to receive any response from client. In both cases, out of order FIN or RST will get received on the client given that the firewall will not block them as no data are in those frames. And we want to disable active TFO globally as it helps if the middle box is very close to the client and most of the connections are likely to fail. Also, add a debug sysctl: tcp_fastopen_blackhole_detect_timeout_sec: the initial timeout to use when firewall blackhole issue happens. This can be set and read. When setting it to 0, it means to disable the active disable logic. Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:27:17 -04:00
David S. Miller	bc95cd8e8b	mlx5-updates-2017-04-22 Sparse and compiler warnings fixes from Stephen Hemminger. From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJY+5cRAAoJEEg/ir3gV/o+5c8H/1/khPzy26B2lWyjPC8CRCQF eSd0tiHLgIqbZTbnIHTR+NbZ/SUFaukoJi8OKn1fGFHCCajWvPP4xkENVKrUdi3q kOgNZb/R1V0j6SdELyoMalFPjAscTgdmwYMnry+vcjOxJ+H2uUTnMKXwFf8IsBjz EINy8oZ5jZcejmft0c2O5HN4Bt/7U5ttM3CroAdcvPT9lq2DFJL2uCABhTO/1DdY b7uVa47FnkqxX19Ebn7fjp5r3diGYOmCPMjdC89C//rbkLB8FN61EkcSLpGY3YNm djmCPQ+xaa3ielmBpOk3AMayFEtYW0nDMj9eWECVByadRQZ2qz9wTVXBp5CX9zg= =E3Jt -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2017-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2017-04-22 Sparse and compiler warnings fixes from Stephen Hemminger. From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling E-Switch encapsulation mode, this knob will enable HW support for applying encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:11:10 -04:00
David Ahern	58c4c6a3f7	net: add rcu locking when changing early demux systemd-sysctl is triggering a suspicious RCU usage message when net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via a sysctl config file: [ 33.896184] =============================== [ 33.899558] [ ERR: suspicious RCU usage. ] [ 33.900624] 4.11.0-rc7+ #104 Not tainted [ 33.901698] ------------------------------- [ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage! [ 33.905724] other info that might help us debug this: [ 33.907656] rcu_scheduler_active = 2, debug_locks = 0 [ 33.909288] 1 lock held by systemd-sysctl/143: [ 33.910373] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48 [ 33.912407] stack backtrace: [ 33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104 [ 33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 33.917870] Call Trace: [ 33.918431] dump_stack+0x81/0xb6 [ 33.919241] lockdep_rcu_suspicious+0x10f/0x118 [ 33.920263] proc_configure_early_demux+0x65/0x10a [ 33.921391] proc_udp_early_demux+0x3a/0x41 add rcu locking to proc_configure_early_demux. Fixes: `dddb64bcb3` ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:08:19 -04:00
David S. Miller	ad0cb27ce9	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-04-22 Here are some more Bluetooth patches (and one 802.15.4 patch) in the bluetooth-next tree targeting the 4.12 kernel. Most of them are pure fixes. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 14:05:40 -04:00
Colin Ian King	11a9ec4330	net: netcp: fix spelling mistake: "memomry" -> "memory" Trivial fix to spelling mistake in dev_err message and rejoin line. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 13:59:32 -04:00
Geliang Tang	4cc17bcf7f	net: atheros: atl1: use offset_in_page() macro Use offset_in_page() macro instead of open-coding. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 13:58:01 -04:00
David S. Miller	4dd42b94b4	Merge branch 'bnxt_en-misc-next' Michael Chan says: ==================== bnxt_en: Updates for net-next. Miscellaneous updates include passing DCBX RoCE VLAN priority to firmware, checking one more new firmware flag before allowing DCBX to run on the host, adding 100Gbps speed support, adding check to disallow speed settings on Multi-host NICs, and a minor fix for reporting VF attributes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-24 13:54:48 -04:00

1 2 3 4 5 ...

665510 Commits