linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 09:26:44 +07:00

Author	SHA1	Message	Date
Alexander Duyck	3a80e1facd	ip6gre: Add support for GSO This patch adds code borrowed from bits and pieces of other protocols to the IPv6 GRE path so that we can support GSO over IPv6 based GRE tunnels. By adding this support we are able to significantly improve the throughput for GRE tunnels as we are able to make use of GSO. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
Alexander Duyck	e0c20967c8	GRE: Add support for GRO/GSO of IPv6 GRE traffic Since GRE doesn't really care about L3 protocol we can support IPv4 and IPv6 using the same offloads. With that being the case we can add a call to register the offloads for IPv6 as a part of our GRE offload initialization. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
Alexander Duyck	ac4eb009e4	ip6gre: Add support for basic offloads offloads excluding GSO This patch adds support for the basic offloads we support on most devices. Specifically with this patch set we can support checksum offload, basic scatter-gather, and highdma. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
Alexander Duyck	a9e242ca43	ip6gretap: Fix MTU to allow for Ethernet header When we were creating an ip6gretap interface the MTU was about 6 bytes short of what was needed. It turns out we were not taking the Ethernet header into account and as a result we were eating into the 8 bytes reserved for the encap limit. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
Alexander Duyck	aed069df09	ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 19:09:13 -04:00
David S. Miller	ec9dcd3507	Merge branch 'w5100-spi-and-w5200-support' Akinobu Mita says: ==================== net: w5100: add support W5100/W5200 for SPI interface This series add support for Wiznet W5100 and W5200 for SPI interface. We can easily find the ethernet modules and shield for Arduino with these chips for purchase. I've tested them with BeagleBone. Wiznet W5100 for mmio access has already supported by w5100 driver. In order to share the code between mmio mode and SPI mode, this series firstly adds ability to support another register access interface to the existing w5100 driver. This ground work also requires to introduce workqueue and threaded irq because SPI transfers are callable only from contexts that can sleep unlike mmio access. The latter part of this series adds w5100-spi driver which actually support W5100 and W5200 for SPI interface. Supporting W5100 is straight forward because it only required to add a register access interface by the SPI transfer. W5100 and W5200 have similar memory map which justifies adding W5200 support to w5100 driver. * Changes from v2 to v3 - Add comment for reg_lock - Add ability to allocate ops specific data structure - Allocate w5200 ops specific data structure to put DMA-safe buffer - Add missing chip_id assignment for w5100__ops Changes from v1 to v2 - Use a plain single pointer instead of SKB queue, spotted by David S. Miller - Correct timeout period in w5100_command - Use spi_write_then_read instead of spi_write which needs DMA-safe buffer - Support W5200 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:27 -04:00
Akinobu Mita	0c165ff2d8	net: w5100: support W5200 This adds support for W5200 chip. W5100 and W5200 have similar memory map although some of their offsets are different. The register access sequences between them are different but w5100 driver has abstraction layer for difference bus interface modes so it is easy to add W5200 support to w5100 and w5100-spi drivers. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:27 -04:00
Akinobu Mita	630cf09751	net: w5100: support SPI interface mode This adds new w5100-spi driver which shares the bus interface independent code with existing w5100 driver. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:27 -04:00
Akinobu Mita	bf2c6b90b3	net: w5100: enable to support sleepable register access interface SPI transfer routines are callable only from contexts that can sleep. This adds ability to tell the core driver that the interface mode cannot access w5100 register on atomic contexts. In this case, workqueue and threaded irq are required. This also corrects timeout period waiting for command register to be automatically cleared because the latency of the register access with SPI transfer can be interfered by other contexts. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:27 -04:00
Akinobu Mita	850576cfed	net: w5100: add ability to support other bus interface The w5100 driver currently only supports direct and indirect bus interface mode which use MMIO space for accessing w5100 registers. In order to support SPI interface mode which is supported by W5100 chip, this makes the bus interface abstraction layer more generic so that separated w5100-spi driver can use w5100 driver as core module. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:27 -04:00
Akinobu Mita	d6586d2ef4	net: w5100: move mmiowb into register access callbacks Instead of sprinkle mmiowb over the driver code, move it into primary register write callbacks. (w5100_write, w5100_write16, w5100_writebuf) This is a preparation for supporting SPI interface which doesn't use MMIO for accessing w5100 registers. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:30:26 -04:00
Hannes Frederic Sowa	544a773a01	vxlan: reduce usage of synchronize_net in ndo_stop We only need to do the synchronize_net dance once for both, ipv4 and ipv6 sockets, thus removing one synchronize_net in case both sockets get dismantled. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:23:23 -04:00
Hannes Frederic Sowa	0412bd931f	vxlan: synchronously and race-free destruction of vxlan sockets Due to the fact that the udp socket is destructed asynchronously in a work queue, we have some nondeterministic behavior during shutdown of vxlan tunnels and creating new ones. Fix this by keeping the destruction process synchronous in regards to the user space process so IFF_UP can be reliably set. udp_tunnel_sock_release destroys vs->sock->sk if reference counter indicates so. We expect to have the same lifetime of vxlan_sock and vxlan_sock->sock->sk even in fast paths with only rcu locks held. So only destruct the whole socket after we can be sure it cannot be found by searching vxlan_net->sock_list. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-16 18:23:01 -04:00
wangweidong	f48256efed	phy: make some bits preserved while setup forced mode When tested the PHY SGMII Loopback: 1.set the LOOPBACK bit, 2.set the autoneg to AUTONEG_DISABLE, it calls the genphy_setup_forced which will clear the bit. The BMCR_LOOPBACK bit should be preserved. As Florian pointed out that other bits should be preserved too. So I make the BMCR_ISOLATE and BMCR_PDOWN as well. Signed-off-by: Weidong Wang <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 20:10:00 -04:00
David S. Miller	4e811b1e11	Merge branch 'sctp-diag' Xin Long says: ==================== sctp: support sctp_diag in kernel This patchset will add sctp_diag module to implement diag interface on sctp in kernel. For a listening sctp endpoint, we will just dump it's ep info. For a sctp connection, we will the assoc info and it's ep info. The ss dump will looks like: [iproute2]# ./misc/ss --sctp -n -l State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 172.16.254.254:8888 : LISTEN 0 5 127.0.0.1:1234 : LISTEN 0 5 127.0.0.1:1234 : - ESTAB 0 0 127.0.0.1%lo:1234 127.0.0.1:4321 LISTEN 0 128 172.16.254.254:8888 : - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.253.253:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.1.1:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.1.2:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.2.1:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.2.2:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.3.1:8888 - ESTAB 0 0 172.16.254.254%eth1:8888 172.16.3.2:8888 LISTEN 0 0 127.0.0.1:4321 : - ESTAB 0 0 127.0.0.1%lo:4321 127.0.0.1:1234 The entries with '- ESTAB' are the assocs, some of them may belong to the same endpoint. So we will dump the parent endpoint first, like the entry with 'LISTEN'. then dump the assocs. ep and assocs entries will be dumped in right order so that ss can show them in tree format easily. Besides, this patchset also simplifies sctp proc codes, cause it has some similar codes with sctp diag in sctp transport traversal. v1->v2: 1. inet_diag_get_handler needs to return it as const. 2. merge 5/7 into 2/7 of v1. v2->v3: do some improvements and fixes in patch 1-4, see the details in each patch's comment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:37 -04:00
Xin Long	53fa10369c	sctp: fix some rhashtable functions using in sctp proc/diag When rhashtable_walk_init return err, no release function should be called, and when rhashtable_walk_start return err, we should only invoke rhashtable_walk_exit to release the source. But now when sctp_transport_walk_start return err, we just call rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init or start return err, which is so bad. We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start return err in sctp_transport_walk_start, and if sctp_transport_walk_start return err, we do not need to call sctp_transport_walk_stop any more. For sctp proc, we will use 'iter->start_fail' to decide if we will call rhashtable_walk_stop/exit. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:37 -04:00
Xin Long	b5e2f4e699	sctp: merge the seq_start/next/exits in remaddrs and assocs In sctp proc, these three functions in remaddrs and assocs are the same. we should merge them into one. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:36 -04:00
Xin Long	8f840e47f1	sctp: add the sctp_diag.c file This one will implement all the interface of inet_diag, inet_diag_handler. which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info. It will work as a module, and register inet_diag_handler when loading. v2->v3: - fix the mistake in inet_assoc_attr_size(). - change inet_diag_msg_laddrs_fill() name to inet_diag_msg_sctpladdrs_fill. - change inet_diag_msg_paddrs_fill() name to inet_diag_msg_sctpaddrs_fill. - add inet_diag_msg_sctpinfo_fill() to make asoc/ep fill code clearer. - add inet_diag_msg_sctpasoc_fill() to make asoc fill code clearer. - merge inet_asoc_diag_fill() and inet_ep_diag_fill() to inet_sctp_diag_fill(). - call sctp_diag_get_info() directly, instead by handler, cause the caller is in the same file with it. - call lock_sock in sctp_tsp_dump_one() to make sure we call get sctp info safely. - after lock_sock(sk), we should check sk != assoc->base.sk. - change mem[SK_MEMINFO_WMEM_ALLOC] to asoc->sndbuf_used for asoc dump when asoc->ep->sndbuf_policy is set. don't use INET_DIAG_MEMINFO attr any more. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:36 -04:00
Xin Long	cb2050a7b8	sctp: export some functions for sctp_diag in inet_diag inet_diag_msg_common_fill is used to fill the diag msg common info, we need to use it in sctp_diag as well, so export it. inet_diag_msg_attrs_fill is used to fill some common attrs info between sctp diag and tcp diag. v2->v3: - do not need to define and export inet_diag_get_handler any more. cause all the functions in it are in sctp_diag.ko, we just call them in sctp_diag.ko. - add inet_diag_msg_attrs_fill to make codes clear. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:36 -04:00
Xin Long	626d16f50f	sctp: export some apis or variables for sctp_diag and reuse some for proc For some main variables in sctp.ko, we couldn't export it to other modules, so we have to define some api to access them. It will include sctp transport and endpoint's traversal. There are some transport traversal functions for sctp_diag, we can also use it for sctp_proc. cause they have the similar situation to traversal transport. v2->v3: - rhashtable_walk_init need the parameter gfp, because of recent upstrem update Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:36 -04:00
Xin Long	52c52a61a3	sctp: add sctp_info dump api for sctp_diag sctp_diag will dump some important details of sctp's assoc or ep, we use sctp_info to describe them, sctp_get_sctp_info to get them, and export it to sctp_diag.ko. v2->v3: - we will not use list_for_each_safe in sctp_get_sctp_info, cause all the callers of it will use lock_sock. - fix the holes in struct sctp_info with __reserved* field. because sctp_diag is a new feature, and sctp_info is just for now, it may be changed in the future. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:29:35 -04:00
Marcelo Ricardo Leitner	311b21774f	sctp: simplify sk_receive_queue locking SCTP already serializes access to rcvbuf through its sock lock: sctp_recvmsg takes it right in the start and release at the end, while rx path will also take the lock before doing any socket processing. On sctp_rcv() it will check if there is an user using the socket and, if there is, it will queue incoming packets to the backlog. The backlog processing will do the same. Even timers will do such check and re-schedule if an user is using the socket. Simplifying this will allow us to remove sctp_skb_list_tail and get ride of some expensive lockings. The lists that it is used on are also mangled with functions like __skb_queue_tail and __skb_unlink in the same context, like on sctp_ulpq_tail_event() and sctp_clear_pd(). sctp_close() will also purge those while using only the sock lock. Therefore the lockings performed by sctp_skb_list_tail() are not necessary. This patch removes this function and replaces its calls with just skb_queue_splice_tail_init() instead. The biggest gain is at sctp_ulpq_tail_event(), because the events always contain a list, even if it's queueing a single skb and this was triggering expensive calls to spin_lock_irqsave/_irqrestore for every data chunk received. As SCTP will deliver each data chunk on a corresponding recvmsg, the more effective the change will be. Before this patch, with chunks with 30 bytes: netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000 400000 -s 400000 400000 on a 10Gbit link with 1500 MTU: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608 With it: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788 Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:22:20 -04:00
David S. Miller	936d4b41b0	Merge branch 'mlx5_ifc-updates' Saeed Mahameed says: ==================== mlx5_core: mlx5_ifc updates This series include mlx5_core updates for both net-next and rdma trees for 4.7 kernel cycle. This is the only shared code planned for 4.7 between rdma and net trees. Hopefully, this will prevent future conflicts when merging between ib-next and net-next once 4.7 cycle is over and merge window is opened. Both Mellanox rdma and net submissions will proceed once this series is applied into both trees. Future shared code will be sent to both maintainers as pull requests from Mellanox's kernel.org tree. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:21:10 -04:00
Saeed Mahameed	7d5e14237a	net/mlx5: Update mlx5_ifc hardware features Adding the needed mlx5_ifc hardware bits and structs for the following feature: * Add vport to steering commands for SRIOV ACL support * Add mlcr, pcmr and mcia registers for dump module EEPROM * Add support for FCS, baeacon led and disable_link bits to hca caps * Add CQE period mode bit in CQ context for CQE based CQ moderation support * Add umr SQ bit for fragmented memory registration * Add needed bits and caps for Striding RQ support Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:21:10 -04:00
Tariq Toukan	e1c9c62b9a	net/mlx5: Fix mlx5 ifc cmd_hca_cap bad offsets All reserved fields after early_vf_enable are off by 1, since early_vf_enable was not explicitly declared as array of size 1. Reserved field before cqe_zip had a wrong size, it should be 0x80 + 0x3f. Fixes: `b084444459` ("net/mlx5_core: Introduce access function to read internal timer ") Fixes: `b4ff3a36d3` ("net/mlx5: Use offset based reserved field names in the IFC header file") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:21:09 -04:00
David S. Miller	993feee979	Merge branch 'qed-tunneling-offload' Manish Chopra says: ==================== qed/qede: Add tunneling support This patch series adds support for VXLAN, GRE and GENEVE tunnels to be used over this driver. With this support, adapter can perform TSO offload, inner/outer checksums offloads on TX and RX for encapsulated packets. V1->V2 [ Comments from Jesse Gross incorporated ] * Drop general infrastructure change patch. "net: Make vxlan/geneve default udp ports public" * Remove by default Linux default UDP ports configurations in driver. Instead, use general registration APIs for UDP port configurations * Removing .ndo_features_check - we will add it later with proper change. Please consider applying this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:09 -04:00
Manish Chopra	14db81defa	qede: Add fastpath support for tunneling This patch enables netdev tunneling features and adds TX/RX fastpath support for tunneling in driver. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:09 -04:00
Manish Chopra	f798586920	qed: Enable GRE tunnel slowpath configuration Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:09 -04:00
Manish Chopra	9a109dd073	qed/qede: Add GENEVE tunnel slowpath configuration support This patch enables GENEVE tunnel on the adapter and add support for driver hooks to configure UDP ports for GENEVE tunnel offload to be performed by the adapter. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:08 -04:00
Manish Chopra	b18e170cac	qed/qede: Add VXLAN tunnel slowpath configuration support This patch enables VXLAN tunnel on the adapter and add support for driver hooks to configure UDP ports for VXLAN tunnel offload to be performed by the adapter. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:08 -04:00
Manish Chopra	464f664501	qed: Add infrastructure support for tunneling This patch adds various structure/APIs needed to configure/enable different tunnel [VXLAN/GRE/GENEVE] parameters on the adapter. Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:08:08 -04:00
Peter Heise	ee1c279772	net/hsr: Added support for HSR v1 This patch adds support for the newer version 1 of the HSR networking standard. Version 0 is still default and the new version has to be selected via iproute2. Main changes are in the supervision frame handling and its ethertype field. Signed-off-by: Peter Heise <peter.heise@airbus.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 17:06:48 -04:00
David S. Miller	125c8d1233	Merge branch 'tcp-synflood-perf' Eric Dumazet says: ==================== tcp: final work on SYNFLOOD behavior In the first patch, I remove the costly association of SYNACK+COOKIES to a listener. I believe other parts of the stack should be ready. The second patch removes a useless write into listener socket in tcp_rcv_state_process(), incurring false sharing in tcp_conn_request() Performance under SYNFLOOD goes from 3.2 Mpps to 6 Mpps. Test was using a single TCP listener, on a host with 8 RX queues on the NIC, and 24 cores (48 ht) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:45:45 -04:00
Eric Dumazet	8804b2722d	tcp: remove false sharing in tcp_rcv_state_process() Last known hot point during SYNFLOOD attack is the clearing of rx_opt.saw_tstamp in tcp_rcv_state_process() It is not needed for a listener, so we move it where it matters. Performance while a SYNFLOOD hits a single listener socket went from 5 Mpps to 6 Mpps on my test server (24 cores, 8 NIC RX queues) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:45:44 -04:00
Eric Dumazet	b3d051477c	tcp: do not mess with listener sk_wmem_alloc When removing sk_refcnt manipulation on synflood, I missed that using skb_set_owner_w() was racy, if sk->sk_wmem_alloc had already transitioned to 0. We should hold sk_refcnt instead, but this is a big deal under attack. (Doing so increase performance from 3.2 Mpps to 3.8 Mpps only) In this patch, I chose to not attach a socket to syncookies skb. Performance is now 5 Mpps instead of 3.2 Mpps. Following patch will remove last known false sharing in tcp_rcv_state_process() Fixes: `3b24d854cb` ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:45:44 -04:00
Amitoj Kaur Chawla	ac18dd9e84	qlge: Replace create_singlethread_workqueue with alloc_ordered_workqueue Replace deprecated create_singlethread_workqueue with alloc_ordered_workqueue. Work items include getting tx/rx frame sizes, resetting MPI processor, setting asic recovery bit so ordering seems necessary as only one work item should be in queue/executing at any given time, hence the use of alloc_ordered_workqueue. WQ_MEM_RECLAIM flag has been set since ethernet devices seem to sit in memory reclaim path, so to guarantee forward progress regardless of memory pressure. Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:42:10 -04:00
David S. Miller	0818556cd0	Merge branch 'tipc-link-setup-improvements' Jon Maloy says: ==================== tipc: improvements to the link setup algorithm This series addresses some smaller issues regarding the link setup algorithm. The first commit fixes a rare bug we have discovered during testing; the second one may have some future impact on cluster scalabilty, while remaining ones can be regarded as cosmetic in a wider sense of the word. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:07 -04:00
Jon Paul Maloy	34b9cd64c8	tipc: let first message on link be a state message According to the link FSM, a received traffic packet can take a link from state ESTABLISHING to ESTABLISHED, but the link can still not be fully set up in one atomic operation. This means that even if the the very first packet on the link is a traffic packet with sequence number 1 (one), it has to be dropped and retransmitted. This can be avoided if we let the mentioned packet be preceded by a LINK_PROTOCOL/STATE message, which takes up the endpoint before the arrival of the traffic. We add this small feature in this commit. This is a fully compatible change. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:06 -04:00
Jon Paul Maloy	de7e07f9ee	tipc: ensure that first packets on link are sent in order In some link establishment scenarios we see that packet #2 may be sent out before packet #1, forcing the receiver to demand retransmission of the missing packet. This is harmless, but may cause confusion among people tracing the packet flow. Since this is extremely easy to fix, we do so by adding en extra send call to the bearer immediately after the link has come up. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:06 -04:00
Jon Paul Maloy	42b18f605f	tipc: refactor function tipc_link_timeout() The function tipc_link_timeout() is unnecessary complex, and can easily be made more readable. We do that with this commit. The only functional change is that we remove a redundant test for whether the broadcast link is up or not. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:06 -04:00
Jon Paul Maloy	88e8ac7000	tipc: reduce transmission rate of reset messages when link is down When a link is down, it will continuously try to re-establish contact with the peer by sending out a RESET or an ACTIVATE message at each timeout interval. The default value for this interval is currently 375 ms. This is wasteful, and may become a problem in very large clusters with dozens or hundreds of nodes being down simultaneously. We now introduce a simple backoff algorithm for these cases. The first five messages are sent at default rate; thereafter a message is sent only each 16th timer interval. This will cover the vast majority of link recycling cases, since the endpoint starting last will transmit at the higher speed, and the link should normally be established well be before the rate needs to be reduced. The only case where we will see a degradation of link re-establishment times is when the endpoints remain intact, and a glitch in the transmission media is causing the link reset. We will then experience a worst-case re-establishing time of 6 seconds, something we deem acceptable. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:05 -04:00
Jon Paul Maloy	634696b197	tipc: guarantee peer bearer id exchange after reboot When a link endpoint is going down locally, e.g., because its interface is being stopped, it will spontaneously send out a RESET message to its peer, informing it about this fact. This saves the peer from detecting the failure via probing, and hence gives both speedier and less resource consuming failure detection on the peer side. According to the link FSM, a receiver of a RESET message, ignoring the reason for it, must now consider the sender ready to come back up, and starts periodically sending out ACTIVATE messages to the peer in order to re-establish the link. Also, according to the FSM, the receiver of an ACTIVATE message can now go directly to state ESTABLISHED and start sending regular traffic packets. This is a well-proven and robust FSM. However, in the case of a reboot, there is a small possibilty that link endpoint on the rebooted node may have been re-created with a new bearer identity between the moment it sent its (pre-boot) RESET and the moment it receives the ACTIVATE from the peer. The new bearer identity cannot be known by the peer according to this scenario, since traffic headers don't convey such information. This is a problem, because both endpoints need to know the correct value of the peer's bearer id at any moment in time in order to be able to produce correct link events for their users. The only way to guarantee this is to enforce a full setup message exchange (RESET + ACTIVATE) even after the reboot, since those messages carry the bearer idientity in their header. In this commit we do this by introducing and setting a "stopping" bit in the header of the spontaneously generated RESET messages, informing the peer that the sender will not be immediately ready to re-establish the link. A receiver seeing this bit must act as if this were a locally detected connectivity failure, and hence has to go through a full two- way setup message exchange before any link can be re-established. Although never reported, this problem seems to have always been around. This protocol addition is fully backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 16:09:05 -04:00
David S. Miller	25fb0b6c73	Merge branch 'mlxsw-next' Jiri Pirko says: ==================== mlxsw: spectrum_buffers: couple of cosmetic patches As suggested by David Laight ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:43 -04:00
Jiri Pirko	b94cdabbf1	mlxsw: spectrum_buffers: Use MLXSW_SP_PB_UNUSED define for unused pb Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:43 -04:00
Jiri Pirko	ce78f02042	mlxsw: spectrum_buffers: Use designated initializers for mlxsw_sp_pbs Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:42 -04:00
Jiri Pirko	de33efd0fb	devlink: fix sb register stub in case devlink is disabled Reported-by: kbuild test robot <fengguang.wu@intel.com> Fixes: `bf7974710a` ("devlink: add shared buffer configuration") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 12:57:29 -04:00
Paolo Abeni	608b997726	tun: use per cpu variables for stats accounting Currently the tun device accounting uses dev->stats without applying any kind of protection, regardless that accounting happens in preemptible process context. This patch move the tun stats to a per cpu data structure, and protect the updates with u64_stats_update_begin()/u64_stats_update_end() or this_cpu_inc according to the stat type. The per cpu stats are aggregated by the newly added ndo_get_stats64 ops. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 22:55:25 -04:00
David S. Miller	548aacdd74	Merge branch 'bpf-ARG_PTR_TO_RAW_STACK' Merge branch 'bpf-ARG_PTR_TO_RAW_STACK' Daniel Borkmann says: ==================== BPF updates This series adds a new verifier argument type called ARG_PTR_TO_RAW_STACK and converts related helpers to make use of it. Basic idea is that we can save init of stack memory when the helper function is guaranteed to fully fill out the passed buffer in every path. Series also adds test cases and converts samples. For more details, please see individual patches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:40:53 -04:00
Daniel Borkmann	3f2050e20e	bpf, samples: add test cases for raw stack This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the verifier behaviour. [...] #84 raw_stack: no skb_load_bytes OK #85 raw_stack: skb_load_bytes, no init OK #86 raw_stack: skb_load_bytes, init OK #87 raw_stack: skb_load_bytes, spilled regs around bounds OK #88 raw_stack: skb_load_bytes, spilled regs corruption OK #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK #90 raw_stack: skb_load_bytes, spilled regs + data OK #91 raw_stack: skb_load_bytes, invalid access 1 OK #92 raw_stack: skb_load_bytes, invalid access 2 OK #93 raw_stack: skb_load_bytes, invalid access 3 OK #94 raw_stack: skb_load_bytes, invalid access 4 OK #95 raw_stack: skb_load_bytes, invalid access 5 OK #96 raw_stack: skb_load_bytes, invalid access 6 OK #97 raw_stack: skb_load_bytes, large access OK Summary: 98 PASSED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:40:42 -04:00
Daniel Borkmann	02413cabd6	bpf, samples: don't zero data when not needed Remove the zero initialization in the sample programs where appropriate. Note that this is an optimization which is now possible, old programs still doing the zero initialization are just fine as well. Also, make sure we don't have padding issues when we don't memset() the entire struct anymore. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 21:40:42 -04:00

1 2 3 4 5 ...

589827 Commits