linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-27 15:21:41 +07:00

Author	SHA1	Message	Date
Pravin B Shelar	2fdb957d63	openvswitch: Refactor action alloc and copy api. There are two separate API to allocate and copy actions list. Anytime OVS needs to copy action list, it needs to call both functions. Following patch moves action allocation to copy function to avoid code duplication. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>	2014-11-05 23:52:35 -08:00
Joe Stringer	41af73e9c1	openvswitch: Move key_attr_size() to flow_netlink.h. flow-netlink has netlink related code. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Lorand Jakab	d98612b8c1	openvswitch: Remove flow member from struct ovs_skb_cb The 'flow' memeber was chosen for removal because it's only used in ovs_execute_actions() we can pass it as argument to this function. Signed-off-by: Lorand Jakab <lojakab@cisco.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Jarno Rajahalme	1a4e96a0e9	openvswitch: Fix the type of struct ovs_key_nd nd_target field. Should be the same as other IPv6 address fields. Current master produces sparse warnings without this change. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Chunhe Li	e1f9c356d2	openvswitch: Drop packets when interdev is not up If the internal device is not up, it should drop received packets. Sometimes it receive the broadcast or multicast packets, and the ip protocol stack will casue more cpu usage wasted. Signed-off-by: Chunhe Li <lichunhe@huawei.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:35 -08:00
Andy Zhou	cc3a5ae6f2	openvswitch: Refactor get_dp() function into multiple access APIs. Avoid recursive read_rcu_lock() by using the lighter weight get_dp_rcu() API. Add proper locking assertions to get_dp(). Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Joe Stringer	ca7105f278	openvswitch: Refactor ovs_flow_cmd_fill_info(). Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a dump reply. This will be used to streamline flow_dump in a future patch. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Andy Zhou	738967b8bf	openvswitch: refactor do_output() to move NULL check out of fast path skb_clone() NULL check is implemented in do_output(), as past of the common (fast) path. Refactoring so that NULL check is done in the slow path, immediately after skb_clone() is called. Besides optimization, this change also improves code readability by making the skb_clone() NULL check consistent within OVS datapath module. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Jesse Gross	426cda5cc1	openvswitch: Additional logging for -EINVAL on flow setups. There are many possible ways that a flow can be invalid so we've added logging for most of them. This adds logs for the remaining possible cases so there isn't any ambiguity while debugging. CC: Federico Iezzi <fiezzi@enter.it> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Joe Stringer	1b760fb9a8	openvswitch: Remove redundant tcp_flags code. These two cases used to be treated differently for IPv4/IPv6, but they are now identical. Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:34 -08:00
Pravin B Shelar	9b996e544a	openvswitch: Move table destroy to dp-rcu callback. Ths simplifies flow-table-destroy API. No need to pass explicit parameter about context. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com>	2014-11-05 23:52:34 -08:00
Simon Horman	25cd9ba0ab	openvswitch: Add basic MPLS support to kernel Allow datapath to recognize and extract MPLS labels into flow keys and execute actions which push, pop, and set labels on packets. Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer. Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:33 -08:00
Pravin B Shelar	59b93b41e7	net: Remove MPLS GSO feature. Device can export MPLS GSO support in dev->mpls_features same way it export vlan features in dev->vlan_features. So it is safe to remove NETIF_F_GSO_MPLS redundant flag. Signed-off-by: Pravin B Shelar <pshelar@nicira.com>	2014-11-05 23:52:33 -08:00
Tom Herbert	e1b2cb6550	fou: Fix typo in returning flags in netlink When filling netlink info, dport is being returned as flags. Fix instances to return correct value. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:18:20 -05:00
hayeswang	93ffbeab77	r8152: disable the tasklet by default Let the tasklet only be enabled after open(), and be disabled for the other situation. The tasklet is only necessary after open() for tx/rx, so it could be disabled by default. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:17:10 -05:00
Daniel Borkmann	4c672e4b42	ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs It has been reported that generating an MLD listener report on devices with large MTUs (e.g. 9000) and a high number of IPv6 addresses can trigger a skb_over_panic(): skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20 head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0 dev:port1 ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:100! invalid opcode: 0000 [#1] SMP Modules linked in: ixgbe(O) CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4 [...] Call Trace: <IRQ> [<ffffffff80578226>] ? skb_put+0x3a/0x3b [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4 [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45 [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68 [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182 [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3 [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46 [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70 mld_newpack() skb allocations are usually requested with dev->mtu in size, since commit `72e09ad107` ("ipv6: avoid high order allocations") we have changed the limit in order to be less likely to fail. However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb) macros, which determine if we may end up doing an skb_put() for adding another record. To avoid possible fragmentation, we check the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong assumption as the actual max allocation size can be much smaller. The IGMP case doesn't have this issue as commit `57e1ab6ead` ("igmp: refine skb allocations") stores the allocation size in the cb[]. Set a reserved_tailroom to make it fit into the MTU and use skb_availroom() helper instead. This also allows to get rid of igmp_skb_size(). Reported-by: Wei Liu <lw1a2.jing@gmail.com> Fixes: `72e09ad107` ("ipv6: avoid high order allocations") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: David L Stevens <david.stevens@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:12:30 -05:00
Joe Perches	1744bea1fa	net: Convert SEQ_START_TOKEN/seq_printf to seq_puts Using a single fixed string is smaller code size than using a format and many string arguments. Reduces overall code size a little. $ size net/ipv4/igmp.o* net/ipv6/mcast.o* net/ipv6/ip6_flowlabel.o* text data bss dec hex filename 34269 7012 14824 56105 db29 net/ipv4/igmp.o.new 34315 7012 14824 56151 db57 net/ipv4/igmp.o.old 30078 7869 13200 51147 c7cb net/ipv6/mcast.o.new 30105 7869 13200 51174 c7e6 net/ipv6/mcast.o.old 11434 3748 8580 23762 5cd2 net/ipv6/ip6_flowlabel.o.new 11491 3748 8580 23819 5d0b net/ipv6/ip6_flowlabel.o.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:04:55 -05:00
Hannes Frederic Sowa	e5a2c89995	fast_hash: avoid indirect function calls By default the arch_fast_hash hashing function pointers are initialized to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get updated to the CRC32 ones. This dispatching scheme incurs a function pointer lookup and indirect call for every hashing operation. rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing functions in its structure, too, causing two indirect branches per hashing operation. Using alternative_call we can get away with one of those indirect branches. Acked-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 22:01:21 -05:00
David S. Miller	2c99cd914d	Merge branch 'amd-xgbe-next' Tom Lendacky says: ==================== amd-xgbe: AMD XGBE driver updates 2014-11-04 The following series of patches includes functional updates to the driver as well as some trivial changes for function renaming and spelling fixes. - Move channel and ring structure allocation into the device open path - Rename the pre_xmit function to dev_xmit - Explicitly use the u32 data type for the device descriptors - Use page allocation for the receive buffers - Add support for split header/payload receive - Add support for per DMA channel interrupts - Add support for receive side scaling (RSS) - Add support for ethtool receive side scaling commands - Fix the spelling of descriptors - After a PCS reset, sync the PCS and PHY modes - Add dependency on HAS_IOMEM to both the amd-xgbe and amd-xgbe-phy drivers This patch series is based on net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:43 -05:00
Lendacky, Thomas	5cdec67967	amd-xgbe-phy: Let AMD_XGBE_PHY depend on HAS_IOMEM The amd-xgbe-phy driver needs to perform ioremap calls, so add HAS_IOMEM to its build dependency. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:13 -05:00
Lendacky, Thomas	474809b9e1	amd-xgbe: Let AMD_XGBE depend on HAS_IOMEM The amd-xgbe driver needs to perform ioremap calls, so add HAS_IOMEM to its build dependency. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:13 -05:00
Lendacky, Thomas	0c95a1faaa	amd-xgbe-phy: Sync PCS and PHY modes after reset This patch adds support to sync the states of the PCS and the PHY after a reset is performed. If the PCS and the PHY are not in the same state after reset an extra mode change would be performed. This extra mode change might not be needed if the PCS and the PHY are synced up after reset. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	a7beaf2300	amd-xgbe: Fix a spelling error This patch fixes the spelling of the word "descriptor" in a couple of locations. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	f6ac862845	amd-xgbe: Add receive side scaling ethtool support This patch adds support for ethtool receive side scaling (RSS) commands. Support is added to get/set the RSS hash key and the RSS lookup table. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	5b9dfe299e	amd-xgbe: Provide support for receive side scaling This patch provides support for receive side scaling (RSS). RSS allows for spreading incoming network packets across the Rx queues. When used in conjunction with the per DMA channel interrupt support, this allows the receive processing to be spread across multiple processors. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	9227dc5e57	amd-xgbe: Add support for per DMA channel interrupts This patch provides support for interrupts that are generated by the Tx/Rx DMA channel pairs of the device. This allows for Tx and Rx processing to run across multiple processsors. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	174fd2597b	amd-xgbe: Implement split header receive support Provide support for splitting IP packets so that the header and payload can be sent to different DMA addresses. This will allow the IP header to be put into the linear part of the skb while the payload can be added as frags. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	08dcc47c06	amd-xgbe: Use page allocations for Rx buffers Use page allocations for Rx buffers instead of pre-allocating skbs of a set size. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	aa96bd3c9f	amd-xgbe: Use the u32 data type for descriptors The Tx and Rx descriptors are unsigned 32 bit values. Use the u32 type, rather than unsigned int, to map these descriptors. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	a9d41981e9	amd-xgbe: Rename pre_xmit function to dev_xmit The pre_xmit function name implies that it performs operations prior to transmitting the packet when in fact it is responsible for setting up the descriptors and initiating the transmit. Rename this to function from pre_xmit to dev_xmit, which is consistent with the name used during receive processing - dev_read. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:12 -05:00
Lendacky, Thomas	4780b7cae6	amd-xgbe: Move ring allocation to device open Move the channel and ring tracking structures allocation to device open. This will allow for future support to vary the number of Tx/Rx queues without unloading the module. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 21:50:11 -05:00
Gregory Fong	66f1c44887	bridge: include in6.h in if_bridge.h for struct in6_addr if_bridge.h uses struct in6_addr ip6, but wasn't including the in6.h header. Thomas Backlund originally sent a patch to do this, but this revealed a redefinition issue: https://lkml.org/lkml/2013/1/13/116 The redefinition issue should have been fixed by the following Linux commits: `ee262ad827` inet: defines IPPROTO_* needed for module alias generation `cfd280c912` net: sync some IP headers with glibc and the following glibc commit: 6c82a2f8d7c8e21e39237225c819f182ae438db3 Coordinate IPv6 definitions for Linux and glibc so actually include the header now. Reported-by: Colin Guthrie <colin@mageia.org> Reported-by: Christiaan Welvaart <cjw@daneel.dyndns.org> Reported-by: Thomas Backlund <tmb@mageia.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Gregory Fong <gregory.0xf0@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 17:13:34 -05:00
Marcelo Leitner	1f37bf87aa	tcp: zero retrans_stamp if all retrans were acked Ueki Kohei reported that when we are using NewReno with connections that have a very low traffic, we may timeout the connection too early if a second loss occurs after the first one was successfully acked but no data was transfered later. Below is his description of it: When SACK is disabled, and a socket suffers multiple separate TCP retransmissions, that socket's ETIMEDOUT value is calculated from the time of the first retransmission instead of the latest retransmission. This happens because the tcp_sock's retrans_stamp is set once then never cleared. Take the following connection: Linux remote-machine \| \| send#1---->(1)\|--------> data#1 --------->\| \| \| \| RTO : : \| \| \| ---(2)\|----> data#1(retrans) ---->\| \| (3)\|<---------- ACK <----------\| \| \| \| \| : : \| : : \| : : 16 minutes (or more) : \| : : \| : : \| : : \| \| \| send#2---->(4)\|--------> data#2 --------->\| \| \| \| RTO : : \| \| \| ---(5)\|----> data#2(retrans) ---->\| \| \| \| \| \| \| RTO2 : : \| \| \| \| \| \| ETIMEDOUT<----(6)\| \| (1) One data packet sent. (2) Because no ACK packet is received, the packet is retransmitted. (3) The ACK packet is received. The transmitted packet is acknowledged. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". (4) After 16 minutes (to correspond with retries2=15), a new data packet is sent. Note: No data is transmitted between (3) and (4). The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago. (5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT. Therefore, now we clear retrans_stamp as soon as all data during the loss window is fully acked. Reported-by: Ueki Kohei Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:59:49 -05:00
WANG Cong	25de4668d0	ipv6: move INET6_MATCH() to include/net/inet6_hashtables.h It is only used in net/ipv6/inet6_hashtables.c. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:59:04 -05:00
David S. Miller	51f3d02b98	net: Add and use skb_copy_datagram_msg() helper. This encapsulates all of the skb_copy_datagram_iovec() callers with call argument signature "skb, offset, msghdr->msg_iov, length". When we move to iov_iters in the networking, the iov_iter object will sit in the msghdr. Having a helper like this means there will be less places to touch during that transformation. Based upon descriptions and patch from Al Viro. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:46:40 -05:00
David S. Miller	1d76c1d028	Merge branch 'gue-next' Tom Herbert says: ==================== gue: Remote checksum offload This patch set implements remote checksum offload for GUE, which is a mechanism that provides checksum offload of encapsulated packets using rudimentary offload capabilities found in most Network Interface Card (NIC) devices. The outer header checksum for UDP is enabled in packets and, with some additional meta information in the GUE header, a receiver is able to deduce the checksum to be set for an inner encapsulated packet. Effectively this offloads the computation of the inner checksum. Enabling the outer checksum in encapsulation has the additional advantage that it covers more of the packet than the inner checksum including the encapsulation headers. Remote checksum offload is described in: http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01 The GUE transmit and receive paths are modified to support the remote checksum offload option. The option contains a checksum offset and checksum start which are directly derived from values set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the operation is to calculate the packet checksum from "start" to end of the packet (normally derived for checksum complete), and then set the resultant value at checksum "offset" (the checksum field has already been primed with the pseudo header). This emulates a NIC that implements NETIF_F_HW_CSUM. The primary purpose of this feature is to eliminate cost of performing checksum calculation over a packet when encpasulating. In this patch set: - Move fou_build_header into fou.c and split it into a couple of functions - Enable offloading of outer UDP checksum in encapsulation - Change udp_offload to support remote checksum offload, includes new GSO type and ensuring encapsulated layers (TCP) doesn't try to set a checksum covered by RCO - TX support for RCO with GUE. This is configured through ip_tunnel and set the option on transmit when packet being encapsulated is CHECKSUM_PARTIAL - RX support for RCO with GUE for normal and GRO paths. Includes resolving the offloaded checksum v2: Address comments from davem: Move accounting for private option field in gue_encap_hlen to patch in which we add the remote checksum offload option. Testing: I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200 streams, comparing GUE with and without remote checksum offload (doing checksum-unnecessary to complete conversion in both cases). These were run on mlnx4 and bnx2x. Some mlnx4 results are below. GRE/GUE TCP_STREAM IPv4, with remote checksum offload 9.71% TX CPU utilization 7.42% RX CPU utilization 36380 Mbps IPv4, without remote checksum offload 12.40% TX CPU utilization 7.36% RX CPU utilization 36591 Mbps TCP_RR IPv4, with remote checksum offload 77.79% CPU utilization 91/144/216 90/95/99% latencies 1.95127e+06 tps IPv4, without remote checksum offload 78.70% CPU utilization 89/152/297 90/95/99% latencies 1.95458e+06 tps IPIP/GUE TCP_STREAM With remote checksum offload 10.30% TX CPU utilization 7.43% RX CPU utilization 36486 Mbps Without remote checksum offload 12.47% TX CPU utilization 7.49% RX CPU utilization 36694 Mbps TCP_RR With remote checksum offload 77.80% CPU utilization 87/153/270 90/95/99% latencies 1.98735e+06 tps Without remote checksum offload 77.98% CPU utilization 87/150/287 90/95/99% latencies 1.98737e+06 tps SIT/GUE TCP_STREAM With remote checksum offload 9.68% TX CPU utilization 7.36% RX CPU utilization 35971 Mbps Without remote checksum offload 12.95% TX CPU utilization 8.04% RX CPU utilization 36177 Mbps TCP_RR With remote checksum offload 79.32% CPU utilization 94/158/295 90/95/99% latencies 1.88842e+06 tps Without remote checksum offload 80.23% CPU utilization 94/149/226 90/95/99% latencies 1.90338e+06 tps VXLAN TCP_STREAM 35.03% TX CPU utilization 20.85% RX CPU utilization 36230 Mbps TCP_RR 77.36% CPU utilization 84/146/270 90/95/99% latencies 2.08063e+06 tps We can also look at CPU time in csum_partial using perf (with bnx2x setup). For GRE with TCP_STREAM I see: With remote checksum offload 0.33% TX 1.81% RX Without remote checksum offload 6.00% TX 0.51% RX I suspect the fact that time in csum_partial noticably increases with remote checksum offload for RX is due to taking the cache miss on the encapsulated header in that function. By similar reasoning, if on the TX side the packet were not in cache (say we did a splice from a file whose data was never touched by the CPU) the CPU savings for TX would probably be more pronounced. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:34:47 -05:00
Tom Herbert	a8d31c128b	gue: Receive side of remote checksum offload Add processing of the remote checksum offload option in both the normal path as well as the GRO path. The implements patching the affected checksum to derive the offloaded checksum. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:04 -05:00
Tom Herbert	b17f709a24	gue: TX support for using remote checksum offload option Add if_tunnel flag TUNNEL_ENCAP_FLAG_REMCSUM to configure remote checksum offload on an IP tunnel. Add logic in gue_build_header to insert remote checksum offload option. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	c1aa8347e7	gue: Protocol constants for remote checksum offload Define a private flag for remote checksun offload as well as a length for the option. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	e585f23636	udp: Changes to udp_offload to support remote checksum offload Add a new GSO type, SKB_GSO_TUNNEL_REMCSUM, which indicates remote checksum offload being done (in this case inner checksum must not be offloaded to the NIC). Added logic in __skb_udp_tunnel_segment to handle remote checksum offload case. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	5024c33ac3	gue: Add infrastructure for flags and options Add functions and basic definitions for processing standard flags, private flags, and control messages. This includes definitions to compute length of optional fields corresponding to a set of flags. Flag validation is in validate_gue_flags function. This checks for unknown flags, and that length of optional fields is <= length in guehdr hlen. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	4bcb877d25	udp: Offload outer UDP tunnel csum if available In __skb_udp_tunnel_segment if outer UDP checksums are enabled and ip_summed is not already CHECKSUM_PARTIAL, set up checksum offload if device features allow it. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:03 -05:00
Tom Herbert	63487babf0	net: Move fou_build_header into fou.c and refactor Move fou_build_header out of ip_tunnel.c and into fou.c splitting it up into fou_build_header, gue_build_header, and fou_build_udp. This allows for other users for TX of FOU or GUE. Change ip_tunnel_encap to call fou_build_header or gue_build_header based on the tunnel encapsulation type. Similarly, added fou_encap_hlen and gue_encap_hlen functions which are called by ip_encap_hlen. New net/fou.h has prototypes and defines for this. Added NET_FOU_IP_TUNNELS configuration. When this is set, IP tunnels can use FOU/GUE and fou module is also selected. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:30:02 -05:00
David S. Miller	46d3802627	Merge branch 'stmmac-net' Giuseppe Cavallaro says: ==================== stmmac: review and fix lock and atomicity Recently some issues have been reported for the driver for locking mechanism and atomicity. In fact, enabling DEBUG support to prove lock and to verify if sleeping while atomic context some warnings occur at runtime. I have reproduced all on STi platforms. Concerning the tx path, I had provided a patch time ago but I discarded the idea to completely remove locks; in this patch-set we can have some useful fixes instead of. This patch-set is to fix the atomicity in the PM stuff where I tried to collect all the points and advice reported in the past weeks. As final result, on my side no warnings and no problem when suspend/resume the driver on STi boxes. I also added a patch that fixes the locks for the EEE. As pointed in some thread there was a design problem behind the eee initialization and I have tried to fix that before. As final result no issues when proving locks too. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:23:09 -05:00
Giuseppe CAVALLARO	777da230c5	stmmac: fix atomicity in pm routines This patch is to fix the atomicity when suspend and resume the driver. The clk api have been changed (as reported by Hao Liang) and the skb allocation is done out of the hw setup function and taking care about the GFP flags. Reported-by: Hao Liang <hliang1025@gmail.com> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Hao Liang <hliang1025@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:22:57 -05:00
Giuseppe CAVALLARO	4741cf9cec	stmmac: fix concurrency in eee initialization. This patch aims to fix the concurrency in eee initialization inside the stmmac driver and related warnings when enable DEBUG_ATOMIC_SLEEP. Prior this patch, the stmmac_eee_init could be called in several places as shown below: stmmac_open stmmac_resume PHY Layer \| \| \| stmmac_hw_setup stmmac_adjust_link \| \| stmmac ethtool \|__________________________\|______________\| \| stmmac_eee_init The patch removes the stmmac_eee_init call inside the stmmac_hw_setup that is unnecessary. It is sufficient to call it in the adjust_link to always guarantee that EEE is always configured at mac level too. Fixing the lock protection now it is covered another case (not considered before). The stmmac_eee_init could be called by the ethtool so critical sections must be protected inside this function too. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:22:57 -05:00
Giuseppe CAVALLARO	b9d73704aa	stmmac: fix lock in stmmac_set_rx_mode When compile with CONFIG_PROVE_LOCKING the following warnings happen: [snip] HARDIRQ-ON-W at: [<c0480c1c>] _raw_spin_lock+0x3c/0x4c [<c02c2828>] stmmac_set_rx_mode+0x18/0x3c [<c038b2cc>] dev_set_rx_mode+0x1c/0x28 [<c038b38c>] __dev_open+0xb4/0xf8 [<c038b5a8>] __dev_change_flags+0x94/0x128 [<c038b6a8>] dev_change_flags+0x10/0x48 [<c062afe0>] ip_auto_config+0x1d4/0x1084 [<c000873c>] do_one_initcall+0x108/0x15c [<c060ec50>] kernel_init_freeable+0x1a8/0x248 [<c0472cc0>] kernel_init+0x8/0x160 [<c000dfc8>] ret_from_fork+0x14/0x2c INITIAL USE at: [<c0480c1c>] _raw_spin_lock+0x3c/0x4c [<c02c2828>] stmmac_set_rx_mode+0x18/0x3c [<c038b2cc>] dev_set_rx_mode+0x1c/0x28 [<c038b38c>] __dev_open+0xb4/0xf8 [<c038b5a8>] __dev_change_flags+0x94/0x128 [<c038b6a8>] dev_change_flags+0x10/0x48 [<c062afe0>] ip_auto_config+0x1d4/0x1084 [<c000873c>] do_one_initcall+0x108/0x15c [<c060ec50>] kernel_init_freeable+0x1a8/0x248 [<c0472cc0>] kernel_init+0x8/0x160 [<c000dfc8>] ret_from_fork+0x14/0x2c so the patch just removes the lock protection in the stmmac_set_rx_mode Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Emilio Lopez <emilio@elopez.com.ar> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:22:56 -05:00
Fabrice Gasnier	758a0ab59b	stmmac: release tx lock, in case of dma mapping error. Add missing spin_unlock when tx frames gets dropped. Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:22:56 -05:00
Fabrice Gasnier	16ee817e43	stmmac: fix stmmac_tx_avail should be called with TX locked stmmac_tx_avail() may lie if used unprotected. It's using cur_tx and dirty_tx index. These index may be already in use by tx_clean when entering xmit routine. So, this should be called locked. This can cause transmit queue to be stuck, with following message: NETDEV WATCHDOG: eth0 (stmmaceth): transmit queue 0 timed out Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:22:56 -05:00
David S. Miller	890b7916d0	Merge branch 'stmmac-next' Giuseppe Cavallaro says: ==================== stmmac: review driver Koptions Recently many Koption options have been added to have new glue logic on several platforms. The main goal behind this work is to guarantee that the driver built fine on all the branches where it is present independently of which glue logic is selected. IMHO, it is better to remove all the not necessary Koption(s) that can hide build problems when something changes in the driver and especially when the DT compatibility allows us to manage all the platform data. I compiled the driver w/o any issue on net-next Git for: x86, arm and sh4. In case of there are build problems on some repos now it will be easy to catch them and cherry-pick patches from mainstream. For sure, do not hesitate to contact me in case of issue. Also this set removes STMMAC_DEBUG_FS and BUS_MODE_DA. The latter is useless and the former can be replaced by DEBUG_FS (always to make safe the build). V2: patch-set re-based on top of the latest updates for net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-05 16:14:54 -05:00

... 2 3 4 5 6 ...

481959 Commits