linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
françois romieu	222e4d0b13	pch_gbe: replace private tx ring lock with common netif_tx_lock pch_gbe_tx_ring.tx_lock is only used in the hard_xmit handler and in the transmit completion reaper called from NAPI context. Compile-tested only. Potential victims Cced. Someone more knowledgeable may check if pch_gbe_tx_queue could have some use for a mmiowb. Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Andy Cress <andy.cress@us.kontron.com> Cc: bryan@fossetcon.org Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 17:19:58 -04:00
Florian Fainelli	badf3ada60	net: dsa: Provide CPU port statistics to master netdev This patch overloads the DSA master netdev, aka CPU Ethernet MAC to also include switch-side statistics, which is useful for debugging purposes, when the switch is not properly connected to the Ethernet MAC (duplex mismatch, (RG)MII electrical issues etc.). We accomplish this by retaining the original copy of the master netdev's ethtool_ops, and just overload the 3 operations we care about: get_sset_count, get_strings and get_ethtool_stats so as to intercept these calls and call into the original master_netdev ethtool_ops, plus our own. We take this approach as opposed to providing a set of DSA helper functions that would retrive the CPU port's statistics, because the entire purpose of DSA is to allow unmodified Ethernet MAC drivers to be used as CPU conduit interfaces, therefore, statistics overlay in such drivers would simply not scale. The new ethtool -S <iface> output would therefore look like this now: <iface> statistics p<2 digits cpu port number>_<switch MIB counter names> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 17:16:17 -04:00
Eric Dumazet	0cef6a4c34	tcp: give prequeue mode some care TCP prequeue goal is to defer processing of incoming packets to user space thread currently blocked in a recvmsg() system call. Intent is to spend less time processing these packets on behalf of softirq handler, as softirq handler is unfair to normal process scheduler decisions, as it might interrupt threads that do not even use networking. Current prequeue implementation has following issues : 1) It only checks size of the prequeue against sk_rcvbuf It was fine 15 years ago when sk_rcvbuf was in the 64KB vicinity. But we now have ~8MB values to cope with modern networking needs. We have to add sk_rmem_alloc in the equation, since out of order packets can definitely use up to sk_rcvbuf memory themselves. 2) Even with a fixed memory truesize check, prequeue can be filled by thousands of packets. When prequeue needs to be flushed, either from sofirq context (in tcp_prequeue() or timer code), or process context (in tcp_prequeue_process()), this adds a latency spike which is often not desirable. I added a fixed limit of 32 packets, as this translated to a max flush time of 60 us on my test hosts. Also note that all packets in prequeue are not accounted for tcp_mem, since they are not charged against sk_forward_alloc at this point. This is probably not a big deal. Note that this might increase LINUX_MIB_TCPPREQUEUEDROPPED counts, which is misnamed, as packets are not dropped at all, but rather pushed to the stack (where they can be either consumed or dropped) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 17:14:35 -04:00
Michal Kazior	b43e7199a9	fq: split out backlog update logic mac80211 (which will be the first user of the fq.h) recently started to support software A-MSDU aggregation. It glues skbuffs together into a single one so the backlog accounting needs to be more fine-grained. To avoid backlog sorting logic duplication split it up for re-use. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 17:03:38 -04:00
Dan Carpenter	b43586576e	tipc: remove an unnecessary NULL check This is never called with a NULL "buf" and anyway, we dereference 's' on the lines before so it would Oops before we reach the check. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:54:12 -04:00
Arnd Bergmann	6b87663fbe	net/mlx5e: avoid stack overflow in mlx5e_open_channels struct mlx5e_channel_param is a large structure that is allocated on the stack of mlx5e_open_channels, and with a recent change it has grown beyond the warning size for the maximum stack that a single function should use: mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels': mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] The function is already using dynamic allocation and is not in a fast path, so the easiest workaround is to use another kzalloc for allocating the channel parameters. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `d3c9bc2743` ("net/mlx5e: Added ICO SQs") Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:46:59 -04:00
Jason Wang	3df97ba830	tuntap: calculate rps hash only when needed There's no need to calculate rps hash if it was not enabled. So this patch export rps_needed and check it before trying to get rps hash. Tests (using pktgen to inject packets to guest) shows this can improve pps about 13% (when rps is disabled). Before: ~1150000 pps After: ~1300000 pps Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> ---- Changes from V1: - Fix build when CONFIG_RPS is not set Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:38:54 -04:00
David S. Miller	f345c9a572	Merge branch 'tcp-eor' Martin KaFai Lau says: ==================== tcp: Make use of MSG_EOR in tcp_sendmsg v4: ~ Do not set eor bit in do_tcp_sendpages() since there is no way to pass MSG_EOR from the userland now. ~ Avoid rmw by testing MSG_EOR first in tcp_sendmsg(). ~ Move TCP_SKB_CB(skb)->eor test to a new helper tcp_skb_can_collapse_to() (suggested by Soheil). ~ Add some packetdrill tests. v3: ~ Separate EOR marking from the SKBTX_ANY_TSTAMP logic. ~ Move the eor bit test back to the loop in tcp_sendmsg and tcp_sendpage because there could be >1 threads doing sendmsg. ~ Thanks to Eric Dumazet's suggestions on v2. ~ The TCP timestamp bug fixes are separated into other threads. v2: ~ Rework based on the recent work "add TX timestamping via cmsg" by Soheil Hassas Yeganeh <soheil.kdev@gmail.com> ~ This version takes the MSG_EOR bit as a signal of end-of-response-message and leave the selective timestamping job to the cmsg ~ Changes based on the v1 feedback (like avoid unlikely check in a loop and adding tcp_sendpage support) ~ The first 3 patches are bug fixes. The fixes in this series depend on the newly introduced txstamp_ack in net-next. I will make relevant patches against net after getting some feedback. ~ The test results are based on the recently posted net fix: "tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks" One potential use case is to use MSG_EOR with SOF_TIMESTAMPING_TX_ACK to get a more accurate TCP ack timestamping on application protocol with multiple outgoing response messages (e.g. HTTP2). One of our use case is at the webserver. The webserver tracks the HTTP2 response latency by measuring when the webserver sends the first byte to the socket till the TCP ACK of the last byte is received. In the cases where we don't have client side measurement, measuring from the server side is the only option. In the cases we have the client side measurement, the server side data can also be used to justify/cross-check-with the client side data. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:14:20 -04:00
Martin KaFai Lau	a166140e81	tcp: Handle eor bit when fragmenting a skb When fragmenting a skb, the next_skb should carry the eor from prev_skb. The eor of prev_skb should also be reset. Packetdrill script for testing: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 sendto(4, ..., 15330, MSG_EOR, ..., ...) = 15330 0.200 sendto(4, ..., 730, 0, ..., ...) = 730 0.200 > . 1:7301(7300) ack 1 0.200 > . 7301:14601(7300) ack 1 0.300 < . 1:1(0) ack 14601 win 257 0.300 > P. 14601:15331(730) ack 1 0.300 > P. 15331:16061(730) ack 1 0.400 < . 1:1(0) ack 16061 win 257 0.400 close(4) = 0 0.400 > F. 16061:16061(0) ack 1 0.400 < F. 1:1(0) ack 16062 win 257 0.400 > . 16062:16062(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:14:19 -04:00
Martin KaFai Lau	a643b5d41c	tcp: Handle eor bit when coalescing skb This patch: 1. Prevent next_skb from coalescing to the prev_skb if TCP_SKB_CB(prev_skb)->eor is set 2. Update the TCP_SKB_CB(prev_skb)->eor if coalescing is allowed Packetdrill script for testing: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 write(4, ..., 11680) = 11680 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:13141,nop,nop> 0.300 > P. 1:731(730) ack 1 0.300 > P. 731:1461(730) ack 1 0.400 < . 1:1(0) ack 13141 win 257 0.400 close(4) = 0 0.400 > F. 13141:13141(0) ack 1 0.500 < F. 1:1(0) ack 13142 win 257 0.500 > . 13142:13142(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:14:19 -04:00
Martin KaFai Lau	c134ecb878	tcp: Make use of MSG_EOR in tcp_sendmsg This patch adds an eor bit to the TCP_SKB_CB. When MSG_EOR is passed to tcp_sendmsg, the eor bit will be set at the skb containing the last byte of the userland's msg. The eor bit will prevent data from appending to that skb in the future. The change in do_tcp_sendpages is to honor the eor set during the previous tcp_sendmsg(MSG_EOR) call. This patch handles the tcp_sendmsg case. The followup patches will handle other skb coalescing and fragment cases. One potential use case is to use MSG_EOR with SOF_TIMESTAMPING_TX_ACK to get a more accurate TCP ack timestamping on application protocol with multiple outgoing response messages (e.g. HTTP2). Packetdrill script for testing: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 14600) = 14600 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 > . 1:7301(7300) ack 1 0.200 > P. 7301:14601(7300) ack 1 0.300 < . 1:1(0) ack 14601 win 257 0.300 > P. 14601:15331(730) ack 1 0.300 > P. 15331:16061(730) ack 1 0.400 < . 1:1(0) ack 16061 win 257 0.400 close(4) = 0 0.400 > F. 16061:16061(0) ack 1 0.400 < F. 1:1(0) ack 16062 win 257 0.400 > . 16062:16062(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:14:18 -04:00
David S. Miller	2a9e8438a2	Merge branch 'tcp-redundant-checks' Soheil Hassas Yeganeh says: ==================== tcp: simplify ack tx timestamps v2: - Fully remove SKBTX_ACK_TSTAMP, as suggested by Willem de Bruijn. This patch series aims at removing redundant checks and fields for ack timestamps for TCP. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:06:11 -04:00
Soheil Hassas Yeganeh	0a2cf20c3f	tcp: remove SKBTX_ACK_TSTAMP since it is redundant The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when the timestamp of the TCP acknowledgement should be reported on error queue. Since accessing skb_shinfo is likely to incur a cache-line miss at the time of receiving the ack, the txstamp_ack bit was added in tcp_skb_cb, which is set iff the SKBTX_ACK_TSTAMP flag is set for an skb. This makes SKBTX_ACK_TSTAMP flag redundant. Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit everywhere. Note that this frees one bit in shinfo->tx_flags. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Suggested-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:06:10 -04:00
Soheil Hassas Yeganeh	863c1fd981	tcp: remove an unnecessary check in tcp_tx_timestamp Remove the redundant check for sk->sk_tsflags in tcp_tx_timestamp. tcp_tx_timestamp() receives the tsflags as a parameter. As a result the "sk->sk_tsflags \|\| tsflags" is redundant, since tsflags already includes sk->sk_tsflags plus overrides from control messages. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:06:10 -04:00
Eric Dumazet	ba7863f4d3	net: snmp: fix 64bit stats on 32bit arches I accidentally replaced BH disabling by preemption disabling in SNMP_ADD_STATS64() and SNMP_UPD_PO_STATS64() on 32bit builds. For 64bit stats on 32bit arch, we really need to disable BH, since the "struct u64_stats_sync syncp" might be manipulated both from process and BH contexts. Fixes: `6aef70a851` ("net: snmp: kill various STATS_USER() helpers") Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 11:49:45 -04:00
David S. Miller	8be2748a40	Merge branch 'socket-space-optimizations' Eric Dumazet says: ==================== net: avoid some atomic ops when FASYNC is not used We can avoid some atomic operations on sockets not using FASYNC ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 23:08:41 -04:00
Eric Dumazet	4be735225f	net: SOCKWQ_ASYNC_WAITDATA optimizations SOCKWQ_ASYNC_WAITDATA is set/cleared in sk_wait_data() and equivalent functions, so that sock_wake_async() can send a SIGIO only when necessary. Since these atomic operations are really not needed unless socket expressed interest in FASYNC, we can omit them in most cases. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 23:08:40 -04:00
Eric Dumazet	9317bb6982	net: SOCKWQ_ASYNC_NOSPACE optimizations SOCKWQ_ASYNC_NOSPACE is tested in sock_wake_async() so that a SIGIO signal is sent when needed. tcp_sendmsg() clears the bit. tcp_poll() sets the bit when stream is not writeable. We can avoid two atomic operations by first checking if socket is actually interested in the FASYNC business (most sockets in real applications do not use AIO, but select()/poll()/epoll()) This also removes one cache line miss to access sk->sk_wq->flags in tcp_sendmsg() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 23:08:40 -04:00
David S. Miller	210732d16d	Merge branch 'snmp-stats-update' Eric Dumazet says: ==================== net: snmp: update SNMP methods In the old days (before linux-3.0), SNMP counters were duplicated, one set for user context, and anther one for BH context. After commit `8f0ea0fe3a` ("snmp: reduce percpu needs by 50%") we have a single copy, and what really matters is preemption being enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc() respectively. This patch series kills the obsolete STATS_USER() helpers, and rename all XXX_BH() helpers to __XXX() ones, to more closely match conventions used to update per cpu variables. This is probably going to hurt maintainers job for a while, since cherry-picks will not be clean, but this had to be cleaned at one point. I am so sorry guys. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:25 -04:00
Eric Dumazet	13415e46c5	net: snmp: kill STATS_BH macros There is nothing related to BH in SNMP counters anymore, since linux-3.0. Rename helpers to use __ prefix instead of _BH prefix, for contexts where preemption is disabled. This more closely matches convention used to update percpu variables. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:25 -04:00
Eric Dumazet	f3832ed2c2	ipv6: kill ICMP6MSGIN_INC_STATS_BH() IPv6 ICMP stats are atomics anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:25 -04:00
Eric Dumazet	c2005eb010	ipv6: rename IP6_UPD_PO_STATS_BH() Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:25 -04:00
Eric Dumazet	1d01550359	ipv6: rename IP6_INC_STATS_BH() Rename IP6_INC_STATS_BH() to __IP6_INC_STATS() and IP6_ADD_STATS_BH() to __IP6_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	02a1d6e7a6	net: rename NET_{ADD\|INC}_STATS_BH() Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	b15084ec7d	net: rename IP_UPD_PO_STATS_BH() Rename IP_UPD_PO_STATS_BH() to __IP_UPD_PO_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	98f619957e	net: rename IP_ADD_STATS_BH() Rename IP_ADD_STATS_BH() to __IP_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	a16292a0f0	net: rename ICMP6_INC_STATS_BH() Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	b45386efa2	net: rename IP_INC_STATS_BH() Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to better express this is used in non preemptible context. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	08e3baef65	net: sctp: rename SCTP_INC_STATS_BH() Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	214d3f1f87	net: icmp: rename ICMPMSGIN_INC_STATS_BH() Remove misleading _BH suffix. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	90bbcc6083	net: tcp: rename TCP_INC_STATS_BH Rename TCP_INC_STATS_BH() to __TCP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	b540f9d702	net: xfrm: kill XFRM_INC_STATS_BH() Not used anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	02c223470c	net: udp: rename UDP_INC_STATS_BH() Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(), and UDP6_INC_STATS_BH() to __UDP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
Eric Dumazet	5d3848bc33	net: rename ICMP_INC_STATS_BH() Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:22 -04:00
Eric Dumazet	aa62d76b6e	dccp: rename DCCP_INC_STATS_BH() Rename DCCP_INC_STATS_BH() to __DCCP_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:22 -04:00
Eric Dumazet	6aef70a851	net: snmp: kill various STATS_USER() helpers In the old days (before linux-3.0), SNMP counters were duplicated, one for user context, and one for BH context. After commit `8f0ea0fe3a` ("snmp: reduce percpu needs by 50%") we have a single copy, and what really matters is preemption being enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc() respectively. We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(), NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(), SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(), UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER() Following patches will rename __BH helpers to make clear their usage is not tied to BH being disabled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:22 -04:00
David S. Miller	2995aea5b6	Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 40GbE Intel Wired LAN Driver Updates 2016-04-27 This series contains updates to i40e and i40evf. Alex Duyck cleans up the feature flags since they are becoming pretty "massive", the primary change being that we now build our features list around hw_encap_features. Added support for IPIP and SIT offloads, which should improvement in throughput for IPIP and SIT tunnels with the offload enabled. Mitch adds support for configuring RSS on behalf of the VFs, which removes the burden of dealing with different hardware interfaces from the VF drivers and improves future compatibility. Fix to ensure that we do not panic by checking that the vsi_res pointer is valid before dereferencing it, after which we can drink beer and eat peanuts. Shannon does come housekeeping in i40e_add_fdir_ethtool() in preparation for more cloud filter work. Added flexibility to the nvmupdate facility by adding the ability to specify an AQ event opcode to wait on after Exec_AQ request. Michal adds device capability which defines if an update is available and if a security check is needed during the update process. Kamil just adds a device id to support X722 QSFP+ device. Greg fixes an issue where a mirror rule ID may be zero, so do not return invalid parameter when the user passes in a zero for a rule ID. Adds support to steer packets to VSIs by VLAN tag alone while being in promiscuous mode for multicast and unicast MAC addresses. Jesse fixes the driver from offloading the VLAN tag into the skb any time there was a VLAN tag and the hardware stripping was enabled, to making sure it is enabled before put_tag. v2: Dropped patch 8 ("i40e: Allow user to change input set mask for flow director") while Kiran reworks a more generalized solution based on feedback from David Miller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 21:59:08 -04:00
Eric Dumazet	501e7ef569	net-rfs: fix false sharing accessing sd->input_queue_head sd->input_queue_head is incremented for each processed packet in process_backlog(), and read from other cpus performing Out Of Order avoidance in get_rps_cpu() Moving this field in a separate cache line keeps it mostly hot for the cpu in process_backlog(), as other cpus will only read it. In a stress test, process_backlog() was consuming 6.80 % of cpu cycles, and the patch reduced the cost to 0.65 % Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 21:55:45 -04:00
Akinobu Mita	35ef7d689d	net: w5100: support W5500 This adds support for W5500 chip. W5500 has similar register and memory organization with W5100 and W5200. There are a few important differences listed below but it is still possible to share common code with W5100 and W5200. * W5500 register and memory are organized by multiple blocks. Each one is selected by 16bits offset address and 5bits block select bits. But the existing register access operations take u16 address. This change extends the addess by u32 address and put offset address to lower 16bits and block select bits to upper 16bits. This change also adds the offset addresses for socket register and TX/RX memory blocks to the driver private data structure in order to reduce conditional switches for each chip. * W5500 has the different register offset for socket interrupt mask register. Newly added internal functions w5100_enable_intr() and w5100_disable_intr() take care of the diffrence. * W5500 has the different register offset for retry time-value register. But this register is only used to verify that the reset value is correctly read at initialization. So move the verification to w5100_hw_reset() which already does different things for different chips. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 21:55:45 -04:00
Anjali Singhai Jain	47d3483988	i40evf: Add driver support for promiscuous mode Add necessary Linux Ethernet driver support for promiscuous mode operation. Add a flag so the VF knows it is in promiscuous mode and two state flags to discreetly track multicast and unicast promiscuous states. Change-Id: Ib2f2dc7a7582304fec90fc917ebb7ded21ba1de4 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:06:01 -07:00
Anjali Singhai Jain	5676a8b9cd	i40e: Add VF promiscuous mode driver support Add infrastructure for Network Function Virtualization VLAN tagged packet steering feature. Change-Id: I9b873d8fcc253858e6baba65ac68ec5b9363944e Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:05:57 -07:00
Greg Rose	6c41a76069	i40e: Add promiscuous on VLAN support NFV use cases require the ability to steer packets to VSIs by VLAN tag alone while being in promiscuous mode for multicast and unicast MAC addresses. These two new functions support that ability. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:05:53 -07:00
Jesse Brandeburg	a149f2c323	i40e/i40evf: Only offload VLAN tag if enabled The driver was offloading the VLAN tag into the skb any time there was a VLAN tag and the hardware stripping was enabled. Just check to make sure it's enabled before put_tag. Change-Id: Ife95290c06edd9a616393b38679923938b382241 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:05:47 -07:00
Greg Rose	db0772782f	i40e: Remove zero check A mirror rule ID may be zero so do not return invalid parameter when the user passes in a zero value for a rule ID. Change-ID: I261b8c24725ce2c6ed32f859da81093dfcbe2970 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:05:41 -07:00
Kamil Krawczyk	bccf474435	i40e: Add DeviceID for X722 QSFP+ Change-ID: I1370fbc7774e815ac1ad56561e97488e829592fc Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:05:29 -07:00
Michal Kosiarz	68a1c5a777	i40e: Add device capability which defines if update is available Add device capability which defines if update is available and security check is needed during update process. Change-ID: I380787c878275e1df18b39198df3ee3666342282 Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2016-04-27 13:01:48 -07:00
David S. Miller	c0cc53162a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 15:43:10 -04:00
David Ahern	8c14586fc3	net: ipv6: Use passed in table for nexthop lookups Similar to `3bfd847203` ("net: Use passed in table for nexthop lookups") for IPv4, if the route spec contains a table id use that to lookup the next hop first and fall back to a full lookup if it fails (per the fix `4c9bcd1179` ("net: Fix nexthop lookups")). Example: root@kenny:~# ip -6 ro ls table red local 2100:1::1 dev lo proto none metric 0 pref medium 2100:1::/120 dev eth1 proto kernel metric 256 pref medium local 2100:2::1 dev lo proto none metric 0 pref medium 2100:2::/120 dev eth2 proto kernel metric 256 pref medium local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium fe80::/64 dev eth1 proto kernel metric 256 pref medium fe80::/64 dev eth2 proto kernel metric 256 pref medium ff00::/8 dev red metric 256 pref medium ff00::/8 dev eth1 metric 256 pref medium ff00::/8 dev eth2 metric 256 pref medium unreachable default dev lo metric 240 error -113 pref medium root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64 RTNETLINK answers: No route to host Route add fails even though 2100:1::64 is a reachable next hop: root@kenny:~# ping6 -I red 2100:1::64 ping6: Warning: source address might be selected on device other than red. PING 2100:1::64(2100:1::64) from 2100:1::1 red: 56 data bytes 64 bytes from 2100:1::64: icmp_seq=1 ttl=64 time=1.33 ms With this patch: root@kenny:~# ip -6 ro add table red 2100:3::/64 via 2100:1::64 root@kenny:~# ip -6 ro ls table red local 2100:1::1 dev lo proto none metric 0 pref medium 2100:1::/120 dev eth1 proto kernel metric 256 pref medium local 2100:2::1 dev lo proto none metric 0 pref medium 2100:2::/120 dev eth2 proto kernel metric 256 pref medium 2100:3::/64 via 2100:1::64 dev eth1 metric 1024 pref medium local fe80::e0:f9ff:fe09:3cac dev lo proto none metric 0 pref medium local fe80::e0:f9ff:fe1c:b974 dev lo proto none metric 0 pref medium fe80::/64 dev eth1 proto kernel metric 256 pref medium fe80::/64 dev eth2 proto kernel metric 256 pref medium ff00::/8 dev red metric 256 pref medium ff00::/8 dev eth1 metric 256 pref medium ff00::/8 dev eth2 metric 256 pref medium unreachable default dev lo metric 240 error -113 pref medium Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 15:34:42 -04:00
Nicolas Dichtel	570d8e9398	taskstats: fix nl parsing in accounting/getdelays.c The type TASKSTATS_TYPE_NULL should always be ignored. When jumping to the next attribute, only the length of the current attribute should be added, not the length of all nested attributes. This last bug was not visible before commit `80df554275`, because the kernel didn't put more than two nested attributes. Fixes: `a3baf649ca` ("[PATCH] per-task-delay-accounting: documentation") Fixes: `80df554275` ("taskstats: use the libnl API to align nlattr on 64-bit") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 12:50:14 -04:00
Linus Torvalds	f28f20da70	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig Gallak. 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks, missing locking around netdev list traversal, etc.) from Sabrina Dubroca. 3) Fix handling of host routes on ifdown in ipv6, from David Ahern. 4) Fix double-fdput in bpf verifier. From Jann Horn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits) bpf: fix double-fdput in replace_map_fd_with_map_ptr() net: ipv6: Delete host routes on an ifdown Revert "ipv6: Revert optional address flusing on ifdown." net/mlx4_en: fix spurious timestamping callbacks net: dummy: remove note about being Y by default cxgbi: fix uninitialized flowi6 ipv6: Revert optional address flusing on ifdown. ipv4/fib: don't warn when primary address is missing if in_dev is dead net/mlx5: Add pci shutdown callback net/mlx5_core: Remove static from local variable net/mlx5e: Use vport MTU rather than physical port MTU net/mlx5e: Fix minimum MTU net/mlx5e: Device's mtu field is u16 and not int net/mlx5_core: Add ConnectX-5 to list of supported devices net/mlx5e: Fix MLX5E_100BASE_T define net/mlx5_core: Fix soft lockup in steering error flow qlcnic: Update version to 5.3.64 net: stmmac: socfpga: Remove re-registration of reset controller macsec: fix netlink attribute validation macsec: add missing macsec prefix in uapi ...	2016-04-26 16:25:51 -07:00

1 2 3 4 5 ...

590600 Commits