linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-13 09:06:43 +07:00

Author	SHA1	Message	Date
Eric Dumazet	5e91f6ce4c	sock: relax WARN_ON() in sock_owned_by_user() Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user() This test needs to be relaxed if/when lockdep disables itself. Note that other lockdep_sock_is_held() callers are all from rcu_dereference_protected() sections which already are disabled if/when lockdep has been disabled. Fixes: `fafc4e1ea1` ("sock: tigthen lockdep checks for sock_owned_by_user") Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-25 11:49:53 -04:00
Haiyang Zhang	15cfd40771	hv_netvsc: Fix the list processing for network change event RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" -- media disconnect & connect. The second half should be added to the list head, not to the tail. So all events are processed in normal order. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 23:27:58 -04:00
Eric Dumazet	10d3be5692	tcp-tso: do not split TSO packets at retransmit time Linux TCP stack painfully segments all TSO/GSO packets before retransmits. This was fine back in the days when TSO/GSO were emerging, with their bugs, but we believe the dark age is over. Keeping big packets in write queues, but also in stack traversal has a lot of benefits. - Less memory overhead, because write queues have less skbs - Less cpu overhead at ACK processing. - Better SACK processing, as lot of studies mentioned how awful linux was at this ;) - Less cpu overhead to send the rtx packets (IP stack traversal, netfilter traversal, drivers...) - Better latencies in presence of losses. - Smaller spikes in fq like packet schedulers, as retransmits are not constrained by TCP Small Queues. 1 % packet losses are common today, and at 100Gbit speeds, this translates to ~80,000 losses per second. Losses are often correlated, and we see many retransmit events leading to 1-MSS train of packets, at the time hosts are already under stress. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:43:59 -04:00
Parthasarathy Bhuvaragan	8cee83dd29	tipc: fix stale links after re-enabling bearer Commit `42b18f605f` ("tipc: refactor function tipc_link_timeout()"), introduced a bug which prevents sending of probe messages during link synchronization phase. This leads to hanging links, if the bearer is disabled/enabled after links are up. In this commit, we send the probe messages correctly. Fixes: `42b18f605f` ("tipc: refactor function tipc_link_timeout()") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:35:07 -04:00
David S. Miller	6a74c1965a	Merge branch 'tcp-tcstamp_ack-frag-coalesce' Martin KaFai Lau says: ==================== tcp: Handle txstamp_ack when fragmenting/coalescing skbs This patchset is to handle the txstamp-ack bit when fragmenting/coalescing skbs. The second patch depends on the recently posted series for the net branch: "tcp: Merge timestamp info when coalescing skbs" A BPF prog is used to kprobe to sock_queue_err_skb() and print out the value of serr->ee.ee_data. The BPF prog (run-able from bcc) is attached here: BPF prog used for testing: ~~~~~ from __future__ import print_function from bcc import BPF bpf_text = """ int trace_err_skb(struct pt_regs ctx) { struct sk_buff skb = (struct sk_buff )ctx->si; struct sock sk = (struct sock )ctx->di; struct sock_exterr_skb serr; u32 ee_data = 0; if (!sk \|\| !skb) return 0; serr = SKB_EXT_ERR(skb); bpf_probe_read(&ee_data, sizeof(ee_data), &serr->ee.ee_data); bpf_trace_printk("ee_data:%u\\n", ee_data); return 0; }; """ b = BPF(text=bpf_text) b.attach_kprobe(event="sock_queue_err_skb", fn_name="trace_err_skb") print("Attached to kprobe") b.trace_print() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:06:56 -04:00
Martin KaFai Lau	2de8023e7b	tcp: Merge txstamp_ack in tcp_skb_collapse_tstamp When collapsing skbs, txstamp_ack also needs to be merged. Retrans Collapse Test: ~~~~~~ 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 730) = 730 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 write(4, ..., 11680) = 11680 0.200 > P. 1:731(730) ack 1 0.200 > P. 731:1461(730) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:13141(4380) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:2921,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:4381,nop,nop> 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:5841,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 13141 win 257 BPF Output Before: ~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~ <...>-2027 [007] d.s. 79.765921: : ee_data:1459 Sacks Collapse Test: ~~~~~ 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 BPF Output Before: ~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~ <...>-2049 [007] d.s. 89.185538: : ee_data:14599 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:06:43 -04:00
Martin KaFai Lau	b51e13faf7	tcp: Carry txstamp_ack in tcp_fragment_tstamp When a tcp skb is sliced into two smaller skbs (e.g. in tcp_fragment() and tso_fragment()), it does not carry the txstamp_ack bit to the newly created skb if it is needed. The end result is a timestamping event (SCM_TSTAMP_ACK) will be missing from the sk->sk_error_queue. This patch carries this bit to the new skb2 in tcp_fragment_tstamp(). BPF Output Before: ~~~~~~ <No output due to missing SCM_TSTAMP_ACK timestamp> BPF Output After: ~~~~~~ <...>-2050 [000] d.s. 100.928763: : ee_data:14599 Packetdrill Script: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 14600) = 14600 +0 setsockopt(4, SOL_SOCKET, 37, [2176], 4) = 0 0.200 > . 1:7301(7300) ack 1 0.200 > P. 7301:14601(7300) ack 1 0.300 < . 1:1(0) ack 14601 win 257 0.300 close(4) = 0 0.300 > F. 14601:14601(0) ack 1 0.400 < F. 1:1(0) ack 16062 win 257 0.400 > . 14602:14602(0) ack 2 Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Tested-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:06:43 -04:00
David S. Miller	11afbff861	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, mostly from Florian Westphal to sort out the lack of sufficient validation in x_tables and connlabel preparation patches to add nf_tables support. They are: 1) Ensure we don't go over the ruleset blob boundaries in mark_source_chains(). 2) Validate that target jumps land on an existing xt_entry. This extra sanitization comes with a performance penalty when loading the ruleset. 3) Introduce xt_check_entry_offsets() and use it from {arp,ip,ip6}tables. 4) Get rid of the smallish check_entry() functions in {arp,ip,ip6}tables. 5) Make sure the minimal possible target size in x_tables. 6) Similar to #3, add xt_compat_check_entry_offsets() for compat code. 7) Check that standard target size is valid. 8) More sanitization to ensure that the target_offset field is correct. 9) Add xt_check_entry_match() to validate that matches are well-formed. 10-12) Three patch to reduce the number of parameters in translate_compat_table() for {arp,ip,ip6}tables by using a container structure. 13) No need to return value from xt_compat_match_from_user(), so make it void. 14) Consolidate translate_table() so it can be used by compat code too. 15) Remove obsolete check for compat code, so we keep consistent with what was already removed in the native layout code (back in 2007). 16) Get rid of target jump validation from mark_source_chains(), obsoleted by #2. 17) Introduce xt_copy_counters_from_user() to consolidate counter copying, and use it from {arp,ip,ip6}tables. 18,22) Get rid of unnecessary explicit inlining in ctnetlink for dump functions. 19) Move nf_connlabel_match() to xt_connlabel. 20) Skip event notification if connlabel did not change. 21) Update of nf_connlabels_get() to make the upcoming nft connlabel support easier. 23) Remove spinlock to read protocol state field in conntrack. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 00:12:08 -04:00
David S. Miller	8d9ea1606f	Merge branch 'nla_align-more' Nicolas Dichtel says: ==================== netlink: align attributes when needed (patchset #1) This is the continuation of the work done to align netlink attributes when these attributes contain some 64-bit fields. David, if the third patch is too big (or maybe the series), I can split it. Just tell me what you prefer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:29 -04:00
Nicolas Dichtel	80df554275	taskstats: use the libnl API to align nlattr on 64-bit Goal of this patch is to use the new libnl API to align netlink attribute when needed. The layout of the netlink message will be a bit different after the patch, because the padattr (TASKSTATS_TYPE_STATS) will be inside the nested attribute instead of before it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:25 -04:00
Nicolas Dichtel	de95c4a46a	xfrm: align nlattr properly when needed Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:25 -04:00
Nicolas Dichtel	73520786b0	libnl: add nla_put_u64_64bit() helper With this function, nla_data() is aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:25 -04:00
Nicolas Dichtel	2175d87cc3	libnl: nla_put_msecs(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
Nicolas Dichtel	756a2f59b7	libnl: nla_put_s64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. In fact, there is no user of this function. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
Nicolas Dichtel	e9bbe898cb	libnl: nla_put_net64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. The temporary function nla_put_be64_32bit() is removed in this patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
Nicolas Dichtel	b46f6ded90	libnl: nla_put_be64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
Nicolas Dichtel	e7479122be	libnl: nla_put_le64(): align on a 64-bit area nla_data() is now aligned on a 64-bit area. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
Nicolas Dichtel	11a9957307	libnl: fix help of _64bit functions Fix typo and describe 'padattr'. Fixes: `089bf1a6a9` ("libnl: add more helpers to align attributes on 64-bit") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 20:13:24 -04:00
David S. Miller	1602f49b58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 18:51:33 -04:00
Linus Torvalds	5f44abd041	RTC fixes for 4.6 A documentation fix for s3c and two fixes for the ds1307. -----BEGIN PGP SIGNATURE----- iQIcBAABCgAGBQJXGVFgAAoJENiigzvaE+LCOVAQAIMHfxfhcKPv05RXgOuxrIJ5 0YdQZG9/CTRr93KgfeeDznI1Zdsg+YdZQxuSzTmlxMt5KBmW1zqs0iAxLQXjwNVO Sjb+kkNh7p2d5UkKfEqrSCsz4PRYMR5GJ3PINN9M2b5txOQjiQ45TizrGGNoi893 AYvvlUrGs8aVOOyV2IaoKh5BB+H56TAPSQBA3i1hJ6epjirchL4ySbmYTMT3bIWL c9Hkr9TLH8zltnkyfYWs1pjUgheOS5VerB+oRXB+I46Vg6Ob4i+MxdYdbe11c4uO ew/xbztiJF2/iHB7/V4CFAxkUl/f+z13ufueo39oVxAdvuQaX+kZbUYdBYtu+tNL 5nM1gcDDl87/MhbPogjhpj3WyJB1XCq2ziQ7//ZkEJ7ILWEx4bVoWaLEIbNrsqW2 zp2X+lQMSKx8gHRxO+PnCA93WzKKy7tGZ7G7ct7E4M0UdTR9/LrCnCo4OwGp0K2e QHVWVyJSeH1xJnhHvC2hk5QU/g9JQbyjBTH7WVL/CYbUIihe0i+Vlw+8NLF6tA/1 BhRkuoQC9gCq0Qjt304WT62y0U5eBDjsgp1VN6i/B7ovIqLoBQIVP+KnEzClHMmM 7wouYhEU7TlU3ugYgmnFiS2zH2544qB+p2sU0uCHBHXah7wUyagnBwvw4/zuS2Tf 390v+/CwnuKm1hGBsOSV =tU6V -----END PGP SIGNATURE----- Merge tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fixes from Alexandre Belloni: "A few fixes for the RTC subsystem. The documentation fix already missed 4.5 so I think it is worth taking it now: A documentation fix for s3c and two fixes for the ds1307" * tag 'rtc-4.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: ds1307: Use irq when available for wakeup-source device rtc: ds1307: ds3231 temperature s16 overflow rtc: s3c: Document in binding that only s3c6410 needs a src clk	2016-04-21 15:41:13 -07:00
Linus Torvalds	f78fe0817a	Power management fixes for v4.6-rc5 - Fix an intel_pstate driver problem causing CPUs to get stuck in the highest P-state when completely idle uncovered by the recent switch over from using timers (Rafael Wysocki). - Avoid attempts to get the current CPU frequency when all devices (like I2C controllers that may be nedded for that purpose) have been suspended during system suspend/resume (Rafael Wysocki). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXGS4tAAoJEILEb/54YlRxOgYQAJfjMJdift90vjXW1rvptx1E mHDweIky3ZhsUifBiOHbxkkIH0zOD4gpi761h/0AefK48wwA/4vflrJar5p/ZnyC 1GnMm/lZ66n2K6kCKnYyhs/CFRqYDDZerhhp88qvA1G9s++wDlmcwqDOfAXF1hFi O7N6bkXsElKowTJtJYRwt477NhMOCzSFJBfoUbCFqi5S4t5aksgJurZgdfGxAMiP oJ9E8mYzkUlLMxyceZJTk/FRe2x0zS9OibpJB3QfChFJ3EMp6BzT5r8G931yFd53 YdV7irw6vZiIM6L04BRzKEODciV71pznBTq2aKGYYx1BdO/2urQ27WhfRdRcVngW OaVDBH7vDHO3BNt8JRJ3C9Veg2fj2rRTKwyoEc5fbXUf2Af5Wx0XxbzuZ38YEqY1 KTBKgnYfzpyvLpjIsF0V8Wdzkx4QZC/XLtKicR3O16Dl/MCMTmHvv5s0fQ8b+hJS GF8BnTtLROjE4tZo19xbqv5uW1hRO7O3V7RqW7rqstmAbCvWzTkZMRsEiCJFw4aU 6Sh2Ds8weQjhMG/NHYLLyMNWKfvF8bj7OKalo0CnVbb3RFrRVbP0m95VHDL7MWgS JeZDF5rsA0K6vcDdnZW1qUg983W63w0uG2uvG9UGnT8bN5fHQEAdESy1QfeydXIl NvvIFzfo0ISKuahL5LdE =SNbz -----END PGP SIGNATURE----- Merge tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Two fixes for issues introduced recently, one for an intel_pstate driver problem uncovered by the recent switch over from using timers and the other one for a potential cpufreq core problem related to system suspend/resume. Specifics: - Fix an intel_pstate driver problem causing CPUs to get stuck in the highest P-state when completely idle uncovered by the recent switch over from using timers (Rafael Wysocki). - Avoid attempts to get the current CPU frequency when all devices (like I2C controllers that may be nedded for that purpose) have been suspended during system suspend/resume (Rafael Wysocki)" * tag 'pm+acpi-4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set intel_pstate: Avoid getting stuck in high P-states when idle	2016-04-21 14:29:34 -07:00
Nishanth Menon	38a7a73e8e	rtc: ds1307: Use irq when available for wakeup-source device With commit `8bc2a40730` ("rtc: ds1307: add support for the DT property 'wakeup-source'") we lost the ability for rtc irq functionality for devices that are actually hooked on a real IRQ line and have capability to wakeup as well. This is not an expected behavior. So, instead of just not requesting IRQ, skip the IRQ requirement only if interrupts are not defined for the device. Fixes: `8bc2a40730` ("rtc: ds1307: add support for the DT property 'wakeup-source'") Reported-by: Tony Lindgren <tony@atomide.com> Cc: Michael Lange <linuxstuff@milaw.biz> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Nishanth Menon <nm@ti.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2016-04-21 23:21:00 +02:00
Zhuang Yuyao	9a3dce62cc	rtc: ds1307: ds3231 temperature s16 overflow while retrieving temperature from ds3231, the result may be overflow since s16 is too small for a multiplication with 250. ie. if temp_buf[0] == 0x2d, the result (s16 temp) will be negative. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Michael Tatarinov <kukabu@gmail.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>	2016-04-21 23:20:59 +02:00
Linus Torvalds	c5edde3a81	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix memory leak in iwlwifi, from Matti Gottlieb. 2) Add missing registration of netfilter arp_tables into initial namespace, from Florian Westphal. 3) Fix potential NULL deref in DecNET routing code. 4) Restrict NETLINK_URELEASE to truly bound sockets only, from Dmitry Ivanov. 5) Fix dst ref counting in VRF, from David Ahern. 6) Fix TSO segmenting limits in i40e driver, from Alexander Duyck. 7) Fix heap leak in PACKET_DIAG_MCLIST, from Mathias Krause. 8) Ravalidate IPV6 datagram socket cached routes properly, particularly with UDP, from Martin KaFai Lau. 9) Fix endian bug in RDS dp_ack_seq handling, from Qing Huang. 10) Fix stats typing in bcmgenet driver, from Eric Dumazet. 11) Openvswitch needs to orphan SKBs before ipv6 fragmentation handing, from Joe Stringer. 12) SPI device reference leak in spi_ks8895 PHY driver, from Mark Brown. 13) atl2 doesn't actually support scatter-gather, so don't advertise the feature. From Ben Hucthings. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits) openvswitch: use flow protocol when recalculating ipv6 checksums Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets atl2: Disable unimplemented scatter/gather feature net/mlx4_en: Split SW RX dropped counter per RX ring net/mlx4_core: Don't allow to VF change global pause settings net/mlx4_core: Avoid repeated calls to pci enable/disable net/mlx4_core: Implement pci_resume callback net: phy: spi_ks8895: Don't leak references to SPI devices net: ethernet: davinci_emac: Fix platform_data overwrite net: ethernet: davinci_emac: Fix Unbalanced pm_runtime_enable qede: Fix single MTU sized packet from firmware GRO flow qede: Fix setting Skb network header qede: Fix various memory allocation error flows for fastpath tcp: Merge tx_flags and tskey in tcp_shifted_skb tcp: Merge tx_flags and tskey in tcp_collapse_retrans drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks openvswitch: Orphan skbs before IPv6 defrag Revert "Prevent NUll pointer dereference with two PHYs on cpsw" VSOCK: Only check error on skb_recv_datagram when skb is NULL ...	2016-04-21 12:57:34 -07:00
David S. Miller	22d37b6b00	Merge branch 'geneve-vxlan-deps' Hannes Frederic Sowa says: ==================== net: network drivers should not depend on geneve/vxlan This patchset removes the dependency of network drivers on vxlan or geneve, so those don't get autoloaded when the nic driver is loaded. Also audited the code such that vxlan_get_rx_port and geneve_get_rx_port are not called without rtnl lock. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:36:04 -04:00
Hannes Frederic Sowa	681e683ff3	geneve: break dependency with netdev drivers Equivalent to "vxlan: break dependency with netdev drivers", don't autoload geneve module in case the driver is loaded. Instead make the coupling weaker by using netdevice notifiers as proxy. Cc: Jesse Gross <jesse@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:44 -04:00
Hannes Frederic Sowa	b7aade1548	vxlan: break dependency with netdev drivers Currently all drivers depend and autoload the vxlan module because how vxlan_get_rx_port is linked into them. Remove this dependency: By using a new event type in the netdevice notifier call chain we proxy the request from the drivers to flush and resetup the vxlan ports not directly via function call but by the already existing netdevice notifier call chain. I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so. We don't need to save those ids, as the event type field is an unsigned long and using specialized event types for this purpose seemed to be a more elegant way. This also comes in beneficial if in future we want to add offloading knobs for vxlan. Cc: Jesse Gross <jesse@kernel.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:44 -04:00
Hannes Frederic Sowa	50d65d7889	qlcnic: protect qlicnic_attach_func with rtnl_lock qlcnic_attach_func requires rtnl_lock to be held. Cc: Dept-GELinuxNICDev@qlogic.com Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:44 -04:00
Hannes Frederic Sowa	b1f99a787e	ixgbe: protect vxlan_get_rx_port in ixgbe_service_task with rtnl_lock vxlan_get_rx_port requires rtnl_lock to be held. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Shannon Nelson <shannon.nelson@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:43 -04:00
Hannes Frederic Sowa	0c5c3252c4	mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock mlx4_en_start_port requires rtnl_lock to be held. Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:43 -04:00
Hannes Frederic Sowa	41419b9303	fm10k: protect fm10k_open in fm10k_io_resume with rtnl_lock fm10k_open requires rtnl_lock to be held. Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: Shannon Nelson <shannon.nelson@intel.com> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com> Cc: Don Skidmore <donald.c.skidmore@intel.com> Cc: Bruce Allan <bruce.w.allan@intel.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:43 -04:00
Hannes Frederic Sowa	08d9910c34	benet: be_resume needs to protect be_open with rtnl_lock be_open calls down to functions which expects rtnl lock to be held. Cc: Sathya Perla <sathya.perla@broadcom.com> Cc: Ajit Khaparde <ajit.khaparde@broadcom.com> Cc: Padmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Cc: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:07 -04:00
Simon Horman	b4f70527f0	openvswitch: use flow protocol when recalculating ipv6 checksums When using masked actions the ipv6_proto field of an action to set IPv6 fields may be zero rather than the prevailing protocol which will result in skipping checksum recalculation. This patch resolves the problem by relying on the protocol in the flow key rather than that in the set field action. Fixes: `83d2b9ba1a` ("net: openvswitch: Support masked set actions.") Cc: Jarno Rajahalme <jrajahalme@nicira.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:28:47 -04:00
Shrikrishna Khare	f0d437809d	Driver: Vmxnet3: set CHECKSUM_UNNECESSARY for IPv6 packets For IPv6, if the device indicates that the checksum is correct, set CHECKSUM_UNNECESSARY. Reported-by: Subbarao Narahari <snarahari@vmware.com> Signed-off-by: Shrikrishna Khare <skhare@vmware.com> Signed-off-by: Jin Heo <heoj@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:28:05 -04:00
Ben Hutchings	f43bfaeddc	atl2: Disable unimplemented scatter/gather feature atl2 includes NETIF_F_SG in hw_features even though it has no support for non-linear skbs. This bug was originally harmless since the driver does not claim to implement checksum offload and that used to be a requirement for SG. Now that SG and checksum offload are independent features, if you explicitly enable SG and use one of the rare protocols that can use SG without checkusm offload, this potentially leaks sensitive information (before you notice that it just isn't working). Therefore this obscure bug has been designated CVE-2016-2117. Reported-by: Justin Yackoski <jyackoski@crypto-nite.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: `ec5f061564` ("net: Kill link between CSUM and SG features.") Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:12:23 -04:00
Alexander Duyck	7f348a6076	net: Add support for IP ID mangling TSO in cases that require encapsulation This patch adds support for NETIF_F_TSO_MANGLEID if a given tunnel supports NETIF_F_TSO. This way if needed a device can then later enable the TSO with IP ID mangling and the tunnels on top of that device can then also make use of the IP ID mangling as well. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:11:07 -04:00
David S. Miller	1df845be65	Merge branch 'mlx5-next' Saeed Mahameed says: ==================== Mellanox 100G mlx5 driver receive path optimizations Changes from V2: - Rebased to `46e7b8d8d5` ("net: dsa: kill circular reference with slave priv") - Updated: ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") * Per Eric Dumazet comment we changed the driver memory handling scheme to work with order-0 pages rather than order-5 via split_page(). * This means that now a mlx5e rx skb can hold one or (more in case of HW LRO) skb frag each pointing to a 4K order-0 page rather than one frag with order-5 page. - Updated: ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") * Code refactoring and code reuse due the split_page() mechanism, now the MPWQE and fragmented MPWQE handling almost look the same, and share most of the code. - In some cases we see 2%-3% packet rate degradation in comparison to the order-5 pages approach, due to split_page() cpu consumption, but still we do see 3%-10% improvement in comparison to the current linear SKB approach. - We do believe that now the driver memory scheme is significantly less vulnerable to the memory DOS attack Eric pointed at. Changes from V1: - Rebased to `efde611b0a` ("Merge branch 'nfp-next'") - Dropped: ("net/mlx5: Refactor mlx5_core_mr to mkey") Already merged into 4.6 from rdma tree. - Dropped: ("net/mlx5_core: Add ConnectX-5 to list of supported devices") Will be pushed to net as we want it in 4.6 release. - Dropped: ("net/mlx5e: Change RX moderation period to be based on CQE") Will be pushed in a later series with full software based adaptive moderation. - Added: ("net/mlx5e: Delay skb->data access") Small trivial optimization. - Updated: ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)") Changed Striding RQ defaults to: > NUM WQEs = 16 > Strides Per WQE = 1024 > Stride Size = 128 - Updated: ("net/mlx5e: Use napi_alloc_skb for RX SKB allocations") Consider the IP packet alignment already done in napi_alloc_skb. Changes from V0: - Fixed a typo in commit message reported by Sergei - Align SKB fragments truesize to stride size - Use skb_add_rx_frag and remove the use of SKB_TRUESIZE - Fix: # MTTs alignment on Power PC - Fix: Free original (unaligned) pointer of MTT array - Use dev_alloc_pages and dev_alloc_page - Extend the stats.buff_alloc_err counter - Reform the copying of packet header into skb linear data - Add compiler hints for conditional statements - Prefetch skd->data prior to copying packet header into it - Rework: mlx5e_complete_rx_fragmented_mpwqe - Handle SKB fragments before linear data - Dropped ("net/mlx5e: Prefetch next RX CQE") for now - Added a small patch that Adds ConnectX-5 devices to the list of supported devices - Rebased to `1cdba55055` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next") This series includes Some RX modifications and optimizations for the mlx5 Ethernet driver. From Rana, we have one patch that adds the support for Connectx-4 queue counters. From Tariq, several patches that are centralized around improving RX path message rate, CPU and Memory utilization, in each patch commit message you will find the performance improvements numbers related to that specific patch. In the 2nd patch we used a queue counter to report "out of buffer" dropped packet count, "Dropped packets due to lack of software resources" 3rd patch modifies the driver's to RSS default value to be spread along the close NUMA node cores only for better out of the box experience. In the 4th and 5th patches we utilized the use of RX multi-packet WQE (Striding RQ) for better memory utilization especially in case of hardware LRO is enabled and for better message rate for small packets. In the 6th and 7th patches we added a fallback mechanism to use fragmented memory when allocating large WQE strides fails, using UMR (User Memory Registration) and ICO (Internal Control Operations) SQs. In the 8th to 11th patches we did some small modification which show some small extra improvements. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Tariq Toukan	5498440756	net/mlx5e: Add ethtool counter for RX buffer allocation failures Counts the number of RX buffer allocation failures and shows it in ethtool statistics. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Saeed Mahameed	e20a0db304	net/mlx5e: Delay skb->data access Move mlx5e_handle_csum and eth_type_trans to the end of mlx5e_build_rx_skb to gain some more time before accessing skb->data, to reduce cache misses. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Tariq Toukan	1bfec31627	net/mlx5e: Remove redundant barrier The bit-op operation one line before is an explicit barrier by itself. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	c5adb96f6c	net/mlx5e: Use napi_alloc_skb for RX SKB allocations Instead of netdev_alloc_skb, we use the napi_alloc_skb function which is designated to allocate skbuff's for RX in a channel-specific NAPI instance, and implies the IP packet alignment. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	bc77b240b3	net/mlx5e: Add fragmented memory support for RX multi packet WQE If the allocation of a linear (physically continuous) MPWQE fails, we allocate a fragmented MPWQE. This is implemented via device's UMR (User Memory Registration) which allows to register multiple memory fragments into ConnectX hardware as a continuous buffer. UMR registration is an asynchronous operation and is done via ICO SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	d3c9bc2743	net/mlx5e: Added ICO SQs Added ICO (Internal Control Operations) SQ per channel to be used for driver internal operations such as memory registration for fragmented memory and nop requests upon ifconfig up. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	461017cb00	net/mlx5e: Support RX multi-packet WQE (Striding RQ) Introduce the feature of multi-packet WQE (RX Work Queue Element) referred to as (MPWQE or Striding RQ), in which WQEs are larger and serve multiple packets each. Every WQE consists of many strides of the same size, every received packet is aligned to a beginning of a stride and is written to consecutive strides within a WQE. In the regular approach, each regular WQE is big enough to be capable of serving one received packet of any size up to MTU or 64K in case of device LRO is enabled, making it very wasteful when dealing with small packets or device LRO is enabled. For its flexibility, MPWQE allows a better memory utilization (implying improvements in CPU utilization and packet rate) as packets consume strides according to their size, preserving the rest of the WQE to be available for other packets. MPWQE default configuration: Num of WQEs = 16 Strides Per WQE = 2048 Stride Size = 64 byte The default WQEs memory footprint went from 1024mtu (~1.5MB) to 16 2048 * 64 = 2MB per ring. However, HW LRO can now be supported at no additional cost in memory footprint, and hence we turn it on by default and get an even better performance. Performance tested on ConnectX4-Lx 50G. To isolate the feature under test, the numbers below were measured with HW LRO turned off. We verified that the performance just improves when LRO is turned back on. * Netperf single TCP stream: - BW raised by 10-15% for representative packet sizes: default, 64B, 1024B, 1478B, 65536B. * Netperf multi TCP stream: - No degradation, line rate reached. * Pktgen: packet rate raised by 2-10% for traffic of different message sizes: 64B, 128B, 256B, 1024B, and 1500B. * Pktgen: packet loss in bursts of small messages (64byte), single stream: - \| num packets \| packets loss before \| packets loss after \| 2K \| ~ 1K \| 0 \| 8K \| ~ 6K \| 0 \| 16K \| ~13K \| 0 \| 32K \| ~28K \| 0 \| 64K \| ~57K \| ~24K As expected as the driver can receive as many small packets (<=64B) as the number of total strides in the ring (default = 2048 * 16) vs. 1024 (default ring size regardless of packets size) before this feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	2f48af128d	net/mlx5e: Use function pointers for RX data path handling In preparation for Striding RQ feature, which will need its own RX handlers. This patch does not change any functionality. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	d8c9660dac	net/mlx5e: Use only close NUMA node for default RSS Distribute default RSS table uniformly over the rings of the close NUMA node, instead of all available channels. This way we enforce the preference of close rings over far ones. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Rana Shahout	593cf33829	net/mlx5e: Allocate set of queue counters per netdev Connect all netdev RQs to this set of queue counters. Also, add an "rx_out_of_buffer" counter to ethtool, which indicates RX packet drops due to lack of receive buffers. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	237cd21809	net/mlx5: Introduce device queue counters A queue counter can collect several statistics for one or more hardware queues (QPs, RQs, etc ..) that the counter is attached to. For Ethernet it will provide an "out of buffer" counter which collects the number of all packets that are dropped due to lack of software buffers. Here we add device commands to alloc/query/dealloc queue counters. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
David S. Miller	b8fd789afb	Merge branch 'bcmsysport-napi-updates' Florian Fainelli says: ==================== net: bcmsysport: utilize newer NAPI APIs These two patches are very analoguous to what was already submitted for BCMGENET and switch the SYSTEMPORT driver to utilizing __napi_schedule_irqoff() and napi_complete_done for the RX NAPI context. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:06:05 -04:00
Florian Fainelli	c82f47efa0	net: bcmsysport: use napi_complete_done() By using napi_complete_done(), we allow fine tuning of /sys/class/net/ethX/gro_flush_timeout for higher GRO aggregation efficiency for a Gbit NIC. Check commit `24d2e4a507` ("tg3: use napi_complete_done()") for details. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:06:04 -04:00

1 2 3 4 5 ...

590220 Commits