linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Arnd Bergmann	51bef926be	nfp: avoid using getnstimeofday64() getnstimeofday64 is deprecated in favor of the ktime_get() family of functions. The direct replacement would be ktime_get_real_ts64(), but I'm picking the basic ktime_get() instead: - using a ktime_t simplifies the code compared to timespec64 - using monotonic time instead of real time avoids issues caused by a concurrent settimeofday() or during a leap second adjustment. Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:55:39 -07:00
Arnd Bergmann	44c58899b0	liquidio: use ktime_get_real_ts64() instead of getnstimeofday64() The two do the same thing, but we want to have a consistent naming in the kernel. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:55:39 -07:00
David S. Miller	ccb06fba51	Merge branch 'net-sched-act_skbedit-lockless-data-path' Davide Caratti says: ==================== net/sched: act_skbedit: lockless data path the data path of act_skbedit can be faster if we avoid using spinlocks: - patch 1 converts act_skbedit statistics to use per-cpu counters - patch 2 lets act_skbedit use RCU to read/update its configuration test procedure (using pktgen from https://github.com/netoptimizer): # ip link add name eth1 type dummy # ip link set dev eth1 up # tc qdisc add dev eth1 clsact # tc filter add dev eth1 egress matchall action skbedit priority c1a0:c1a0 # for c in 1 2 4 ; do > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1 > done test results (avg. pps/thread) $c \| before patch \| after patch \| improvement ----+--------------+--------------+------------ 1 \| 3917464 ± 3% \| `4000458` ± 3% \| irrelevant 2 \| 3455367 ± 4% \| 3953076 ± 1% \| +14% 4 \| 2496594 ± 2% \| 3801123 ± 3% \| +52% v2: rebased on latest net-next ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:54:12 -07:00
Davide Caratti	c749cdda90	net/sched: act_skbedit: don't use spinlock in the data path use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on act_skbedit configuration. This reduces the effects of contention in the data path, in case multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:54:12 -07:00
Davide Caratti	6f3dfb0dc8	net/sched: skbedit: use per-cpu counters use per-CPU counters, instead of sharing a single set of stats with all cores: this removes the need of spinlocks when stats are read/updated. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:54:12 -07:00
Arnd Bergmann	cca9bab1b7	tcp: use monotonic timestamps for PAWS Using get_seconds() for timestamps is deprecated since it can lead to overflows on 32-bit systems. While the interface generally doesn't overflow until year 2106, the specific implementation of the TCP PAWS algorithm breaks in 2038 when the intermediate signed 32-bit timestamps overflow. A related problem is that the local timestamps in CLOCK_REALTIME form lead to unexpected behavior when settimeofday is called to set the system clock backwards or forwards by more than 24 days. While the first problem could be solved by using an overflow-safe method of comparing the timestamps, a nicer solution is to use a monotonic clocksource with ktime_get_seconds() that simply doesn't overflow (at least not until 136 years after boot) and that doesn't change during settimeofday(). To make 32-bit and 64-bit architectures behave the same way here, and also save a few bytes in the tcp_options_received structure, I'm changing the type to a 32-bit integer, which is now safe on all architectures. Finally, the ts_recent_stamp field also (confusingly) gets used to store a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow(). This is currently safe, but changing the type to 32-bit requires some small changes there to keep it working. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:50:40 -07:00
Vakul Garg	d2bdd26812	net/tls: Use aead_request_alloc/free for request alloc/free Instead of kzalloc/free for aead_request allocation and free, use functions aead_request_alloc(), aead_request_free(). It ensures that any sensitive crypto material held in crypto transforms is securely erased from memory. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 14:44:11 -07:00
Pieter Jansen van Vuuren	cba54f9cf4	tc-testing: add geneve options in tunnel_key unit tests Extend tc tunnel_key action unit tests with geneve options. Tests include testing single and multiple geneve options, as well as testing geneve options that are expected to fail. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 12:34:31 -07:00
David S. Miller	f1fbeada1b	Merge branch 'be2net-small-structures-clean-up' Ivan Vecera says: ==================== be2net: small structures clean-up The series: - removes unused / unneccessary fields in several be2net structures - re-order fields in some structures to eliminate holes, cache-lines crosses - as result reduces size of main struct be_adapter by 4kB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:31 -07:00
Ivan Vecera	28ace84b10	be2net: move rss_flags field in rss_info to ensure proper alignment The current position of .rss_flags field in struct rss_info causes that fields .rsstable and .rssqueue (both 128 bytes long) crosses cache-line boundaries. Moving it at the end properly align all fields. Before patch: struct rss_info { u64 rss_flags; /* 0 8 / u8 rsstable[128]; / 8 128 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / u8 rss_queue[128]; / 136 128 / / --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- / u8 rss_hkey[40]; / 264 40 / }; After patch: struct rss_info { u8 rsstable[128]; / 0 128 / / --- cacheline 2 boundary (128 bytes) --- / u8 rss_queue[128]; / 128 128 / / --- cacheline 4 boundary (256 bytes) --- / u8 rss_hkey[40]; / 256 40 / u64 rss_flags; / 296 8 */ }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:31 -07:00
Ivan Vecera	03d231a963	be2net: re-order fields in be_error_recovert to avoid hole - Unionize two u8 fields where only one of them is used depending on NIC chipset. - Move recovery_supported field after that union These changes eliminate 7-bytes hole in the struct and makes it smaller by 8 bytes. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:31 -07:00
Ivan Vecera	f9520b86dc	be2net: remove unused tx_jiffies field from be_tx_stats Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Vecera	646d2c10aa	be2net: move txcp field in be_tx_obj to eliminate holes in the struct Before patch: struct be_tx_obj { u32 db_offset; /* 0 4 / / XXX 4 bytes hole, try to pack / struct be_queue_info q; / 8 56 / / --- cacheline 1 boundary (64 bytes) --- / struct be_queue_info cq; / 64 56 / struct be_tx_compl_info txcp; / 120 4 / / XXX 4 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / struct sk_buff sent_skb_list[2048]; /* 128 16384 / ... }: After patch: struct be_tx_obj { u32 db_offset; / 0 4 / struct be_tx_compl_info txcp; / 4 4 / struct be_queue_info q; / 8 56 / / --- cacheline 1 boundary (64 bytes) --- / struct be_queue_info cq; / 64 56 / struct sk_buff sent_skb_list[2048]; /* 120 16384 */ ... }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Vecera	e9c74cd85c	be2net: reorder fields in be_eq_obj structure Re-order fields in struct be_eq_obj to ensure that .napi field begins at start of cache-line. Also the .adapter field is moved to the first cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr) and the 4-bytes hole to 3rd cache-line. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Vecera	d6d9704af8	be2net: remove desc field from be_eq_obj The event queue description (be_eq_obj.desc) field is used only to format string for IRQ name and it is not really needed to hold this value. Remove it and use local variable to format string for IRQ name. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Vecera	c1328a27bb	be2net: remove unused old custom busy-poll fields The commit `fb6113e688` ("be2net: get rid of custom busy poll code") replaced custom busy-poll code by the generic one but left several macros and fields in struct be_eq_obj that are currently unused. Remove this stuff. Fixes: `fb6113e688` ("be2net: get rid of custom busy poll code") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Vecera	a5d7fcb689	be2net: remove unused old AIC info The commit `2632bafd74` ("be2net: fix adaptive interrupt coalescing") introduced a separate struct be_aic_obj to hold AIC information but unfortunately left the old stuff in be_eq_obj. So remove it. Fixes: `2632bafd74` ("be2net: fix adaptive interrupt coalescing") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:03:30 -07:00
Ivan Khoronzhuk	d0c694fc7b	net: ethernet: ti: cpts: break cycle once late ts is matched The late ts queue can contain a bunch of skbs while hi rate testing, no need to check all of them if timestamp is already matched. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-12 00:00:07 -07:00
Petr Machata	4280129838	selftests: forwarding: mirror_gre_nh: Unset rp_filter on host VRF The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP address from 192.0.2.128/28 network. However the interface is configured as a member of 192.0.2.160/28 and there's no route directing traffic from the former network through that interface. Correspondingly, the RP filter on the VRF rejects it. Therefore turn off the VRF's RP filter. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:59:27 -07:00
David S. Miller	d90a5215c8	Merge branch 'mlxsw-ERSPAN-Take-LACP-state-into-consideration' Ido Schimmel says: ==================== mlxsw: ERSPAN: Take LACP state into consideration Petr says: When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. This patch set aims to address these problems. Patch #1 publishes team_port_get_rcu(). Then in patch #2, a new function is introduced, mlxsw_sp_port_dev_check(). That returns, for a given netdevice that is a slave of a LAG device, whether that device is "txable", i.e. whether the LAG master would send traffic through it. Since there's no good place to put LAG-wide helpers, introduce a new header include/net/lag.h. Finally in patch #3, fix the slave selection logic to take into consideration whether a given slave has a carrier and whether it is txable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:10:20 -07:00
Petr Machata	b5de82f3df	mlxsw: spectrum_span: Change LAG lower selection When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. So instead of checking upness of the device, check carrier state and txability. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:10:19 -07:00
Petr Machata	eeed992b77	net: Add lag.h, net_lag_port_dev_txable() LAG devices (team or bond) recognize for each one of their slave devices whether LAG traffic is going to be sent through that device. Bond calls such devices "active", team calls them "txable". When this state changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together with a netdev_notifier_changelowerstate_info structure that for LAG devices includes a tx_enabled flag that refers to the new state. The notification thus makes it possible to react to the changes in txability in drivers. However there's no way to query txability from the outside on demand. That is problematic namely for mlxsw, which when resolving ERSPAN packet path, may encounter a LAG device, and needs to determine which of the slaves it should choose. To that end, introduce a new function, net_lag_port_dev_txable(), which determines whether a given slave device is "active" or "txable" (depending on the flavor of the LAG device). That function then dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp. team_port_dev_txable(). Because there currently is no good place where net_lag_port_dev_txable() should be added, introduce a new header file, lag.h, which should from now on hold any logic common to both team and bond. (But keep netif_is_lag_master() together with the rest of netif_is_*_master() functions). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:10:19 -07:00
Petr Machata	3443b00e07	team: Publish team_port_get_rcu() A follow-up patch adds a new entry point, team_port_dev_txable(). Making it an ordinary exported function would mean that any module that may need the service in one of the supported configurations also unconditionally needs to pull in the team module, whether or not the user actually intends to create team interfaces. To prevent that, team_port_dev_txable() is defined in if_team.h, and therefore all dependencies of that function also need to be publicly-visible. Therefore move team_port_get_rcu() from team.c to if_team.h. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:10:19 -07:00
Travis Brown	80fd2d6ca5	macvlan: Change status when lower device goes down Today macvlan ignores the notification when a lower device goes administratively down, preventing the lack of connectivity from bubbling up. Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN with NO-CARRIER which should be easy to interpret in userspace. 2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 Signed-off-by: Suresh Krishnan <skrishnan@arista.com> Signed-off-by: Travis Brown <travisb@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:07:22 -07:00
David S. Miller	0e97c4fb18	Merge branch 'tipc-make-link-protocol-more-resilient' Jon Maloy says: ==================== tipc: make link protocol more resilient These two commits make the link ptotocol more resilient to infrastructures with frequent packet duplication and long delays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:06:14 -07:00
Jon Maloy	7ea817f4e8	tipc: check session number before accepting link protocol messages In some virtual environments we observe a significant higher number of packet reordering and delays than we have been used to traditionally. This makes it necessary with stricter checks on incoming link protocol messages' session number, which until now only has been validated for RESET messages. Since the other two message types, ACTIVATE and STATE messages also carry this number, it is easy to extend the validation check to those messages. We also introduce a flag indicating if a link has a valid peer session number or not. This eliminates the mixing of 32- and 16-bit arithmethics we are currently using to achieve this. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:06:14 -07:00
Jon Maloy	9012de5089	tipc: add sequence number check for link STATE messages Some switch infrastructures produce huge amounts of packet duplicates. This becomes a problem if those messages are STATE/NACK protocol messages, causing unnecessary retransmissions of already accepted packets. We now introduce a unique sequence number per STATE protocol message so that duplicates can be identified and ignored. This will also be useful when tracing such cases, and to avert replay attacks when TIPC is encrypted. For compatibility reasons we have to introduce a new capability flag TIPC_LINK_PROTO_SEQNO to handle this new feature. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:06:14 -07:00
David S. Miller	e32f55f373	Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09 This patch series is meant to allow support for the L2 forward offload, aka MACVLAN offload without the need for using ndo_select_queue. The existing solution currently requires that we use ndo_select_queue in the transmit path if we want to associate specific Tx queues with a given MACVLAN interface. In order to get away from this we need to repurpose the tc_to_txq array and XPS pointer for the MACVLAN interface and use those as a means of accessing the queues on the lower device. As a result we cannot offload a device that is configured as multiqueue, however it doesn't really make sense to configure a macvlan interfaced as being multiqueue anyway since it doesn't really have a qdisc of its own in the first place. The big changes in this set are: Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN Disable XPS for single queue devices Replace accel_priv with sb_dev in ndo_select_queue Add sb_dev parameter to fallback function for ndo_select_queue Consolidated ndo_select_queue functions that appeared to be duplicates ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:03:32 -07:00
Deepti Raghavan	4929c9428a	tcp: expose both send and receive intervals for rate sample Congestion control algorithms, which access the rate sample through the tcp_cong_control function, only have access to the maximum of the send and receive interval, for cases where the acknowledgment rate may be inaccurate due to ACK compression or decimation. Algorithms may want to use send rates and receive rates as separate signals. Signed-off-by: Deepti Raghavan <deeptir@mit.edu> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:01:56 -07:00
Vlad Buslov	e0479b670d	net: sched: fix unprotected access to rcu cookie pointer Fix action attribute size calculation function to take rcu read lock and access act_cookie pointer with rcu dereference. Fixes: `eec94fdb04` ("net: sched: use rcu for action cookie update") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 23:01:02 -07:00
David S. Miller	2368957ab5	Merge branch 'cxgb4-move-stats-fetched-from-firmware-to-debugfs' Rahul Lakkireddy says: ==================== cxgb4: move stats fetched from firmware to debugfs Some stats are fetched via slow firmware mailbox, which can cause packet drops under heavy load. So, this series removes these stats from ethtool -S and expose them via debugfs. Patch 1 removes stats fetched via firmware from ethtool -S. Patch 2 exposes stats removed in Patch 1 via debugfs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:59:39 -07:00
Rahul Lakkireddy	31e5f5c3e9	cxgb4: expose stats fetched from firmware via debugfs Expose stats obtained from firmware via debugfs. These stats can't be part of ethtool -S because the slow firmware mailbox can cause packet drops under heavy load. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:59:38 -07:00
Rahul Lakkireddy	b351b16d8a	cxgb4: remove stats fetched from firmware When running ethtool -S, some stats are requested from firmware. Since getting these stats via firmware mailbox is slow, some packets get dropped under heavy load while running ethtool -S. So, remove these stats from ethtool -S. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:59:38 -07:00
Antoine Tenart	b32b088181	net: mvpp2: explicitly include linux/interrupt.h The Marvell PPv2 driver uses interrupts and tasklet but does not explicitly include linux/interrupt.h, relying on implicit includes. This one particularly is included by chance after a long unlogical chain of inclusions. Fix this so we do not get future build breaks. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:56:52 -07:00
Jan Dakinevich	fb8ed3af74	cnic: use kvzalloc to allocate memory for csk_tbl Size of csk_tbl is about 58K, which means 3rd order page allocation. kvzalloc provides a fallback if no high order memory is available. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:55:52 -07:00
Colin Ian King	3c546728df	wimax/i2400m: remove redundant variables ack_status, bcf and protocol Variables ack_status, bcf and protocol are being assigned but are never used hence they are redundant and can be removed. Also declare ack_type as unsigned int rather than unsigned to clean up a checkpatch warning. Cleans up clang warnings: warning: variable 'ack_status' set but not used [-Wunused-but-set-variable] warning: variable 'bcf' set but not used [-Wunused-but-set-variable] warning: variable 'protocol' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:54:25 -07:00
Vlad Buslov	01e866bf07	net: sched: act_ife: fix memory leak in ife init Free params if tcf_idr_check_alloc() returned error. Fixes: `0190c1d452` ("net: sched: atomically check-allocate action") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:53:00 -07:00
Arjun Vynipadath	8dce04f1fd	cxgb4: specify IQTYPE in fw_iq_cmd congestion argument passed to t4_sge_alloc_rxq() is used to differentiate between nic/ofld queues. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:52:10 -07:00
David S. Miller	57cd07fbf7	Merge branch 'net-ipv6-addr_gen_mode-fixes' Sabrina Dubroca says: ==================== net/ipv6: addr_gen_mode fixes This series fixes bugs in handling of the addr_gen_mode option, mainly related to the sysctl. A minor netlink issue was also present in the initial commit introducing the option on a per-netdevice basis. v2: add patch 4, requested by David Ahern during review of v1 add patch 5, missing documentation for the sysctl patches 1, 2, 3 are unchanged ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:46 -07:00
Sabrina Dubroca	f168db5e25	Documentation: ip-sysctl.txt: document addr_gen_mode addr_gen_mode was introduced in without documentation, add it now. Fixes: `d35a00b8e3` ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:45 -07:00
Sabrina Dubroca	f24c5987dd	net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devices This aligns the addr_gen_mode sysctl with the expected behavior of the "all" variant. Fixes: `d35a00b8e3` ("net/ipv6: allow sysctl to change link-local address generation mode") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:45 -07:00
Sabrina Dubroca	bdd72f4133	net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODE inet6_ifla6_size() is called to check how much space is needed by inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it. Fixes: `bc91b0f07a` ("ipv6: addrconf: implement address generation modes") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:45 -07:00
Sabrina Dubroca	70c30d76e5	net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_dev The value has already been copied from this netns's devconf_dflt, it shouldn't be reset to the global kernel default. Fixes: `d35a00b8e3` ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:45 -07:00
Sabrina Dubroca	c6dbf7aaa4	net/ipv6: fix addrconf_sysctl_addr_gen_mode addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores the errors returned by proc_dointvec(). addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which writes the value to memory, and then checks if it's valid and may return EINVAL. If a bad value is given, the value displayed when reading net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the value provided by the user was valid, addrconf_dev_config() won't be called since idev->cnf.addr_gen_mode has already been updated. Fix this in the usual way we deal with values that need to be checked after the proc_do*() helper has returned: define a local ctl_table and storage, call proc_dointvec() on that temporary area, then check and store. addrconf_sysctl_addr_gen_mode() also writes the new value to the global ipv6_devconf_dflt, when we're writing to some netns's default, so that new netns will inherit the value that was set by the change occuring in any netns. That doesn't make any sense, so let's drop this assignment. Finally, since addr_gen_mode is a __u32, switch to proc_douintvec(). Fixes: `d35a00b8e3` ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:50:45 -07:00
Jianbo Liu	5e9a0fe492	net/sched: flower: Fix null pointer dereference when run tc vlan command Zahari issued tc vlan command without setting vlan_ethtype, which will crash kernel. To avoid this, we must check tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE] is not null before use it. Also we don't need to dump vlan_ethtype or cvlan_ethtype in this case. Fixes: `d64efd0926` ('net/sched: flower: Add supprt for matching on QinQ vlan headers') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Zahari Doychev <zahari.doychev@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-11 22:48:13 -07:00
Petr Machata	db560d1612	selftests: forwarding: mirror_lib: Tighten up VLAN capture The function do_test_span_vlan_dir_ips() is used for testing whether mirrored packets are VLAN-encapsulated. But since it only considers VLAN encapsulation, it may end up matching unmirrored ARP traffic as well. One consequence is a rare failure of mirror_gre_vlan_bridge_1q's test_gretap_untagged_egress. Decreasing ping cadence in mirror_test() makes the problem easily reproducible. Therefore tighten up the match criterion to only count those 802.1q packets where the next header is IP. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-10 22:58:10 -07:00
David S. Miller	5025b99c96	Merge branch 'cake-qdisc' Toke Høiland-Jørgensen says: ==================== sched: Add Common Applications Kept Enhanced (cake) qdisc This patch series adds the CAKE qdisc, and has been split up to ease review. I have attempted to split out each configurable feature into its own patch. The first commit adds the base shaper and packet scheduler, while subsequent commits add the optional features. The full userspace API and most data structures are included in this commit, but options not understood in the base version will be ignored. The result of applying the entire series is identical to the out of tree version that have seen extensive testing in previous deployments, most notably as an out of tree patch to OpenWrt. However, note that I have only compile tested the individual patches; so the whole series should be considered as a unit. --- Changelog v19: - Rebase to current net-next. - Don't rely on the value of sch->q.qlen to break loops; fixes possible infinite loop on multi-queue devices. - Don't overwrite NAT flag when setting flow mode. v18: - Rework classification logic in the diffserv case to always hash if filter doesn't select a queue, and to run TC filters before selecting the diffserv tin (allowing filter to influence this). - Make sure we always call qdisc_watchdog_init() in cake_init(), so we don't crash in cake_destroy(). v17: - Rebase to newest net-next and move the conntrack callback to nf_ct_hook - Fix a compile error when NF_CONNTRACK is unset. v16: - Move conntrack lookup function into conntrack core and read it via RCU so it is only active when the nf_conntrack module is loaded. This avoids the module dependency on conntrack for NAT mode. Thanks to Pablo for the idea. v15: - Handle ECN flags in ACK filter v14: - Handle seqno wraps and DSACKs in ACK filter v13: - Avoid ktime_t to scalar compares - Add class dumping and basic stats - Fail with ENOTSUPP when requesting NAT mode and conntrack is not available. - Parse all TCP options in ACK filter and make sure to only drop safe ones. Also handle SACK ranges properly. v12: - Get rid of custom time typedefs. Use ktime_t for time and u64 for duration instead. v11: - Fix overhead compensation calculation for GSO packets - Change configured rate to be u64 (I ran out of bits before I ran out of CPU when testing the effects of the above) v10: - Christmas tree gardening (fix variable declarations to be in reverse line length order) v9: - Remove duplicated checks around kvfree() and just call it unconditionally. - Don't pass __GFP_NOWARN when allocating memory - Move options in cake_dump() that are related to optional features to later patches implementing the features. - Support attaching filters to the qdisc and use the classification result to select flow queue. - Support overriding diffserv priority tin from skb->priority v8: - Remove inline keyword from function definitions - Simplify ACK filter; remove the complex state handling to make the logic easier to follow. This will potentially be a bit less efficient, but I have not been able to measure a difference. v7: - Split up patch into a series to ease review. - Constify the ACK filter. v6: - Fix 6in4 encapsulation checks in ACK filter code - Checkpatch fixes v5: - Refactor ACK filter code and hopefully fix the safety issues properly this time. v4: - Only split GSO packets if shaping at speeds <= 1Gbps - Fix overhead calculation code to also work for GSO packets - Don't re-implement kvzalloc() - Remove local header include from out-of-tree build (fixes kbuild-bot complaint). - Several fixes to the ACK filter: - Check pskb_may_pull() before deref of transport headers. - Don't run ACK filter logic on split GSO packets - Fix TCP sequence number compare to deal with wraparounds v3: - Use IS_REACHABLE() macro to fix compilation when sch_cake is built-in and conntrack is a module. - Switch the stats output to use nested netlink attributes instead of a versioned struct. - Remove GPL boilerplate. - Fix array initialisation style. v2: - Fix kbuild test bot complaint - Clean up the netlink ABI - Fix checkpatch complaints - A few tweaks to the behaviour of cake based on testing carried out while writing the paper. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-10 20:06:35 -07:00
Toke Høiland-Jørgensen	0c850344d3	sch_cake: Conditionally split GSO segments At lower bandwidths, the transmission time of a single GSO segment can add an unacceptable amount of latency due to HOL blocking. Furthermore, with a software shaper, any tuning mechanism employed by the kernel to control the maximum size of GSO segments is thrown off by the artificial limit on bandwidth. For this reason, we split GSO segments into their individual packets iff the shaper is active and configured to a bandwidth <= 1 Gbps. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-10 20:06:34 -07:00
Toke Høiland-Jørgensen	a729b7f0bd	sch_cake: Add overhead compensation support to the rate shaper This commit adds configurable overhead compensation support to the rate shaper. With this feature, userspace can configure the actual bottleneck link overhead and encapsulation mode used, which will be used by the shaper to calculate the precise duration of each packet on the wire. This feature is needed because CAKE is often deployed one or two hops upstream of the actual bottleneck (which can be, e.g., inside a DSL or cable modem). In this case, the link layer characteristics and overhead reported by the kernel does not match the actual bottleneck. Being able to set the actual values in use makes it possible to configure the shaper rate much closer to the actual bottleneck rate (our experience shows it is possible to get with 0.1% of the actual physical bottleneck rate), thus keeping latency low without sacrificing bandwidth. The overhead compensation has three tunables: A fixed per-packet overhead size (which, if set, will be accounted from the IP packet header), a minimum packet size (MPU) and a framing mode supporting either ATM or PTM framing. We include a set of common keywords in TC to help users configure the right parameters. If no overhead value is set, the value reported by the kernel is used. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-10 20:06:34 -07:00
Toke Høiland-Jørgensen	83f8fd69af	sch_cake: Add DiffServ handling This adds support for DiffServ-based priority queueing to CAKE. If the shaper is in use, each priority tier gets its own virtual clock, which limits that tier's rate to a fraction of the overall shaped rate, to discourage trying to game the priority mechanism. CAKE defaults to a simple, three-tier mode that interprets most code points as "best effort", but places CS1 traffic into a low-priority "bulk" tier which is assigned 1/16 of the total rate, and a few code points indicating latency-sensitive or control traffic (specifically TOS4, VA, EF, CS6, CS7) into a "latency sensitive" high-priority tier, which is assigned 1/4 rate. The other supported DiffServ modes are a 4-tier mode matching the 802.11e precedence rules, as well as two 8-tier modes, one of which implements strict precedence of the eight priority levels. This commit also adds an optional DiffServ 'wash' mode, which will zero out the DSCP fields of any packet passing through CAKE. While this can technically be done with other mechanisms in the kernel, having the feature available in CAKE significantly decreases configuration complexity; and the implementation cost is low on top of the other DiffServ-handling code. Filters and applications can set the skb->priority field to override the DSCP-based classification into tiers. If TC_H_MAJ(skb->priority) matches CAKE's qdisc handle, the minor number will be interpreted as a priority tier if it is less than or equal to the number of configured priority tiers. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-07-10 20:06:34 -07:00

1 2 3 4 5 ...

767844 Commits