linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-14 10:46:46 +07:00

Author	SHA1	Message	Date
Jesse Brandeburg	4c66d227e4	ice: add helpers for virtchnl The virtchannel interface was repeating a lot of strings and wasting storage space in the kernel. There was also inconsistent messages for the same thing. Consolidate all those messages and bit checks into a couple of helper functions. Also, reduce stack space usage by simplifying getting the pointer to the pf using a helper. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Co-developed-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:15:21 -08:00
Brett Creeley	4015d11e4b	ice: Add ice_pf_to_dev(pf) macro We use &pf->dev->pdev all over the code. Add a simple macro to do this for us. When multiple de-references like this are being done add a local struct device variable. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:15:17 -08:00
Tony Nguyen	9efe35d0db	ice: Do not use devm* functions for local uses In situations where we alloc and free memory within the same function do not use the devm_* variants; use regular alloc and free functions. Remove any unused vars if there are no usages after these changes. Also, replace an allocate and copy with kmemdup() and remove an unnecessary memset() to 0 after a kzalloc(). Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:15:12 -08:00
Brett Creeley	1bc7a4ab85	ice: Refactor removal of VLAN promiscuous rules Currently ice_clear_vsi_promisc() detects if the VLAN ID sent is not 0 and sets the recipe_id to ICE_SW_LKUP_PROMISC_VLAN in that case and ICE_SW_LKUP_PROMISC if the VLAN_ID is 0. However this doesn't allow VLAN 0 promiscuous rules to be removed, but they can be added. Fix this by checking if the promisc_mask contains ICE_PROMISC_VLAN_RX or ICE_PROMISC_VLAN_TX. This change was made to match what is being done for ice_set_vsi_promisc(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:15:08 -08:00
Brett Creeley	e25f9152bc	ice: Fix setting coalesce to handle DCB configuration Currently there can be a case where a DCB map is applied and there are more interrupt vectors (vsi->num_q_vectors) than Rx queues (vsi->num_rxq) and Tx queues (vsi->num_txq). If we try to set coalesce settings in this case it will report a false failure. Fix this by checking if vector index is valid with respect to the number of Tx and Rx queues configured. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:15:04 -08:00
Akeem G Abodunrin	1f9639d2fb	ice: Only disable VF state when freeing each VF resources It is wrong to set PF disable state flag for all VFs when freeing VF resources - Instead, we should set VF disable state flag for each VF with its resources being returned to the device. Right now, all VF opcodes, mailbox communication to clear its resources as well fails - since we already indicate that PF is in disable state, with all VFs not active. In addition, we don't need to notify VF that PF is intending to reset it, if it is already in disabled state. Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:14:48 -08:00
Jesse Brandeburg	949375de94	ice: fix stack leakage In the case of an invalid virtchannel request the driver would return uninitialized data to the VF from the PF stack which is a bug. Fix by initializing the stack variable earlier in the function before any return paths can be taken. Fixes: `1071a8358a` ("ice: Implement virtchnl commands for AVF support") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:09:31 -08:00
Brett Creeley	2f9ec24198	ice: Don't modify stripping for add/del VLANs on VF Currently when adding/deleting vlans in ice_vc_process_vlan_msg() we are calling ice_vsi_manage_vlan_stripping() to enable/disable when adding and deleting a VLAN respectively. This is wrong because adding/deleting VLANs has nothing to do with configuring VLAN stripping. VLAN stripping is configured through the following VIRTCHNL operations: VIRTCHNL_OP_ENABLE_VLAN_STRIPPING VIRTCHNL_OP_DISABLE_VLAN_STRIPPING Unfortunately we can't just remove this because then stripping will never be configured on VF initialization. Fix this by adding a new function that initializes (disables/enables) VLAN stripping for the VF based on the device supported capabilities. This allows us to remove the call to ice_vsi_manage_vlan_stripping() in ice_vc_process_vlan_msg(). Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:06:34 -08:00
Brett Creeley	d4bc4e2d6b	ice: Disallow VF VLAN opcodes if VLAN offloads disabled Currently if the host disables VLAN offloads on the VF by not setting the VIRTCHNL_VF_OFFLOAD_VLAN capability bit we will still honor VF VLAN configuration messages over VIRTCHNL. These messages (i.e. enable/disable VLAN stripping and VLAN filtering) should be blocked when the feature is not supported. Fix that by adding a helper function to determine if the VF is allowed to do VLAN operations based on the host's VF configuration. Also, mirror the VF communicated capabilities in the host's VF configuration. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:06:34 -08:00
Bruce Allan	9164f761c9	ice: Correct capabilities reporting of max TCs Firmware always returns 8 as the max number of supported TCs. However on devices with more than 4 ports, the maximum number of TCs per port is limited to 4. Check and, if necessary, correct the reporting of capabilities for devices with more than 4 ports. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:06:34 -08:00
Bruce Allan	eae1bbb2a4	ice: Store number of functions for the device Store the number of functions the device has and use this number when setting safe mode capabilities instead of calculating it. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Co-developed-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>	2019-11-22 13:06:34 -08:00
Chen Wandun	3243e04ab1	net: dsa: ocelot: fix "should it be static?" warnings Fix following sparse warnings: drivers/net/dsa/ocelot/felix.c:351:6: warning: symbol 'felix_txtstamp' was not declared. Should it be static? Signed-off-by: Chen Wandun <chenwandun@huawei.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 10:09:10 -08:00
Andrea Mayer	fd1fef0c45	seg6: allow local packet processing for SRv6 End.DT6 behavior End.DT6 behavior makes use of seg6_lookup_nexthop() function which drops all packets that are destined to be locally processed. However, DT* should be able to deliver decapsulated packets that are destined to local addresses. Function seg6_lookup_nexthop() is also used by DX6, so in order to maintain compatibility I created another routing helper function which is called seg6_lookup_any_nexthop(). This function is able to take into account both packets that have to be processed locally and the ones that are destined to be forwarded directly to another machine. Hence, seg6_lookup_any_nexthop() is used in DT6 rather than seg6_lookup_nexthop() to allow local delivery. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 09:45:55 -08:00
Petr Machata	d1746d1e80	net: flow_dissector: Wrap unionized VLAN fields in a struct In commit `a82055af59` ("netfilter: nft_payload: add VLAN offload support"), VLAN fields in struct flow_dissector_key_vlan were unionized with the intention of introducing another field that covered the whole TCI header. However without a wrapping struct the subfields end up sharing the same bits. As a result, "tc filter add ... flower vlan_id 14" specifies not only vlan_id, but also vlan_priority. Fix by wrapping the individual VLAN fields in a struct. Fixes: `a82055af59` ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 09:44:14 -08:00
David S. Miller	4bbb02f1a5	The interesting new thing here is AQL, the Airtime Queue Limit patchset from Kan Yan (Google) and Toke Høiland-Jørgensen (Redhat). The effect is intended to eventually be similar to BQL, but byte queue limits are not useful in wifi where the actual throughput can vary by around 4 orders of magnitude. There are more details in the patches themselves. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl3X3AcACgkQB8qZga/f l8Q8WQ/+M+KaxTsqlLCZFoQwegQ3Z2i6wZw0uhPEJ3vDWBdOEtopMzv0v69DQPV4 TQdXj+SoXLijvcUah6nc8Ve8am7wjoxf6YfHKvhbJK3xc3L25H+W5+0dZSzWXX1l ldhv4tBF5nBJAAhAN6DX8oOp6B6t7E5vTbwTcW6fr897g/ypXqM5zl39PQwOCznA SwRoQua5Wz/EIIpljK9Z9PSv/B2FIa3k6QgZGJizSKZd+wjiYJC0CM1hYbWqZlSx TL5Zy5QbJhsC7jpByVfJ/SrWuKT5uHVobhUY7uEpLTV2VuMTUSvshY0Naz/uD48+ E6rLkJWD/DiZijCnRuJyh7uFfoWsHOjav69vqzYwTYrtqGBoDbQ3jtYyyePyp1c4 h182yh7IcE7t8CSpgOGPDvYC3o4JYHZhXjyonXS5es4IOrTLLf26HOotvjuPCS4U KdrDuv/ayYW4C5suBs/E/TIfqCEW+glhJuoEL3ruFXVtvpjLfaAbFsP2OH7M3Vg+ PPOKGtgz0JkdanNuH2aEcEI6UrtHYnAwqpD8DXi2zxk7eKc/yWW8A/guPFVzNsH9 QSucdLMWccfEgQhnHilelEfGPamNGeANQs0uDsdTE9kJ9y9OofgncYsfMb9R5R3p ezFuWhPtX4DS13lvXLPxl8l6xmz/NKWSwWSqlIlm8u5xi9oyOss= =0uzN -----END PGP SIGNATURE----- Merge tag 'mac80211-next-for-net-next-2019-11-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== The interesting new thing here is AQL, the Airtime Queue Limit patchset from Kan Yan (Google) and Toke Høiland-Jørgensen (Redhat). The effect is intended to eventually be similar to BQL, but byte queue limits are not useful in wifi where the actual throughput can vary by around 4 orders of magnitude. There are more details in the patches themselves. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 09:40:52 -08:00
Tuong Lien	41b416f1fc	tipc: support in-order name publication events It is observed that TIPC service binding order will not be kept in the publication event report to user if the service is subscribed after the bindings. For example, services are bound by application in the following order: Server: bound port A to {18888,66,66} scope 2 Server: bound port A to {18888,33,33} scope 2 Now, if a client subscribes to the service range (e.g. {18888, 0-100}), it will get the 'TIPC_PUBLISHED' events in that binding order only when the subscription is started before the bindings. Otherwise, if started after the bindings, the events will arrive in the opposite order: Client: received event for published {18888,33,33} Client: received event for published {18888,66,66} For the latter case, it is clear that the bindings have existed in the name table already, so when reported, the events' order will follow the order of the rbtree binding nodes (- a node with lesser 'lower'/'upper' range value will be first). This is correct as we provide the tracking on a specific service status (available or not), not the relationship between multiple services. However, some users expect to see the same order of arriving events irrespective of when the subscription is issued. This turns out to be easy to fix. We now add functionality to ensure that publication events always are issued in the same temporal order as the corresponding bindings were performed. v2: replace the unnecessary macro - 'publication_after()' with inline function. v3: reuse 'time_after32()' instead of reinventing the same exact code. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 09:29:50 -08:00
Hoang Le	ba5f6a8617	tipc: update replicast capability for broadcast send link When setting up a cluster with non-replicast/replicast capability supported. This capability will be disabled for broadcast send link in order to be backwards compatible. However, when these non-support nodes left and be removed out the cluster. We don't update this capability on broadcast send link. Then, some of features that based on this capability will also disabling as unexpected. In this commit, we make sure the broadcast send link capabilities will be re-calculated as soon as a node removed/rejoined a cluster. Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-22 09:29:50 -08:00
Toke Høiland-Jørgensen	7a89233ac5	mac80211: Use Airtime-based Queue Limits (AQL) on packet dequeue The previous commit added the ability to throttle stations when they queue too much airtime in the hardware. This commit enables the functionality by calculating the expected airtime usage of each packet that is dequeued from the TXQs in mac80211, and accounting that as pending airtime. The estimated airtime for each skb is stored in the tx_info, so we can subtract the same amount from the running total when the skb is freed or recycled. The throttling mechanism relies on this accounting to be accurate (i.e., that we are not freeing skbs without subtracting any airtime they were accounted for), so we put the subtraction into ieee80211_report_used_skb(). As an optimisation, we also subtract the airtime on regular TX completion, zeroing out the value stored in the packet afterwards, to avoid having to do an expensive lookup of the station from the packet data on every packet. This patch does not include any mechanism to wake a throttled TXQ again, on the assumption that this will happen anyway as a side effect of whatever freed the skb (most commonly a TX completion). Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20191119060610.76681-5-kyan@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Kan Yan	3ace10f5b5	mac80211: Implement Airtime-based Queue Limit (AQL) In order for the Fq_CoDel algorithm integrated in mac80211 layer to operate effectively to control excessive queueing latency, the CoDel algorithm requires an accurate measure of how long packets stays in the queue, AKA sojourn time. The sojourn time measured at the mac80211 layer doesn't include queueing latency in the lower layer (firmware/hardware) and CoDel expects lower layer to have a short queue. However, most 802.11ac chipsets offload tasks such TX aggregation to firmware or hardware, thus have a deep lower layer queue. Without a mechanism to control the lower layer queue size, packets only stay in mac80211 layer transiently before being sent to firmware queue. As a result, the sojourn time measured by CoDel in the mac80211 layer is almost always lower than the CoDel latency target, hence CoDel does little to control the latency, even when the lower layer queue causes excessive latency. The Byte Queue Limits (BQL) mechanism is commonly used to address the similar issue with wired network interface. However, this method cannot be applied directly to the wireless network interface. "Bytes" is not a suitable measure of queue depth in the wireless network, as the data rate can vary dramatically from station to station in the same network, from a few Mbps to over Gbps. This patch implements an Airtime-based Queue Limit (AQL) to make CoDel work effectively with wireless drivers that utilized firmware/hardware offloading. AQL allows each txq to release just enough packets to the lower layer to form 1-2 large aggregations to keep hardware fully utilized and retains the rest of the frames in mac80211 layer to be controlled by the CoDel algorithm. Signed-off-by: Kan Yan <kyan@google.com> [ Toke: Keep API to set pending airtime internal, fix nits in commit msg ] Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20191119060610.76681-4-kyan@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Toke Høiland-Jørgensen	db3e1c40cf	mac80211: Import airtime calculation code from mt76 Felix recently added code to calculate airtime of packets to the mt76 driver. Import this into mac80211 so we can use it for airtime queue limit calculations. The airtime.c file is copied verbatim from the mt76 driver, and adjusted to be usable in mac80211. This involves: - Switching to mac80211 data structures. - Adding support for 160 MHz channels and HE mode. - Moving the symbol and duration calculations around a bit to avoid rounding with the higher rates and longer symbol times used for HE rates. The per-rate TX rate calculation is also split out to its own function so it can be used directly for the AQL calculations later. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20191119060610.76681-3-kyan@google.com [fix HE_GROUP_IDX() to use 3 * bw, since there are 3 _gi values] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Taehee Yoo	bc71d8b580	virt_wifi: fix use-after-free in virt_wifi_newlink() When virt_wifi interface is created, virt_wifi_newlink() is called and it calls register_netdevice(). if register_netdevice() fails, it internally would call ->priv_destructor(), which is virt_wifi_net_device_destructor() and it frees netdev. but virt_wifi_newlink() still use netdev. So, use-after-free would occur in virt_wifi_newlink(). Test commands: ip link add dummy0 type dummy modprobe bonding ip link add bonding_masters link dummy0 type virt_wifi Splat looks like: [ 202.220554] BUG: KASAN: use-after-free in virt_wifi_newlink+0x88b/0x9a0 [virt_wifi] [ 202.221659] Read of size 8 at addr ffff888061629cb8 by task ip/852 [ 202.222896] CPU: 1 PID: 852 Comm: ip Not tainted 5.4.0-rc5 #3 [ 202.223765] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 202.225073] Call Trace: [ 202.225532] dump_stack+0x7c/0xbb [ 202.226869] print_address_description.constprop.5+0x1be/0x360 [ 202.229362] __kasan_report+0x12a/0x16f [ 202.230714] kasan_report+0xe/0x20 [ 202.232595] virt_wifi_newlink+0x88b/0x9a0 [virt_wifi] [ 202.233370] __rtnl_newlink+0xb9f/0x11b0 [ 202.244909] rtnl_newlink+0x65/0x90 [ ... ] Cc: stable@vger.kernel.org Fixes: `c7cdba31ed` ("mac80211-next: rtnetlink wifi simulation device") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Link: https://lore.kernel.org/r/20191121122645.9355-1-ap420073@gmail.com [trim stack dump a bit] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Thomas Pedersen	08a5bdde38	mac80211: consider QoS Null frames for STA_NULLFUNC_ACKED Commit `7b6ddeaf27` ("mac80211: use QoS NDP for AP probing") let STAs send QoS Null frames as PS triggers if the AP was a QoS STA. However, the mac80211 PS stack relies on an interface flag IEEE80211_STA_NULLFUNC_ACKED for determining trigger frame ACK, which was not being set for acked non-QoS Null frames. The effect is an inability to trigger hardware sleep via IEEE80211_CONF_PS since the QoS Null frame was seemingly never acked. This bug only applies to drivers which set both IEEE80211_HW_REPORTS_TX_ACK_STATUS and IEEE80211_HW_PS_NULLFUNC_STACK. Detect the acked QoS Null frame to restore STA power save. Fixes: `7b6ddeaf27` ("mac80211: use QoS NDP for AP probing") Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20191119053538.25979-4-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Thomas Pedersen	c90142a518	mac80211: expose HW conf flags through debugfs This is useful during testing to eg. check the currently configured HW power save state. Signed-off-by: Thomas Pedersen <thomas@adapt-ip.com> Link: https://lore.kernel.org/r/20191119053538.25979-3-thomas@adapt-ip.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 13:36:25 +01:00
Toke Høiland-Jørgensen	5072f73cb6	mac80211: Add new sta_info getter by sta/vif addrs In ieee80211_tx_status() we don't have an sdata struct when looking up the destination sta. Instead, we just do a lookup by the vif addr that is the source of the packet being completed. Factor this out into a new sta_info getter helper, since we need to use it for accounting AQL as well. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20191112130835.382062-1-toke@redhat.com [remove internal rcu_read_lock(), document instead] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 12:53:53 +01:00
Johannes Berg	b226a826d8	mac80211: add a comment about monitor-to-dev injection Add a note with a use-case for the monitor-to-dev injection mechanism in mac80211, reported by Ben Greear. Change-Id: I6456997ef9bc40b24ede860b6ef2fed5af49cf44 Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2019-11-22 12:42:42 +01:00
Mao Wenan	13baf667fa	enetc: make enetc_setup_tc_mqprio static While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile, make C=2 drivers/net/ethernet/freescale/enetc/enetc.o one warning can be found: drivers/net/ethernet/freescale/enetc/enetc.c:1439:5: warning: symbol 'enetc_setup_tc_mqprio' was not declared. Should it be static? This patch make symbol enetc_setup_tc_mqprio static. Fixes: `34c6adf197` ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 19:30:11 -08:00
David S. Miller	7d75c0cb22	Merge branch 'net-introduce-and-use-route-hint' Paolo Abeni says: ==================== net: introduce and use route hint This series leverages the listification infrastructure to avoid unnecessary route lookup on ingress packets. In absence of custom rules, packets with equal daddr will usually land on the same dst. When processing packet bursts (lists) we can easily reference the previous dst entry. When we hit the 'same destination' condition we can avoid the route lookup, coping the already available dst. Detailed performance numbers are available in the individual commit messages. v3 -> v4: - move helpers to their own patches (Eric D.) - enable hints for SUBTREE builds (David A.) - re-enable hints for ipv4 forward (David A.) v2 -> v3: - use fib_has_custom_rules() helpers (David A.) - add ip_extract_route_hint() helper (Edward C.) - use prev skb as hint instead of copying data (Willem ) v1 -> v2: - fix build issue with !CONFIG_IP*_MULTIPLE_TABLES - fix potential race in ip6_list_rcv_finish() ==================== Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:46:31 -08:00
Paolo Abeni	02b2494161	ipv4: use dst hint for ipv4 list receive This is alike the previous change, with some additional ipv4 specific quirk. Even when using the route hint we still have to do perform additional per packet checks about source address validity: a new helper is added to wrap them. Hints are explicitly disabled if the destination is a local broadcast, that keeps the code simple and local broadcast are a slower path anyway. UDP flood performances vs recvmmsg() receiver: vanilla patched delta Kpps Kpps % 1683 1871 +11 In the worst case scenario - each packet has a different destination address - the performance delta is within noise range. v3 -> v4: - re-enable hints for forward v2 -> v3: - really fix build (sic) and hint usage check - use fib4_has_custom_rules() helpers (David A.) - add ip_extract_route_hint() helper (Edward C.) - use prev skb as hint instead of copying data (Willem) v1 -> v2: - fix build issue with !CONFIG_IP_MULTIPLE_TABLES Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:45:55 -08:00
Paolo Abeni	c43c3d76c0	ipv4: move fib4_has_custom_rules() helper to public header So that we can use it in the next patch. Additionally constify the helper argument. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:45:55 -08:00
Paolo Abeni	197dbf24e3	ipv6: introduce and uses route look hints for list input. When doing RX batch packet processing, we currently always repeat the route lookup for each ingress packet. When no custom rules are in place, and there aren't routes depending on source addresses, we know that packets with the same destination address will use the same dst. This change tries to avoid per packet route lookup caching the destination address of the latest successful lookup, and reusing it for the next packet when the above conditions are in place. Ingress traffic for most servers should fit. The measured performance delta under UDP flood vs a recvmmsg receiver is as follow: vanilla patched delta Kpps Kpps % 1431 1674 +17 In the worst-case scenario - each packet has a different destination address - the performance delta is within noise range. v3 -> v4: - support hints for SUBFLOW build, too (David A.) - several style fixes (Eric) v2 -> v3: - add fib6_has_custom_rules() helpers (David A.) - add ip6_extract_route_hint() helper (Edward C.) - use hint directly in ip6_list_rcv_finish() (Willem) v1 -> v2: - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES - fix potential race when fib6_has_custom_rules is set while processing a packet batch Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:45:55 -08:00
Paolo Abeni	b9b33e7c24	ipv6: keep track of routes using src Use a per namespace counter, increment it on successful creation of any route using the source address, decrement it on deletion of such routes. This allows us to check easily if the routing decision in the current namespace depends on the packet source. Will be used by the next patch. Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:45:55 -08:00
Paolo Abeni	1f8ac57030	ipv6: add fib6_has_custom_rules() helper It wraps the namespace field with the same name, to easily access it regardless of build options. Suggested-by: David Ahern <dsahern@gmail.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:45:55 -08:00
David S. Miller	2c44713ed9	Merge branch 'DSA-Felix-PTP' Yangbo Lu says: ==================== Support PTP clock and hardware timestamping for DSA Felix driver This patch-set is to support PTP clock and hardware timestamping for DSA Felix driver. Some functions in ocelot.c/ocelot_board.c driver were reworked/exported, so that DSA Felix driver was able to reuse them as much as possible. On TX path, timestamping works on packet which requires timestamp. The injection header will be configured accordingly, and skb clone requires timestamp will be added into a list. The TX timestamp is final handled in threaded interrupt handler when PTP timestamp FIFO is ready. On RX path, timestamping is always working. The RX timestamp could be got from extraction header. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:03 -08:00
Yangbo Lu	c0bcf53766	net: dsa: ocelot: add hardware timestamping support for Felix This patch is to reuse ocelot functions as possible to enable PTP clock and to support hardware timestamping on Felix. On TX path, timestamping works on packet which requires timestamp. The injection header will be configured accordingly, and skb clone requires timestamp will be added into a list. The TX timestamp is final handled in threaded interrupt handler when PTP timestamp FIFO is ready. On RX path, timestamping is always working. The RX timestamp could be got from extraction header. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:02 -08:00
Yangbo Lu	5df66c48bc	net: dsa: ocelot: define PTP registers for felix_vsc9959 This patch is to define PTP registers for felix_vsc9959. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:02 -08:00
Yangbo Lu	400928bf92	net: mscc: ocelot: convert to use ocelot_port_add_txtstamp_skb() Convert to use ocelot_port_add_txtstamp_skb() for adding skbs which require TX timestamp into list. Export it so that DSA Felix driver could reuse it too. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:02 -08:00
Yangbo Lu	e23a7b3e8d	net: mscc: ocelot: convert to use ocelot_get_txtstamp() The method getting TX timestamp by reading timestamp FIFO and matching skbs list is common for DSA Felix driver too. So move code out of ocelot_board.c, convert to use ocelot_get_txtstamp() function and export it. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:02 -08:00
Yangbo Lu	f145922ddc	net: mscc: ocelot: export ocelot_hwstamp_get/set functions Export ocelot_hwstamp_get/set functions so that DSA driver is able to reuse them. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 14:39:02 -08:00
John Fastabend	8163999db4	bpf: skmsg, fix potential psock NULL pointer dereference Report from Dan Carpenter, net/core/skmsg.c:792 sk_psock_write_space() error: we previously assumed 'psock' could be null (see line 790) net/core/skmsg.c 789 psock = sk_psock(sk); 790 if (likely(psock && sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))) Check for NULL 791 schedule_work(&psock->work); 792 write_space = psock->saved_write_space; ^^^^^^^^^^^^^^^^^^^^^^^^ 793 rcu_read_unlock(); 794 write_space(sk); Ensure psock dereference on line 792 only occurs if psock is not null. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 13:43:45 -08:00
Jiri Olsa	7599a896f2	audit: Move audit_log_task declaration under CONFIG_AUDITSYSCALL The 0-DAY found that audit_log_task is not declared under CONFIG_AUDITSYSCALL which causes compilation error when it is not defined: kernel/bpf/syscall.o: In function `bpf_audit_prog.isra.30': >> syscall.c:(.text+0x860): undefined reference to `audit_log_task' Adding the audit_log_task declaration and stub within CONFIG_AUDITSYSCALL ifdef. Fixes: `91e6015b08` ("bpf: Emit audit messages upon successful prog load and unload") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 12:14:49 -08:00
Krzysztof Kozlowski	43da14110c	net: Fix Kconfig indentation, continued Adjust indentation from spaces to tab (+optional two spaces) as in coding style. This fixes various indentation mixups (seven spaces, tab+one space, etc). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 12:00:21 -08:00
Krzysztof Kozlowski	5421cf84af	drivers: net: Fix Kconfig indentation, continued Adjust indentation from spaces to tab (+optional two spaces) as in coding style. This fixes various indentation mixups (seven spaces, tab+one space, etc). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:54:09 -08:00
Xin Long	1841b98299	lwtunnel: check erspan options before allocating tun_info As Jakub suggested on another patch, it's better to do the check on erspan options before allocating memory. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:47:39 -08:00
Xin Long	7b6a70f737	lwtunnel: be STRICT to validate the new LWTUNNEL_IP(6)_OPTS LWTUNNEL_IP(6)_OPTS are the new items in ip(6)_tun_policy, which are parsed by nla_parse_nested_deprecated(). We should check it strictly by setting .strict_start_type = LWTUNNEL_IP(6)_OPTS. This patch also adds missing LWTUNNEL_IP6_OPTS in ip6_tun_policy. Fixes: `4ece477870` ("lwtunnel: add options setting and dumping for geneve") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:47:23 -08:00
Xin Long	f3bed7f8f9	net: remove the unnecessary strict_start_type in some policies ct_policy and mpls_policy are parsed with nla_parse_nested(), which does NL_VALIDATE_STRICT validation, strict_start_type is not needed to set as it is actually trying to make some attributes parsed with NL_VALIDATE_STRICT. This patch is to remove it, and do the same on rtm_nh_policy which is parsed by nlmsg_parse(). Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:46:18 -08:00
David S. Miller	ff998a80c3	Merge branch 'net-sched-support-vxlan-and-erspan-options' Xin Long says: ==================== net: sched: support vxlan and erspan options This patchset is to add vxlan and erspan options support in cls_flower and act_tunnel_key. The form is pretty much like geneve_opts in: https://patchwork.ozlabs.org/patch/935272/ https://patchwork.ozlabs.org/patch/954564/ but only one option is allowed for vxlan and erspan. v1->v2: - see each patch changelog. ==================== Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	79b1011cb3	net: sched: allow flower to match erspan options This patch is to allow matching options in erspan. The options can be described in the form: VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK. When ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. Different from geneve, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. # ip link add name erspan1 type erspan external # tc qdisc add dev erspan1 ingress # tc filter add dev erspan1 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ erspan_opts 1:12:0:0/1:ffff:0:0 \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - improve some err msgs of extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	d8f9dfae49	net: sched: allow flower to match vxlan options This patch is to allow matching gbp option in vxlan. The options can be described in the form GBP/GBP_MASK, where GBP is represented as a 32bit hexadecimal value. Different from geneve, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev vxlan0 ingress # tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ vxlan_opts 01020304/ffffffff \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	e20d4ff2ac	net: sched: add erspan option support to act_tunnel_key This patch is to allow setting erspan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. Options are expressed as ver:index:dir:hwid, when ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. # ip link add name erspan1 type erspan external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ erspan_opts 1:2:0:0 \ action mirred egress redirect dev erspan1 v1->v2: - do the validation when dst is not yet allocated as Jakub suggested. - use Duplicate instead of Wrong in err msg for extack. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00
Xin Long	fca3f91cc3	net: sched: add vxlan option support to act_tunnel_key This patch is to allow setting vxlan options using the act_tunnel_key action. Different from geneve options, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. gbp is the only param for vxlan options: # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ vxlan_opts 01020304 \ action mirred egress redirect dev vxlan0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-21 11:44:06 -08:00

1 2 3 4 5 ...

875139 Commits