In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.
The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.
When flushing a particular table, skip any aliases belonging to a
different one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Patrick McHardy <kaber@trash.net>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct sw_flow_key has two 16-bit holes. Move the most matched
conntrack match fields there. In some typical cases this reduces the
size of the key that needs to be hashed into half and into one cache
line.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stateful network admission policy may allow connections to one
direction and reject connections initiated in the other direction.
After policy change it is possible that for a new connection an
overlapping conntrack entry already exists, where the original
direction of the existing connection is opposed to the new
connection's initial packet.
Most importantly, conntrack state relating to the current packet gets
the "reply" designation based on whether the original direction tuple
or the reply direction tuple matched. If this "directionality" is
wrong w.r.t. to the stateful network admission policy it may happen
that packets in neither direction are correctly admitted.
This patch adds a new "force commit" option to the OVS conntrack
action that checks the original direction of an existing conntrack
entry. If that direction is opposed to the current packet, the
existing conntrack entry is deleted and a new one is subsequently
created in the correct direction.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the fields of the conntrack original direction 5-tuple to struct
sw_flow_key. The new fields are initially marked as non-existent, and
are populated whenever a conntrack action is executed and either finds
or generates a conntrack entry. This means that these fields exist
for all packets that were not rejected by conntrack as untrackable.
The original tuple fields in the sw_flow_key are filled from the
original direction tuple of the conntrack entry relating to the
current packet, or from the original direction tuple of the master
conntrack entry, if the current conntrack entry has a master.
Generally, expected connections of connections having an assigned
helper (e.g., FTP), have a master conntrack entry.
The main purpose of the new conntrack original tuple fields is to
allow matching on them for policy decision purposes, with the premise
that the admissibility of tracked connections reply packets (as well
as original direction packets), and both direction packets of any
related connections may be based on ACL rules applying to the master
connection's original direction 5-tuple. This also makes it easier to
make policy decisions when the actual packet headers might have been
transformed by NAT, as the original direction 5-tuple represents the
packet headers before any such transformation.
When using the original direction 5-tuple the admissibility of return
and/or related packets need not be based on the mere existence of a
conntrack entry, allowing separation of admission policy from the
established conntrack state. While existence of a conntrack entry is
required for admission of the return or related packets, policy
changes can render connections that were initially admitted to be
rejected or dropped afterwards. If the admission of the return and
related packets was based on mere conntrack state (e.g., connection
being in an established state), a policy change that would make the
connection rejected or dropped would need to find and delete all
conntrack entries affected by such a change. When using the original
direction 5-tuple matching the affected conntrack entries can be
allowed to time out instead, as the established state of the
connection would not need to be the basis for packet admission any
more.
It should be noted that the directionality of related connections may
be the same or different than that of the master connection, and
neither the original direction 5-tuple nor the conntrack state bits
carry this information. If needed, the directionality of the master
connection can be stored in master's conntrack mark or labels, which
are automatically inherited by the expected related connections.
The fact that neither ARP nor ND packets are trackable by conntrack
allows mutual exclusion between ARP/ND and the new conntrack original
tuple fields. Hence, the IP addresses are overlaid in union with ARP
and ND fields. This allows the sw_flow_key to not grow much due to
this patch, but it also means that we must be careful to never use the
new key fields with ARP or ND packets. ARP is easy to distinguish and
keep mutually exclusive based on the ethernet type, but ND being an
ICMPv6 protocol requires a bit more attention.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We avoid calling into nf_conntrack_in() for expected connections, as
that would remove the expectation that we want to stick around until
we are ready to commit the connection. Instead, we do a lookup in the
expectation table directly. However, after a successful expectation
lookup we have set the flow key label field from the master
connection, whereas nf_conntrack_in() does not do this. This leads to
master's labels being inherited after an expectation lookup, but those
labels not being inherited after the corresponding conntrack action
with a commit flag.
This patch resolves the problem by changing the commit code path to
also inherit the master's labels to the expected connection.
Resolving this conflict in favor of inheriting the labels allows more
information be passed from the master connection to related
connections, which would otherwise be much harder if the 32 bits in
the connmark are not enough. Labels can still be set explicitly, so
this change only affects the default values of the labels in presense
of a master connection.
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactoring conntrack labels initialization makes changes in later
patches easier to review.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 23014011ba ("netfilter: conntrack: support a fixed size of 128
distinct labels"), the size of conntrack labels extension has fixed to
128 bits, so we do not need to check for labels sizes shorter than 128
at run-time. This patch simplifies labels length logic accordingly,
but allows the conntrack labels size to be increased in the future
without breaking the build. In the event of conntrack labels
increasing in size OVS would still be able to deal with the 128 first
label bits.
Suggested-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the array of labels in struct ovs_key_ct_label an union, adding a
u32 array of the same byte size as the existing u8 array. It is
faster to loop through the labels 32 bits at the time, which is also
the alignment of netlink attributes.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Receiving change events before the 'new' event for the connection has
been received can be confusing. Avoid triggering change events for
setting conntrack mark or labels before the conntrack entry has been
confirmed.
Fixes: 182e3042e1 ("openvswitch: Allow matching on conntrack mark")
Fixes: c2ac667358 ("openvswitch: Allow matching on conntrack label")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conntrack lookup for existing connections fails to invert the
packet 5-tuple for NATted packets, and therefore fails to find the
existing conntrack entry. Conntrack only stores 5-tuples for incoming
packets, and there are various situations where a lookup on a packet
that has already been transformed by NAT needs to be made. Looking up
an existing conntrack entry upon executing packet received from the
userspace is one of them.
This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
for the conntrack lookup whenever the packet has already been
transformed by conntrack from its input form as evidenced by one of
the NAT flags being set in the conntrack state metadata.
Fixes: 05752523e5 ("openvswitch: Interface with NAT.")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
they are combined into '_nfct'.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netanel Belgazal says:
====================
Bug Fixes in ENA driver
Changes from V3:
* Rebase patchset to master and solve merge conflicts.
* Remove redundant bug fix (fix error handling when probe fails)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the host info config to be the first admin command that is executed.
This change require the driver to remove the 'feature check'
from host info configuration flow.
The check is removed since the supported features bitmask field
is retrieved only after calling ENA_ADMIN_DEVICE_ATTRIBUTES admin command.
If set host info is not supported an error will be returned by the device.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The timeouts were too agressive and sometimes cause false alarms.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Completion descriptors are accessed from the driver and from the device.
To avoid reading the old value, use READ_ONCE macro.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not unamsk interrupts if we are in busy poll mode.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the ena driver detects that the device is not behave as expected,
it tries to reset the device.
The reset flow calls ena_down, which will frees all the resources
the driver allocates and then it will reset the device.
This flow can cause memory corruption if the device is still writes
to the driver's memory space.
To overcome this potential race, move the reset before the device
resources are freed.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ndo_get_stat64() can be called from atomic context, but the current
implementation sends an admin command to retrieve the statistics from
the device. This admin command can sleep.
This patch re-factors the implementation of ena_get_stats64() to use
the {rx,tx}bytes/count from the driver's inner counters, and to obtain
the rx drop counter from the asynchronous keep alive (heart bit)
event.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENA default hash configures IPv4_frag hash twice instead of
configure non-IP packets.
The bug caused IPv4 fragmented packets to be calculated based on
L2 source and destination address instead of L3 source and destination.
IPv4 packets can reach to the wrong Rx queue.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
treat the ena_flow_hash_to_flow_type enum as power of two values.
Change the values of ena_admin_flow_hash_fields to be power of two values.
This bug effect the ethtool set/get rxnfc.
ethtool will report wrong values hash fields for get and will
configure wrong hash fields in set.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ENA driver tries to open a queue per vCPU.
To determine how many vCPUs the instance have it uses num_possible_cpus()
while it should have use num_online_cpus() instead.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove NETIF_F_NTUPLE from netdev->features.
The ENA device driver does not support ntuple filtering.
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Govindarajulu Varadarajan says:
====================
enic: add vxlan offload support
This series adds vxlan offload support for enic driver. The first
patch adds vxlan devcmd for configuring vxland offload parameters.
Second patch adds ndo_udp_tunnel_add/del and offload on rx path.
There are to modes in which fw supports vxlan offload.
mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok
mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok
Some hw supports only mode 0, some support mode 0 and 2. Driver gets
the supported modes bitmap using get_supported_feature_ver devcmd
and selects the highest mode both driver and fw supports.
Third patch adds offload support on tx path by adding
enic_features_check().
v2: Order local variable declarations from longest to shortest line,
on all three patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Define ndo_features_check. Hw supports offload only for ipv4 inner and
ipv4 outer pkt.
Code refactor for setting inner tcp pseudo csum.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload.
enic supports offload of only one ipv4/udp port.
There are two modes that fw supports for vxlan offload.
mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok
mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok
tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd
overlay_offload_ctrl: enable/disable offload
overlay_offload_cfg: update offload udp port number
get_supported_feature_ver: get hw supported offload version. Each
version has different bitmap for csum_ok/encap
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move it out from the middle for the #defines to just before it is
needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While adding switch.o to the list of DSA object files, we essentially
duplicated the previous obj-y line and just added switch.o, remove the
duplicate.
Fixes: f515f192ab ("net: dsa: add switch notifier")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Timur Tabi says:
====================
net: qcom/emac: add the last ethtool functions
These two patches implement the remaining two ethtool functions that
are of interest to the Qualcomm EMAC driver. These are the last
patches that will be submitted for the 4.11 merge window.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the set_ringparam method, which allows the user to specify
the size of the TX and RX descriptor rings. The values are constrained
to the limits of the hardware.
Since the driver does not use separate queues for mini or jumbo frames,
attempts to set those values are rejected.
If the interface is already running when the setting is changed, then
the interface is reset.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the get_regs_len and get_regs ethtool methods. The driver
returns the values of selected hardware registers.
The make the register offsets known to emac_ethtool, the the register
offset macros are all combined into one header file. They were
inexplicably and arbitrarily split between two files.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The other instances of this structure got removed along with the MDIO
device change, but this one was left behind and needs to be removed
as well:
arch/arm/mach-orion5x/wnr854t-setup.c:109:44: error: 'wnr854t_switch_plat_data' defined but not used [-Werror=unused-variable]
static struct dsa_platform_data __initdata wnr854t_switch_plat_data = {
Fixes: 575e93f7b5 ("ARM: orion: Register DSA switch as a MDIO device")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: add sender-side procedures for stream reconf asoc reset and add streams
Patch 4/6 is to implement sender-side procedures for the SSN/TSN Reset
Request Parameter described in rfc6525 section 5.1.4, patch 3/6 is
ahead of it to define a function to make the request chunk for it.
Patch 6/6 is to implement sender-side procedures for the Add Incoming
and Outgoing Streams Request Parameter Request Parameter described in
rfc6525 section 5.1.5 and 5.1.6, patch 5/6 is ahead of it to define a
function to make the request chunk for it.
Patch 2/6 is a fix to recover streams states when it fails to send
request and Patch 1/6 is to drop some unncessary __packed from some
old structures.
v1->v2:
- put these into a smaller group.
- rename some temporary variables in the codes.
- rename the titles of the commits and improve some changelogs.
v2->v3:
- re-split the patchset and make sure it has no dead codes for review.
- move some codes into stream.c from socket.c.
v3->v4:
- add one more patch to fix a send reset stream request issue.
- doing actual work only when request is sent successfully.
- reduce some indents in sctp_send_add_streams.
v4->v5:
- close streams before sending request and recover them when sending
fails in patch 1/5 and patch 3/5
v5->v6:
- add patch 1/6 to drop some unncessary __packed from some old structures.
- remove __packed from some new structures in patch 3/6 and 5/6.
- define unsigned int outcnt and incnt to make codes smaller in patch 6/6.
- use krealloc instead of kcalloc and remove ksize check in patch 6/6, as
ksize check is acutally used in krealloc already.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the Add
Outgoing and Incoming Streams Request Parameter described in
rfc6525 section 5.1.5-5.1.6.
It is also to add sockopt SCTP_ADD_STREAMS in rfc6525 section
6.3.4 for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to define Add Incoming/Outgoing Streams Request
Parameter described in rfc6525 section 4.5 and 4.6. They can
be in one same chunk trunk as rfc6525 section 3.1-7 describes,
so make them in one function.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to implement Sender-Side Procedures for the SSN/TSN
Reset Request Parameter descibed in rfc6525 section 5.1.4.
It is also to add sockopt SCTP_RESET_ASSOC in rfc6525 section 6.3.3
for users.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to define SSN/TSN Reset Request Parameter described
in rfc6525 section 4.3.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when sending stream reset request, it closes the streams to
block further xmit of data until this request is completed, then
calls sctp_send_reconf to send the chunk.
But if sctp_send_reconf returns err, and it doesn't recover the
streams' states back, which means the request chunk would not be
queued and sent, so the asoc will get stuck, streams are closed
and no packet is even queued.
This patch is to fix it by recovering the streams' states when
it fails to send the request, it is also to fix a return value.
Fixes: 7f9d68ac94 ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 85c727b594 ("sctp: drop __packed from almost all SCTP structures")
has removed __packed from almost all SCTP structures. But there still are
three structures where it should be dropped.
This patch is to remove it from some stream reconf structures.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree says:
====================
sfc: more encap offloads
This patch series adds support for RX checksum offload of encapsulated packets.
It also adds support for configuring the hardware's lists of UDP ports used for
VXLAN and GENEVE encapsulation offloads. Since changing these lists causes the
MC to reboot, the driver has been hardened against reboots, which used to be
considered an exceptional occurrence but are now normal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ndo_udp_tunnel_{add,del} to update the NIC's list of VXLAN and
GENEVE UDP ports. Also reset the port list to empty on driver load and
on driver unload, with appropriate flag set on the unload case.
These port numbers are used for RX inner checksum offload, and in future
will also be used for TX inner checksum offload and encapsulated TSO.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function wasn't being called in this particular case when the MC
reboots. This caused resource reallocations to not be handled properly
and often ended up disabling the interface.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is mainly to prepare for a future overlay networking patch that
could cause an MC reset at probe time if the UDP tunnel port list is
set immediately upon driver load.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the csum_level for encapsulated packets where the encapsulation
type, l3 class and l4 class are sets that need it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RX checksum offload of encapsulated packets. This
essentially just means paying attention to the inner checksum flags
in the RX event, and if *either* checksum flag indicates a fail then
don't tell the kernel that checksum offload was successful.
Also, count these checksum errors and export the counts to ethtool -S.
Test the most common "good" case of RX events with a single bitmask
instead of a series of ifs. Move the more specific error checking
in to a separate function for clarity, and don't use unlikely() there
since we know at least one of the bits is bad.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>