Commit Graph

619785 Commits

Author SHA1 Message Date
David Howells
dfc3da4404 rxrpc: Need to start the resend timer on initial transmission
When a DATA packet has its initial transmission, we may need to start or
adjust the resend timer.  Without this we end up relying on being sent a
NACK to initiate the resend.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 14:05:12 +01:00
David Howells
98dafac569 rxrpc: Use before_eq() and friends to compare serial numbers
before_eq() and friends should be used to compare serial numbers (when not
checking for (non)equality) rather than casting to int, subtracting and
checking the result.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 14:05:08 +01:00
David S. Miller
7a048d5adc Merge branch 'bpf-helper-improvements'
Daniel Borkmann says:

====================
Few minor BPF helper improvements

Just a few minor improvements around BPF helpers, first one is a
fix but given this late stage and that it's not really a critical
one, I think net-next is just fine. For details please see the
individual patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:36 -04:00
Daniel Borkmann
7a4b28c6cc bpf: add helper to invalidate hash
Add a small helper that complements 36bbef52c7 ("bpf: direct packet
write and access for helpers for clsact progs") for invalidating the
current skb->hash after mangling on headers via direct packet write.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:28 -04:00
Daniel Borkmann
669dc4d76d bpf: use bpf_get_smp_processor_id_proto instead of raw one
Same motivation as in commit 80b48c4457 ("bpf: don't use raw processor
id in generic helper"), but this time for XDP typed programs. Thus, allow
for preemption checks when we have DEBUG_PREEMPT enabled, and otherwise
use the raw variant.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:28 -04:00
Daniel Borkmann
2d48c5f933 bpf: use skb_to_full_sk helper in bpf_skb_under_cgroup
We need to use skb_to_full_sk() helper introduced in commit bd5eb35f16
("xfrm: take care of request sockets") as otherwise we miss tcp synack
messages, since ownership is on request socket and therefore it would
miss the sk_fullsock() check. Use skb_to_full_sk() as also done similarly
in the bpf_get_cgroup_classid() helper via 2309236c13 ("cls_cgroup:
get sk_classid only from full sockets") fix to not let this fall through.

Fixes: 4a482f34af ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:40:27 -04:00
David S. Miller
c14fec3969 Merge branch 'hv_netvsc-next'
Stephen Hemminger says:

====================
hv_netvsc changes

These are mostly about improving the handling of interaction between
the virtual network device (netvsc) and the SR-IOV VF network device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:54 -04:00
Stephen Hemminger
f7ad75b753 hv_netvsc: count multicast packets received
Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger
9cbcc42806 hv_netvsc: remove VF in flight counters
Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger
f207c10d98 hv_netvsc: use RCU to protect vf_netdev
The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger
e8ff40d4bf hv_netvsc: improve VF device matching
The code to associate netvsc and VF devices can be made less error prone
by using a better matching algorithms.

On registration, use the permanent address which avoids any possible
issues caused by device MAC address being changed. For all other callbacks,
search by the netdevice pointer value to ensure getting the correct
network device.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger
ee837a1373 hv_netvsc: simplify callback event code
The callback handler for netlink events can be simplified:
 * Consolidate check for netlink callback events about this driver itself.
 * Ignore non-Ethernet devices.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:49 -04:00
Stephen Hemminger
07d0f0008c hv_netvsc: dev hold/put reference to VF
The netvsc driver holds a pointer to the virtual function network device if
managing SR-IOV association. In order to ensure that the VF network device
does not disappear, it should be using dev_hold/dev_put to get a reference
count.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:48 -04:00
Stephen Hemminger
17db4bcef3 hv_netvsc: use consume_skb
Packets that are transmitted in normal path should use consume_skb
instead of kfree_skb. This allows for better tracing of packet drops.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:39:48 -04:00
David S. Miller
dd5a3005eb Merge branch 'dsa-port-fast-ageing'
Vivien Didelot says:

====================
net: dsa: add port fast ageing

Today the DSA drivers are in charge of flushing the MAC addresses
associated to a port when its STP state changes from Learning or
Forwarding, to Disabled or Blocking or Listening.

This makes the drivers more complex and hides this generic switch logic.

This patchset introduces a new optional port_fast_age operation to
dsa_switch_ops, to move this logic to the DSA layer and keep drivers
simple. b53 and mv88e6xxx are updated accordingly.
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:59 -04:00
Vivien Didelot
749efcb814 net: dsa: mv88e6xxx: implement DSA port fast ageing
Now that the DSA layer handles port fast ageing on correct STP change,
simplify _mv88e6xxx_port_state and implement mv88e6xxx_port_fast_age.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot
597698f1e0 net: dsa: b53: implement DSA port fast ageing
Remove the fast ageing logic from b53_br_set_stp_state and implement the
new DSA switch port_fast_age operation instead.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot
732f794c1b net: dsa: add port fast ageing
Today the DSA drivers are in charge of flushing the MAC addresses
associated to a port when its STP state changes from Learning or
Forwarding, to Disabled or Blocking or Listening.

This makes the drivers more complex and hides the generic switch logic.
Introduce a new optional port_fast_age operation to dsa_switch_ops, to
move this logic to the DSA layer and keep drivers simple.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Vivien Didelot
4acfee8143 net: dsa: add port STP state helper
Add a void helper to set the STP state of a port, checking first if the
required routine is provided by the driver.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:50 -04:00
Iyappan Subramanian
e3978673f5 drivers: net: xgene: Fix MSS programming
Current driver programs static value of MSS in hardware register for TSO
offload engine to segment the TCP payload regardless the MSS value
provided by network stack.

This patch fixes this by programming hardware registers with the
stack provided MSS value.

Since the hardware has the limitation of having only 4 MSS registers,
this patch uses reference count of mss values being used.

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 08:38:38 -04:00
David Howells
90bd684ded rxrpc: Should be using ktime_add_ms() not ktime_add_ns()
ktime_add_ms() should be used to add the resend time (in ms) rather than
ktime_add_ns().

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 13:23:09 +01:00
David Howells
c0d058c21c rxrpc: Make sure sendmsg() is woken on call completion
Make sure that sendmsg() gets woken up if the call it is waiting for
completes abnormally.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 13:23:09 +01:00
David Howells
9aff212bd6 rxrpc: Don't send an ACK at the end of service call response transmission
Don't send an IDLE ACK at the end of the transmission of the response to a
service call.  The service end resends DATA packets until the client sends an
ACK that hard-acks all the send data.  At that point, the call is complete.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 13:23:09 +01:00
David Howells
b24d2891cf rxrpc: Preset timestamp on Tx sk_buffs
Set the timestamp on sk_buffs holding packets to be transmitted before
queueing them because the moment the packet is on the queue it can be seen
by the retransmission algorithm - which may see a completely random
timestamp.

If the retransmission algorithm sees such a timestamp, it may retransmit
the packet and, in future, tell the congestion management algorithm that
the retransmit timer expired.

Signed-off-by: David Howells <dhowells@redhat.com>
2016-09-23 13:17:52 +01:00
Colin Ian King
e12934d980 cxgb4: fix signed wrap around when decrementing index idx
Change predecrement compare to post decrement compare to avoid an
unsigned integer wrap-around comparison when decrementing idx in
the while loop.

For example, when idx is zero, the current situation will
predecrement idx in the while loop, wrapping idx to the maximum
signed integer and cause out of bounds reads on rxq_info->msix_tbl[idx].

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:25:16 -04:00
David S. Miller
c7b9e63341 Merge branch 'mlx5-sriov-vlan-push-pop'
Saeed Mahameed says:

====================
Mellanox 100G SRIOV offloads vlan push/pop

From Or Gerlitz:

This series further enhances the SRIOV TC offloads of mlx5 to handle
the TC vlan push and pop actions. This serves a common use-case in
virtualization systems where the virtual switch add (push) vlan tags
to packets sent from VMs and removes (pop) vlan tags before the packet
is received by the VM. We use the new E-Switch switchdev mode and the
TC vlan action to achieve that also in SW defined SRIOV environments by
offloading TC rules that contain this action along with forwarding
(TC mirred/redirect action) the packet.

In the first patch we add some helpers to access the TC vlan action info
by offloading drivers. The next five patches don't add any new functionality,
they do some refactoring and cleanups in the current code to be used next.

The seventh patch deals with supporting vlans by the mlx5 e-switch in switchdev
mode. The eighth patch does the vlan action offload from TC and the last patch
adds matching for vlans as typically required by TC flows that involve vlan
pop action.

The series was applied on top of commit 524605e "cxgb4: Convert to use simple_open()"
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:19 -04:00
Or Gerlitz
095b6cfd69 net/mlx5e: Add TC vlan match parsing
Enhance the parsing of offloaded TC rules matches to handle vlans.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
8b32580df1 net/mlx5e: Add TC vlan action for SRIOV offloads
Parse TC vlan actions and set the required elements to allow offloading.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
f5f8247609 net/mlx5: E-Switch, Support VLAN actions in the offloads mode
Many virtualization systems use a policy under which a vlan tag is
pushed to packets sent by guests, and popped before the packet is
forwarded to the VM.

The current generation of the mlx5 HW doesn't fully support that on
a per flow level. As such, we are addressing the above common use
case with the SRIOV e-Switch abilities to push vlan into packets
sent by VFs and pop vlan from packets forwarded to VFs.

The HW can match on the correct vlan being present in packets
forwarded to VFs (eSwitch steering is done before stripping
the tag), so this part is offloaded as is.

A common practice for vlans is to avoid both push vlan and pop vlan
for inter-host VM/VM (east-west) communication because in this case,
push on egress cancels out with pop on ingress.

For supporting that, we use a global eswitch vlan pop policy, hence
allowing guest A to communicate with both remote VM B and local VM C.
This works since the HW pops the vlan only if it exists (e.g for
C --> A packets but not for B --> A packets).

On the slow path, when a VF vport has an offloaded flow which involves
pushing vlans, wheres another flow is not currently offloaded, the
packets from the 2nd flow seen by the VF representor on the host have
vlan. The VF rep driver removes such vlan before calling into the host
networking stack.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
8515c581df net/mlx5e: Refactor retrival of skb from rx completion element (cqe)
Factor the relevant code into a static inline helper (skb_from_cqe)
doing that.

Move the call to napi_gro_receive to be carried out just
after mlx5e_complete_rx_cqe returns.

Both changes are to be used for the VF representor as well
in the next commit.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
776b12b674 net/mlx5: Put elements related to offloaded TC rule in one struct
Put the representors related to the source and dest vports and the
action in struct mlx5_esw_flow_attr which is used while setting the FDB rule.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
e33dfe316c net/mlx5: E-Switch, Allow fine tuning of eswitch vport push/pop vlan
The HW can be programmed to push vlan, pop vlan or both.

A factorization step towards using the push/pop capabilties in the
eswitch offloads mode. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
bac9b6aa1d net/mlx5: E-Switch, Set vport representor fields explicitly on registration
The structure we use for the eswitch vport representor (mlx5_eswitch_rep)
has some fields which are set from upper layers in the driver when they
register the rep. Use explicit setting on registration time for them and
avoid global memcpy. This patch doesn't add new functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
9deb2241f1 net/mlx5: E-Switch, Set the vport when registering the uplink rep
Set the vport value in the PF entry to be that of the uplink so
we can use it blindly over the tc / eswitch offload code without
translating it each time we deal with the uplink representor.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:12 -04:00
Or Gerlitz
53e89941ba net_sched: act_vlan: add helper inlines to access tcf_vlan info
Needed e.g for offloading drivers to pick the relevant attributes.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:22:11 -04:00
Eric Dumazet
fefa569a9d net_sched: sch_fq: account for schedule/timers drifts
It looks like the following patch can make FQ very precise, even in VM
or stressed hosts. It matters at high pacing rates.

We take into account the difference between the time that was programmed
when last packet was sent, and current time (a drift of tens of usecs is
often observed)

Add an EWMA of the unthrottle latency to help diagnostics.

This latency is the difference between current time and oldest packet in
delayed RB-tree. This accounts for the high resolution timer latency,
but can be different under stress, as fq_check_throttled() can be
opportunistically be called from a dequeue() called after an enqueue()
for a different flow.

Tested:
// Start a 10Gbit flow
$ netperf --google-pacing-rate 1250000000 -H lpaa24 -l 10000 -- -K bbr &

Before patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17106.04 756876.84   1102.75 1119049.02      0.00      0.00      0.52

After patch :
$ sar -n DEV 10 5 | grep eth0 | grep Average
Average:         eth0  17867.00 800245.90   1151.77 1183172.12      0.00      0.00      0.52

A new iproute2 tc can output the 'unthrottle latency' :

$ tc -s qd sh dev eth0 | grep latency
  0 gc, 0 highprio, 32490767 throttled, 2382 ns latency

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:19:06 -04:00
Bert Kenward
429baa6f0e sfc: check async completer is !NULL before calling
Add a NULL check before calling asynchronous MCDI completion functions
during device removal.

Fixes: 7014d7f6 ("sfc: allow asynchronous MCDI without completion function")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 07:18:35 -04:00
David S. Miller
88e4f75900 Merge branch 'sctp-fix-gap-ack-blocks'
Marcelo Ricardo Leitner says:

====================
sctp: fix the handling of SACK Gap Ack blocks

sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:55:04 -04:00
Marcelo Ricardo Leitner
182691d099 sctp: improve how SSN, TSN and ASCONF serial are compared
Make it similar to time_before() macros:
- easier to understand
- make use of typecheck() to avoid working on unexpected variable types
  (made the issue on previous patch visible)
- for _[lg]te versions, slighly faster, as the compiler used to generate
  a sequence of cmp/je/cmp/js instructions and now it's sub/test/jle
  (for _lte):

Before, for sctp_outq_sack:
	if (primary->cacc.changeover_active) {
    1f01:	80 b9 84 02 00 00 00 	cmpb   $0x0,0x284(%rcx)
    1f08:	74 6e                	je     1f78 <sctp_outq_sack+0xe8>
		u8 clear_cycling = 0;

		if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
    1f0a:	8b 81 80 02 00 00    	mov    0x280(%rcx),%eax
	return ((s) - (t)) & TSN_SIGN_BIT;
}

static inline int TSN_lte(__u32 s, __u32 t)
{
	return ((s) == (t)) || (((s) - (t)) & TSN_SIGN_BIT);
    1f10:	8b 7d bc             	mov    -0x44(%rbp),%edi
    1f13:	39 c7                	cmp    %eax,%edi
    1f15:	74 25                	je     1f3c <sctp_outq_sack+0xac>
    1f17:	39 f8                	cmp    %edi,%eax
    1f19:	78 21                	js     1f3c <sctp_outq_sack+0xac>
			primary->cacc.changeover_active = 0;

After:
	if (primary->cacc.changeover_active) {
    1ee7:	80 b9 84 02 00 00 00 	cmpb   $0x0,0x284(%rcx)
    1eee:	74 73                	je     1f63 <sctp_outq_sack+0xf3>
		u8 clear_cycling = 0;

		if (TSN_lte(primary->cacc.next_tsn_at_change, sack_ctsn)) {
    1ef0:	8b 81 80 02 00 00    	mov    0x280(%rcx),%eax
    1ef6:	2b 45 b4             	sub    -0x4c(%rbp),%eax
    1ef9:	85 c0                	test   %eax,%eax
    1efb:	7e 26                	jle    1f23 <sctp_outq_sack+0xb3>
			primary->cacc.changeover_active = 0;

*_lt() generated pretty much the same code.
Tested with gcc (GCC) 6.1.1 20160621.

This patch also removes SSN_lte as it is not used and cleanups some
comments.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:54:58 -04:00
Marcelo Ricardo Leitner
a3007446e5 sctp: fix the handling of SACK Gap Ack blocks
sctp_acked() is using 32bit arithmetics on 16bits vars, via TSN_lte()
macros, which is weird and confusing.

Once the offset to ctsn is calculated, all wrapping is already handled
and thus to verify the Gap Ack blocks we can just use pure
less/big-or-equal than checks.

Also, rename gap variable to tsn_offset, so it's more meaningful, as
it doesn't point to any gap at all.

Even so, I don't think this discrepancy resulted in any practical bug.

This patch is a preparation for the next one, which will introduce
typecheck() for TSN_lte() macros and would cause a compile error here.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:54:58 -04:00
WANG Cong
21641c2e1f net_sched: check NULL on error path in route4_change()
On error path in route4_change(), 'f' could be NULL,
so we should check NULL before calling tcf_exts_destroy().

Fixes: b9a24bb76b ("net_sched: properly handle failure case of tcf_exts_init()")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-09-23 06:51:49 -04:00
David S. Miller
d6989d4bbe Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-09-23 06:46:57 -04:00
Pablo Neira Ayuso
4004d5c374 netfilter: nft_lookup: remove superfluous element found check
We already checked for !found just a bit before:

        if (!found) {
                regs->verdict.code = NFT_BREAK;
                return;
        }

        if (found && set->flags & NFT_SET_MAP)
            ^^^^^

So this redundant check can just go away.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:48 +02:00
Gao Feng
b9d80f83bf netfilter: xt_helper: Use sizeof(variable) instead of literal number
It's better to use sizeof(info->name)-1 as index to force set the string
tail instead of literal number '29'.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:43 +02:00
Gao Feng
7bdc66242d netfilter: Enhance the codes used to get random once
There are some codes which are used to get one random once in netfilter.
We could use net_get_random_once to simplify these codes.

Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:36 +02:00
Liping Zhang
a20877b5ed netfilter: nf_tables: check tprot_set first when we use xt.thoff
pkt->xt.thoff is not always set properly, but we use it without any check.
For payload expr, it will cause wrong results. For nftrace, we may notify
the wrong network or transport header to the user space, furthermore,
input the following nft rules, warning message will be printed out:
  # nft add rule arp filter output meta nftrace set 1

  WARNING: CPU: 0 PID: 13428 at net/netfilter/nf_tables_trace.c:263
  nft_trace_notify+0x4a3/0x5e0 [nf_tables]
  Call Trace:
  [<ffffffff813d58ae>] dump_stack+0x63/0x85
  [<ffffffff810a4c0b>] __warn+0xcb/0xf0
  [<ffffffff810a4d3d>] warn_slowpath_null+0x1d/0x20
  [<ffffffffa0589703>] nft_trace_notify+0x4a3/0x5e0 [nf_tables]
  [ ... ]
  [<ffffffffa05690a8>] nft_do_chain_arp+0x78/0x90 [nf_tables_arp]
  [<ffffffff816f4aa2>] nf_iterate+0x62/0x80
  [<ffffffff816f4b33>] nf_hook_slow+0x73/0xd0
  [<ffffffff81732bbf>] arp_xmit+0x8f/0xb0
  [ ... ]
  [<ffffffff81732d36>] arp_solicit+0x106/0x2c0

So before we use pkt->xt.thoff, check the tprot_set first.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:26 +02:00
Liping Zhang
8dc3c2b86b netfilter: nf_tables: improve nft payload fast eval
There's an off-by-one issue in nft_payload_fast_eval, skb_tail_pointer
and ptr + priv->len all point to the last valid address plus 1. So if
they are equal, we can still fetch the valid data. It's unnecessary to
fall back to nft_payload_eval.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:16 +02:00
Liping Zhang
2462f3f4a7 netfilter: nf_queue: improve queue range support for bridge family
After commit ac28634456 ("netfilter: bridge: add nf_afinfo to enable
queuing to userspace"), we can queue packets to the user space in bridge
family. But when the user specify the queue range, packets will be only
delivered to the first queue num. Because in nfqueue_hash, we only support
ipv4 and ipv6 family. Now add support for bridge family too.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:30:01 +02:00
Liping Zhang
8061bb5443 netfilter: nft_queue: add _SREG_QNUM attr to select the queue number
Currently, the user can specify the queue numbers by _QUEUE_NUM and
_QUEUE_TOTAL attributes, this is enough in most situations.

But acctually, it is not very flexible, for example:
  tcp dport 80 mapped to queue0
  tcp dport 81 mapped to queue1
  tcp dport 82 mapped to queue2
In order to do this thing, we must add 3 nft rules, and more
mapping meant more rules ...

So take one register to select the queue number, then we can add one
simple rule to mapping queues, maybe like this:
  queue num tcp dport map { 80:0, 81:1, 82:2 ... }

Florian Westphal also proposed wider usage scenarios:
  queue num jhash ip saddr . ip daddr mod ...
  queue num meta cpu ...
  queue num meta mark ...

The last point is how to load a queue number from sreg, although we can
use *(u16*)&regs->data[reg] to load the queue number, just like nat expr
to load its l4port do.

But we will cooperate with hash expr, meta cpu, meta mark expr and so on.
They all store the result to u32 type, so cast it to u16 pointer and
dereference it will generate wrong result in the big endian system.

So just keep it simple, we treat queue number as u32 type, although u16
type is already enough.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:29:50 +02:00
Laura Garcia Liebana
36b701fae1 netfilter: nf_tables: validate maximum value of u32 netlink attributes
Fetch value and validate u32 netlink attribute. This validation is
usually required when the u32 netlink attributes are being stored in a
field whose size is smaller.

This patch revisits 4da449ae1d ("netfilter: nft_exthdr: Add size check
on u8 nft_exthdr attributes").

Fixes: 96518518cc ("netfilter: add nftables")
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-09-23 09:29:02 +02:00