We store the rule blob per (possible) cpu. Unfortunately this means we can
waste lot of memory on big smp machines. ipt_entry structure ('rule head')
is 112 byte, so e.g. with maxcpu=64 one single rule eats
close to 8k RAM.
Since previous patch made counters percpu it appears there is nothing
left in the rule blob that needs to be percpu.
On my test system (144 possible cpus, 400k dummy rules) this
change saves close to 9 Gigabyte of RAM.
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The binary arp/ip/ip6tables ruleset is stored per cpu.
The only reason left as to why we need percpu duplication are the rule
counters embedded into ipt_entry et al -- since each cpu has its own copy
of the rules, all counters can be lockless.
The downside is that the more cpus are supported, the more memory is
required. Rules are not just duplicated per online cpu but for each
possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
not for the e.g. 64 cores present.
To save some memory and also improve utilization of shared caches it
would be preferable to only store the rule blob once.
So we first need to separate counters and the rule blob.
Instead of using entry->counters, allocate this percpu and store the
percpu address in entry->counters.pcnt on CONFIG_SMP.
This change makes no sense as-is; it is merely an intermediate step to
remove the percpu duplication of the rule set in a followup patch.
Suggested-by: Eric Dumazet <edumazet@google.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
If bridge netfilter is used with both
bridge-nf-call-iptables and bridge-nf-filter-vlan-tagged enabled
then ip fragments in VLAN frames are sent without the vlan header.
This has never worked reliably. Turns out this relied on pre-3.5
behaviour where skb frag_list was used to store ip fragments;
ip_fragment() then re-used these skbs.
But since commit 3cc4949269
("ipv4: use skb coalescing in defragmentation") this is no longer
the case. ip_do_fragment now needs to allocate new skbs, but these
don't contain the vlan tag information anymore.
Fix it by storing vlan information of the ressembled skb in the
br netfilter percpu frag area, and restore them for each of the
fragments.
Fixes: 3cc4949269 ("ipv4: use skb coalescing in defragmentation")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
since commit d6b915e29f
("ip_fragment: don't forward defragmented DF packet") the largest
fragment size is available in the IPCB.
Therefore we no longer need to care about 'encapsulation'
overhead of stripped PPPOE/VLAN headers since ip_do_fragment
doesn't use device mtu in such cases.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv6 fragmented packets are not forwarded on an ethernet bridge
with netfilter ip6_tables loaded. e.g. steps to reproduce
1) create a simple bridge like this
modprobe br_netfilter
brctl addbr br0
brctl addif br0 eth0
brctl addif br0 eth2
ifconfig eth0 up
ifconfig eth2 up
ifconfig br0 up
2) place a host with an IPv6 address on each side of the bridge
set IPv6 address on host A:
ip -6 addr add fd01:2345:6789:1::1/64 dev eth0
set IPv6 address on host B:
ip -6 addr add fd01:2345:6789:1::2/64 dev eth0
3) run a simple ping command on host A with packets > MTU
ping6 -s 4000 fd01:2345:6789:1::2
4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge
IPv6 fragmented packets traverse the bridge cleanly until somebody runs.
"ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are
loaded) IPv6 fragmented packets do not traverse the bridge any more (you
see no more responses in ping's output).
After applying this patch IPv6 fragmented packets traverse the bridge
cleanly in above scenario.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
[pablo@netfilter.org: small changes to br_nf_dev_queue_xmit]
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Prepare check_hbh_len() to be called from newly introduced
br_validate_ipv6() in next commit.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
br_parse_ip_options() does not parse any IP options, it validates IP
packets as a whole and the function name is misleading.
Rename br_parse_ip_options() to br_validate_ipv4() and remove unneeded
commments.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Currently frag_max_size is member of br_input_skb_cb and copied back and
forth using IPCB(skb) and BR_INPUT_SKB_CB(skb) each time it is changed or
used.
Attach frag_max_size to nf_bridge_info and set value in pre_routing and
forward functions. Use its value in forward and xmit functions.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge.
e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-iptables=1
$ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
-j REDIRECT --to-ports 81
This does not work with ip6tables on a bridge in NAT66 scenario
because the REDIRECT/DNAT/SNAT is not correctly detected.
The bridge pre-routing (finish) netfilter hook has to check for a possible
redirect and then fix the destination mac address. This allows to use the
ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4
iptables version.
e.g. REDIRECT
$ sysctl -w net.bridge.bridge-nf-call-ip6tables=1
$ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
-j REDIRECT --to-ports 81
This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested
on a bridge with two interfaces using SNAT/DNAT NAT66 rules.
Reported-by: Artie Hamilton <artiemhamilton@yahoo.com>
Signed-off-by: Sven Eckelmann <sven@open-mesh.com>
[bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()]
[bernhard.thaler@wvnet.at: rebased, split into separate patches]
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Put br_nf_pre_routing_finish_ipv6() after daddr_was_changed() and
br_nf_pre_routing_finish_bridge() to prepare calling these functions
from there.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
use binary AND on complement of BRNF_NF_BRIDGE_PREROUTING to unset
bit in nf_bridge->mask.
Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
After db29a9508a ("netfilter: conntrack: disable generic tracking for
known protocols"), if the specific helper is built but not loaded
(a standard for most distributions) systems with a restrictive firewall
but weak configuration regarding netfilter modules to load, will
silently stop working.
This patch then puts a warning message so the sysadmin knows where to
start looking into. It's a pr_warn_once regardless of protocol itself
but it should be enough to give a hint on where to look.
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
We had various issues in the past when TCP stack was modifying
gso_size/gso_segs while clones were in flight.
Commit c52e2421f7 ("tcp: must unclone packets before mangling them")
fixed these bugs and added a WARN_ON_ONCE(skb_cloned(skb)); in
tcp_set_skb_tso_segs()
These bugs are now fixed, and because TCP stack now only sets
shinfo->gso_size|segs on the clone itself, the check can be removed.
As a result of this change, compiler inlines tcp_set_skb_tso_segs() in
tcp_init_tso_segs()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit cd7d8498c9 ("tcp: change tcp_skb_pcount() location") we stored
gso_segs in a temporary cache hot location.
This patch does the same for gso_size.
This allows to save 2 cache line misses in tcp xmit path for
the last packet that is considered but not sent because of
various conditions (cwnd, tso defer, receiver window, TSQ...)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_set_skb_tso_segs() & tcp_init_tso_segs() no longer
use the sock pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our goal is to touch skb_shinfo(skb) only when absolutely needed,
to avoid two cache line misses in TCP output path for last skb
that is considered but not sent because of various conditions
(cwnd, tso defer, receiver window, TSQ...)
A packet is GSO only when skb_shinfo(skb)->gso_size is not zero.
We can set skb_shinfo(skb)->gso_type to sk->sk_gso_type even for
non GSO packets.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_gso_segment() and tcp_gro_receive() are not strictly
part of TCP stack. They should not assume tcp_skb_mss(skb)
is in fact skb_shinfo(skb)->gso_size.
This will allow us to change tcp_skb_mss() in following patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a
bridged port does not support switchdev_port_attr_set op. Don't BUG_ON()
if -EOPNOTSUPP is returned.
Also change BUG_ON() to netdev_err since this is a normal error path and
does not warrant the use of BUG_ON(), which is reserved for unrecoverable
errs.
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Reported-by: Brenden Blanco <bblanco@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add strings array of the current supported tunable options.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CAIA Delay-Gradient (CDG) is a TCP congestion control that modifies
the TCP sender in order to [1]:
o Use the delay gradient as a congestion signal.
o Back off with an average probability that is independent of the RTT.
o Coexist with flows that use loss-based congestion control, i.e.,
flows that are unresponsive to the delay signal.
o Tolerate packet loss unrelated to congestion. (Disabled by default.)
Its FreeBSD implementation was presented for the ICCRG in July 2012;
slides are available at http://www.ietf.org/proceedings/84/iccrg.html
Running the experiment scenarios in [1] suggests that our implementation
achieves more goodput compared with FreeBSD 10.0 senders, although it also
causes more queueing delay for a given backoff factor.
The loss tolerance heuristic is disabled by default due to safety concerns
for its use in the Internet [2, p. 45-46].
We use a variant of the Hybrid Slow start algorithm in tcp_cubic to reduce
the probability of slow start overshoot.
[1] D.A. Hayes and G. Armitage. "Revisiting TCP congestion control using
delay gradients." In Networking 2011, pages 328-341. Springer, 2011.
[2] K.K. Jonassen. "Implementing CAIA Delay-Gradient in Linux."
MSc thesis. Department of Informatics, University of Oslo, 2015.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Hayes <davihay@ifi.uio.no>
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming tcp_cdg uses tcp_enter_cwr() to initiate PRR. Export this
function so that CDG can be compiled as a module.
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: David Hayes <davihay@ifi.uio.no>
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Nicolas Kuhn <nicolas.kuhn@telecom-bretagne.eu>
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NET_SWITCHDEV is enabled, but port driver does not implement
support for IPv4 FIB add/del ops, don't fail route add/del offload
operations. Route adds will not be marked as OFFLOAD. Routes will be
installed in the kernel FIB, as usual.
This was report/fixed by Florian when testing DSA driver with net-next on
devices with L2 offload support but no L3 offload support. What he reported
was an initial route installed from DHCP client would fail (route not
installed to kernel FIB). This was triggering the setting of
ipv4.fib_offload_disabled, which would disable route offloading after the
first failure. So subsequent attempts to install the route would succeed.
There is follow-on work/discussion to address the handling of route install
failures, but for now, let's differentiate between no support and failed
support.
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dctcp_alpha can be read by from dctcp_get_info() without
synchro, so use WRITE_ONCE() to prevent compiler from using
dctcp_alpha as a temporary variable.
Also, playing with small dctcp_shift_g (like 1), can expose
an overflow with 32bit values shifted 9 times before divide.
Use an u64 field to avoid this problem, and perform the divide
only if acked_bytes_ecn is not zero.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* mesh fixes from Alexis Green and Chun-Yeow Yeoh,
* a documentation fix from Jakub Kicinski,
* a missing channel release (from Michal Kazior),
* a fix for a signal strength reporting bug (from Sara Sharon),
* handle deauth while associating (myself),
* don't report mangled TX SKB back to userspace for status (myself),
* handle aggregation session timeouts properly in fast-xmit (myself)
However, there are also a few cleanups and one big change that
affects all drivers (and that required me to pull in your tree)
to change the mac80211 HW flags to use an unsigned long bitmap
so that we can extend them more easily - we're running out of
flags even with a cleanup to remove the two unused ones.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJVeEQ1AAoJEDBSmw7B7bqrebcP/3v7I2ZXAeHag2W4hdD4YH6W
tuKfs3JKW3GDh84l2AJs2JBpFxR6Tk0Z7zGKrPLzkBTkkJkSLgKuUKR0+YQU6PYH
VfZ2NkdIHEqouLgMWxGGlp6suqp2yYD9tiIUroICXZ6aFm5trQuZgzv5ePI+lhmX
cWUYCawE2tcpVdg0NJsFExeCJhw81e/Bet1LCGHo0asWNpIK7phMdltzD7e4tgQS
4q475FCIkWxbxKgJRrRkz8J7grsjK1wf2W3acOxKMaoVBeqJVW5BWDrTgo0aDPts
qQ8n8t1s9o/jKQIvaz3RyjkQgX8T4vCMqkouLF4jJOThKIsUSi3Fvm9oKcMg4YhA
Ju5QWfbCBFhpLZeBzWzKyePTnDru1XDFFVdIATLONKTVg1modzFAs3j5gb4Z3Wtg
VYLoLWWpRtHKd9pzfZMhyWq64Xb8C+qlyQHr4r4QRm9ADz0Jq+OCh0rTFt+/bncM
CHxnf0VS9hEOFk0+TxFqi2yXOnv2uMgcN+jnGkEs4QuLfv9ML1Eb23ZjDoHxd1uq
1Yd4R8IDEY/KU6UJMwksz+gV/ekoB32eAhw56pxehgAMuZL4OgNvmeAQHx7Jq9it
0/OfAK2BSNH8odqYQbpg89C8keqSInMwUhFyRhyMJAWSKiPRHypsDBWxMKGJIssI
3mB4d/go+RP1AvZnazeF
=2wTw
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
For this round we mostly have fixes:
* mesh fixes from Alexis Green and Chun-Yeow Yeoh,
* a documentation fix from Jakub Kicinski,
* a missing channel release (from Michal Kazior),
* a fix for a signal strength reporting bug (from Sara Sharon),
* handle deauth while associating (myself),
* don't report mangled TX SKB back to userspace for status (myself),
* handle aggregation session timeouts properly in fast-xmit (myself)
However, there are also a few cleanups and one big change that
affects all drivers (and that required me to pull in your tree)
to change the mac80211 HW flags to use an unsigned long bitmap
so that we can extend them more easily - we're running out of
flags even with a cleanup to remove the two unused ones.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SCM_SECURITY was originally only implemented for datagram sockets,
not for stream sockets. However, SCM_CREDENTIALS is supported on
Unix stream sockets. For consistency, implement Unix stream support
for SCM_SECURITY as well. Also clean up the existing code and get
rid of the superfluous UNIXSID macro.
Motivated by https://bugzilla.redhat.com/show_bug.cgi?id=1224211,
where systemd was using SCM_CREDENTIALS and assumed wrongly that
SCM_SECURITY was also supported on Unix stream sockets.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before this patch the user-specified bridge port was ignored when
deleting an fdb entry and thus one could delete an entry that belonged
to any port.
Example (eth0 and eth1 are br0 ports):
bridge fdb add 00:11:22:33:44:55 dev eth0 master
bridge fdb del 00:11:22:33:44:55 dev eth1 master
(succeeds)
after the patch:
bridge fdb add 00:11:22:33:44:55 dev eth0 master
bridge fdb del 00:11:22:33:44:55 dev eth1 master
RTNETLINK answers: No such file or directory
Based on a patch by Wilson Kok.
Reported-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJVdphLAAoJEP5prqPJtc/HZR0H/j7+UFxotHs+ummTanP3Rbqi
lLBgjzXQDvQeEYhCHzUM4s0dKOQp3arHnO538hB5bONijxoNJkHBkz0urbspLR/f
i9BZRdLd8eiCrXYo4zXM1dZ4xA/xLKX/4sZU/r44wZlcQeLR0VwerZWWd3Zl9hi5
0RGWvENz2FajCxBHaf+TFbr6AVPg7ikWXRpk3s5//adfGYzZMLTpIecrOI3OdtB2
JYrX1FZW2NudtzNfC5vR7g8x4esGzTL7TupdNP0SpxYUTfAkEbLrG+VCbswkYVCs
TL7BElbz5JrfUweWjj/6LJKPFUlCHW3TygBbmHiIl7htyJPllCIpF2gDI+uhSYw=
=JtLU
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.2-20150609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2015-05-06
this is a pull request of a two patches for net-next.
The first patch is by Tomas Krcka, he fixes the (currently unused)
register address for acceptance filters. Oliver Hartkopp contributes a
patch for the cangw, where an optional UID is added to reference
routing jobs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As we're running out of hardware capability flags pretty quickly,
convert them to use the regular test_bit() style unsigned long
bitmaps.
This introduces a number of helper functions/macros to set and to
test the bits, along with new debugfs code.
The occurrences of an explicit __clear_bit() are intentional, the
drivers were never supposed to change their supported bits on the
fly. We should investigate changing this to be a per-frame flag.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Merge back net-next to get wireless driver changes (from Kalle)
to be able to create the API change across all trees properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
This patch fixes a bug in hwmp_preq_frame_process where the wrong metric
can be used when forwarding a PREQ. This happens because the code uses
the same metric variable to record the value of the metric to the source
of the PREQ and the value of the metric to the target of the PREQ.
This comes into play when both reply and forward are set which happens
when IEEE80211_PREQ_PROACTIVE_PREP_FLAG is set and when MP_F_DO | MP_F_RF
is set. The original code had a special case to handle the first case
but not the second.
The patch uses distinct variables for the two metrics which makes the
code flow much clearer and removes the need to restore the original
value of metric when forwarding.
Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- use common Jenkins hash instead of private implementation
- extend internal routing API
- properly re-arrange header files inclusion
- clarify precedence between '&' and '?'
- remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
- ensure per-VLAN structs are updated upon MAC change
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJVdaxwAAoJEOb/4TMchkvffJsQAK33RTmcnFYVpS09AywTbxfC
kNo5SAbQJ1MsNui8TmyHsnLBW/l59WnP8CITVYDlwLyW4cEKWVooGsFPv+MCityM
3vOqghdi7jC1b0LT6kTcoSr7P04q+bBV9atj5Z5nnXUpvjmW0b/a6qCIZmqGKTtQ
WIIayEs0bVPJ6y1Z6CDpFQY6k9j938BaH3VgP7NL4zXRA3OJGGQKaZTRwVuYnVu4
Swt/zj+B7lutBSP2d4ngjU2CXS35OzDWl4GybQuIiNzdSXsZwm4inTyuNjA4Eqdg
vE87ZqbFqD4T2PqR1wcTPKMJegd+nPQrYDdeNfz91KdjVKBN+LrdSCX3IozEynpY
7584+eROCyCKlQGi7OcRze5THbZ+BNKxUd3Ds4ek2NTU7vG2rMWADFj3wZ0yzX6F
e/JunQ39BHXPSc6UlaTgdYiVpTNME9+A2I0AXMOpU6urpc4fAyFVGUORJqjM8BBm
jLZ8QakmXdfgBYi7zqefeCW9de9L1ekGQs/XB1AmKmx5xY7IKzKEZbkUV3nAxtqI
YKP2yQUhqjoke6bzllz72CPChO4O11fYPCDPL0sarn/UxY8SzwmHeKi8UeGaN6gt
FZhR17kQGm4c2WF7082HEIrOW137YogEOFWMixqpqix5ICBmmRMPqo8wqQBxllk3
mC+4jiwdn4kT1EVAqHba
=u+G3
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- use common Jenkins hash instead of private implementation
- extend internal routing API
- properly re-arrange header files inclusion
- clarify precedence between '&' and '?'
- remove unused ethhdr variable in batadv_gw_dhcp_recipient_get()
- ensure per-VLAN structs are updated upon MAC change
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In mesh mode there is a race between establishing links and processing
rates and capabilities in beacons. This is very noticeable with slow
beacons (e.g. beacon intervals of 1s) and manifested for us as stations
using minstrel when minstrel_ht should be used. Fixed by changing
mesh_sta_info_init so that it always checks rates and such if it has not
already done so.
Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The csa counter has moved from sdata to beacon/presp but
it is not updated accordingly for mesh and ibss. Fix this.
Fixes: af296bdb8d ("mac80211: move csa counters from sdata to beacon/presp")
Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The last hop metric should refer to link cost (this is how
hwmp_route_info_get uses it for example). But in mesh_rx_path_sel_frame
we are not dealing with link cost but with the total cost to the origin
of a PREQ or PREP.
Signed-off-by: Alexis Green <agreen@cococorp.com>
CC: Jesse Jones <jjones@cococorp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).
The scan result ignores those invalid values and the
station last signal should not be updated as well.
In case the scan determines the signal to be invalid turn on
the flag so the station last signal will not be updated with
the value and thus user space probing for NL80211_STA_INFO_SIGNAL
and NL80211_STA_INFO_SIGNAL_AVG will not get this invalid RSSI
value.
Signed-off-by: Sara Sharon <sara.sharon@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There were a few rare cases when upon
authentication failure channel wasn't released.
This could cause stale pointers to remain in
chanctx assigned_vifs after interface removal and
trigger general protection fault later.
This could be triggered, e.g. on ath10k with the
following steps:
1. start an AP
2. create 2 extra vifs on ath10k host
3. connect vif1 to the AP
4. connect vif2 to the AP
(auth fails because ath10k firmware isn't able
to maintain 2 peers with colliding AP mac
addresses across vifs and consequently
refuses sta_info_insert() in
ieee80211_prep_connection())
5. remove the 2 extra vifs
6. goto step 2; at step 3 kernel was crashing:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: ath10k_pci ath10k_core ath
...
Call Trace:
[<ffffffff81a2dabb>] ieee80211_check_combinations+0x22b/0x290
[<ffffffff819fb825>] ? ieee80211_check_concurrent_iface+0x125/0x220
[<ffffffff8180f664>] ? netpoll_poll_disable+0x84/0x100
[<ffffffff819fb833>] ieee80211_check_concurrent_iface+0x133/0x220
[<ffffffff81a0029e>] ieee80211_open+0x3e/0x80
[<ffffffff817f2d26>] __dev_open+0xb6/0x130
[<ffffffff817f3051>] __dev_change_flags+0xa1/0x170
...
RIP [<ffffffff81a23140>] ieee80211_chanctx_radar_detect+0xa0/0x170
(gdb) l * ieee80211_chanctx_radar_detect+0xa0
0xffffffff81a23140 is in ieee80211_chanctx_radar_detect (/devel/src/linux/net/mac80211/util.c:3182).
3177 */
3178 WARN_ON(ctx->replace_state == IEEE80211_CHANCTX_REPLACES_OTHER &&
3179 !list_empty(&ctx->assigned_vifs));
3180
3181 list_for_each_entry(sdata, &ctx->assigned_vifs, assigned_chanctx_list)
3182 if (sdata->radar_required)
3183 radar_detect |= BIT(sdata->vif.bss_conf.chandef.width);
3184
3185 return radar_detect;
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The conversion to the fast-xmit path lost proper aggregation session
timeout handling - the last_tx wasn't set on that path and the timer
would therefore incorrectly tear down the session periodically (with
those drivers/rate control algorithms that have a timeout.)
In case of iwlwifi, this was every 5 seconds and caused significant
throughput degradation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Similar to referencing iptables rules by their line number this UID allows to
reference created routing jobs, e.g. to alter configured data modifications.
The UID is an optional non-zero value which can be provided at routing job
creation time. When the UID is set the UID replaces the data modification
configuration as job identification attribute e.g. at job removal time.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Commit 70008aa50e ("skbuff: convert to skb_orphan_frags") replaced
open coded tests of SKBTX_DEV_ZEROCOPY and skb_copy_ubufs with calls
to helper function skb_orphan_frags. Apply that to the last remaining
open coded site.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP encapsulation is broken on IPv6. This is because the logic to resubmit
the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
the resubmit label is in the wrong position since we already get the
nexthdr value when performing decapsulation. In addition the skb pull is no
longer necessary either.
This changes the return value check to look for < 0, using it for the
nexthdr on the next iteration, and moves the resubmit label to the proper
location.
With these changes the v6 code now matches what we do in the v4 ip input
code wrt resubmitting when decapsulating.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: "Tom Herbert" <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory pointed to by idev->stats.icmpv6msgdev,
idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
read context without taking a reference on idev. For example, through
IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
waiting for an RCU grace period to elapse. This could lead to the
memory being written to after it has been freed.
Fix this by using call_rcu to free the memory used for stats, as well
as idev after an RCU grace period has elapsed.
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Alexander Duyck pointed out that:
struct tnode {
...
struct key_vector kv[1];
}
The kv[1] member of struct tnode is an arry that refernced by
a null pointer will not crash the system, like this:
struct tnode *p = NULL;
struct key_vector *kv = p->kv;
As such p->kv doesn't actually dereference anything, it is simply a
means for getting the offset to the array from the pointer p.
This patch make the code more regular to avoid making people feel
odd when they look at the code.
Signed-off-by: Firo Yang <firogm@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to disable softirqs because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. The spin locks in br_fdb_update were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Using local_bh_disable/enable around br_fdb_update() allows us to keep
using the spin_lock/unlock in br_fdb_update for the fast-path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
The mpls device is used in an RCU read context without a lock being
held. As the memory is freed without waiting for the RCU grace period
to elapse, the freed memory could still be in use.
Address this by using kfree_rcu to free the memory for the mpls device
after the RCU grace period has elapsed.
Fixes: 03c57747a7 ("mpls: Per-device MPLS state")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_fdb_update() can be called in process context in the following way:
br_fdb_add() -> __br_fdb_add() -> br_fdb_update() (if NTF_USE flag is set)
so we need to use spin_lock_bh because there are softirq users of the
hash_lock. One easy way to reproduce this is to modify the bridge utility
to set NTF_USE, enable stp and then set maxageing to a low value so
br_fdb_cleanup() is called frequently and then just add new entries in
a loop. This happens because br_fdb_cleanup() is called from timer/softirq
context. These locks were _bh before commit f8ae737dee
("[BRIDGE]: forwarding remove unneeded preempt and bh diasables")
and at the time that commit was correct because br_fdb_update() couldn't be
called from process context, but that changed after commit:
292d139898 ("bridge: add NTF_USE support")
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: 292d139898 ("bridge: add NTF_USE support")
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv4 and IPv6 share same implementation of get_cookie_sock(),
and there is no point inlining it.
We add tcp_ prefix to the common helper name and export it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC address of the soft-interface is used to initialise
the "non-purge" TT entry of each existing VLAN. Therefore
when the user invokes ndo_set_mac_address() all the
"non-purge" TT entries have to be updated, not only the one
belonging to the non-tagged network.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>