Commit Graph

32385 Commits

Author SHA1 Message Date
david decotigny
2d3b479df4 net-sysfs: expose number of carrier on/off changes
This allows to monitor carrier on/off transitions and detect link
flapping issues:
 - new /sys/class/net/X/carrier_changes
 - new rtnetlink IFLA_CARRIER_CHANGES (getlink)

Tested:
  - grep . /sys/class/net/*/carrier_changes
    + ip link set dev X down/up
    + plug/unplug cable
  - updated iproute2: prints IFLA_CARRIER_CHANGES
  - iproute2 20121211-2 (debian): unchanged behavior

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:24:52 -04:00
Wang Yufen
9c76a114bb ipv6: tcp_ipv6 policy route issue
The issue raises when adding policy route, specify a particular
NIC as oif, the policy route did not take effect. The reason is
that fl6.oif is not set and route map failed. From the
tcp_v6_send_response function, if the binding address is linklocal,
fl6.oif is set, but not for global address.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:16:17 -04:00
Wang Yufen
60ea37f7a5 ipv6: reuse rt6_need_strict
Move the whole rt6_need_strict as static inline into ip6_route.h,
so that it can be reused

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:16:16 -04:00
Wang Yufen
4aa956d801 ipv6: tcp_ipv6 do some cleanup
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:16:16 -04:00
Vlad Yasevich
f6367b4660 bridge: use is_skb_forwardable in forward path
Use existing function instead of trying to use our own.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:04:04 -04:00
Vlad Yasevich
1ee481fb4c net: Allow modules to use is_skb_forwardable
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:04:04 -04:00
John W. Linville
96da266e77 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-31 15:22:17 -04:00
Alexei Starovoitov
bd4cf0ed33 net: filter: rework/optimize internal BPF interpreter's instruction set
This patch replaces/reworks the kernel-internal BPF interpreter with
an optimized BPF instruction set format that is modelled closer to
mimic native instruction sets and is designed to be JITed with one to
one mapping. Thus, the new interpreter is noticeably faster than the
current implementation of sk_run_filter(); mainly for two reasons:

1. Fall-through jumps:

  BPF jump instructions are forced to go either 'true' or 'false'
  branch which causes branch-miss penalty. The new BPF jump
  instructions have only one branch and fall-through otherwise,
  which fits the CPU branch predictor logic better. `perf stat`
  shows drastic difference for branch-misses between the old and
  new code.

2. Jump-threaded implementation of interpreter vs switch
   statement:

  Instead of single table-jump at the top of 'switch' statement,
  gcc will now generate multiple table-jump instructions, which
  helps CPU branch predictor logic.

Note that the verification of filters is still being done through
sk_chk_filter() in classical BPF format, so filters from user- or
kernel space are verified in the same way as we do now, and same
restrictions/constraints hold as well.

We reuse current BPF JIT compilers in a way that this upgrade would
even be fine as is, but nevertheless allows for a successive upgrade
of BPF JIT compilers to the new format.

The internal instruction set migration is being done after the
probing for JIT compilation, so in case JIT compilers are able to
create a native opcode image, we're going to use that, and in all
other cases we're doing a follow-up migration of the BPF program's
instruction set, so that it can be transparently run in the new
interpreter.

In short, the *internal* format extends BPF in the following way (more
details can be taken from the appended documentation):

  - Number of registers increase from 2 to 10
  - Register width increases from 32-bit to 64-bit
  - Conditional jt/jf targets replaced with jt/fall-through
  - Adds signed > and >= insns
  - 16 4-byte stack slots for register spill-fill replaced
    with up to 512 bytes of multi-use stack space
  - Introduction of bpf_call insn and register passing convention
    for zero overhead calls from/to other kernel functions
  - Adds arithmetic right shift and endianness conversion insns
  - Adds atomic_add insn
  - Old tax/txa insns are replaced with 'mov dst,src' insn

Performance of two BPF filters generated by libpcap resp. bpf_asm
was measured on x86_64, i386 and arm32 (other libpcap programs
have similar performance differences):

fprog #1 is taken from Documentation/networking/filter.txt:
tcpdump -i eth0 port 22 -dd

fprog #2 is taken from 'man tcpdump':
tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) -
   ((tcp[12]&0xf0)>>2)) != 0)' -dd

Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the
same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call,
smaller is better:

--x86_64--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF      90       101        192       202
new BPF      31        71         47        97
old BPF jit  12        34         17        44
new BPF jit TBD

--i386--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     107       136        227       252
new BPF      40       119         69       172

--arm32--
         fprog #1  fprog #1   fprog #2  fprog #2
         cache-hit cache-miss cache-hit cache-miss
old BPF     202       300        475       540
new BPF     180       270        330       470
old BPF jit  26       182         37       202
new BPF jit TBD

Thus, without changing any userland BPF filters, applications on
top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf
classifier, netfilter's xt_bpf, team driver's load-balancing mode,
and many more will have better interpreter filtering performance.

While we are replacing the internal BPF interpreter, we also need
to convert seccomp BPF in the same step to make use of the new
internal structure since it makes use of lower-level API details
without being further decoupled through higher-level calls like
sk_unattached_filter_{create,destroy}(), for example.

Just as for normal socket filtering, also seccomp BPF experiences
a time-to-verdict speedup:

05-sim-long_jumps.c of libseccomp was used as micro-benchmark:

  seccomp_rule_add_exact(ctx,...
  seccomp_rule_add_exact(ctx,...

  rc = seccomp_load(ctx);

  for (i = 0; i < 10000000; i++)
     syscall(199, 100);

'short filter' has 2 rules
'large filter' has 200 rules

'short filter' performance is slightly better on x86_64/i386/arm32
'large filter' is much faster on x86_64 and i386 and shows no
               difference on arm32

--x86_64-- short filter
old BPF: 2.7 sec
 39.12%  bench  libc-2.15.so       [.] syscall
  8.10%  bench  [kernel.kallsyms]  [k] sk_run_filter
  6.31%  bench  [kernel.kallsyms]  [k] system_call
  5.59%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  4.37%  bench  [kernel.kallsyms]  [k] trace_hardirqs_off_caller
  3.70%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.67%  bench  [kernel.kallsyms]  [k] lock_is_held
  3.03%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
new BPF: 2.58 sec
 42.05%  bench  libc-2.15.so       [.] syscall
  6.91%  bench  [kernel.kallsyms]  [k] system_call
  6.25%  bench  [kernel.kallsyms]  [k] trace_hardirqs_on_caller
  6.07%  bench  [kernel.kallsyms]  [k] __secure_computing
  5.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp

--arm32-- short filter
old BPF: 4.0 sec
 39.92%  bench  [kernel.kallsyms]  [k] vector_swi
 16.60%  bench  [kernel.kallsyms]  [k] sk_run_filter
 14.66%  bench  libc-2.17.so       [.] syscall
  5.42%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  5.10%  bench  [kernel.kallsyms]  [k] __secure_computing
new BPF: 3.7 sec
 35.93%  bench  [kernel.kallsyms]  [k] vector_swi
 21.89%  bench  libc-2.17.so       [.] syscall
 13.45%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
  6.25%  bench  [kernel.kallsyms]  [k] __secure_computing
  3.96%  bench  [kernel.kallsyms]  [k] syscall_trace_exit

--x86_64-- large filter
old BPF: 8.6 seconds
    73.38%    bench  [kernel.kallsyms]  [k] sk_run_filter
    10.70%    bench  libc-2.15.so       [.] syscall
     5.09%    bench  [kernel.kallsyms]  [k] seccomp_bpf_load
     1.97%    bench  [kernel.kallsyms]  [k] system_call
new BPF: 5.7 seconds
    66.20%    bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
    16.75%    bench  libc-2.15.so       [.] syscall
     3.31%    bench  [kernel.kallsyms]  [k] system_call
     2.88%    bench  [kernel.kallsyms]  [k] __secure_computing

--i386-- large filter
old BPF: 5.4 sec
new BPF: 3.8 sec

--arm32-- large filter
old BPF: 13.5 sec
 73.88%  bench  [kernel.kallsyms]  [k] sk_run_filter
 10.29%  bench  [kernel.kallsyms]  [k] vector_swi
  6.46%  bench  libc-2.17.so       [.] syscall
  2.94%  bench  [kernel.kallsyms]  [k] seccomp_bpf_load
  1.19%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.87%  bench  [kernel.kallsyms]  [k] sys_getuid
new BPF: 13.5 sec
 76.08%  bench  [kernel.kallsyms]  [k] sk_run_filter_int_seccomp
 10.98%  bench  [kernel.kallsyms]  [k] vector_swi
  5.87%  bench  libc-2.17.so       [.] syscall
  1.77%  bench  [kernel.kallsyms]  [k] __secure_computing
  0.93%  bench  [kernel.kallsyms]  [k] sys_getuid

BPF filters generated by seccomp are very branchy, so the new
internal BPF performance is better than the old one. Performance
gains will be even higher when BPF JIT is committed for the
new structure, which is planned in future work (as successive
JIT migrations).

BPF has also been stress-tested with trinity's BPF fuzzer.

Joint work with Daniel Borkmann.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Paul Moore <pmoore@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
164d8c6665 net: ptp: do not reimplement PTP/BPF classifier
There are currently pch_gbe, cpts, and ixp4xx_eth drivers that open-code
and reimplement a BPF classifier for the PTP protocol. Since all of them
effectively do the very same thing and load the very same PTP/BPF filter,
we can just consolidate that code by introducing ptp_classify_raw() in
the time-stamping core framework which can be used in drivers.

As drivers get initialized after bootstrapping the core networking
subsystem, they can make use of ptp_insns wrapped through
ptp_classify_raw(), which allows to simplify and remove PTP classifier
setup code in drivers.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
e62d2df084 net: ptp: use sk_unattached_filter_create() for BPF
This patch migrates an open-coded sk_run_filter() implementation with
proper use of the BPF API, that is, sk_unattached_filter_create(). This
migration is needed, as we will be internally transforming the filter
to a different representation, and therefore needs to be decoupled.

It is okay to do so as skb_timestamping_init() is called during
initialization of the network stack in core initcall via sock_init().
This would effectively also allow for PTP filters to be jit compiled if
bpf_jit_enable is set.

For better readability, there are also some newlines introduced, also
ptp_classify.h is only in kernel space.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Richard Cochran <richard.cochran@omicron.at>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
fbc907f0b1 net: filter: move filter accounting to filter core
This patch basically does two things, i) removes the extern keyword
from the include/linux/filter.h file to be more consistent with the
rest of Joe's changes, and ii) moves filter accounting into the filter
core framework.

Filter accounting mainly done through sk_filter_{un,}charge() take
care of the case when sockets are being cloned through sk_clone_lock()
so that removal of the filter on one socket won't result in eviction
as it's still referenced by the other.

These functions actually belong to net/core/filter.c and not
include/net/sock.h as we want to keep all that in a central place.
It's also not in fast-path so uninlining them is fine and even allows
us to get rd of sk_filter_release_rcu()'s EXPORT_SYMBOL and a forward
declaration.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
a3ea269b8b net: filter: keep original BPF program around
In order to open up the possibility to internally transform a BPF program
into an alternative and possibly non-trivial reversible representation, we
need to keep the original BPF program around, so that it can be passed back
to user space w/o the need of a complex decoder.

The reason for that use case resides in commit a8fc927780 ("sk-filter:
Add ability to get socket filter program (v2)"), that is, the ability
to retrieve the currently attached BPF filter from a given socket used
mainly by the checkpoint-restore project, for example.

Therefore, we add two helpers sk_{store,release}_orig_filter for taking
care of that. In the sk_unattached_filter_create() case, there's no such
possibility/requirement to retrieve a loaded BPF program. Therefore, we
can spare us the work in that case.

This approach will simplify and slightly speed up both, sk_get_filter()
and sock_diag_put_filterinfo() handlers as we won't need to successively
decode filters anymore through sk_decode_filter(). As we still need
sk_decode_filter() later on, we're keeping it around.

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:09 -04:00
Daniel Borkmann
f8bbbfc3b9 net: filter: add jited flag to indicate jit compiled filters
This patch adds a jited flag into sk_filter struct in order to indicate
whether a filter is currently jited or not. The size of sk_filter is
not being expanded as the 32 bit 'len' member allows upper bits to be
reused since a filter can currently only grow as large as BPF_MAXINSNS.

Therefore, there's enough room also for other in future needed flags to
reuse 'len' field if necessary. The jited flag also allows for having
alternative interpreter functions running as currently, we can only
detect jit compiled filters by testing fp->bpf_func to not equal the
address of sk_run_filter().

Joint work with Alexei Starovoitov.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 00:45:08 -04:00
Kinglong Mee
642aab58db SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed
Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp,
because rpc_ping (which is in rpc_create) will using it.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:37 -04:00
Kinglong Mee
d531c008d7 NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp
Besides checking rpc_xprt out of xs_setup_bc_tcp,
increase it's reference (it's important).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:36 -04:00
Kinglong Mee
83ddfebdd2 SUNRPC: New helper for creating client with rpc_xprt
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:36 -04:00
Kinglong Mee
47f72efa8f NFSD: Free backchannel xprt in bc_destroy
Backchannel xprt isn't freed right now.
Free it in bc_destroy, and put the reference of THIS_MODULE.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-30 10:47:35 -04:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Wang Yufen
437de07ced ipv6: fix checkpatch errors of "foo*" and "foo * bar"
ERROR: "(foo*)" should be "(foo *)"
ERROR: "foo * bar" should be "foo *bar"

Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Wang Yufen
49e253e399 ipv6: fix checkpatch errors of brace and trailing statements
ERROR: open brace '{' following enum go on the same line
ERROR: open brace '{' following struct go on the same line
ERROR: trailing statements should be on next line

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Wang Yufen
8db46f1d4c ipv6: fix checkpatch errors comments and space
WARNING: please, no space before tabs
WARNING: please, no spaces at the start of a line
ERROR: spaces required around that ':' (ctx:VxW)
ERROR: spaces required around that '>' (ctx:VxV)
ERROR: spaces required around that '>=' (ctx:VxV)

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:15:52 -04:00
Eric W. Biederman
5efeac44cf netpoll: Respect NETIF_F_LLTX
Stop taking the transmit lock when a network device has specified
NETIF_F_LLTX.

If no locks needed to trasnmit a packet this is the ideal scenario for
netpoll as all packets can be trasnmitted immediately.

Even if some locks are needed in ndo_start_xmit skipping any unnecessary
serialization is desirable for netpoll as it makes it more likely a
debugging packet may be trasnmitted immediately instead of being
deferred until later.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
080b3c19a4 netpoll: Remove strong unnecessary assumptions about skbs
Remove the assumption that the skbs that make it to
netpoll_send_skb_on_dev are allocated with find_skb, such that
skb->users == 1 and nothing is attached that would prevent the skbs from
being freed from hard irq context.

Remove this assumption by replacing __kfree_skb on error paths with
dev_kfree_skb_irq (in hard irq context) and kfree_skb (in process
context).

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
66b5552fc2 netpoll: Rename netpoll_rx_enable/disable to netpoll_poll_disable/enable
The netpoll_rx_enable and netpoll_rx_disable functions have always
controlled polling the network drivers transmit and receive queues.

Rename them to netpoll_poll_enable and netpoll_poll_disable to make
their functionality clear.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
3f4df2066b netpoll: Move rx enable/disable into __dev_close_many
Today netpoll_rx_enable and netpoll_rx_disable are called from
dev_close and and __dev_close, and not from dev_close_many.

Move the calls into __dev_close_many so that we have a single call
site to maintain, and so that dev_close_many gains this protection as
well.  Which importantly makes batched network device deletes safe.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
944e294857 netpoll: Only call ndo_start_xmit from a single place
Factor out the code that needs to surround ndo_start_xmit
from netpoll_send_skb_on_dev into netpoll_start_xmit.

It is an unfortunate fact that as the netpoll code has been maintained
the primary call site ndo_start_xmit learned how to handle vlans
and timestamps but the second call of ndo_start_xmit in queue_process
did not.

With the introduction of netpoll_start_xmit this associated logic now
happens at both call sites of ndo_start_xmit and should make it easy
for that to continue into the future.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
a8779ec1c5 netpoll: Remove gfp parameter from __netpoll_setup
The gfp parameter was added in:
commit 47be03a28c
Author: Amerigo Wang <amwang@redhat.com>
Date:   Fri Aug 10 01:24:37 2012 +0000

    netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

    slave_enable_netpoll() and __netpoll_setup() may be called
    with read_lock() held, so should use GFP_ATOMIC to allocate
    memory. Eric suggested to pass gfp flags to __netpoll_setup().

    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b71
Author: dingtianhong <dingtianhong@huawei.com>
Date:   Tue Jul 23 15:25:27 2013 +0800

    bonding: don't call slave_xxx_netpoll under spinlocks

    The slave_xxx_netpoll will call synchronize_rcu_bh(),
    so the function may schedule and sleep, it should't be
    called under spinlocks.

    bond_netpoll_setup() and bond_netpoll_cleanup() are always
    protected by rtnl lock, it is no need to take the read lock,
    as the slave list couldn't be changed outside rtnl lock.

    Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
    Cc: Jay Vosburgh <fubar@us.ibm.com>
    Cc: Andy Gospodarek <andy@greyhouse.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
ZhangZhen
59ff3eb6d6 workqueue: remove deprecated WQ_NON_REENTRANT
Tejun Heo has made WQ_NON_REENTRANT useless in the dbf2576e37
("workqueue: make all workqueues non-reentrant"). So remove its
usages and definition.

This patch doesn't introduce any behavior changes.

tj: minor description updates.

Signed-off-by: ZhangZhen <zhenzhang.zhang@huawei.com>
Sigend-off-by: Tejun Heo <tj@kernel.org>
Acked-by: James Chapman <jchapman@katalix.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
2014-03-29 09:33:03 -04:00
J. Bruce Fields
de4aee2e0a rpc: Allow xdr_buf_subsegment to operate in-place
Allow

	xdr_buf_subsegment(&buf, &buf, base, len)

to modify an xdr_buf in-place.

Also, none of the callers need the iov_base of head or tail to be zeroed
out.

Also add documentation.

(As it turns out, I'm not really using this new guarantee, but it seems
a simple way to make this function a bit more robust.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:24:49 -04:00
Kinglong Mee
315f3812db SUNRPC: fix memory leak of peer addresses in XPRT
Creating xprt failed after xs_format_peer_addresses,
sunrpc must free those memory of peer addresses in xprt.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 21:23:39 -04:00
Jeff Layton
3cbe01a94c svcrdma: fix offset calculation for non-page aligned sge entries
The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the
offset into the page to be mapped. This calculation does not correctly
take into account the case where the data starts at some offset into
the page. Increment the xdr_off by the page_base to ensure that it is
respected.

Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:13 -04:00
Jeff Layton
2e8c12e1b7 xprtrdma: add separate Kconfig options for NFSoRDMA client and server support
There are two entirely separate modules under xprtrdma/ and there's no
reason that enabling one should automatically enable the other. Add
config options for each one so they can be enabled/disabled separately.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:12 -04:00
Tom Tucker
7e4359e261 Fix regression in NFSRDMA server
The server regression was caused by the addition of rq_next_page
(afc59400d6). There were a few places that
were missed with the update of the rq_respages array.

Signed-off-by: Tom Tucker <tom@ogc.us>
Tested-by: Steve Wise <swise@ogc.us>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-28 18:02:11 -04:00
Vlad Yasevich
2adb956b08 vlan: Warn the user if lowerdev has bad vlan features.
Some drivers incorrectly assign vlan acceleration features to
vlan_features thus causing issues for Q-in-Q vlan configurations.
Warn the user of such cases.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:16:51 -04:00
Vlad Yasevich
fc92f745f8 bridge: Fix crash with vlan filtering and tcpdump
When the vlan filtering is enabled on the bridge, but
the filter is not configured on the bridge device itself,
running tcpdump on the bridge device will result in a
an Oops with NULL pointer dereference.  The reason
is that br_pass_frame_up() will bypass the vlan
check because promisc flag is set.  It will then try
to get the table pointer and process the packet based
on the table.  Since the table pointer is NULL, we oops.
Catch this special condition in br_handle_vlan().

Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:14:02 -04:00
Vlad Yasevich
53d6471cef net: Account for all vlan headers in skb_mac_gso_segment
skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:10:36 -04:00
Hannes Frederic Sowa
c15b1ccadb ipv6: move DAD and addrconf_verify processing to workqueue
addrconf_join_solict and addrconf_join_anycast may cause actions which
need rtnl locked, especially on first address creation.

A new DAD state is introduced which defers processing of the initial
DAD processing into a workqueue.

To get rtnl lock we need to push the code paths which depend on those
calls up to workqueues, specifically addrconf_verify and the DAD
processing.

(v2)
addrconf_dad_failure needs to be queued up to the workqueue, too. This
patch introduces a new DAD state and stop the DAD processing in the
workqueue (this is because of the possible ipv6_del_addr processing
which removes the solicited multicast address from the device).

addrconf_verify_lock is removed, too. After the transition it is not
needed any more.

As we are not processing in bottom half anymore we need to be a bit more
careful about disabling bottom half out when we lock spin_locks which are also
used in bh.

Relevant backtrace:
[  541.030090] RTNL: assertion failed at net/core/dev.c (4496)
[  541.031143] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O 3.10.33-1-amd64-vyatta #1
[  541.031145] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  541.031146]  ffffffff8148a9f0 000000000000002f ffffffff813c98c1 ffff88007c4451f8
[  541.031148]  0000000000000000 0000000000000000 ffffffff813d3540 ffff88007fc03d18
[  541.031150]  0000880000000006 ffff88007c445000 ffffffffa0194160 0000000000000000
[  541.031152] Call Trace:
[  541.031153]  <IRQ>  [<ffffffff8148a9f0>] ? dump_stack+0xd/0x17
[  541.031180]  [<ffffffff813c98c1>] ? __dev_set_promiscuity+0x101/0x180
[  541.031183]  [<ffffffff813d3540>] ? __hw_addr_create_ex+0x60/0xc0
[  541.031185]  [<ffffffff813cfe1a>] ? __dev_set_rx_mode+0xaa/0xc0
[  541.031189]  [<ffffffff813d3a81>] ? __dev_mc_add+0x61/0x90
[  541.031198]  [<ffffffffa01dcf9c>] ? igmp6_group_added+0xfc/0x1a0 [ipv6]
[  541.031208]  [<ffffffff8111237b>] ? kmem_cache_alloc+0xcb/0xd0
[  541.031212]  [<ffffffffa01ddcd7>] ? ipv6_dev_mc_inc+0x267/0x300 [ipv6]
[  541.031216]  [<ffffffffa01c2fae>] ? addrconf_join_solict+0x2e/0x40 [ipv6]
[  541.031219]  [<ffffffffa01ba2e9>] ? ipv6_dev_ac_inc+0x159/0x1f0 [ipv6]
[  541.031223]  [<ffffffffa01c0772>] ? addrconf_join_anycast+0x92/0xa0 [ipv6]
[  541.031226]  [<ffffffffa01c311e>] ? __ipv6_ifa_notify+0x11e/0x1e0 [ipv6]
[  541.031229]  [<ffffffffa01c3213>] ? ipv6_ifa_notify+0x33/0x50 [ipv6]
[  541.031233]  [<ffffffffa01c36c8>] ? addrconf_dad_completed+0x28/0x100 [ipv6]
[  541.031241]  [<ffffffff81075c1d>] ? task_cputime+0x2d/0x50
[  541.031244]  [<ffffffffa01c38d6>] ? addrconf_dad_timer+0x136/0x150 [ipv6]
[  541.031247]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]
[  541.031255]  [<ffffffff8105313a>] ? call_timer_fn.isra.22+0x2a/0x90
[  541.031258]  [<ffffffffa01c37a0>] ? addrconf_dad_completed+0x100/0x100 [ipv6]

Hunks and backtrace stolen from a patch by Stephen Hemminger.

Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:54:50 -04:00
Eric Dumazet
015f0688f5 net: net: add a core netdev->tx_dropped counter
Dropping packets in __dev_queue_xmit() when transmit queue
is stopped (NIC TX ring buffer full or BQL limit reached) currently
outputs a syslog message.

It would be better to get a precise count of such events available in
netdevice stats so that monitoring tools can have a clue.

This extends the work done in caf586e5f2
("net: add a core netdev->rx_dropped counter")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Daniel Borkmann
43279500de packet: respect devices with LLTX flag in direct xmit
Quite often it can be useful to test with dummy or similar
devices as a blackhole sink for skbs. Such devices are only
equipped with a single txq, but marked as NETIF_F_LLTX as
they do not require locking their internal queues on xmit
(or implement locking themselves). Therefore, rather use
HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will be respected.

trafgen mmap/TX_RING example against dummy device with config
foo: { fill(0xff, 64) } results in the following performance
improvements for such scenarios on an ordinary Core i7/2.80GHz:

Before:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   160,975,944,159 instructions:k            #    0.55  insns per cycle          ( +-  0.09% )
   293,319,390,278 cycles:k                  #    0.000 GHz                      ( +-  0.35% )
       192,501,104 branch-misses:k                                               ( +-  1.63% )
               831 context-switches:k                                            ( +-  9.18% )
                 7 cpu-migrations:k                                              ( +-  7.40% )
            69,382 cache-misses:k            #    0.010 % of all cache refs      ( +-  2.18% )
       671,552,021 cache-references:k                                            ( +-  1.29% )

      22.856401569 seconds time elapsed                                          ( +-  0.33% )

After:

 Performance counter stats for 'trafgen -i foo -o du0 -n100000000' (10 runs):

   133,788,739,692 instructions:k            #    0.92  insns per cycle          ( +-  0.06% )
   145,853,213,256 cycles:k                  #    0.000 GHz                      ( +-  0.17% )
        59,867,100 branch-misses:k                                               ( +-  4.72% )
               384 context-switches:k                                            ( +-  3.76% )
                 6 cpu-migrations:k                                              ( +-  6.28% )
            70,304 cache-misses:k            #    0.077 % of all cache refs      ( +-  1.73% )
        90,879,408 cache-references:k                                            ( +-  1.35% )

      11.719372413 seconds time elapsed                                          ( +-  0.24% )

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Eric Dumazet
e2a1d3e47b tcp: fix get_timewait4_sock() delay computation on 64bit
It seems I missed one change in get_timewait4_sock() to compute
the remaining time before deletion of IPV4 timewait socket.

This could result in wrong output in /proc/net/tcp for tm->when field.

Fixes: 96f817fede ("tcp: shrink tcp6_timewait_sock by one cache line")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:43:14 -04:00
Flavio Leitner
4f647e0a3c openvswitch: fix a possible deadlock and lockdep warning
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <jesse@nicira.com>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:41:53 -04:00
Toshiaki Makita
99b192da9c bridge: Fix handling stacked vlan tags
If a bridge with vlan_filtering enabled receives frames with stacked
vlan tags, i.e., they have two vlan tags, br_vlan_untag() strips not
only the outer tag but also the inner tag.

br_vlan_untag() is called only from br_handle_vlan(), and in this case,
it is enough to set skb->vlan_tci to 0 here, because vlan_tci has already
been set before calling br_handle_vlan().

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:33:09 -04:00
Toshiaki Makita
12464bb8de bridge: Fix inabillity to retrieve vlan tags when tx offload is disabled
Bridge vlan code (br_vlan_get_tag()) assumes that all frames have vlan_tci
if they are tagged, but if vlan tx offload is manually disabled on bridge
device and frames are sent from vlan device on the bridge device, the tags
are embedded in skb->data and they break this assumption.
Extract embedded vlan tags and move them to vlan_tci at ingress.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:33:09 -04:00
Erik Hugne
16470111ed tipc: make discovery domain a bearer attribute
The node discovery domain is assigned when a bearer is enabled.
In the previous commit we reflect this attribute directly in the
bearer structure since it's needed to reinitialize the node
discovery mechanism after a hardware address change.

There's no need to replicate this attribute anywhere else, so we
remove it from the tipc_link_req structure.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:46:29 -04:00
Erik Hugne
a21a584d67 tipc: fix neighbor detection problem after hw address change
If the hardware address of a underlying netdevice is changed, it is
not enough to simply reset the bearer/links over this device. We
also need to reflect this change in the TIPC bearer and node
discovery structures aswell.

This patch adds the necessary reinitialization of the node disovery
mechanism following a hardware address change so that the correct
originating media address is advertised in the discovery messages.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Dong Liu <dliu.cn@gmail.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 14:46:29 -04:00
Rashika Kheria
4548120100 net: Mark functions as static in net/sunrpc/svc_xprt.c
Mark functions as static in net/sunrpc/svc_xprt.c because they are not
used outside this file.

This eliminates the following warning in net/sunrpc/svc_xprt.c:
net/sunrpc/svc_xprt.c:574:5: warning: no previous prototype for ‘svc_alloc_arg’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:615:18: warning: no previous prototype for ‘svc_get_next_xprt’ [-Wmissing-prototypes]
net/sunrpc/svc_xprt.c:694:6: warning: no previous prototype for ‘svc_add_new_temp_xprt’ [-Wmissing-prototypes]

Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:57 -04:00
Jeff Layton
c42a01eee7 svcrdma: fix printk when memory allocation fails
It retries in 1s, not 1000 jiffies.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-03-27 16:31:56 -04:00
Zoltan Kiss
36d5fe6a00 core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
skb_zerocopy can copy elements of the frags array between skbs, but it doesn't
orphan them. Also, it doesn't handle errors, so this patch takes care of that
as well, and modify the callers accordingly. skb_tx_error() is also added to
the callers so they will signal the failed delivery towards the creator of the
skb.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:29:38 -04:00
Julia Lawall
02f2d5a066 hsr: replace del_timer by del_timer_sync
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exists.

This change was suggested by Thomas Gleixner.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
declarer name module_exit;
identifier ex;
@@

module_exit(ex);

@@
identifier r.ex;
@@

ex(...) {
  <...
- del_timer
+ del_timer_sync
    (...)
  ...>
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:28:06 -04:00
Julia Lawall
84275593ac atm: replace del_timer by del_timer_sync
Use del_timer_sync to ensure that the timer is stopped on all CPUs before
the driver exists.

This change was suggested by Thomas Gleixner.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@r@
declarer name module_exit;
identifier ex;
@@

module_exit(ex);

@@
identifier r.ex;
@@

ex(...) {
  <...
- del_timer
+ del_timer_sync
    (...)
  ...>
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:28:06 -04:00
Eric Dumazet
a0b8486caf tcp: tcp_make_synack() minor changes
There is no need to allocate 15 bytes in excess for a SYNACK packet,
as it contains no data, only headers.

SYNACK are always generated in softirq context, and contain a single
segment, we can use TCP_INC_STATS_BH()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:10:10 -04:00
Michal Kubeček
e5fd387ad5 ipv6: do not overwrite inetpeer metrics prematurely
If an IPv6 host route with metrics exists, an attempt to add a
new route for the same target with different metrics fails but
rewrites the metrics anyway:

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1s
12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1500
RTNETLINK answers: File exists
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  rto_min lock 1.5s

This is caused by all IPv6 host routes using the metrics in
their inetpeer (or the shared default). This also holds for the
new route created in ip6_route_add() which shares the metrics
with the already existing route and thus ip6_route_add()
rewrites the metrics even if the new route ends up not being
used at all.

Another problem is that old metrics in inetpeer can reappear
unexpectedly for a new route, e.g.

12sp0:~ # ip route add fec0::1 dev eth0 rto_min 1000
12sp0:~ # ip route del fec0::1
12sp0:~ # ip route add fec0::1 dev eth0
12sp0:~ # ip route change fec0::1 dev eth0 hoplimit 10
12sp0:~ # ip -6 route show
fe80::/64 dev eth0  proto kernel  metric 256
fec0::1 dev eth0  metric 1024  hoplimit 10 rto_min lock 1s

Resolve the first problem by moving the setting of metrics down
into fib6_add_rt2node() to the point we are sure we are
inserting the new route into the tree. Second problem is
addressed by introducing new flag DST_METRICS_FORCE_OVERWRITE
which is set for a new host route in ip6_route_add() and makes
ipv6_cow_metrics() always overwrite the metrics in inetpeer
(even if they are not "new"); it is reset after that.

v5: use a flag in _metrics member rather than one in flags

v4: fix a typo making a condition always true (thanks to Hannes
Frederic Sowa)

v3: rewritten based on David Miller's idea to move setting the
metrics (and allocation in non-host case) down to the point we
already know the route is to be inserted. Also rebased to
net-next as it is quite late in the cycle.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:09:07 -04:00
Vlad Yasevich
fc0d48b8fb vlan: Set hard_header_len according to available acceleration
Currently, if the card supports CTAG acceleration we do not
account for the vlan header even if we are configuring an
8021AD vlan.  This may not be best since we'll do software
tagging for 8021AD which will cause data copy on skb head expansion
Configure the length based on available hw offload capabilities and
vlan protocol.

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 15:00:37 -04:00
John W. Linville
2de21e5899 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-27 14:16:50 -04:00
Ying Xue
dde2026608 tipc: use node list lock to protect tipc_num_links variable
Without properly implicit or explicit read memory barrier, it's
unsafe to read an atomic variable with atomic_read() from another
thread which is different with the thread of changing the atomic
variable with atomic_inc() or atomic_dec(). So a stale tipc_num_links
may be got with atomic_read() in tipc_node_get_links(). If the
tipc_num_links variable type is converted from atomic to unsigned
integer and node list lock is used to protect it, the issue would
be avoided.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:38 -04:00
Ying Xue
2220646a53 tipc: use node_list_lock to protect tipc_num_nodes variable
As tipc_node_list is protected by rcu read lock on read side, it's
unnecessary to hold node_list_lock to protect tipc_node_list in
tipc_node_get_links(). Instead, node_list_lock should just protects
tipc_num_nodes in the function.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
6c7a762e70 tipc: tipc: convert node list and node hlist to RCU lists
Convert tipc_node_list list and node_htable hash list to RCU lists.
On read side, the two lists are protected with RCU read lock, and
on update side, node_list_lock is applied to them.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
46651c59c4 tipc: rename node create lock to protect node list and hlist
When a node is created, tipc_net_lock read lock is first held and
then node_create_lock is grabbed in order to prevent the same node
from being created and inserted into both node list and hlist twice.
But when we query node from the two node lists, we only hold
tipc_net_lock read lock without grabbing node_create_lock. Obviously
this locking policy is unable to guarantee that the two node lists
are always synchronized especially when the operation of changing
and accessing them occurs in different contexts like currently doing.

Therefore, rename node_create_lock to node_list_lock to protect the
two node lists, that is, whenever node is inserted into them or node
is queried from them, the node_list_lock should be always held. As a
result, tipc_net_lock read lock becomes redundant and then can be
removed from the node query functions.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
987b58be37 tipc: make broadcast bearer store in bearer_list array
Now unicast bearer is dynamically allocated and placed into its
identity specified slot of bearer_list array. When we search
bearer_list array with a bearer identity, the corresponding bearer
instance can be found. But broadcast bearer is statically allocated
and it is not located in the bearer_list array yet. So we decide to
enlarge bearer_list array into MAX_BEARERS + 1 slots, and its last
slot stores the broadcast bearer so that the broadcast bearer can
be found from bearer_list array with MAX_BEARERS as index. The
change will help us reduce the complex relationship between bearer
and link in the future.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
f47de12b06 tipc: remove active flag from tipc_bearer structure
After the allocation of tipc_bearer structure instance is converted
from statical way to dynamical way, we identify whether a certain
tipc_bearer structure pointer is valid by checking whether the pointer
is NULL or not. So the active flag in tipc_bearer structure becomes
redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
3874ccbba8 tipc: convert tipc_bearers array to pointer list
As part of the effort to introduce RCU protection for the bearer
list, we first need to change it to a list of pointers.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:37 -04:00
Ying Xue
78dfb789b6 tipc: acquire necessary locks in named_cluster_distribute routine
The 'tipc_node_list' is guarded by tipc_net_lock and 'links' array
defined in 'tipc_node' structure is protected by node lock as well.
Without acquiring the two locks in named_cluster_distribute() a fatal
oops may happen in case that a destroyed link might be got and then
accessed. Therefore, above mentioned two locks must be held in
named_cluster_distribute() to prevent the issue from happening
accidentally.

As 'links' array in node struct must be protected by node lock,
we have to move the code of selecting an active link from
tipc_link_xmit() to named_cluster_distribute() and then call
__tipc_link_xmit() with the selected link to deliver name messages.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Ying Xue
5902385a24 tipc: obsolete the remote management feature
Due to the lacking of any credential, it's allowed to accept commands
requested from remote nodes to query the local node status, which is
prone to involve potential security risks. Instead, if we login to
a remote node with ssh command, this approach is not only more safe
than the remote management feature, but also it can give us more
permissions like changing the remote node configuration. So it's
reasonable for us to obsolete the remote management feature now.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Ying Xue
76d7882420 tipc: remove unnecessary checking for node object
tipc_node_create routine doesn't need to check whether a node
object specified with a node address exists or not because its
caller(ie, tipc_disc_recv_msg routine) has checked this before
calling it.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 13:08:36 -04:00
Monam Agarwal
fcb144b5df net/core: Use RCU_INIT_POINTER(x, NULL) in netpoll.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 00:18:09 -04:00
Monam Agarwal
cd18721e52 net/bridge: Use RCU_INIT_POINTER(x, NULL) in br_vlan.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27 00:18:09 -04:00
Eric Dumazet
de14439167 net: unix: non blocking recvmsg() should not return -EINTR
Some applications didn't expect recvmsg() on a non blocking socket
could return -EINTR. This possibility was added as a side effect
of commit b3ca9b02b0 ("net: fix multithreaded signal handling in
unix recv routines").

To hit this bug, you need to be a bit unlucky, as the u->readlock
mutex is usually held for very small periods.

Fixes: b3ca9b02b0 ("net: fix multithreaded signal handling in unix recv routines")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 17:05:40 -04:00
dingtianhong
71e415e44c vlan: make a new function vlan_dev_vlan_proto() and export
The vlan support 2 proto: 802.1q and 802.1ad, so make a new function
called vlan_dev_vlan_proto() which could return the vlan proto for
input dev.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 16:41:28 -04:00
Tom Herbert
61b905da33 net: Rename skb->rxhash to skb->hash
The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:58:20 -04:00
Peter Pan(潘卫平)
cc93fc51f3 tcp: delete unused parameter in tcp_nagle_check()
After commit d4589926d7 (tcp: refine TSO splits), tcp_nagle_check() does
not use parameter mss_now anymore.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:43:40 -04:00
Pravin B Shelar
fbd02dd405 ip_tunnel: Fix dst ref-count.
Commit 10ddceb22b (ip_tunnel:multicast process cause panic due
to skb->_skb_refdst NULL pointer) removed dst-drop call from
ip-tunnel-recv.

Following commit reintroduce dst-drop and fix the original bug by
checking loopback packet before releasing dst.
Original bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-26 15:18:40 -04:00
Johan Hedberg
e8b1ab9e6d Bluetooth: Fix returning peer address in pending connect state
We should let user space request the peer address also in the pending
connect states, i.e. BT_CONNECT and BT_CONNECT2. There is existing user
space code that tries to do this and will fail without extending the set
of allowed states for the peer address information.

This patch adds the two states to the allowed ones in the L2CAP and
RFCOMM sock_getname functions, thereby preventing ENOTCONN from being
returned.

Reported-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Tested-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-26 09:31:33 -07:00
David S. Miller
04f58c8854 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	Documentation/devicetree/bindings/net/micrel-ks8851.txt
	net/core/netpoll.c

The net/core/netpoll.c conflict is a bug fix in 'net' happening
to code which is completely removed in 'net-next'.

In micrel-ks8851.txt we simply have overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 20:29:20 -04:00
David S. Miller
0fc3196603 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of wireless updates intended for 3.15!

For the mac80211 bits, Johannes says:

"This has a whole bunch of bugfixes for things that went into -next
previously as well as some other bugfixes I didn't want to rush into
3.14 at this point. The rest of it is some cleanups and a few small
features, the biggest of which is probably Janusz's regulatory DFS CAC
time code."

For the Bluetooth bits, Gustavo says:

"One more pull request to 3.15. This is mostly and bug fix pull request, it
contains several fixes and clean up all over the tree, plus some small new
features."

For the NFC bits, Samuel says:

"This is the NFC pull request for 3.15. With this one we have:

- Support for ISO 15693 a.k.a. NFC vicinity a.k.a. Type 5 tags. ISO
  15693 are long range (1 - 2 meters) vicinity tags/cards. The kernel
  now supports those through the NFC netlink and digital APIs.

- Support for TI's trf7970a chipset. This chipset relies on the NFC
  digital layer and the driver currently supports type 2, 4A and 5 tags.

- Support for NXP's pn544 secure firmare download. The pn544 C3 chipsets
  relies on a different firmware download protocal than the C2 one. We
  now support both and use the right one depending on the version we
  detect at runtime.

- Support for 4A tags from the NFC digital layer.

- A bunch of cleanups and minor fixes from Axel Lin and Thierry Escande."

For the iwlwifi bits, Emmanuel says:

"We were sending a host command while the mutex wasn't held. This
led to hard-to-catch races."

And...

"I have a fix for a "merge damage" which is not really a merge
damage: it enables scheduled scan which has been disabled in
wireless.git. Since you merged wireless.git into wireless-next.git,
this can now be fixed in wireless-next.git.

Besides this, Alex made a workaround for a hardware bug. This fix
allows us to consume less power in S3. Arik and Eliad continue to
work on D0i3 which is a run-time power saving feature. Eliad also
contributes a few bits to the rate scaling logic to which Eyal adds his
own contribution. Avri dives deep in the power code - newer firmware
will allow to enable power save in newer scenarios. Johannes made a few
clean-ups. I have the regular amount of BT Coex boring stuff. I disable
uAPSD since we identified firmware bugs that cause packet loss. One
thing that do stand out is the udev event that we now send when the
FW asserts. I hope it will allow us to debug the FW more easily."

Also included is one last iwlwifi pull for a build breakage fix...

For the Atheros bits, Kalle says:

"Michal now did some optimisations and was able to improve throughput by
100 Mbps on our MIPS based AP135 platform. Chun-Yeow added some
workarounds to be able to better use ad-hoc mode. Ben improved log
messages and added support for MSDU chaining. And, as usual, also some
smaller fixes."

Beyond that...

Andrea Merello continues his rtl8180 refactoring, in preparation for
a long-awaited rtl8187 driver.  We get a new driver (rsi) for the
RS9113 chip, from Fariya Fatima.  And, of course, we get the usual
round of updates for ath9k, brcmfmac, mwifiex, wil6210, etc. as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-25 19:25:39 -04:00
Simon Derr
8a5daf1e2c 9pnet_rdma: check token type before int conversion
When parsing options, make sure we have found a proper token before
doing a numeric conversion.

Without this check, the current code will end up following random
pointers that just happened to be on the stack when this function was
called, because match_token() will not touch the 'args' list unless a
valid token is found.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:19 -05:00
Simon Derr
263c582888 9pnet: trans_fd : allocate struct p9_trans_fd and struct p9_conn together.
There is no point in allocating these structs separately.
Changing this makes the code a little simpler and saves a few bytes of
memory.

Reported-by: Herve Vico
Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:17 -05:00
Simon Derr
0bfd6845c0 9P: Get rid of REQ_STATUS_FLSH
This request state is mostly useless, and properly implementing it
for RDMA would require an extra lock to be taken in handle_recv()
and in rdma_cancel() to avoid this race:

    handle_recv()           rdma_cancel()
        .                     .
        .                   if req->state == SENT
    req->state = RCVD         .
        .                           req->state = FLSH

So just get rid of it.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:15 -05:00
Simon Derr
931700d26b 9pnet_rdma: add cancelled()
Take into account posted recv buffers that will never receive their
reply.

The RDMA code posts a recv buffer for each request that it sends.
When a request is flushed, it is possible that this request will
never receive a reply, and that one recv buffer will stay unused on
the recv queue.

It is then possible, if this scenario happens several times, to have the
recv queue full, and have the 9pnet_rmda module unable to send new requests.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:13 -05:00
Simon Derr
3f9d5b8dfd 9pnet_rdma: update request status during send
This will be needed by the flush logic.

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:12 -05:00
Simon Derr
afd8d65411 9P: Add cancelled() to the transport functions.
And move transport-specific code out of net/9p/client.c

Signed-off-by: Simon Derr <simon.derr@bull.net>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:11 -05:00
Rashika
05a782d416 net: Mark function as static in 9p/client.c
Mark function as static in net/9p/client.c because it is not used
outside this file.

This eliminates the following warning in net/9p/client.c:
net/9p/client.c:207:18: warning: no previous prototype for ‘p9_fcall_alloc’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:38:09 -05:00
Dominique Martinet
2b6e72ed74 9P: Add memory barriers to protect request fields over cb/rpc threads handoff
We need barriers to guarantee this pattern works as intended:
[w] req->rc, 1		[r] req->status, 1
wmb			rmb
[w] req->status, 1	[r] req->rc

Where the wmb ensures that rc gets written before status,
and the rmb ensures that if you observe status == 1, rc is the new value.

Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2014-03-25 16:37:59 -05:00
Erik Hugne
a5d0e7c037 tipc: fix spinlock recursion bug for failed subscriptions
If a topology event subscription fails for any reason, such as out
of memory, max number reached or because we received an invalid
request the correct behavior is to terminate the subscribers
connection to the topology server. This is currently broken and
produces the following oops:

[27.953662] tipc: Subscription rejected, illegal request
[27.955329] BUG: spinlock recursion on CPU#1, kworker/u4:0/6
[27.957066]  lock: 0xffff88003c67f408, .magic: dead4ead, .owner: kworker/u4:0/6, .owner_cpu: 1
[27.958054] CPU: 1 PID: 6 Comm: kworker/u4:0 Not tainted 3.14.0-rc6+ #5
[27.960230] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[27.960874] Workqueue: tipc_rcv tipc_recv_work [tipc]
[27.961430]  ffff88003c67f408 ffff88003de27c18 ffffffff815c0207 ffff88003de1c050
[27.962292]  ffff88003de27c38 ffffffff815beec5 ffff88003c67f408 ffffffff817f0a8a
[27.963152]  ffff88003de27c58 ffffffff815beeeb ffff88003c67f408 ffffffffa0013520
[27.964023] Call Trace:
[27.964292]  [<ffffffff815c0207>] dump_stack+0x45/0x56
[27.964874]  [<ffffffff815beec5>] spin_dump+0x8c/0x91
[27.965420]  [<ffffffff815beeeb>] spin_bug+0x21/0x26
[27.965995]  [<ffffffff81083df6>] do_raw_spin_lock+0x116/0x140
[27.966631]  [<ffffffff815c6215>] _raw_spin_lock_bh+0x15/0x20
[27.967256]  [<ffffffffa0008540>] subscr_conn_shutdown_event+0x20/0xa0 [tipc]
[27.968051]  [<ffffffffa000fde4>] tipc_close_conn+0xa4/0xb0 [tipc]
[27.968722]  [<ffffffffa00101ba>] tipc_conn_terminate+0x1a/0x30 [tipc]
[27.969436]  [<ffffffffa00089a2>] subscr_conn_msg_event+0x1f2/0x2f0 [tipc]
[27.970209]  [<ffffffffa0010000>] tipc_receive_from_sock+0x90/0xf0 [tipc]
[27.970972]  [<ffffffffa000fa79>] tipc_recv_work+0x29/0x50 [tipc]
[27.971633]  [<ffffffff8105dbf5>] process_one_work+0x165/0x3e0
[27.972267]  [<ffffffff8105e869>] worker_thread+0x119/0x3a0
[27.972896]  [<ffffffff8105e750>] ? manage_workers.isra.25+0x2a0/0x2a0
[27.973622]  [<ffffffff810648af>] kthread+0xdf/0x100
[27.974168]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0
[27.974893]  [<ffffffff815ce13c>] ret_from_fork+0x7c/0xb0
[27.975466]  [<ffffffff810647d0>] ? kthread_create_on_node+0x1a0/0x1a0

The recursion occurs when subscr_terminate tries to grab the
subscriber lock, which is already taken by subscr_conn_msg_event.
We fix this by checking if the request to establish a new
subscription was successful, and if not we initiate termination of
the subscriber after we have released the subscriber lock.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 15:36:56 -04:00
Li RongQing
c27f0872a3 netpoll: fix the skb check in pkt_is_ns
Neighbor Solicitation is ipv6 protocol, so we should check
skb->protocol with ETH_P_IPV6

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Cc: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 15:08:40 -04:00
Li RongQing
0b8c7f6f2a ipv4: remove ip_rt_dump from route.c
ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 12:45:01 -04:00
Johan Hedberg
8396215d48 Bluetooth: Remove unnecessary assignment in SMP
The smp variable in smp_conn_security is not used anywhere before the
smp = smp_chan_create() call in the smp_conn_security function so it
makes no sense to assign any other value to it before that.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 08:43:50 -07:00
Johan Hedberg
61b3b2b6f4 Bluetooth: Fix potential NULL pointer dereference in smp_conn_security
The smp pointer might not be initialized for jumps to the "done" label
in the smp_conn_security function. Furthermore doing the set_bit after
done might "overwrite" a previous value of the flag in case pairing was
already in progress. This patch moves the call to set_bit before the
label so that it is only done for a newly created smp context (as
returned by smp_chan_create).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 08:43:47 -07:00
Johan Hedberg
1d98bf4fda Bluetooth: Remove LTK re-encryption procedure
Due to several devices being unable to handle this procedure reliably
(resulting in forced disconnections before pairing completes) it's
better to remove it altogether.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:56 -07:00
Johan Hedberg
a82505c7bc Bluetooth: Don't try to confirm locally initiated SMP pairing
In the case that the just-works model would be triggered we only want to
confirm remotely initiated pairings (i.e. those triggered by a Security
Request or Pairing Request). This patch adds the necessary check to the
tk_request function to fall back to the JUST_WORKS method in the case of
a locally initiated pairing.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:56 -07:00
Johan Hedberg
edca792c03 Bluetooth: Add SMP flag to track which side is the initiator
For remotely initiated just-works pairings we want to show the user a
confirmation dialog for the pairing. However, we can only know which
side was the initiator by tracking which side sends the first Security
Request or Pairing Request PDU. This patch adds a new SMP flag to
indicate whether our side was the initiator for the pairing.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:56 -07:00
Johan Hedberg
4eb65e667b Bluetooth: Fix SMP confirmation callback handling
In the case that a local pairing confirmation (JUST_CFM) has been
selected as the method we need to use the user confirm request mgmt
event for it with the confirm_hint set to 1 (to indicate confirmation
without any specific passkey value). Without this (if passkey_notify was
used) the pairing would never proceed. This patch adds the necessary
call to mgmt_user_confirm_request in this scenario.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:56 -07:00
Johan Hedberg
81d0c8ad71 Bluetooth: Add missing cmd_status handler for LE_Start_Encryption
It is possible that the HCI_LE_Start_Encryption command fails in an
early stage and triggers a command status event with the failure code.
In such a case we need to properly notify the hci_conn object and
cleanly bring the connection down. This patch adds the missing command
status handler for this HCI command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:55 -07:00
Johan Hedberg
0a66cf2036 Bluetooth: Fix potential NULL pointer dereference in SMP
If a sudden disconnection happens the l2cap_conn pointer may already
have been cleaned up by the time hci_conn_security gets called,
resulting in the following oops if we don't have a proper NULL check:

BUG: unable to handle kernel NULL pointer dereference at 000000c8
IP: [<c132e2ed>] smp_conn_security+0x26/0x151
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU: 1 PID: 673 Comm: memcheck-x86-li Not tainted 3.14.0-rc2+ #437
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: f0ef0520 ti: f0d6a000 task.ti: f0d6a000
EIP: 0060:[<c132e2ed>] EFLAGS: 00010246 CPU: 1
EIP is at smp_conn_security+0x26/0x151
EAX: f0ec1770 EBX: f0ec1770 ECX: 00000002 EDX: 00000002
ESI: 00000002 EDI: 00000000 EBP: f0d6bdc0 ESP: f0d6bda0
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 80050033 CR2: 000000c8 CR3: 30f0f000 CR4: 00000690
Stack:
 f4f55000 00000002 f0d6bdcc c1097a2b c1319f40 f0ec1770 00000002 f0d6bdd0
 f0d6bde8 c1312a82 f0d6bdfc c1312a82 c1319f84 00000008 f4d81c20 f0e5fd86
 f0ec1770 f0d6bdfc f0d6be28 c131be3b c131bdc1 f0d25270 c131be3b 00000008
Call Trace:
 [<c1097a2b>] ? __kmalloc+0x118/0x128
 [<c1319f40>] ? mgmt_pending_add+0x49/0x9b
 [<c1312a82>] hci_conn_security+0x4a/0x1dd
 [<c1312a82>] ? hci_conn_security+0x4a/0x1dd
 [<c1319f84>] ? mgmt_pending_add+0x8d/0x9b
 [<c131be3b>] pair_device+0x1e1/0x206
 [<c131bdc1>] ? pair_device+0x167/0x206
 [<c131be3b>] ? pair_device+0x1e1/0x206
 [<c131ed44>] mgmt_control+0x275/0x2d6

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-24 07:51:55 -07:00
Li RongQing
4a4eb21fd6 ipv4: remove ipv4_ifdown_dst from route.c
ipv4_ifdown_dst does nothing after IPv4 route caches removal,
so we can remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-24 00:18:44 -04:00
Simon Wunderlich
3f2532fcde batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:59 +01:00
Antonio Quartulli
151dcb3c56 batman-adv: improve DAT documentation
Add missing documentation for BATADV_DAT_ADDR_MAX and
convert an existing documentation to kerneldoc

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22 09:18:59 +01:00
Antonio Quartulli
43e6f65a3d batman-adv: improve the TT flags documentation
Convert the current documentation for the TT flags in proper
kerneldoc and improve it by adding an explanation for each
of the flags.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22 09:18:59 +01:00
Linus Lüssing
4c8755d69c batman-adv: Send multicast packets to nodes with a WANT_ALL flag
With this patch a node sends IPv4 multicast packets to nodes which
have a BATADV_MCAST_WANT_ALL_IPV4 flag set and IPv6 multicast packets
to nodes which have a BATADV_MCAST_WANT_ALL_IPV6 flag set, too.

Why is this needed? There are scenarios involving bridges where
multicast report snooping and multicast TT announcements are not
sufficient, which would lead to packet loss for some nodes otherwise:

MLDv1 and IGMPv1/IGMPv2 have a suppression mechanism
for multicast listener reports. When we have an MLDv1/IGMPv1/IGMPv2
querier behind a bridge then our snooping bridge is potentially not
going to see any reports even though listeners exist because according
to RFC4541 such reports are only forwarded to multicast routers:

-----------------------------------------------------------
            ---------------
{Querier}---|Snoop. Switch|----{Listener}
            ---------------
                       \           ^
                      -------
                      | br0 |  <  ???
                      -------
                          \
                     _-~---~_
                 _-~/        ~-_
                ~   batman-adv  \-----{Sender}
                \~_   cloud    ~/
                   -~~__-__-~_/

I)  MLDv1 Query:  {Querier}  -> flooded
II) MLDv1 Report: {Listener} -> {Querier}

-> br0 cannot detect the {Listener}
=> Packets from {Sender} need to be forwarded to all
   detected listeners and MLDv1/IGMPv1/IGMPv2 queriers.

-----------------------------------------------------------

Note that we do not need to explicitly forward to MLDv2/IGMPv3 queriers,
because these protocols have no report suppression: A bridge has no
trouble detecting MLDv2/IGMPv3 listeners.

Even though we do not support bridges yet we need to provide the
according infrastructure already to not break compatibility later.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:58 +01:00
Linus Lüssing
ab49886e3d batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support
With this patch a node may additionally perform the dropping or
unicasting behaviour for a link-local IPv4 and link-local-all-nodes
IPv6 multicast packet, too.

The extra counter and BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag is needed
because with a future bridge snooping support integration a node with a
bridge on top of its soft interface is not able to reliably detect its
multicast listeners for IPv4 link-local and the IPv6
link-local-all-nodes addresses anymore (see RFC4541, section 2.1.2.2
and section 3).

Even though this new flag does make "no difference" now, it'll ensure
a seamless integration of multicast bridge support without needing to
break compatibility later.

Also note, that even with multicast bridge support it won't be possible
to optimize 224.0.0.x and ff02::1 towards nodes with bridges, they will
always receive these ranges.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:58 +01:00
Linus Lüssing
1d8ab8d3c1 batman-adv: Modified forwarding behaviour for multicast packets
With this patch a multicast packet is not always simply flooded anymore,
the behaviour for the following cases is changed to reduce
unnecessary overhead:

If all nodes within the horizon of a certain node have signalized
multicast listener announcement capability then an IPv6 multicast packet
with a destination of IPv6 link-local scope (excluding ff02::1) coming
from the upstream of this node...

* ...is dropped if there is no according multicast listener in the
  translation table,
* ...is forwarded via unicast if there is a single node with interested
  multicast listeners
* ...and otherwise still gets flooded.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:57 +01:00
Linus Lüssing
60432d756c batman-adv: Announce new capability via multicast TVLV
If the soft interface of a node is not part of a bridge then a node
announces a new multicast TVLV: The existence of this TVLV
signalizes that this node is announcing all of its multicast listeners
via the translation table infrastructure.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:57 +01:00
Linus Lüssing
e17931d1a6 batman-adv: introduce capability initialization bitfield
The new bitfield allows us to keep track whether capability subsets of
an originator have gone through their initialization phase yet.

The translation table is the only user right now, but a new one will be
added soon.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:56 +01:00
Linus Lüssing
c5caf4ef34 batman-adv: Multicast Listener Announcements via Translation Table
With this patch a node which has no bridge interface on top of its soft
interface announces its local multicast listeners via the translation
table.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 09:18:56 +01:00
Antonio Quartulli
c5d3a652a3 batman-adv: add kerneldoc for dst_hint argument
Some helper functions used along the TX path have now a new
"dst_hint" argument but the kerneldoc was missing.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22 08:50:26 +01:00
Marek Lindner
af63fde503 batman-adv: call unregister_netdev() to have it handle the locking for us
Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 08:50:26 +01:00
Simon Wunderlich
1a321b0deb batman-adv: fix a few kerneldoc inconsistencies
Reported-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 08:50:26 +01:00
Antonio Quartulli
8fdd01530c batman-adv: prefer ether_addr_copy to memcpy
On some architectures ether_addr_copy() is slightly faster
than memcpy() therefore use the former when possible.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-03-22 08:50:26 +01:00
Linus Lüssing
e88b617d84 batman-adv: remove obsolete skb_reset_mac_header() in batadv_bla_tx()
Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having
the skb mac header pointer set correctly since the following commit
present in kernels >= 3.9:

"net: reset mac header in dev_start_xmit()" (6d1ccff627)

Therefore this commit removes the according, now redundant,
skb_reset_mac_header() call in batadv_bla_tx().

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 08:50:26 +01:00
Linus Lüssing
927c2ed7e5 batman-adv: use vlan_/eth_hdr() instead of skb->data in interface_tx path
Our .ndo_start_xmit handler (batadv_interface_tx()) can rely on having
the skb mac header pointer set correctly since the following commit
present in kernels >= 3.9:

"net: reset mac header in dev_start_xmit()" (6d1ccff627)

Therefore we can safely use eth_hdr() and vlan_eth_hdr() instead of
skb->data now, which spares us some ugly type casts.

At the same time set the mac_header in batadv_dat_snoop_incoming_arp_request()
before sending the skb along the TX path.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 08:50:26 +01:00
Fengguang Wu
abae9479ca batman-adv: fix coccinelle warnings
net/batman-adv/network-coding.c:1535:1-7: Replace memcpy with struct assignment

Generated by: coccinelle/misc/memcpy-assign.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-03-22 08:50:26 +01:00
Marcel Holtmann
533553f873 Bluetooth: Track current configured LE scan type parameter
The LE scan type paramter defines if active scanning or passive scanning
is in use. Track the currently set value so it can be used for decision
making from other pieces in the core.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-21 22:02:12 +02:00
John W. Linville
49c0ca17ee Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-03-21 14:02:04 -04:00
David S. Miller
b74d3feccc Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
Open vSwitch

Four small fixes for net/3.14. I realize that these are late in the
cycle - just got back from vacation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 17:29:02 -04:00
Alexander Aring
54af36e713 ieee802154: dgram: cleanup set of broadcast panid
This patch is only a cleanup to use the right define for a panid field.
The broadcast address and panid broadcast is still the same value.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 17:19:45 -04:00
Alexander Aring
06324f2f7c af_ieee802154: fix check on broadcast address
This patch fixes an issue which was introduced by commit
b70ab2e87f ("ieee802154: enforce
consistent endianness in the 802.15.4 stack").

The correct behaviour should be a check on the broadcast address field
which is 0xffff.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Reported-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 17:19:45 -04:00
Nicolas Dichtel
f518338b16 ip6mr: fix mfc notification flags
Commit 812e44dd18 ("ip6mr: advertise new mfc entries via rtnl") reuses the
function ip6mr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:24:28 -04:00
Nicolas Dichtel
65886f439a ipmr: fix mfc notification flags
Commit 8cd3ac9f9b ("ipmr: advertise new mfc entries via rtnl") reuses the
function ipmr_fill_mroute() to notify mfc events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:24:28 -04:00
Nicolas Dichtel
1c104a6beb rtnetlink: fix fdb notification flags
Commit 3ff661c38c ("net: rtnetlink notify events for FDB NTF_SELF adds and
deletes") reuses the function nlmsg_populate_fdb_fill() to notify fdb events.
But this function was used only for dump and thus was always setting the
flag NLM_F_MULTI, which is wrong in case of a single notification.

Libraries like libnl will wait forever for NLMSG_DONE.

CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:24:28 -04:00
Daniel Baluta
e35bad5d87 net: remove empty lines from tcp_syn_flood_action
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 16:17:22 -04:00
Ben Pfaff
f9b8c4c8ba openvswitch: Correctly report flow used times for first 5 minutes after boot.
The kernel starts out its "jiffies" timer as 5 minutes below zero, as
shown in include/linux/jiffies.h:

  /*
   * Have the 32 bit jiffies value wrap 5 minutes after boot
   * so jiffies wrap bugs show up earlier.
   */
  #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ))

The loop in ovs_flow_stats_get() starts out with 'used' set to 0, then
takes any "later" time.  This means that for the first five minutes after
boot, flows will always be reported as never used, since 0 is greater than
any time already seen.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-03-20 10:45:21 -07:00
Trond Myklebust
494314c415 SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status
When restarting an rpc call, we should not be carrying over data from the
previous call.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-20 13:38:44 -04:00
Trond Myklebust
6bd144160a SUNRPC: Don't let rpc_delay() clobber non-timeout errors
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-20 13:38:43 -04:00
Johan Hedberg
61b1a7fbda Bluetooth: Fix address value for early disconnection events
We need to ensure that we do not send events to user space with the
identity address if we have not yet notified user space of the IRK. The
code was previously trying to handle this for the mgmt_pair_device
response (which worked well enough) but this is not the only connection
related event that might be sent to user space before pairing is
successful: another important event is Device Disconnected.

The issue can actually be solved more simply than the solution
previously used for mgmt_pair_device. Since we do have the identity
address tracked as part of the remote IRK struct we can just copy it
over from there to the hci_conn struct once we've for real sent the mgmt
event for the new IRK.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-20 09:14:26 -07:00
John W. Linville
370c5acef0 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-03-20 11:54:22 -04:00
John W. Linville
7eb2450a51 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-03-20 11:53:20 -04:00
Steve Dickson
1fa3e2eb9d SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
Don't schedule an rpc_delay before checking to see if the task
is a SOFTCONN because the tk_callback from the delay (__rpc_atrun)
clears the task status before the rpc_exit_task can be run.

Signed-off-by: Steve Dickson <steved@redhat.com>
Fixes: 561ec16031 (SUNRPC: call_connect_status should recheck...)
Link: http://lkml.kernel.org/r/5329CF7C.7090308@RedHat.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-20 11:46:52 -04:00
Srivatsa S. Bhat
a0e247a805 net/iucv/iucv.c: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the code in net/iucv/iucv.c by using this latter form of callback
registration. Also, provide helper functions to perform the common memory
allocations and frees, to condense repetitive code.

Cc: Ursula Braun <ursula.braun@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:48 +01:00
Srivatsa S. Bhat
e30a293e8a net/core/flow.c: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:

	get_online_cpus();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	register_cpu_notifier(&foobar_cpu_notifier);

	put_online_cpus();

This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).

Instead, the correct and race-free way of performing the callback
registration is:

	cpu_notifier_register_begin();

	for_each_online_cpu(cpu)
		init_cpu(cpu);

	/* Note the use of the double underscored version of the API */
	__register_cpu_notifier(&foobar_cpu_notifier);

	cpu_notifier_register_done();

Fix the code in net/core/flow.c by using this latter form of callback
registration.

Cc: Li RongQing <roy.qing.li@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 13:43:48 +01:00
Johan Hedberg
39adbffe4b Bluetooth: Fix passkey endianess in user_confirm and notify_passkey
The passkey_notify and user_confirm functions in mgmt.c were expecting
different endianess for the passkey, leading to a big endian bug and
sparse warning in recently added SMP code. This patch converts both
functions to expect host endianess and do the conversion to little
endian only when assigning to the mgmt event struct.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 23:22:07 -07:00
Ursula Braun
2f139a5d82 af_iucv: recvmsg problem for SOCK_STREAM sockets
Commit f9c41a62bb introduced
a problem for SOCK_STREAM sockets, when only part of the
incoming iucv message is received by user space. In this
case the remaining data of the iucv message is lost.
This patch makes sure an incompletely received iucv message
is queued back to the receive queue.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Reported-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-20 00:06:55 -04:00
Marcel Holtmann
40b552aa5a Bluetooth: Enforce strict Secure Connections Only mode security
In Secure Connections Only mode, it is required that Secure Connections
is used for pairing and that the link key is encrypted with AES-CCM using
a P-256 authenticated combination key. If this is not the case, then new
connection shall be refused or existing connections shall be dropped.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-03-19 23:30:32 +02:00
Trond Myklebust
9455e3f43b SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-03-19 17:19:42 -04:00
Johan Hedberg
4e7b2030c4 Bluetooth: Fix Pair Device response parameters for pairing failure
It is possible that pairing fails after we've already received remote
identity information. One example of such a situation is when
re-encryption using the LTK fails. In this case the hci_conn object has
already been updated with the identity address but user space does not
yet know about it (since we didn't notify it of the new IRK yet).

To ensure user space doesn't get a Pair Device command response with an
unknown address always use the same address in the response as was used
for the original command.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 13:56:30 -07:00
Johan Hedberg
01ad34d267 Bluetooth: Fix SMP user passkey notification mgmt event
When performing SMP pairing with MITM protection one side needs to
enter the passkey while the other side displays to the user what needs
to be entered. Nowhere in the SMP specification does it say that the
displaying side needs to any kind of confirmation of the passkey, even
though a code comment in smp.c implies this.

This patch removes the misleading comment and converts the code to use
the passkey notification mgmt event instead of the passkey confirmation
mgmt event.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 13:55:06 -07:00
Johan Hedberg
5ed884d765 Bluetooth: Increase SMP re-encryption delay to 500ms
In some cases the current 250ms delay is not enough for the remote to
receive the keys, as can be witnessed by the following log:

> ACL Data RX: Handle 64 flags 0x02 dlen 21               [hci1] 231.414217
      SMP: Signing Information (0x0a) len 16
        Signature key: 555bb66b7ab3abc9d5c287c97fe6eb29
< ACL Data TX: Handle 64 flags 0x00 dlen 21               [hci1] 231.414414
      SMP: Encryption Information (0x06) len 16
        Long term key: 2a7cdc233c9a4b1f3ed31dd9843fea29
< ACL Data TX: Handle 64 flags 0x00 dlen 15               [hci1] 231.414466
      SMP: Master Identification (0x07) len 10
        EDIV: 0xeccc
        Rand: 0x322e0ef50bd9308a
< ACL Data TX: Handle 64 flags 0x00 dlen 21               [hci1] 231.414505
      SMP: Signing Information (0x0a) len 16
        Signature key: bbda1b2076e2325aa66fbcdd5388f745
> HCI Event: Number of Completed Packets (0x13) plen 5    [hci1] 231.483130
        Num handles: 1
        Handle: 64
        Count: 2
< HCI Command: LE Start Encryption (0x08|0x0019) plen 28  [hci1] 231.664211
        Handle: 64
        Random number: 0x5052ad2b75fed54b
        Encrypted diversifier: 0xb7c2
        Long term key: a336ede66711b49a84bde9b41426692e
> HCI Event: Command Status (0x0f) plen 4                 [hci1] 231.666937
      LE Start Encryption (0x08|0x0019) ncmd 1
        Status: Success (0x00)
> HCI Event: Number of Completed Packets (0x13) plen 5    [hci1] 231.712646
        Num handles: 1
        Handle: 64
        Count: 1
> HCI Event: Disconnect Complete (0x05) plen 4            [hci1] 232.562587
        Status: Success (0x00)
        Handle: 64
        Reason: Remote User Terminated Connection (0x13)

As can be seen, the last key (Signing Information) is sent at 231.414505
but the completed packets event for it comes only at 231.712646,
i.e. roughly 298ms later.

To have a better margin of error this patch increases the delay to
500ms.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 13:55:06 -07:00
Johan Hedberg
18e4aeb9b8 Bluetooth: Simplify logic when checking SMP_FLAG_TK_VALID
This is a trivial coding style simplification by instead of having an
extra early return to instead revert the if condition and do the single
needed queue_work() call there.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-03-19 13:55:05 -07:00
Zhao, Gang
73fb08e24a cfg80211: remove macro ASSERT_RDEV_LOCK(rdev)
Macro ASSERT_RDEV_LOCK(rdev) is equal to ASSERT_RTNL(), so replace it
with ASSERT_RTNL() and remove it.

Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:58 +01:00
Zhao, Gang
4da6462213 cfg80211: remove unnecessary check
RCU pointer bss->pub.beacon_ies is checked before in previous
statement:

if (rcu_access_pointer(bss->pub.beacon_ies))
	continue;

There is no need to check it twice(and in the wrong way :) ).

Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:57 +01:00
Emmanuel Grumbach
fb378c231d mac80211: set beamforming bit in radiotap
Add a bit in rx_status.vht_flags to let the low level driver
notify mac80211 about a beamformed packet. Propagate this
to the radiotap header.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:57 +01:00
Emmanuel Grumbach
3afc2167f6 cfg80211/mac80211: ignore signal if the frame was heard on wrong channel
On 2.4Ghz band, the channels overlap since the delta
between different channels is 5Mhz while the width of the
receiver is 20Mhz (at least).

This means that we can hear beacons or probe responses from
adjacent channels. These frames will have a significant
lower RSSI which will feed all kinds of logic with inaccurate
data. An obvious example is the roaming algorithm that will
think our AP is getting weak and will try to move to another
AP.

In order to avoid this, update the signal only if the frame
has been heard on the same channel as the one advertised by
the AP in its DS / HT IEs.
We refrain from updating the values only if the AP is
already in the BSS list so that we will still have a valid
(but inaccurate) value if the AP was heard on an adjacent
channel only.

To achieve this, stop taking the channel from DS / HT IEs
in mac80211. The DS / HT IEs is taken into account to
discard the frame if it was received on a disabled channel.
This can happen due to the same phenomenon: the frame is
sent on channel 12, but heard on channel 11 while channel
12 can be disabled on certain devices. Since this check
is done in cfg80211, stop even checking this in mac80211.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[remove unused rx_freq variable]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:56 +01:00
Zhao, Gang
2316d7b054 cfg80211: make __cfg80211_join_ibss() static
Function __cfg80211_join_ibss() is only used in net/wireless/ibss.c,
so make it static.

Signed-off-by: Zhao, Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:56 +01:00
Alexander Bondar
71228a1eab mac80211: release sched_scan_sdata when stopping sched scan
Assuming sched_scan_stop operation is synchronous the driver may not
necessary call ieee80211_sched_scan_stopped_work. Since this work is
the only place where sched_scan_sdata is released we can possibly run
into situation when it is never released. Fix this by releasing it
just after calling drv_sched_scan_stop.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:55 +01:00
Michael Braun
112c44b2df mac80211: fix WPA with VLAN on AP side with ps-sta again
commit de74a1d903
  "mac80211: fix WPA with VLAN on AP side with ps-sta"
fixed an issue where queued multicast packets would
be sent out encrypted with the key of an other bss.

commit "7cbf9d017dbb5e3276de7d527925d42d4c11e732"
  "mac80211: fix oops on mesh PS broadcast forwarding"
essentially reverted it, because vif.type cannot be AP_VLAN
due to the check to vif.type in ieee80211_get_buffered_bc before.

As the later commit intended to fix the MESH case, fix it
by checking for IFTYPE_AP instead of IFTYPE_AP_VLAN.

Cc: stable@vger.kernel.org
Fixes: 7cbf9d017d ("mac80211: fix oops on mesh PS broadcast forwarding")
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:54 +01:00
Johannes Berg
1a1cb744de mac80211: fix suspend vs. authentication race
Since Stanislaw's patch removing the quiescing code, mac80211 had
a race regarding suspend vs. authentication: as cfg80211 doesn't
track authentication attempts, it can't abort them. Therefore the
attempts may be kept running while suspending, which can lead to
all kinds of issues, in at least some cases causing an error in
iwlmvm firmware.

Fix this by aborting the authentication attempt when suspending.

Cc: stable@vger.kernel.org
Fixes: 12e7f51702 ("mac80211: cleanup generic suspend/resume procedures")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:53 +01:00
Johannes Berg
c9c3a06046 mac80211: verify deauthentication and return error on failure
When still authenticating the mac80211 code handling a deauthentication
requests from userspace doesn't verify that the request is valid in any
way, fix that. Additionally, it never returns an error, even if there's
no connection or authentication attempt, fix that as well.

While at it, move the message to not print a message in the error case
and to distinguish between the two cases.

Also simplify the code by duplicating the cfg80211 call.

Reviewed-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:53 +01:00
Johannes Berg
d2722f8b87 mac80211: fix potential use-after-free
The bss struct might be freed in ieee80211_rx_bss_put(),
so we shouldn't use it afterwards.

Cc: stable@vger.kernel.org (3.10+)
Fixes: 817cee7675 ("mac80211: track AP's beacon rate and give it to the driver")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-03-19 21:29:52 +01:00
Tejun Heo
4d3bb511b5 cgroup: drop const from @buffer of cftype->write_string()
cftype->write_string() just passes on the writeable buffer from kernfs
and there's no reason to add const restriction on the buffer.  The
only thing const achieves is unnecessarily complicating parsing of the
buffer.  Drop const from @buffer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>                                           
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
2014-03-19 10:23:54 -04:00
David S. Miller
3ab428a4c5 netfilter: Add missing vmalloc.h include to nft_hash.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18 23:12:02 -04:00
Phoebe Buckheister
8cfad496c4 ieee802154: properly unshare skbs in ieee802154 *_rcv functions
ieee802154 sockets do not properly unshare received skbs, which leads to
panics (at least) when they are used in conjunction with 6lowpan, so
run skb_share_check on received skbs.
6lowpan also contains a use-after-free, which is trivially fixed by
replacing the inlined skb_share_check with the explicit call.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18 15:59:25 -04:00
lucien
e367c2d03d ipv6: ip6_append_data_mtu do not handle the mtu of the second fragment properly
In ip6_append_data_mtu(), when the xfrm mode is not tunnel(such as
transport),the ipsec header need to be added in the first fragment, so the mtu
will decrease to reserve space for it, then the second fragment come, the mtu
should be turn back, as the commit 0c1833797a
said.  however, in the commit a493e60ac4bbe2e977e7129d6d8cbb0dd236be, it use
*mtu = min(*mtu, ...) to change the mtu, which lead to the new mtu is alway
equal with the first fragment's. and cannot turn back.

when I test through  ping6 -c1 -s5000 $ip (mtu=1280):
...frag (0|1232) ESP(spi=0x00002000,seq=0xb), length 1232
...frag (1232|1216)
...frag (2448|1216)
...frag (3664|1216)
...frag (4880|164)

which should be:
...frag (0|1232) ESP(spi=0x00001000,seq=0x1), length 1232
...frag (1232|1232)
...frag (2464|1232)
...frag (3696|1232)
...frag (4928|116)

so delete the min() when change back the mtu.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Fixes: 75a493e60a ("ipv6: ip6_append_data_mtu did not care about pmtudisc and frag_size")
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-18 15:17:53 -04:00