Commit Graph

33333 Commits

Author SHA1 Message Date
stephen hemminger
f647944995 ceph: remove bogus extern
Sparse complained about this bogus extern on definition of
a function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:39:19 -07:00
Eric Dumazet
9709674e68 ipv4: fix a race in ip4_datagram_release_cb()
Alexey gave a AddressSanitizer[1] report that finally gave a good hint
at where was the origin of various problems already reported by Dormando
in the past [2]

Problem comes from the fact that UDP can have a lockless TX path, and
concurrent threads can manipulate sk_dst_cache, while another thread,
is holding socket lock and calls __sk_dst_set() in
ip4_datagram_release_cb() (this was added in linux-3.8)

It seems that all we need to do is to use sk_dst_check() and
sk_dst_set() so that all the writers hold same spinlock
(sk->sk_dst_lock) to prevent corruptions.

TCP stack do not need this protection, as all sk_dst_cache writers hold
the socket lock.

[1]
https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel

AddressSanitizer: heap-use-after-free in ipv4_dst_check
Read of size 2 by thread T15453:
 [<ffffffff817daa3a>] ipv4_dst_check+0x1a/0x90 ./net/ipv4/route.c:1116
 [<ffffffff8175b789>] __sk_dst_check+0x89/0xe0 ./net/core/sock.c:531
 [<ffffffff81830a36>] ip4_datagram_release_cb+0x46/0x390 ??:0
 [<ffffffff8175eaea>] release_sock+0x17a/0x230 ./net/core/sock.c:2413
 [<ffffffff81830882>] ip4_datagram_connect+0x462/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Freed by thread T15455:
 [<ffffffff8178d9b8>] dst_destroy+0xa8/0x160 ./net/core/dst.c:251
 [<ffffffff8178de25>] dst_release+0x45/0x80 ./net/core/dst.c:280
 [<ffffffff818304c1>] ip4_datagram_connect+0xa1/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

Allocated by thread T15453:
 [<ffffffff8178d291>] dst_alloc+0x81/0x2b0 ./net/core/dst.c:171
 [<ffffffff817db3b7>] rt_dst_alloc+0x47/0x50 ./net/ipv4/route.c:1406
 [<     inlined    >] __ip_route_output_key+0x3e8/0xf70
__mkroute_output ./net/ipv4/route.c:1939
 [<ffffffff817dde08>] __ip_route_output_key+0x3e8/0xf70 ./net/ipv4/route.c:2161
 [<ffffffff817deb34>] ip_route_output_flow+0x14/0x30 ./net/ipv4/route.c:2249
 [<ffffffff81830737>] ip4_datagram_connect+0x317/0x5d0 ??:0
 [<ffffffff81846d06>] inet_dgram_connect+0x76/0xd0 ./net/ipv4/af_inet.c:534
 [<ffffffff817580ac>] SYSC_connect+0x15c/0x1c0 ./net/socket.c:1701
 [<ffffffff817596ce>] SyS_connect+0xe/0x10 ./net/socket.c:1682
 [<ffffffff818b0a29>] system_call_fastpath+0x16/0x1b
./arch/x86/kernel/entry_64.S:629

[2]
<4>[196727.311203] general protection fault: 0000 [#1] SMP
<4>[196727.311224] Modules linked in: xt_TEE xt_dscp xt_DSCP macvlan bridge coretemp crc32_pclmul ghash_clmulni_intel gpio_ich microcode ipmi_watchdog ipmi_devintf sb_edac edac_core lpc_ich mfd_core tpm_tis tpm tpm_bios ipmi_si ipmi_msghandler isci igb libsas i2c_algo_bit ixgbe ptp pps_core mdio
<4>[196727.311333] CPU: 17 PID: 0 Comm: swapper/17 Not tainted 3.10.26 #1
<4>[196727.311344] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0 07/05/2013
<4>[196727.311364] task: ffff885e6f069700 ti: ffff885e6f072000 task.ti: ffff885e6f072000
<4>[196727.311377] RIP: 0010:[<ffffffff815f8c7f>]  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.311399] RSP: 0018:ffff885effd23a70  EFLAGS: 00010282
<4>[196727.311409] RAX: dead000000200200 RBX: ffff8854c398ecc0 RCX: 0000000000000040
<4>[196727.311423] RDX: dead000000100100 RSI: dead000000100100 RDI: dead000000200200
<4>[196727.311437] RBP: ffff885effd23a80 R08: ffffffff815fd9e0 R09: ffff885d5a590800
<4>[196727.311451] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
<4>[196727.311464] R13: ffffffff81c8c280 R14: 0000000000000000 R15: ffff880e85ee16ce
<4>[196727.311510] FS:  0000000000000000(0000) GS:ffff885effd20000(0000) knlGS:0000000000000000
<4>[196727.311554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[196727.311581] CR2: 00007a46751eb000 CR3: 0000005e65688000 CR4: 00000000000407e0
<4>[196727.311625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[196727.311669] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[196727.311713] Stack:
<4>[196727.311733]  ffff8854c398ecc0 ffff8854c398ecc0 ffff885effd23ab0 ffffffff815b7f42
<4>[196727.311784]  ffff88be6595bc00 ffff8854c398ecc0 0000000000000000 ffff8854c398ecc0
<4>[196727.311834]  ffff885effd23ad0 ffffffff815b86c6 ffff885d5a590800 ffff8816827821c0
<4>[196727.311885] Call Trace:
<4>[196727.311907]  <IRQ>
<4>[196727.311912]  [<ffffffff815b7f42>] dst_destroy+0x32/0xe0
<4>[196727.311959]  [<ffffffff815b86c6>] dst_release+0x56/0x80
<4>[196727.311986]  [<ffffffff81620bd5>] tcp_v4_do_rcv+0x2a5/0x4a0
<4>[196727.312013]  [<ffffffff81622b5a>] tcp_v4_rcv+0x7da/0x820
<4>[196727.312041]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312070]  [<ffffffff815de02d>] ? nf_hook_slow+0x7d/0x150
<4>[196727.312097]  [<ffffffff815fd9e0>] ? ip_rcv_finish+0x360/0x360
<4>[196727.312125]  [<ffffffff815fda92>] ip_local_deliver_finish+0xb2/0x230
<4>[196727.312154]  [<ffffffff815fdd9a>] ip_local_deliver+0x4a/0x90
<4>[196727.312183]  [<ffffffff815fd799>] ip_rcv_finish+0x119/0x360
<4>[196727.312212]  [<ffffffff815fe00b>] ip_rcv+0x22b/0x340
<4>[196727.312242]  [<ffffffffa0339680>] ? macvlan_broadcast+0x160/0x160 [macvlan]
<4>[196727.312275]  [<ffffffff815b0c62>] __netif_receive_skb_core+0x512/0x640
<4>[196727.312308]  [<ffffffff811427fb>] ? kmem_cache_alloc+0x13b/0x150
<4>[196727.312338]  [<ffffffff815b0db1>] __netif_receive_skb+0x21/0x70
<4>[196727.312368]  [<ffffffff815b0fa1>] netif_receive_skb+0x31/0xa0
<4>[196727.312397]  [<ffffffff815b1ae8>] napi_gro_receive+0xe8/0x140
<4>[196727.312433]  [<ffffffffa00274f1>] ixgbe_poll+0x551/0x11f0 [ixgbe]
<4>[196727.312463]  [<ffffffff815fe00b>] ? ip_rcv+0x22b/0x340
<4>[196727.312491]  [<ffffffff815b1691>] net_rx_action+0x111/0x210
<4>[196727.312521]  [<ffffffff815b0db1>] ? __netif_receive_skb+0x21/0x70
<4>[196727.312552]  [<ffffffff810519d0>] __do_softirq+0xd0/0x270
<4>[196727.312583]  [<ffffffff816cef3c>] call_softirq+0x1c/0x30
<4>[196727.312613]  [<ffffffff81004205>] do_softirq+0x55/0x90
<4>[196727.312640]  [<ffffffff81051c85>] irq_exit+0x55/0x60
<4>[196727.312668]  [<ffffffff816cf5c3>] do_IRQ+0x63/0xe0
<4>[196727.312696]  [<ffffffff816c5aaa>] common_interrupt+0x6a/0x6a
<4>[196727.312722]  <EOI>
<1>[196727.313071] RIP  [<ffffffff815f8c7f>] ipv4_dst_destroy+0x4f/0x80
<4>[196727.313100]  RSP <ffff885effd23a70>
<4>[196727.313377] ---[ end trace 64b3f14fae0f2e29 ]---
<0>[196727.380908] Kernel panic - not syncing: Fatal exception in interrupt

Reported-by: Alexey Preobrazhensky <preobr@google.com>
Reported-by: dormando <dormando@rydia.ne>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 8141ed9fce ("ipv4: Add a socket release callback for datagram sockets")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:39:18 -07:00
Octavian Purdila
bad93e9d4e net: add __pskb_copy_fclone and pskb_copy_for_clone
There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:38:02 -07:00
Toshiaki Makita
204177f3f3 bridge: Support 802.1ad vlan filtering
This enables us to change the vlan protocol for vlan filtering.
We come to be able to filter frames on the basis of 802.1ad vlan tags
through a bridge.

This also changes br->group_addr if it has not been set by user.
This is needed for an 802.1ad bridge.
(See IEEE 802.1Q-2011 8.13.5.)

Furthermore, this sets br->group_fwd_mask_required so that an 802.1ad
bridge can forward the Nearest Customer Bridge group addresses except
for br->group_addr, which should be passed to higher layer.

To change the vlan protocol, write a protocol in sysfs:
# echo 0x88a8 > /sys/class/net/br0/bridge/vlan_protocol

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:22:53 -07:00
Toshiaki Makita
f2808d226f bridge: Prepare for forwarding another bridge group addresses
If a bridge is an 802.1ad bridge, it must forward another bridge group
addresses (the Nearest Customer Bridge group addresses).
(For details, see IEEE 802.1Q-2011 8.6.3.)

As user might not want group_fwd_mask to be modified by enabling 802.1ad,
introduce a new mask, group_fwd_mask_required, which indicates addresses
the bridge wants to forward. This will be set by enabling 802.1ad.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:22:53 -07:00
Toshiaki Makita
8580e2117c bridge: Prepare for 802.1ad vlan filtering support
This enables a bridge to have vlan protocol informantion and allows vlan
tag manipulation (retrieve, insert and remove tags) according to the vlan
protocol.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:22:53 -07:00
Toshiaki Makita
1c5abb6c77 bridge: Add 802.1ad tx vlan acceleration
Bridge device doesn't need to embed S-tag into skb->data.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:22:53 -07:00
Alexei Starovoitov
61f83d0d57 net: filter: fix warning on 32-bit arch
fix compiler warning on 32-bit architectures:

net/core/filter.c: In function '__sk_run_filter':
net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:12:27 -07:00
Jon Paul Maloy
02c00c2ab0 tipc: fix potential bug in function tipc_backlog_rcv
In commit 4f4482dcd9 ("tipc: compensate
for double accounting in socket rcv buffer") we access 'truesize' of
a received buffer after it might have been released by the function
filter_rcv().

In this commit we correct this by reading the value of 'truesize' to
the stack before delivering the buffer to filter_rcv().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:01:30 -07:00
Daniel Borkmann
9b87d46510 net: sctp: fix incorrect type in gfp initializer
This fixes the following sparse warning:

  net/sctp/associola.c:1556:29: warning: incorrect type in initializer (different base types)
  net/sctp/associola.c:1556:29:    expected bool [unsigned] [usertype] preload
  net/sctp/associola.c:1556:29:    got restricted gfp_t

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Daniel Borkmann
a7288c4dd5 net: sctp: improve sctp_select_active_and_retran_path selection
In function sctp_select_active_and_retran_path(), we walk the
transport list in order to look for the two most recently used
ACTIVE transports (trans_pri, trans_sec). In case we didn't find
anything ACTIVE, we currently just camp on a possibly PF or
INACTIVE transport that is primary path; this behavior actually
dates back to linux-history tree of the very early days of
lksctp, and can yield a behavior that chooses suboptimal
transport paths.

Instead, be a bit more clever by reusing and extending the
recently introduced sctp_trans_elect_best() handler. In case
both transports are evaluated to have the same score resulting
from their states, break the tie by looking at: 1) transport
patch error count 2) last_time_heard value from each transport.

This is analogous to Nishida's Quick Failover draft [1],
section 5.1, 3:

  The sender SHOULD avoid data transmission to PF destinations.
  When all destinations are in either PF or Inactive state,
  the sender MAY either move the destination from PF to active
  state (and transmit data to the active destination) or the
  sender MAY transmit data to a PF destination. In the former
  scenario, (i) the sender MUST NOT notify the ULP about the
  state transition, and (ii) MUST NOT clear the destination's
  error counter. It is recommended that the sender picks the
  PF destination with least error count (fewest consecutive
  timeouts) for data transmission. In case of a tie (multiple PF
  destinations with same error count), the sender MAY choose the
  last active destination.

Thus for sctp_select_active_and_retran_path(), we keep track of
the best, if any, transport that is in PF state and in case no
ACTIVE transport has been found (hence trans_{pri,sec} is NULL),
we select the best out of the three: current primary_path and
retran_path as well as a possible PF transport.

The secondary may still camp on the original primary_path as
before. The change in sctp_trans_elect_best() with a more fine
grained tie selection also improves at the same time path selection
for sctp_assoc_update_retran_path() in case of non-ACTIVE states.

  [1] http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Daniel Borkmann
e575235fc6 net: sctp: migrate most recently used transport to ktime
Be more precise in transport path selection and use ktime
helpers instead of jiffies to compare and pick the better
primary and secondary recently used transports. This also
avoids any side-effects during a possible roll-over, and
could lead to better path decision-making.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Daniel Borkmann
b82e8f31ac net: sctp: refactor active path selection
This patch just refactors and moves the code for the active
path selection into its own helper function outside of
sctp_assoc_control_transport() which is already big enough.
No functional changes here.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Daniel Borkmann
67cb9366ff ktime: add ktime_after and ktime_before helper
Add two minimal helper functions analogous to time_before() and
time_after() that will later on both be needed by SCTP code.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:23:17 -07:00
Phoebe Buckheister
2d3b5b0a90 mac802154: don't deliver packets to devices that are down
Only one WPAN devices can be active at any given time, so only deliver
packets to that one interface that is actually up. Multiple monitors may
be up at any given time, but we don't have to deliver to monitors that
are down either.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:10:19 -07:00
Phoebe Buckheister
a374eeb5e5 mac802154: properly free incoming skbs on decryption failure
mac802154 RX did not free skbs on decryption failure, assuming that the
caller would when the local rx handler returned _DROP. This was false.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 12:10:18 -07:00
Wei-Chun Chao
5882a07c72 net: fix UDP tunnel GSO of frag_list GRO packets
This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]

Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:48:47 -07:00
huizhang
f6c20c596f net: ipv6: Fixed up ipsec packet be re-routing issue
Bug report on https://bugzilla.kernel.org/show_bug.cgi?id=75781

When a local output ipsec packet match the mangle table rule,
and be set mark value, the packet will be route again in
route_me_harder -> _session_decoder6

In this case, the nhoff in CB of skb was still the default
value 0. So the protocal match can't success and the packet can't match
correct SA rule,and then the packet be send out in plaintext.

To fixed up the issue. The CB->nhoff must be set.

Signed-off-by: Hui Zhang <huizhang@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:47:31 -07:00
Dmitry Popov
5ce54af1fc ip_tunnel: fix i_key matching in ip_tunnel_find
Some tunnels (though only vti as for now) can use i_key just for internal use:
for example vti uses it for fwmark'ing incoming packets. So raw i_key value
shouldn't be treated as a distinguisher for them. ip_tunnel_key_match exists for
cases when we want to compare two ip_tunnel_parms' i_keys.

Example bug:
ip link add type vti ikey 1 local 1.0.0.1 remote 2.0.0.2
ip link add type vti ikey 2 local 1.0.0.1 remote 2.0.0.2
spawned two tunnels, although it doesn't make sense.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:43:37 -07:00
Dmitry Popov
7c8e6b9c28 ip_vti: Fix 'ip tunnel add' with 'key' parameters
ip tunnel add remote 10.2.2.1 local 10.2.2.2 mode vti ikey 1 okey 2
translates to p->iflags = VTI_ISVTI|GRE_KEY and p->i_key = 1, but GRE_KEY !=
TUNNEL_KEY, so ip_tunnel_ioctl would set i_key to 0 (same story with o_key)
making us unable to create vti tunnels with [io]key via ip tunnel.

We cannot simply translate GRE_KEY to TUNNEL_KEY (as GRE module does) because
vti_tunnels with same local/remote addresses but different ikeys will be treated
as different then. So, imo the best option here is to move p->i_flags & *_KEY
check for vti tunnels from ip_tunnel.c to ip_vti.c and to think about [io]_mark
field for ip_tunnel_parm in the future.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:30:52 -07:00
Alexei Starovoitov
e430f34ee5 net: filter: cleanup A/X name usage
The macro 'A' used in internal BPF interpreter:
 #define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.

This patch is trying to clean up the naming and clarify its usage in the
following way:

- A and X are names of two classic BPF registers

- BPF_REG_A denotes internal BPF register R0 used to map classic register A
  in internal BPF programs generated from classic

- BPF_REG_X denotes internal BPF register R7 used to map classic register X
  in internal BPF programs generated from classic

- internal BPF instruction format:
struct sock_filter_int {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};

- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
  BPF_X - means use register X as source operand
  BPF_K - means use 32-bit immediate as source operand
In internal:
  BPF_X - means use 'src_reg' register as source operand
  BPF_K - means use 32-bit immediate as source operand

Suggested-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Chema Gonzalez <chema@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:13:16 -07:00
Manuel Schölling
84a7c0b1db dns_resolver: assure that dns_query() result is null-terminated
dns_query() credulously assumes that keys are null-terminated and
returns a copy of a memory block that is off by one.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 00:12:04 -07:00
Linus Lüssing
2cd4143192 bridge: memorize and export selected IGMP/MLD querier port
Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in IGMP/MLD queriers
to be able to reliably serve any multicast listener behind this same
bridge.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 23:50:47 -07:00
Linus Lüssing
07f8ac4a1e bridge: add export of multicast database adjacent to net_dev
With this new, exported function br_multicast_list_adjacent(net_dev) a
list of IPv4/6 addresses is returned. This list contains all multicast
addresses sensed by the bridge multicast snooping feature on all bridge
ports of the bridge interface of net_dev, excluding addresses from the
specified net_device itself.

Adding bridge support to the batman-adv multicast optimization requires
batman-adv knowing about the existence of bridged-in multicast
listeners to be able to reliably serve them with multicast packets.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 23:50:47 -07:00
Linus Lüssing
dc4eb53a99 bridge: adhere to querier election mechanism specified by RFCs
MLDv1 (RFC2710 section 6), MLDv2 (RFC3810 section 7.6.2), IGMPv2
(RFC2236 section 3) and IGMPv3 (RFC3376 section 6.6.2) specify that the
querier with lowest source address shall become the selected
querier.

So far the bridge stopped its querier as soon as it heard another
querier regardless of its source address. This results in the "wrong"
querier potentially becoming the active querier or a potential,
unnecessary querying delay.

With this patch the bridge memorizes the source address of the currently
selected querier and ignores queries from queriers with a higher source
address than the currently selected one. This slight optimization is
supposed to make it more RFC compliant (but is rather uncritical and
therefore probably not necessary to be queued for stable kernels).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 23:50:47 -07:00
Linus Lüssing
90010b36eb bridge: rename struct bridge_mcast_query/querier
The current naming of these two structs is very random, in that
reversing their naming would not make any semantical difference.

This patch tries to make the naming less confusing by giving them a more
specific, distinguishable naming.

This is also useful for the upcoming patches reintroducing the
"struct bridge_mcast_querier" but for storing information about the
selected querier (no matter if our own or a foreign querier).

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 23:50:46 -07:00
Dmitry Popov
2346829e64 ipip, sit: fix ipv4_{update_pmtu,redirect} calls
ipv4_{update_pmtu,redirect} were called with tunnel's ifindex (t->dev is a
tunnel netdevice). It caused wrong route lookup and failure of pmtu update or
redirect. We should use the same ifindex that we use in ip_route_output_* in
*tunnel_xmit code. It is t->parms.link .

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 23:35:52 -07:00
stephen hemminger
f8c1b7ce00 gre: allow changing mac address when device is up
There is no need to require forcing device down on a Ethernet GRE (gretap)
tunnel to change the MAC address.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 22:46:42 -07:00
Octavian Purdila
6cc55e096f tcp: add gfp parameter to tcp_fragment
tcp_fragment can be called from process context (from tso_fragment).
Add a new gfp parameter to allow it to preserve atomic memory if
possible.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Reviewed-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-10 22:30:58 -07:00
Linus Torvalds
d1e1cda862 NFS client updates for Linux 3.16
Highlights include:
 
 - Massive cleanup of the NFS read/write code by Anna and Dros
 - Support multiple NFS read/write requests per page in order to deal with
   non-page aligned pNFS striping. Also cleans up the r/wsize < page size
   code nicely.
 - stable fix for ensuring inode is declared uptodate only after all the
   attributes have been checked.
 - stable fix for a kernel Oops when remounting
 - NFS over RDMA client fixes
 - move the pNFS files layout driver into its own subdirectory
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTl3pmAAoJEGcL54qWCgDyraIP/08ZbbDowVTP9572bxl+VR2i
 zNbrflBtl1R05D4Imi/IEySK0w6xj1CLsncNpXAT2bxTlyKPW70tpiiPlRKMPuO8
 JW+iPiepR2t0mol6MEd46yuV8btXVk8I+7IYjPXANiMJG8O5dJzNQ8NiCQOERBNt
 FQ7rzTCFO0ESGXnT6vYrT4I0bwqYVklBiJRTT4PQVzhhhDq9qUdq21BlQjQJFXP4
 9aBLurxKptlHBvE6A2Quja6ObEC0s31CxcijqHIJ+Ue4GbKcFbMG1tgjY7ESE/AD
 rqzDeF0jvWHT+frmvFEUUXWqzF1ReZ4x9pfDoOgeG6T9/K6DT91O0yMOgG8jvlbF
 8DSATNYGDX5sSjpvaG5JokGG+cGCk9srVDx+itn7HlwzalRwn0PjKtIYwOJ7TJIr
 o/j20nOsPrRGF0OqLf9phyocgRrlbMKOzj1IXldHHfAbNkRcISTK08lxvsz96Ddn
 zRyDmbsbY6QFXdB3AVSeQmg5R0OOLtzNIcsFPmNdvy5eiy67qU0lsGg8UGNnoz8k
 PHN1pcGejkctLhQ32ee3w/W6zkrgpJZcNC9JSoG8Dc3SeXus0c3IgumRknFCmiep
 ssN+1jEITAGeS5a2aBxwLQLVI2JAr2lxs5e+R4D5EsQlFkCl6Mrgtzh/aToWTuFl
 Qt7l2zI3r3VieKT9u7Bh
 =OyXR
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

   - massive cleanup of the NFS read/write code by Anna and Dros
   - support multiple NFS read/write requests per page in order to deal
     with non-page aligned pNFS striping.  Also cleans up the r/wsize <
     page size code nicely.
   - stable fix for ensuring inode is declared uptodate only after all
     the attributes have been checked.
   - stable fix for a kernel Oops when remounting
   - NFS over RDMA client fixes
   - move the pNFS files layout driver into its own subdirectory"

* tag 'nfs-for-3.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (79 commits)
  NFS: populate ->net in mount data when remounting
  pnfs: fix lockup caused by pnfs_generic_pg_test
  NFSv4.1: Fix typo in dprintk
  NFSv4.1: Comment is now wrong and redundant to code
  NFS: Use raw_write_seqcount_begin/end int nfs4_reclaim_open_state
  xprtrdma: Disconnect on registration failure
  xprtrdma: Remove BUG_ON() call sites
  xprtrdma: Avoid deadlock when credit window is reset
  SUNRPC: Move congestion window constants to header file
  xprtrdma: Reset connection timeout after successful reconnect
  xprtrdma: Use macros for reconnection timeout constants
  xprtrdma: Allocate missing pagelist
  xprtrdma: Remove Tavor MTU setting
  xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
  xprtrdma: Reduce the number of hardway buffer allocations
  xprtrdma: Limit work done by completion handler
  xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
  xprtrmda: Reduce lock contention in completion handlers
  xprtrdma: Split the completion queue
  xprtrdma: Make rpcrdma_ep_destroy() return void
  ...
2014-06-10 15:02:42 -07:00
Linus Torvalds
5b174fd647 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "The largest piece is a long-overdue rewrite of the xdr code to remove
  some annoying limitations: for example, there was no way to return
  ACLs larger than 4K, and readdir results were returned only in 4k
  chunks, limiting performance on large directories.

  Also:
        - part of Neil Brown's work to make NFS work reliably over the
          loopback interface (so client and server can run on the same
          machine without deadlocks).  The rest of it is coming through
          other trees.
        - cleanup and bugfixes for some of the server RDMA code, from
          Steve Wise.
        - Various cleanup of NFSv4 state code in preparation for an
          overhaul of the locking, from Jeff, Trond, and Benny.
        - smaller bugfixes and cleanup from Christoph Hellwig and
          Kinglong Mee.

  Thanks to everyone!

  This summer looks likely to be busier than usual for knfsd.  Hopefully
  we won't break it too badly; testing definitely welcomed"

* 'for-3.16' of git://linux-nfs.org/~bfields/linux: (100 commits)
  nfsd4: fix FREE_STATEID lockowner leak
  svcrdma: Fence LOCAL_INV work requests
  svcrdma: refactor marshalling logic
  nfsd: don't halt scanning the DRC LRU list when there's an RC_INPROG entry
  nfs4: remove unused CHANGE_SECURITY_LABEL
  nfsd4: kill READ64
  nfsd4: kill READ32
  nfsd4: simplify server xdr->next_page use
  nfsd4: hash deleg stateid only on successful nfs4_set_delegation
  nfsd4: rename recall_lock to state_lock
  nfsd: remove unneeded zeroing of fields in nfsd4_proc_compound
  nfsd: fix setting of NFS4_OO_CONFIRMED in nfsd4_open
  nfsd4: use recall_lock for delegation hashing
  nfsd: fix laundromat next-run-time calculation
  nfsd: make nfsd4_encode_fattr static
  SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
  nfsd: remove unused function nfsd_read_file
  nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer
  NFSD: Error out when getting more than one fsloc/secinfo/uuid
  NFSD: Using type of uint32_t for ex_nflavors instead of int
  ...
2014-06-10 11:50:57 -07:00
Linus Torvalds
14208b0ec5 Merge branch 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "A lot of activities on cgroup side.  Heavy restructuring including
  locking simplification took place to improve the code base and enable
  implementation of the unified hierarchy, which currently exists behind
  a __DEVEL__ mount option.  The core support is mostly complete but
  individual controllers need further work.  To explain the design and
  rationales of the the unified hierarchy

        Documentation/cgroups/unified-hierarchy.txt

  is added.

  Another notable change is css (cgroup_subsys_state - what each
  controller uses to identify and interact with a cgroup) iteration
  update.  This is part of continuing updates on css object lifetime and
  visibility.  cgroup started with reference count draining on removal
  way back and is now reaching a point where csses behave and are
  iterated like normal refcnted objects albeit with some complexities to
  allow distinguishing the state where they're being deleted.  The css
  iteration update isn't taken advantage of yet but is planned to be
  used to simplify memcg significantly"

* 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits)
  cgroup: disallow disabled controllers on the default hierarchy
  cgroup: don't destroy the default root
  cgroup: disallow debug controller on the default hierarchy
  cgroup: clean up MAINTAINERS entries
  cgroup: implement css_tryget()
  device_cgroup: use css_has_online_children() instead of has_children()
  cgroup: convert cgroup_has_live_children() into css_has_online_children()
  cgroup: use CSS_ONLINE instead of CGRP_DEAD
  cgroup: iterate cgroup_subsys_states directly
  cgroup: introduce CSS_RELEASED and reduce css iteration fallback window
  cgroup: move cgroup->serial_nr into cgroup_subsys_state
  cgroup: link all cgroup_subsys_states in their sibling lists
  cgroup: move cgroup->sibling and ->children into cgroup_subsys_state
  cgroup: remove cgroup->parent
  device_cgroup: remove direct access to cgroup->children
  memcg: update memcg_has_children() to use css_next_child()
  memcg: remove tasks/children test from mem_cgroup_force_empty()
  cgroup: remove css_parent()
  cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css
  cgroup: use cgroup->self.refcnt for cgroup refcnting
  ...
2014-06-09 15:03:33 -07:00
David S. Miller
b78370c021 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-06-06

Please accept this batch of fixes intended for the 3.16 stream.

For the bluetooth bits, Gustavo says:

"Here some more patches for 3.16. We know that Linus already opened the merge
window, but this is fix only pull request, and most of the patches here are
also tagged for stable."

Along with that, Andrea Merello provides a fix for the broken scanning
in the venerable at76c50x driver...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-08 14:17:39 -07:00
Eric Dumazet
87757a917b net: force a list_del() in unregister_netdevice_many()
unregister_netdevice_many() API is error prone and we had too
many bugs because of dangling LIST_HEAD on stacks.

See commit f87e6f4793 ("net: dont leave active on stack LIST_HEAD")

In fact, instead of making sure no caller leaves an active list_head,
just force a list_del() in the callee. No one seems to need to access
the list after unregister_netdevice_many()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-08 14:15:14 -07:00
Linus Torvalds
b20dcab9d4 LLVMLinux patches for v3.16
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlOTY+wACgkQuseO5dulBZXrIgCdFZyXRojufLLKikWEvHjZ3/k5
 KsQAnimtcge+62/IX7YwDjWS+xg9Wt3m
 =yPrI
 -----END PGP SIGNATURE-----

Merge tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel

Pull LLVM patches from Behan Webster:
 "Next set of patches to support compiling the kernel with clang.
  They've been soaking in linux-next since the last merge window.

  More still in the works for the next merge window..."

* tag 'llvmlinux-for-v3.16' of git://git.linuxfoundation.org/llvmlinux/kernel:
  arm, unwind, LLVMLinux: Enable clang to be used for unwinding the stack
  ARM: LLVMLinux: Change "extern inline" to "static inline" in glue-cache.h
  all: LLVMLinux: Change DWARF flag to support gcc and clang
  net: netfilter: LLVMLinux: vlais-netfilter
  crypto: LLVMLinux: aligned-attribute.patch
2014-06-08 12:27:44 -07:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Mark Charlebois
066c6807f7 net: netfilter: LLVMLinux: vlais-netfilter
Replaced non-standard C use of Variable Length Arrays In Structs (VLAIS) in
xt_repldata.h with a C99 compliant flexible array member and then calculated
offsets to the other struct members. These other members aren't referenced by
name in this code, however this patch maintains the same memory layout and
padding as was previously accomplished using VLAIS.

Had the original structure been ordered differently, with the entries VLA at
the end, then it could have been a flexible member, and this patch would have
been a lot simpler. However since the data stored in this structure is
ultimately exported to userspace, the order of this structure can't be changed.

This patch makes no attempt to change the existing behavior, merely the way in
which the current layout is accomplished using standard C99 constructs. As such
the code can now be compiled with either gcc or clang.

This version of the patch removes the trailing alignment that the VLAIS
structure would allocate in order to simplify the patch.

Author: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Vinícius Tinti <viniciustinti@gmail.com>
2014-06-07 11:44:39 -07:00
Phoebe Buckheister
fff1f59b17 mac802154: llsec: add forgotten list_del_rcu in key removal
During key removal, the key object is freed, but not taken out of the
llsec key list properly. Fix that.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-06 16:25:37 -07:00
Steve Wise
83710fc753 svcrdma: Fence LOCAL_INV work requests
Fencing forces the invalidate to only happen after all prior send
work requests have been completed.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reported by : Devesh Sharma <Devesh.Sharma@Emulex.Com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:51 -04:00
Steve Wise
0bf4828983 svcrdma: refactor marshalling logic
This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
2014-06-06 19:22:50 -04:00
J. Bruce Fields
05638dc73a nfsd4: simplify server xdr->next_page use
The rpc code makes available to the NFS server an array of pages to
encod into.  The server represents its reply as an xdr buf, with the
head pointing into the first page in that array, the pages ** array
starting just after that, and the tail (if any) sharing any leftover
space in the page used by the head.

While encoding, we use xdr_stream->page_ptr to keep track of which page
we're currently using.

Currently we set xdr_stream->page_ptr to buf->pages, which makes the
head a weird exception to the rule that page_ptr always points to the
page we're currently encoding into.  So, instead set it to buf->pages -
1 (the page actually containing the head), and remove the need for a
little unintuitive logic in xdr_get_next_encode_buffer() and
xdr_truncate_encode.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-06-06 19:22:46 -04:00
John W. Linville
c6ac68a612 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-06-06 11:59:11 -04:00
Dmitry Popov
586d5fc867 ip_tunnel: fix possible rtable leak
ip_rt_put(rt) is always called in "error" branches above, but was missed in
skb_cow_head branch. As rt is not yet bound to skb here we have to release it by
hand.

Signed-off-by: Dmitry Popov <ixaphire@qrator.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 18:44:44 -07:00
Ilya Dryomov
6044cde6f2 libceph: add ceph_monc_wait_osdmap()
Add ceph_monc_wait_osdmap(), which will block until the osdmap with the
specified epoch is received or timeout occurs.

Export both of these as they are going to be needed by rbd.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06 09:29:57 +08:00
Ilya Dryomov
513a8243d6 libceph: mon_get_version request infrastructure
Add support for mon_get_version requests to libceph.  This reuses much
of the ceph_mon_generic_request infrastructure, with one exception.
Older OSDs don't set mon_get_version reply hdr->tid even if the
original request had a non-zero tid, which makes it impossible to
lookup ceph_mon_generic_request contexts by tid in get_generic_reply()
for such replies.  As a workaround, we allocate a reply message on the
reply path.  This can probably interfere with revoke, but I don't see
a better way.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06 09:29:57 +08:00
Ilya Dryomov
002b36ba5e libceph: recognize poolop requests in debugfs
Recognize poolop requests in debugfs monc dump, fix prink format
specifiers - tid is unsigned.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-06-06 09:29:56 +08:00
Sven Wegener
9e89fd8b7d ipv6: Shrink udp_v6_mcast_next() to one socket variable
To avoid the confusion of having two variables, shrink the function to
only use the parameter variable for looping.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 16:23:08 -07:00
David S. Miller
f666f87b94 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netback/netback.c
	net/core/filter.c

A filter bug fix overlapped some cleanups and a conversion
over to some new insn generation macros.

A xen-netback bug fix overlapped the addition of multi-queue
support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 16:22:02 -07:00
Alexei Starovoitov
0dcceabb0c net: filter: fix SKF_AD_PKTTYPE extension on big-endian
BPF classic->internal converter broke SKF_AD_PKTTYPE extension, since
pkt_type_offset() was failing to find skb->pkt_type field which is defined as:
__u8 pkt_type:3,
     fclone:2,
     ipvs_property:1,
     peeked:1,
     nf_trace:1;

Fix it by searching for 3 most significant bits and shift them by 5 at run-time

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:40:38 -07:00
David S. Miller
6934e79ed1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/nf_tables fixes for net-next

This patchset contains fixes for recent updates available in your
net-next, they are:

1) Fix double memory allocation for accounting objects that results
   in a leak, this slipped through with the new quota extension,
   patch from Mathieu Poirier.

2) Fix broken ordering when adding set element transactions.

3) Make sure that objects are released in reverse order in the abort
   path, to avoid possible use-after-free when accessing dependencies.

4) Allow to delete several objects (as long as dependencies are
   fulfilled) by using one batch. This includes changes in the use
   counter semantics of the nf_tables objects.

5) Fix illegal sleeping allocation from rcu callback.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:35:04 -07:00
Toshiaki Makita
e0a47d1f78 bridge: Fix incorrect judgment of promisc
br_manage_promisc() incorrectly expects br_auto_port() to return only 0
or 1, while it actually returns flags, i.e., a subset of BR_AUTO_MASK.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:20:31 -07:00
Simon Horman
3b392ddba2 MPLS: Use mpls_features to activate software MPLS GSO segmentation
If an MPLS packet requires segmentation then use mpls_features
to determine if the software implementation should be used.

As no driver advertises MPLS GSO segmentation this will always be
the case.

I had not noticed that this was necessary before as software MPLS GSO
segmentation was already being used in my test environment. I believe that
the reason for that is the skbs in question always had fragments and the
driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the
case for most drivers). Thus software segmentation was activated by
skb_gso_ok().

This introduces the overhead of an extra call to skb_network_protocol()
in the case where where CONFIG_NET_MPLS_GSO is set and
skb->ip_summed == CHECKSUM_NONE.

Thanks to Jesse Gross for prompting me to investigate this.

Signed-off-by: Simon Horman <horms@verge.net.au>
Acked-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 15:05:09 -07:00
John W. Linville
67be1e4f4b Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-06-05 14:10:07 -04:00
WANG Cong
ebbe495f19 ipv4: use skb frags api in udp4_hwcsum()
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:51:47 -07:00
WANG Cong
4cb28970a2 net: use the new API kvfree()
It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:49:51 -07:00
Manuel Schölling
9638f6713f dns_resolver: Do not accept domain names longer than 255 chars
According to RFC1035 "[...] the total length of a domain name (i.e.,
label octets and label length octets) is restricted to 255 octets or
less."

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-05 00:05:53 -07:00
Tom Herbert
359a0ea987 vxlan: Add support for UDP checksums (v4 sending, v6 zero csums)
Added VXLAN link configuration for sending UDP checksums, and allowing
TX and RX of UDP6 checksums.

Also, call common iptunnel_handle_offloads and added GSO support for
checksums.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:39 -07:00
Tom Herbert
4749c09c37 gre: Call gso_make_checksum
Call gso_make_checksum. This should have the benefit of using a
checksum that may have been previously computed for the packet.

This also adds NETIF_F_GSO_GRE_CSUM to differentiate devices that
offload GRE GSO with and without the GRE checksum offloaed.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
0f4f4ffa7b net: Add GSO support for UDP tunnels with checksum
Added a new netif feature for GSO_UDP_TUNNEL_CSUM. This indicates
that a device is capable of computing the UDP checksum in the
encapsulating header of a UDP tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
e9c3a24b3a tcp: Call gso_make_checksum
Call common gso_make_checksum when calculating checksum for a
TCP GSO segment.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
7e2b10c1e5 net: Support for multiple checksums with gso
When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.

This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.

This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
77157e1973 l2tp: call udp{6}_set_csum
Call common functions to set checksum for UDP tunnel.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Tom Herbert
af5fcba7f3 udp: Generic functions to set checksum
Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 22:46:38 -07:00
Sven Wegener
3bfdc59a6c ipv6: Fix regression caused by efe4208 in udp_v6_mcast_next()
Commit efe4208 ("ipv6: make lookups simpler and faster") introduced a
regression in udp_v6_mcast_next(), resulting in multicast packets not
reaching the destination sockets under certain conditions.

The packet's IPv6 addresses are wrongly compared to the IPv6 addresses
from the function's socket argument, which indicates the starting point
for looping, instead of the loop variable. If the addresses from the
first socket do not match the packet's addresses, no socket in the list
will match.

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 15:42:01 -07:00
Sasha Levin
f830b0223c net: Revert "fib_trie: use seq_file_net rather than seq->private"
This reverts commit 30f38d2fdd.

fib_triestat is surrounded by a big lie: while it claims that it's a
seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't:

	static const struct file_operations fib_triestat_fops = {
	        .owner  = THIS_MODULE,
	        .open   = fib_triestat_seq_open,
	        .read   = seq_read,
	        .llseek = seq_lseek,
	        .release = single_release_net,
	};

Yes, fib_triestat is just a regular file.

A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files
you could do seq_file_net() to get the net ptr, doing so for a regular
file would be wrong and would dereference an invalid pointer.

The fib_triestat lie claimed a victim, and trying to show the file would
be bad for the kernel. This patch just reverts the issue and fixes
fib_triestat, which still needs a rewrite to either be a seq_file or
stop claiming it is.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-04 15:11:41 -07:00
Chuck Lever
c93c62231c xprtrdma: Disconnect on registration failure
If rpcrdma_register_external() fails during request marshaling, the
current RPC request is killed. Instead, this RPC should be retried
after reconnecting the transport instance.

The most likely reason for registration failure with FRMR is a
failed post_send, which would be due to a remote transport
disconnect or memory exhaustion. These issues can be recovered
by a retry.

Problems encountered in the marshaling logic itself will not be
corrected by trying again, so these should still kill a request.

Now that we've added a clean exit for marshaling errors, take the
opportunity to defang some BUG_ON's.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:53 -04:00
Chuck Lever
c977dea227 xprtrdma: Remove BUG_ON() call sites
If an error occurs in the marshaling logic, fail the RPC request
being processed, but leave the client running.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:53 -04:00
Chuck Lever
e7ce710a88 xprtrdma: Avoid deadlock when credit window is reset
Update the cwnd while processing the server's reply.  Otherwise the
next task on the xprt_sending queue is still subject to the old
credit window. Currently, no task is awoken if the old congestion
window is still exceeded, even if the new window is larger, and a
deadlock results.

This is an issue during a transport reconnect. Servers don't
normally shrink the credit window, but the client does reset it to
1 when reconnecting so the server can safely grow it again.

As a minor optimization, remove the hack of grabbing the initial
cwnd size (which happens to be RPC_CWNDSCALE) and using that value
as the congestion scaling factor. The scaling value is invariant,
and we are better off without the multiplication operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:52 -04:00
Chuck Lever
4f4cf5ad6f SUNRPC: Move congestion window constants to header file
I would like to use one of the RPC client's congestion algorithm
constants in transport-specific code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:51 -04:00
Chuck Lever
18906972aa xprtrdma: Reset connection timeout after successful reconnect
If the new connection is able to make forward progress, reset the
re-establish timeout. Otherwise it keeps growing even if disconnect
events are rare.

The same behavior as TCP is adopted: reconnect immediately if the
transport instance has been able to make some forward progress.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:51 -04:00
Chuck Lever
bfaee096de xprtrdma: Use macros for reconnection timeout constants
Clean up: Ensure the same max and min constant values are used
everywhere when setting reconnect timeouts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:50 -04:00
Shirley Ma
196c69989d xprtrdma: Allocate missing pagelist
GETACL relies on transport layer to alloc memory for reply buffer.
However xprtrdma assumes that the reply buffer (pagelist) has been
pre-allocated in upper layer. This problem was reported by IOL OFA lab
test on PPC.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Edward Mossman <emossman@iol.unh.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:49 -04:00
Chuck Lever
5bc4bc7292 xprtrdma: Remove Tavor MTU setting
Clean up.  Remove HCA-specific clutter in xprtrdma, which is
supposed to be device-independent.

Hal Rosenstock <hal@dev.mellanox.co.il> observes:
> Note that there is OpenSM option (enable_quirks) to return 1K MTU
> in SA PathRecord responses for Tavor so that can be used for this.
> The default setting for enable_quirks is FALSE so that would need
> changing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:48 -04:00
Chuck Lever
ec62f40d35 xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
disconnect, his HCA is failing to create a fresh QP, leaving
ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
wake up and post LOCAL_INV as they exit, causing an oops.

rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
creation error code (-EPERM in this case) to the RPC client's
generic layer. xprt_connect_status() does not recognize -EPERM, so
it kills pending RPC tasks immediately rather than retrying the
connect.

Re-arrange the QP creation logic so that when it fails on reconnect,
it leaves ->qp with the old QP rather than NULL.  If pending RPC
tasks wake and exit, LOCAL_INV work requests will flush rather than
oops.

On initial connect, leaving ->qp == NULL is OK, since there are no
pending RPCs that might use ->qp. But be sure not to try to destroy
a NULL QP when rpcrdma_ep_connect() is retried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:47 -04:00
Chuck Lever
65866f8259 xprtrdma: Reduce the number of hardway buffer allocations
While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.

rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.

RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."

Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().

We'd like fewer hardway allocations.

Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.

On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.

So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.

This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:46 -04:00
Chuck Lever
8301a2c047 xprtrdma: Limit work done by completion handler
Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().

Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:45 -04:00
Chuck Lever
1c00dd0776 xprtrmda: Reduce calls to ib_poll_cq() in completion handlers
Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:44 -04:00
Chuck Lever
7f23f6f6e3 xprtrmda: Reduce lock contention in completion handlers
Skip the ib_poll_cq() after re-arming, if the provider knows there
are no additional items waiting. (Have a look at commit ed23a727 for
more details).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:43 -04:00
Chuck Lever
fc66448549 xprtrdma: Split the completion queue
The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.

When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.

To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.

Fix suggested by Shirley Ma <shirley.ma@oracle.com>

Reported-by: Rafael Reiter <rafael.reiter@ims.co.at>
Fixes: 5c635e09ce
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Klemens Senn <klemens.senn@ims.co.at>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:42 -04:00
Chuck Lever
7f1d54191e xprtrdma: Make rpcrdma_ep_destroy() return void
Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.

Make rpcrdma_ep_destroy() return void instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:41 -04:00
Chuck Lever
13c9ff8f67 xprtrdma: Simplify rpcrdma_deregister_external() synopsis
Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:40 -04:00
Chuck Lever
cdd9ade711 xprtrdma: mount reports "Invalid mount option" if memreg mode not supported
If the selected memory registration mode is not supported by the
underlying provider/HCA, the NFS mount command reports that there was
an invalid mount option, and fails. This is misleading.

Reporting a problem allocating memory is a lot closer to the truth.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:39 -04:00
Chuck Lever
f10eafd3a6 xprtrdma: Fall back to MTHCAFMR when FRMR is not supported
An audit of in-kernel RDMA providers that do not support the FRMR
memory registration shows that several of them support MTHCAFMR.
Prefer MTHCAFMR when FRMR is not supported.

If MTHCAFMR is not supported, only then choose ALLPHYSICAL.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:39 -04:00
Chuck Lever
0ac531c183 xprtrdma: Remove REGISTER memory registration mode
All kernel RDMA providers except amso1100 support either MTHCAFMR
or FRMR, both of which are faster than REGISTER.  amso1100 can
continue to use ALLPHYSICAL.

The only other ULP consumer in the kernel that uses the reg_phys_mr
verb is Lustre.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:38 -04:00
Chuck Lever
b45ccfd25d xprtrdma: Remove MEMWINDOWS registration modes
The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.

MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.

At this point, the MEMWINDOWS modes add needless complexity, so
remove them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:37 -04:00
Chuck Lever
03ff8821eb xprtrdma: Remove BOUNCEBUFFERS memory registration mode
Clean up: This memory registration mode is slow and was never
meant for use in production environments. Remove it to reduce
implementation complexity.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:37 -04:00
Chuck Lever
254f91e2fa xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context
An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:36 -04:00
Allen Andrews
4034ba0423 nfs-rdma: Fix for FMR leaks
Two memory region leaks were found during testing:

1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's
ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is
called.  If ib_alloc_fast_reg_page_list returns an error it bails out of
the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a
memory leak.  Added code to dereg the last frmr if
ib_alloc_fast_reg_page_list fails.

2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free
the MR's on the rb_mws list if there are rb_send_bufs present.  However, in
rpcrdma_buffer_create while the rb_mws list is being built if one of the MR
allocation requests fail after some MR's have been allocated on the rb_mws
list the routine never gets to create any rb_send_bufs but instead jumps to
the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
list because the rb_send_bufs were never created.   This leaks all the MR's
on the rb_mws list that were created prior to one of the MR allocations
failing.

Issue(2) was seen during testing. Our adapter had a finite number of MR's
available and we created enough connections to where we saw an MR
allocation failure on our Nth NFS connection request. After the kernel
cleaned up the resources it had allocated for the Nth connection we noticed
that FMR's had been leaked due to the coding error described above.

Issue(1) was seen during a code review while debugging issue(2).

Signed-off-by: Allen Andrews <allen.andrews@emulex.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:35 -04:00
Steve Wise
0fc6c4e7bb xprtrdma: mind the device's max fast register page list depth
Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-06-04 08:56:33 -04:00
David S. Miller
c99f7abf0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/inetpeer.h
	net/ipv6/output_core.c

Changes in net were fixing bugs in code removed in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 23:32:12 -07:00
WANG Cong
92ff71b8fe net: remove some unless free on failure in alloc_netdev_mqs()
When we jump to free_pcpu on failure in alloc_netdev_mqs()
rx and tx queues are not yet allocated, so no need to free them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 19:18:58 -07:00
Cong Wang
e51fb15231 rtnetlink: fix a memory leak when ->newlink fails
It is possible that ->newlink() fails before registering
the device, in this case we should just free it, it's
safe to call free_netdev().

Fixes: commit 0e0eee2465 (net: correct error path in rtnl_newlink())
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 19:16:10 -07:00
Michal Kubecek
21ee543edc xfrm: fix race between netns cleanup and state expire notification
The xfrm_user module registers its pernet init/exit after xfrm
itself so that its net exit function xfrm_user_net_exit() is
executed before xfrm_net_exit() which calls xfrm_state_fini() to
cleanup the SA's (xfrm states). This opens a window between
zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
instances which may access it (via the timer). If an xfrm state
expires in this window, xfrm_exp_state_notify() will pass null
pointer as socket to nlmsg_multicast().

As the notifications are called inside rcu_read_lock() block, it
is sufficient to retrieve the nlsk socket with rcu_dereference()
and check the it for null.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-03 16:07:44 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
David S. Miller
014b20133b Merge branch 'ethtool-rssh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next
Ben Hutchings says:

====================
Pull request: Fixes for new ethtool RSS commands

This addresses several problems I previously identified with the new
ETHTOOL_{G,S}RSSH commands:

1. Missing validation of reserved parameters
2. Vague documentation
3. Use of unnamed magic number
4. No consolidation with existing driver operations

I don't currently have access to suitable network hardware, but have
tested these changes with a dummy driver that can support various
combinations of operations and sizes, together with (a) Debian's ethtool
3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH
and minor adjustment for fixes 1 and 3.

v2: Update RSS operations in vmxnet3 too
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 23:07:02 -07:00
Ben Hutchings
f062a38448 ethtool: Check that reserved fields of struct ethtool_rxfh are 0
We should fail rather than silently ignoring use of these extensions.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-06-03 02:43:16 +01:00
Ben Hutchings
fe62d00137 ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2014-06-03 02:42:44 +01:00
Roopa Prabhu
41c389d72c bridge: Add bridge ifindex to bridge fdb notify msgs
(This patch was previously posted as RFC at
http://patchwork.ozlabs.org/patch/352677/)

This patch adds NDA_MASTER attribute to neighbour attributes enum for
bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs.

Today bridge fdb notifications dont contain bridge information.
Userspace can derive it from the port information in the fdb
notification. However this is tricky in some scenarious.

Example, bridge port delete notification comes before bridge fdb
delete notifications. And we have seen problems in userspace
when using libnl where, the bridge fdb delete notification handling code
does not understand which bridge this fdb entry is part of because
the bridge and port association has already been deleted.
And these notifications (port membership and fdb) are generated on
separate rtnl groups.

Fixing the order of notifications could possibly solve the problem
for some cases (I can submit a separate patch for that).

This patch chooses to add NDA_MASTER to bridge fdb notify msgs
because it not only solves the problem described above, but also helps
userspace avoid another lookup into link msgs to derive the master index.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 17:58:55 -07:00
Leon Yu
418c96ac15 net: filter: fix possible memory leak in __sk_prepare_filter()
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter:
rework/optimize internal BPF interpreter's instruction set) so that it should
have uncharged memory once things went wrong. However that work isn't complete.
Error is handled only in __sk_migrate_filter() while memory can still leak in
the error path right after sk_chk_filter().

Fixes: bd4cf0ed33 ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Tested-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 17:49:45 -07:00
Yuchung Cheng
0cfa5c07d6 tcp: fix cwnd undo on DSACK in F-RTO
This bug is discovered by an recent F-RTO issue on tcpm list
https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html

The bug is that currently F-RTO does not use DSACK to undo cwnd in
certain cases: upon receiving an ACK after the RTO retransmission in
F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
the sender only calls tcp_try_undo_loss() if some never retransmisted
data is sacked (FLAG_ORIG_DATA_SACKED).

The correct behavior is to unconditionally call tcp_try_undo_loss so
the DSACK information is used properly to undo the cwnd reduction.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:50:49 -07:00
David Ahern
30f38d2fdd fib_trie: use seq_file_net rather than seq->private
Make fib_triestat_seq_show consistent with other /proc/net files and
use seq_file_net.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:41:38 -07:00
Eric W. Biederman
2d7a85f4b0 netlink: Only check file credentials for implicit destinations
It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.

To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.

Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.

To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified.  Instead
rely exclusively on the privileges of the sender of the socket.

Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes.  Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.

Note to stable maintainers: This is a mess.  An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra.  The offending series
includes:

    commit aa4cf9452f
    Author: Eric W. Biederman <ebiederm@xmission.com>
    Date:   Wed Apr 23 14:28:03 2014 -0700

        net: Add variants of capable for use on netlink messages

If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch.  if that series is
present, then this fix is critical if you care about Zebra.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 16:34:09 -07:00
Eric Dumazet
39c36094d7 net: fix inet_getid() and ipv6_select_ident() bugs
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.

06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)

It appears I introduced this bug in linux-3.1.

inet_getid() must return the old value of peer->ip_id_count,
not the new one.

Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.

Fixes: 87c48fa3b4 ("ipv6: make fragment identifications less predictable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 14:09:28 -07:00
Toshiaki Makita
e0d7968ab6 bridge: Prevent insertion of FDB entry with disallowed vlan
br_handle_local_finish() is allowing us to insert an FDB entry with
disallowed vlan. For example, when port 1 and 2 are communicating in
vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
interfere with their communication by spoofed src mac address with
vlan id 10.

Note: Even if it is judged that a frame should not be learned, it should
not be dropped because it is destined for not forwarding layer but higher
layer. See IEEE 802.1Q-2011 8.13.10.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 13:38:23 -07:00
David S. Miller
31595de219 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
pull request: wireless-next 2014-06-02

Please pull this remaining batch of updates intended for the 3.16 stream...

For the mac80211 bits, Johannes says:

"The remainder for -next right now is mostly fixes, and a handful of
small new things like some CSA infrastructure, the regdb script mW/dBm
conversion change and sending wiphy notifications."

For the bluetooth bits, Gustavo says:

"Some more patches for 3.16. There is nothing really special here, just a
bunch of clean ups, fixes plus some small improvements. Please pull."

For the nfc bits, Samuel says:

"We have:

- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning fixes"

For the atheros bits, Kalle says:

"Ben added support for setting antenna configurations. Michal improved
warm reset so that we would not need to fall back to cold reset that
often, an issue where ath10k stripped protected flag while in monitor
mode and made module initialisation asynchronous to fix the problems
with firmware loading when the driver is linked to the kernel.

Luca removed unused channel_switch_beacon callbacks both from ath9k and
ath10k. Marek fixed Protected Management Frames (PMF) when using Action
Frames. Also we had other small fixes everywhere in the driver."

Along with that, there are a handful of updates to a variety
of drivers.  This includes updates to at76c50x-usb, ath9k, b43,
brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:17:35 -07:00
Eric Dumazet
73f156a6e8 inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 11:00:41 -07:00
Alexander Duyck
670e5b8eaf net: Add support for device specific address syncing
This change provides a function to be used in order to break the
ndo_set_rx_mode call into a set of address add and remove calls.  The code
is based on the implementation of dev_uc_sync/dev_mc_sync.  Since they
essentially do the same thing but with only one dev I simply named my
functions __dev_uc_sync/__dev_mc_sync.

I also implemented an unsync version of the functions as well to allow for
cleanup on close.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 10:40:54 -07:00
Alexander Aring
eb06481d69 6lowpan_rtnl: fix off by one while fragmentation
This patch fix a off by one error while fragmentation. If the frag_cap
value is equal to skb_unprocessed value we need to stop the
fragmentation loop because the last fragment which has a size of
skb_unprocessed fits into the frag capability size.

This issue was introduced by commit d4b2816d67
("6lowpan: fix fragmentation").

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 10:39:42 -07:00
Alexander Aring
51263fffad 6lowpan_rtnl: fix fragmentation with two fragments
This patch fix the 6LoWPAN fragmentation for the case if we have exactly
two fragments. The problem is that the (skb_unprocessed >= frag_cap)
condition is always false on the second fragment after sending the first
fragment. A fragmentation with only one fragment doesn't make any sense.
The solution is that we use a do while loop here, that ensures we sending
always a minimum of two fragments if we need a fragmentation.

This issue was introduced by commit d4b2816d67
("6lowpan: fix fragmentation").

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 10:39:42 -07:00
Denis ChengRq
2f91abd451 genetlink: remove superfluous assignment
the local variable ops and n_ops were just read out from family,
and not changed, hence no need to assign back.

Validation functions should operate on const parameters and not
change anything.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-02 10:36:18 -07:00
John W. Linville
fcb2c0d6cf Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-06-02 11:20:17 -04:00
Jukka Taimisto
8a96f3cd22 Bluetooth: Fix L2CAP deadlock
-[0x01 Introduction

We have found a programming error causing a deadlock in Bluetooth subsystem
of Linux kernel. The problem is caused by missing release_sock() call when
L2CAP connection creation fails due full accept queue.

The issue can be reproduced with 3.15-rc5 kernel and is also present in
earlier kernels.

-[0x02 Details

The problem occurs when multiple L2CAP connections are created to a PSM which
contains listening socket (like SDP) and left pending, for example,
configuration (the underlying ACL link is not disconnected between
connections).

When L2CAP connection request is received and listening socket is found the
l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called.
This function locks the 'parent' socket and then checks if the accept queue
is full.

1178         lock_sock(parent);
1179
1180         /* Check for backlog size */
1181         if (sk_acceptq_is_full(parent)) {
1182                 BT_DBG("backlog full %d", parent->sk_ack_backlog);
1183                 return NULL;
1184         }

If case the accept queue is full NULL is returned, but the 'parent' socket
is not released. Thus when next L2CAP connection request is received the code
blocks on lock_sock() since the parent is still locked.

Also note that for connections already established and waiting for
configuration to complete a timeout will occur and l2cap_chan_timeout()
(net/bluetooth/l2cap_core.c) will be called. All threads calling this
function will also be blocked waiting for the channel mutex since the thread
which is waiting on lock_sock() alread holds the channel mutex.

We were able to reproduce this by sending continuously L2CAP connection
request followed by disconnection request containing invalid CID. This left
the created connections pending configuration.

After the deadlock occurs it is impossible to kill bluetoothd, btmon will not
get any more data etc. requiring reboot to recover.

-[0x03 Fix

Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL
seems to fix the issue.

Signed-off-by: Jukka Taimisto <jtt@codenomicon.com>
Reported-by: Tommi Mäkilä <tmakila@codenomicon.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org
2014-06-02 13:38:19 +03:00
Pablo Neira Ayuso
31f8441c32 netfilter: nf_tables: atomic allocation in set notifications from rcu callback
Use GFP_ATOMIC allocations when sending removal notifications of
anonymous sets from rcu callback context. Sleeping in that context
is illegal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:38 +02:00
Pablo Neira Ayuso
4fefee570d netfilter: nf_tables: allow to delete several objects from a batch
Three changes to allow the deletion of several objects with dependencies
in one transaction, they are:

1) Introduce speculative counter increment/decrement that is undone in
   the abort path if required, thus we avoid hitting -EBUSY when deleting
   the chain. The counter updates are reverted in the abort path.

2) Increment/decrement table/chain use counter for each set/rule. We need
   this to fully rely on the use counters instead of the list content,
   eg. !list_empty(&chain->rules) which evaluate true in the middle of the
   transaction.

3) Decrement table use counter when an anonymous set is bound to the
   rule in the commit path. This avoids hitting -EBUSY when deleting
   the table that contains anonymous sets. The anonymous sets are released
   in the nf_tables_rule_destroy path. This should not be a problem since
   the rule already bumped the use counter of the chain, so the bound
   anonymous set reflects dependencies through the rule object, which
   already increases the chain use counter.

So the general assumption after this patch is that the use counters are
bumped by direct object dependencies.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:35 +02:00
Pablo Neira Ayuso
7632667d26 netfilter: nft_rbtree: introduce locking
There's no rbtree rcu version yet, so let's fall back on the spinlock
to protect the concurrent access of this structure both from user
(to update the set content) and kernel-space (in the packet path).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:31 +02:00
Pablo Neira Ayuso
a1cee076f4 netfilter: nf_tables: release objects in reverse order in the abort path
The patch c7c32e7 ("netfilter: nf_tables: defer all object release via
rcu") indicates that we always release deleted objects in the reverse
order, but that is only needed in the abort path. These are the two
possible scenarios when releasing objects:

1) Deletion scenario in the commit path: no need to release objects in
the reverse order since userspace already ensures that dependencies are
fulfilled), ie. userspace tells us to delete rule -> ... -> rule ->
chain -> table. In this case, we have to release the objects in the
*same order* as userspace provided.

2) Deletion scenario in the abort path: we have to iterate in the reverse
order to undo what it cannot be added, ie. userspace sent us a batch
that includes: table -> chain -> rule -> ... -> rule, and that needs to
be partially undone. In this case, we have to release objects in the
reverse order to ensure that the set and chain objects point to valid
rule and table objects.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:28 +02:00
Pablo Neira Ayuso
46bbafceb2 netfilter: nf_tables: fix wrong transaction ordering in set elements
The transaction needs to be placed at the end of the commit list,
otherwise event notifications are reordered and we may crash when
releasing object via call_rcu.

This problem was introduced in 60319eb ("netfilter: nf_tables: use new
transaction infrastructure to handle elements").

Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:54:25 +02:00
Mathieu Poirier
4c552a64df netfilter: nfnetlink_acct: Fix memory leak
Allocation of memory need only to happen once, that is
after the proper checks on the NFACCT_FLAGS have been
done.  Otherwise the code can return without freeing
already allocated memory.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-02 10:46:52 +02:00
Johan Hedberg
f3fb0b58c8 Bluetooth: Fix missing check for FIPS security level
When checking whether a legacy link key provides at least HIGH security
level we also need to check for FIPS level which is one step above HIGH.
This patch fixes a missing check in the hci_link_key_request_evt()
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-06-02 00:34:36 -07:00
Daniel Borkmann
f8f6d679aa net: filter: improve filter block macros
Commit 9739eef13c ("net: filter: make BPF conversion more readable")
started to introduce helper macros similar to BPF_STMT()/BPF_JUMP()
macros from classic BPF.

However, quite some statements in the filter conversion functions
remained in the old style which gives a mixture of block macros and
non block macros in the code. This patch makes the block macros itself
more readable by using explicit member initialization, and converts
the remaining ones where possible to remain in a more consistent state.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:16:58 -07:00
Daniel Borkmann
3480593131 net: filter: get rid of BPF_S_* enum
This patch finally allows us to get rid of the BPF_S_* enum.
Currently, the code performs unnecessary encode and decode
workarounds in seccomp and filter migration itself when a filter
is being attached in order to overcome BPF_S_* encoding which
is not used anymore by the new interpreter resp. JIT compilers.

Keeping it around would mean that also in future we would need
to extend and maintain this enum and related encoders/decoders.
We can get rid of all that and save us these operations during
filter attaching. Naturally, also JIT compilers need to be updated
by this.

Before JIT conversion is being done, each compiler checks if A
is being loaded at startup to obtain information if it needs to
emit instructions to clear A first. Since BPF extensions are a
subset of BPF_LD | BPF_{W,H,B} | BPF_ABS variants, case statements
for extensions can be removed at that point. To ease and minimalize
code changes in the classic JITs, we have introduced bpf_anc_helper().

Tested with test_bpf on x86_64 (JIT, int), s390x (JIT, int),
arm (JIT, int), i368 (int), ppc64 (JIT, int); for sparc we
unfortunately didn't have access, but changes are analogous to
the rest.

Joint work with Alexei Starovoitov.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Chema Gonzalez <chemag@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:16:58 -07:00
Jon Maxwell
c65c7a3066 bridge: notify user space after fdb update
There has been a number incidents recently where customers running KVM have
reported that VM hosts on different Hypervisors are unreachable. Based on
pcap traces we found that the bridge was broadcasting the ARP request out
onto the network. However some NICs have an inbuilt switch which on occasions
were broadcasting the VMs ARP request back through the physical NIC on the
Hypervisor. This resulted in the bridge changing ports and incorrectly learning
that the VMs mac address was external. As a result the ARP reply was directed
back onto the external network and VM never updated it's ARP cache. This patch
will notify the bridge command, after a fdb has been updated to identify such
port toggling.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:14:50 -07:00
wangweidong
019ee792d7 bridge: fix the unbalanced promiscuous count when add_if failed
As commit 2796d0c648 ("bridge: Automatically manage port
promiscuous mode."), make the add_if use dev_set_allmulti
instead of dev_set_promiscuous, so when add_if failed, we
should do dev_set_allmulti(dev, -1).

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Reviewed-by: Amos Kong <akong@redhat.com>
Acked-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 22:05:16 -07:00
Nikolay Aleksandrov
4b9b1cdf83 net: fix wrong mac_len calculation for vlans
After 1e785f48d2 ("net: Start with correct mac_len in
skb_network_protocol") skb->mac_len is used as a start of the
calculation in skb_network_protocol() but that is not always correct. If
skb->protocol == 8021Q/AD, usually the vlan header is already inserted
in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters
dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the
destination mac address (skb->data + VLAN_HLEN) as a type in
skb_network_protocol() and return vlan_depth == 4. In the case where TSO is
off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so
skb_network_protocol() returns a type from inside the packet and
offset == 22. Also make vlan_depth unsigned as suggested before.
As suggested by Eric Dumazet, move the while() loop in the if() so we
can avoid additional testing in fast path.

Here are few netperf tests + debug printk's to illustrate:
cat netperf.tso-on.reorder-on.bugged
- Vlan -> device (reorder on, default, this case is okay)
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    7111.54
[   81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800
skb->mac_len 0 vlan_depth 0 type 0x800

- Vlan -> device (reorder off, bad)
cat netperf.tso-on.reorder-off.bugged
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00     241.35
[  204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100
skb->mac_len 0 vlan_depth 4 type 0x5301
0x5301 are the last two bytes of the destination mac.

And if we stop TSO, we may get even the following:
[   83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 18 vlan_depth 22 type 0xb84
Because mac_len already accounts for VLAN_HLEN.

After the fix:
cat netperf.tso-on.reorder-off.fixed
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.3.1 () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.01    5001.46
[   81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100
skb->mac_len 0 vlan_depth 18 type 0x800

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Daniel Borkman <dborkman@redhat.com>
CC: David S. Miller <davem@davemloft.net>

Fixes:1e785f48d29a ("net: Start with correct mac_len in
skb_network_protocol")
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-01 19:39:13 -07:00
Johan Hedberg
79897d2097 Bluetooth: Fix requiring SMP MITM for outgoing connections
Due to recent changes to the way that the MITM requirement is set for
outgoing pairing attempts we can no longer rely on the hcon->auth_type
variable (which is actually good since it was formed from BR/EDR
concepts that don't really exist for SMP).

To match the logic that BR/EDR now uses simply rely on the local IO
capability and/or needed security level to set the MITM requirement for
outgoing pairing requests.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-05-31 23:51:12 -07:00
David S. Miller
6ce995c6f4 Included changes:
- prevent NULL dereference in multicast code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTiZ58AAoJEJgn97Bh2u9e2x4QAIjWLDIo6feo4jH8l6q6R5iO
 cH/EXCtqk9GHNwvfZDNt+pF19ejzVk/TPnmmXTZ4QcElS9GuXe5WdWxiGcS5KEwa
 0UNDRp8fgcBSV1Kqc/vbyKiQ4j69QtC1PPfLWUtxj/GYE0qHX/A1OzB9zROvoHJ7
 sa3l8O5XRWiaxBYDkT0RfhHH0jeDdvm3I9yt8B+4B6c71094VIsfGXBVPp4tPrdg
 nkuzBdwF0HFPiFrlsfboJDLcXPLpRR93H1GsmfELYd5jQ4rtUhlcuEESq6573tvB
 TV93tkm/zmbwtInMoPI29qKL8t2478cJH7SvKvM4NiqMsB1zOhknhUXzElh9TPGA
 xyNivxJraYJzL53XguBFO8A8fP1k/E8Z6UQXJbgry4lu+6qZ60e0/J8zGxGpSamP
 i1JX0MAVPX6T4MAlZ70LMxfmzJ5sSNkkYyXobG+aBa/AgzRsXVvG4So1qi364COx
 btCxgBXK1Z20ZuNclY8/J06D8EbTXI5y5MCSDvMCOHQlb5mjBl34RtFVw+5/QXkg
 v2suc7T/YLOPNtZktZC2506caPHoOlwEVvkyA55p+qdkcD/Dd5Iv4Hndi+g+C5gv
 O2ja7gUQco1R8ElormKW9rE7OvjiUlowNJmguXWAdzc9FC0yISpP66BAGjBqwhF9
 6YibEebXMQICjxAVTEAM
 =fvfu
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- prevent NULL dereference in multicast code

Antonion Quartulli says:

====================
pull request net: batman-adv 20140527

here you have another very small fix intended for net/linux-3.15.
It prevents some multicast functions from dereferencing a NULL pointer.
(Actually it was nothing more than a typo)
I hope it is not too late for such a small patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-31 20:01:47 -07:00
Marek Lindner
af0a171c07 batman-adv: fix NULL pointer dereferences
Was introduced with 4c8755d69c
("batman-adv: Send multicast packets to nodes with a WANT_ALL flag")

Reported-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-05-31 10:07:14 +02:00
Jukka Rissanen
6a5e81650a Bluetooth: l2cap: Set more channel defaults
Default values for various channel settings were missing. This
way channel users do not need to set default values themselves.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-30 21:38:37 -07:00
Jukka Rissanen
62bbd5b359 Bluetooth: 6LoWPAN: Fix MAC address universal/local bit handling
The universal/local bit handling was incorrectly done in the code.

So when setting EUI address from BD address we do this:
- If BD address type is PUBLIC, then we clear the universal bit
  in EUI address. If the address type is RANDOM, then the universal
  bit is set (BT 6lowpan draft chapter 3.2.2)
- After this we invert the universal/local bit according to RFC 2464

When figuring out BD address we do the reverse:
- Take EUI address from stateless IPv6 address, invert the
  universal/local bit according to RFC 2464
- If universal bit is 1 in this modified EUI address, then address
  type is set to RANDOM, otherwise it is PUBLIC

Note that 6lowpan_iphc.[ch] does the final toggling of U/L bit
before sending or receiving the network packet.

Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-05-30 21:28:21 -07:00
Johan Hedberg
7e3691e13a Bluetooth: Fix authentication check for FIPS security level
When checking whether we need to request authentication or not we should
include HCI_SECURITY_FIPS to the levels that always need authentication.
This patch fixes check for it in the hci_outgoing_auth_needed()
function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-05-30 21:25:01 -07:00
Johan Hedberg
61b433579b Bluetooth: Fix properly ignoring LTKs of unknown types
In case there are new LTK types in the future we shouldn't just blindly
assume that != MGMT_LTK_UNAUTHENTICATED means that the key is
authenticated. This patch adds explicit checks for each allowed key type
in the form of a switch statement and skips any key which has an unknown
value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org
2014-05-30 21:23:29 -07:00
David S. Miller
dbfc4b698a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains a late fix for IPVS:

* Fix crash when trying to remove the transport header with non-linear
  skbuffs, this was introduced in 3.6-rc. Patch from Peter Christensen
  via the IPVS folks.

I'll pass this to -stable once this hits mainstream.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:56:09 -07:00
David S. Miller
90d0e08e57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

This small patchset contains three accumulated Netfilter/IPVS updates,
they are:

1) Refactorize common NAT code by encapsulating it into a helper
   function, similarly to what we do in other conntrack extensions,
   from Florian Westphal.

2) A minor format string mismatch fix for IPVS, from Masanari Iida.

3) Add quota support to the netfilter accounting infrastructure, now
   you can add quotas to accounting objects via the nfnetlink interface
   and use them from iptables. You can also listen to quota
   notifications from userspace. This enhancement from Mathieu Poirier.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:54:47 -07:00
Himangi Saraogi
47162c0b7e af_key: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:58 -07:00
Himangi Saraogi
01728371dc rds/tcp_listen: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:58 -07:00
Himangi Saraogi
cc2afe9fe2 RDS/RDMA: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:58 -07:00
Himangi Saraogi
70cb4a4526 ipmr: Replace comma with semicolon
This patch replaces a comma between expression statements by a semicolon.

A simplified version of the semantic patch that performs this
transformation is as follows:

// <smpl>
@r@
expression e1,e2,e;
type T;
identifier i;
@@

 e1
-,
+;
 e2;
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:48:57 -07:00
Ursula Braun
4d520f62e0 af_iucv: correct cleanup if listen backlog is full
In case of transport HIPER a sock struct is allocated for an incoming
connect request. If the backlog queue is full this socket is not
needed, but is left in the list of af_iucv sockets. Final socket
release posts console message "Attempt to release alive iucv socket".
This patch makes sure the new created socket is cleaned up correctly
if the backlog queue is full.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:35:23 -07:00
Philipp Hachtmann
53a4b4995e af_iucv: Add automatic (source) iucv_name to bind
If a socket is bound to an address using before calling connect
it is usual to leave it to the network system to choose an appropriate
outgoing application name respective port address.
af_iucv on VM uses a counter and uses simple numbers as unique identifiers.
This behaviour was missing when af_iucv is used with HiperSockets.

This patch contains a simple approach to harmonize af_iucv's behaviour.

Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:35:23 -07:00
Kinglong Mee
a48fd0f9f7 SUNRPC/NFSD: Remove using of dprintk with KERN_WARNING
When debugging, rpc prints messages from dprintk(KERN_WARNING ...)
with "^A4" prefixed,

[ 2780.339988] ^A4nfsd: connect from unprivileged port: 127.0.0.1, port=35316

Trond tells,
> dprintk != printk. We have NEVER supported dprintk(KERN_WARNING...)

This patch removes using of dprintk with KERN_WARNING.

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 20:25:28 -04:00
David S. Miller
4d1cdf1db6 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates intended for 3.16...

For the mac80211 bits, Johannes says:

"Here I just have Heikki's rfkill GPIO cleanups.

The ARM/tegra patch is OK with the maintainer (Stephen). Let me know of
any problems."

and;

"We have a whole bunch of work on CSA by Andrei, Luca and Michal, but
unfortunately it doesn't seem quite complete yet so it's still disabled.
There's some TDLS work from Arik, and the rest is mostly minor fixes and
cleanups."

For the NFC bits, Samuel says:

"This is the NFC pull request for 3.16. We have:

- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
  thus relies on the HCI stack. This submission provides support for tag
  redaer/writer mode (including Type 5) and device tree bindings.

- PM runtime support and a bunch of bug fixes for TI's trf7970a.

- Device tree support for NXP's pn544. Legacy platform data support is
  obviously kept intact.

- NFC Tag type 4B support to the NFC Digital stack.

- SOCK_RAW type support to the raw NFC socket, and allow NCI
  sniffing from that. This can be extended to report HCI frames and also
  proprietarry ones like e.g. the pn533 ones."

For the iwlwifi bits, Emmanuel says:

"Eran continues to work on new devices, Eyal is still digging in
the rate control stuff, and Johannes added new functionality to the
debug system we have in place now along with a few cleanups he made
on the way.  That's pretty much it."

and;

"Avri continues to work on the power code and Eran is improving the
NVM handling as a preparations for new devices on which he works
with Liad. Luca cleans up a bit the code while working on CSA. I have
the regular BT Coex stuff and a small lockdep fix. Johannes has his
regular amount of clean ups and improvements, the main one is the
ability to leave 2 chains open to improve diversity and hence the
throughput in high attenuation scenarios."

and;

"The regular amount of housekeeping here. I merged iwlwifi-fixes.git to
be able to add the patch you didn't want in wireless.git at that stage
of the -rc cycle.  Luca has a few preparations for CSA implementation
and also what seems to be a bugfix for P2P but hasn't caused issues
we could notice."

For the Atheros bits, Kalle says:

"For ath10k Michal did various small fixes on how we handle
hardware/firmware problems and he also fixed two memory leaks."

Also included are a couple of pulls from the wireless tree to
avoid/resolve merge issues...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:18:46 -07:00
Paul Bolle
391296c90c atm: remove commented out check
This preprocessor check is commented out ever since this file was added
during the v2.3 development cycle. It is unclear what it purpose might
have been. Whatever it was, it can safely be removed now.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 17:17:04 -07:00
Sachin Kamat
484611e530 net: tso: Export symbols for modular build
Export the symbols to fix the below errors when built as modules:
ERROR: "tso_build_data" [drivers/net/ethernet/marvell/mvneta.ko] undefined!
ERROR: "tso_build_hdr" [drivers/net/ethernet/marvell/mvneta.ko] undefined!
ERROR: "tso_start" [drivers/net/ethernet/marvell/mvneta.ko] undefined!
ERROR: "tso_count_descs" [drivers/net/ethernet/marvell/mvneta.ko] undefined!
ERROR: "tso_build_data" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined!
ERROR: "tso_build_hdr" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined!
ERROR: "tso_start" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined!
ERROR: "tso_count_descs" [drivers/net/ethernet/marvell/mv643xx_eth.ko] undefined!

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Acked-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-30 15:52:03 -07:00
J. Bruce Fields
a5cddc885b nfsd4: better reservation of head space for krb5
RPC_MAX_AUTH_SIZE is scattered around several places.  Better to set it
once in the auth code, where this kind of estimate should be made.  And
while we're at it we can leave it zero when we're not using krb5i or
krb5p.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:17 -04:00
J. Bruce Fields
db3f58a95b rpc: define xdr_restrict_buflen
With this xdr_reserve_space can help us enforce various limits.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:32:01 -04:00
J. Bruce Fields
2825a7f907 nfsd4: allow encoding across page boundaries
After this we can handle for example getattr of very large ACLs.

Read, readdir, readlink are still special cases with their own limits.

Also we can't handle a new operation starting close to the end of a
page.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:54 -04:00
J. Bruce Fields
3e19ce762b rpc: xdr_truncate_encode
This will be used in the server side in a few cases:
	- when certain operations (read, readdir, readlink) fail after
	  encoding a partial response.
	- when we run out of space after encoding a partial response.
	- in readlink, where we initially reserve PAGE_SIZE bytes for
	  data, then truncate to the actual size.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-30 17:31:47 -04:00
John W. Linville
57afc62e94 NFC: 3.16: Second pull request
This is the 2nd NFC pull request for 3.16. We have:
 
 - Felica (Type3) tags support for trf7970a
 - Type 4b tags support for port100
 - st21nfca DTS typo fix
 - A few sparse warning check fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTh7q/AAoJEIqAPN1PVmxKKAMP/1mJP88CXQv6SZUpmWMXP/K2
 7LqdG6nSnuPwm8k43qbNTdCbZRxfRVTdmyBjdAsxHYVOj2S3hGMkYXcCW6phD+AJ
 I4OPi3quC+y+4Tjl34fWIpEPTgvmAqMxuyLXiKwMTwuzdwNkDF3JzYiRyxm2QvqM
 qFevVEUdWqj0YywJGfokQLFfWNJbu7ghpBei4eIK53QX63dIQVPi63Lih5jBI4ig
 gJg7CHfPzaduYuCysU7rRss93p4CJ45Mc8b9CZn59KWW2nRw98wp867083Rbr9F7
 zwaH0hc/L1kwFLLEXMYPx2a/1CoEya54amu8oKaBEg90OUvYPxjPQlPKvmy1hKXB
 cNwW7snuAH+10IBmD3dcoEqZ50pTXkMZw5czdNmgnUUxrOyS4wzR/n1X10+FqH3O
 1E6G8MWVZuIU9l/FBSRvhX0jFK2upHgGrD93nu1qAg7giAZvqDHUSKdGVmMfI32D
 Tm+j6cS0/AouePssWChQtPwbAJus2kgeBO/w8gu2HaFN8C13E/nPSg77tONlRWQ6
 rEkXum1P2jE9QTGQfzGwbCITxhEiMpHxtXV80lD5THkfHVVtQV6zkL2Lj9QDzoxQ
 d80Xk2DOScKnDcVCOiHX1NrnST3sFH1TsRS9XCKvmDX02VMl+KbYZzzJJaQ8gDLj
 NCVNv3BvuclwsG3VVqFn
 =t52d
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.16: Second pull request

This is the 2nd NFC pull request for 3.16. We have:

- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning check fixes"

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-05-30 13:41:40 -04:00
John W. Linville
a5eb1aeb25 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Conflicts:
	drivers/bluetooth/btusb.c
2014-05-29 13:03:47 -04:00
John W. Linville
737be10d8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-29 12:55:38 -04:00
David Rientjes
c6c8fe79a8 net, sunrpc: suppress allocation warning in rpc_malloc()
rpc_malloc() allocates with GFP_NOWAIT without making any attempt at
reclaim so it easily fails when low on memory.  This ends up spamming the
kernel log:

SLAB: Unable to allocate memory on node 0 (gfp=0x4000)
  cache: kmalloc-8192, object size: 8192, order: 1
  node 0: slabs: 207/207, objs: 207/207, free: 0
rekonq: page allocation failure: order:1, mode:0x204000
CPU: 2 PID: 14321 Comm: rekonq Tainted: G           O  3.15.0-rc3-12.gfc9498b-desktop+ #6
Hardware name: System manufacturer System Product Name/M4A785TD-V EVO, BIOS 2105    07/23/2010
 0000000000000000 ffff880010ff17d0 ffffffff815e693c 0000000000204000
 ffff880010ff1858 ffffffff81137bd2 0000000000000000 0000001000000000
 ffff88011ffebc38 0000000000000001 0000000000204000 ffff88011ffea000
Call Trace:
 [<ffffffff815e693c>] dump_stack+0x4d/0x6f
 [<ffffffff81137bd2>] warn_alloc_failed+0xd2/0x140
 [<ffffffff8113be19>] __alloc_pages_nodemask+0x7e9/0xa30
 [<ffffffff811824a8>] kmem_getpages+0x58/0x140
 [<ffffffff81183de6>] fallback_alloc+0x1d6/0x210
 [<ffffffff81183be3>] ____cache_alloc_node+0x123/0x150
 [<ffffffff81185953>] __kmalloc+0x203/0x490
 [<ffffffffa06b0ee2>] rpc_malloc+0x32/0xa0 [sunrpc]
 [<ffffffffa06a6999>] call_allocate+0xb9/0x170 [sunrpc]
 [<ffffffffa06b19d8>] __rpc_execute+0x88/0x460 [sunrpc]
 [<ffffffffa06b2da9>] rpc_execute+0x59/0xc0 [sunrpc]
 [<ffffffffa06a932b>] rpc_run_task+0x6b/0x90 [sunrpc]
 [<ffffffffa077b5c1>] nfs4_call_sync_sequence+0x51/0x80 [nfsv4]
 [<ffffffffa077d45d>] _nfs4_do_setattr+0x1ed/0x280 [nfsv4]
 [<ffffffffa0782a72>] nfs4_do_setattr+0x72/0x180 [nfsv4]
 [<ffffffffa078334c>] nfs4_proc_setattr+0xbc/0x140 [nfsv4]
 [<ffffffffa074a7e8>] nfs_setattr+0xd8/0x240 [nfs]
 [<ffffffff811baa71>] notify_change+0x231/0x380
 [<ffffffff8119cf5c>] chmod_common+0xfc/0x120
 [<ffffffff8119df80>] SyS_chmod+0x40/0x90
 [<ffffffff815f4cfd>] system_call_fastpath+0x1a/0x1f
...

If the allocation fails, simply return NULL and avoid spamming the kernel
log.

Reported-by: Marc Dietrich <marvin24@gmx.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-29 11:11:51 -04:00
Avraham Stern
d3a58df87a mac80211: set new interfaces as idle upon init
Mark new interfaces as idle to allow operations that require that
interfaces are idle to take place. Interface types that are always
not idle (like AP interfaces) will be set as not idle when they are
assigned a channel context.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Emmanuel Grumbach<emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-28 16:22:49 +02:00
Felix Fietkau
abd43a6a68 mac80211: reduce packet loss notifications under load
During strong signal fluctuations under high throughput, few consecutive
failed A-MPDU transmissions can easily trigger packet loss notification,
and thus (in AP mode) client disconnection.

Reduce the number of false positives by checking the A-MPDU status flag
and treating a failed A-MPDU as a single packet.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-28 16:22:48 +02:00
Arik Nemtsov
923eaf3672 mac80211: don't check netdev state for debugfs read/write
Doing so will lead to an oops for a p2p-dev interface, since it has
no netdev.

Cc: stable@vger.kernel.org
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-28 16:22:48 +02:00
Felix Fietkau
53d045258e mac80211: fix a memory leak on sta rate selection table
If the rate control algorithm uses a selection table, it
is leaked when the station is destroyed - fix that.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Reported-by: Christophe Prévotaux <cprevotaux@nltinc.com>
Fixes: 0d528d85c5 ("mac80211: improve the rate control API")
Cc: stable@vger.kernel.org # v3.10+
[add commit log entry, remove pointless NULL check]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-28 16:22:41 +02:00
John W. Linville
9db7cb6901 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-05-27 13:51:31 -04:00
John W. Linville
03c4444650 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-05-27 13:47:27 -04:00
chaitanya.mgit@gmail.com
a9fb54169b regdb: Generalize the mW to dBm power conversion
Generalize the power conversion from mW to dBm
using log. This should fix the below compilation
error for country NO which adds a new power value
2000mW which is not handled earlier.

 CC [M]  net/wireless/wext-sme.o
 CC [M]  net/wireless/regdb.o
net/wireless/regdb.c:1130:1: error: Unknown undeclared here (not in
a function)
net/wireless/regdb.c:1130:9: error: expected } before power
make[2]: *** [net/wireless/regdb.o] Error 1
make[1]: *** [net/wireless] Error 2
make: *** [net] Error 2

Reported-By:  John Walker <john@x109.net>
Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
[remove unneeded parentheses, fix rounding by using %.0f]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-27 17:58:58 +02:00
Krzysztof Hałasa
c7d37a66e3 mac80211: fix IBSS join by initializing last_scan_completed
Without this fix, freshly rebooted Linux creates a new IBSS
instead of joining an existing one. Only when jiffies counter
overflows after 5 minutes the IBSS can be successfully joined.

Signed-off-by: Krzysztof Hałasa <khalasa@piap.pl>
[edit commit message slightly]
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-27 08:54:01 +02:00
Johannes Berg
3bb2055672 cfg80211: send events when devices are added/removed
We're currently sending NEW_WIPHY events for renames (which
is a bit odd, but now can't be changed), but also send them
for really new devices that register.

Also send DEL_WIPHY events when a device is removed, the
event ID for this was already reserved.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-26 13:52:25 +02:00
Emmanuel Grumbach
34171dc0d6 mac80211: fix virtual monitor interface addition
Since the commit below, cfg80211_chandef_dfs_required()
will warn if it gets a an NL80211_IFTYPE_UNSPECIFIED iftype
as explicitely written in the commit log.
When an virtual monitor interface is added, its type is set
in ieee80211_sub_if_data.vif.type, but not in
ieee80211_sub_if_data.wdev.iftype which is passed to
cfg80211_chandef_dfs_required() hence resulting in the
following warning:

WARNING: CPU: 1 PID: 21265 at net/wireless/chan.c:376 cfg80211_chandef_dfs_required+0xbc/0x130 [cfg80211]()
Modules linked in: [...]
CPU: 1 PID: 21265 Comm: ifconfig Tainted: G        W  O 3.13.11+ #12
Hardware name: Dell Inc. Latitude E6410/0667CC, BIOS A01 03/05/2010
 0000000000000009 ffff88008f5fdb08 ffffffff817d4219 ffff88008f5fdb50
 ffff88008f5fdb40 ffffffff8106f57d 0000000000000000 0000000000000000
 ffff880081062fb8 ffff8800810604e0 0000000000000001 ffff88008f5fdba0
Call Trace:
 [<ffffffff817d4219>] dump_stack+0x4d/0x66
 [<ffffffff8106f57d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8106f5ec>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffffa04ea4ec>] cfg80211_chandef_dfs_required+0xbc/0x130 [cfg80211]
 [<ffffffffa06b1024>] ieee80211_vif_use_channel+0x94/0x500 [mac80211]
 [<ffffffffa0684e6b>] ieee80211_add_virtual_monitor+0x1ab/0x5c0 [mac80211]
 [<ffffffffa0686ae5>] ieee80211_do_open+0xe75/0x1580 [mac80211]
 [<ffffffffa0687259>] ieee80211_open+0x69/0x70 [mac80211]
[snip]

Fixes: 00ec75fc5a ("cfg80211: pass the actual iftype when calling cfg80211_chandef_dfs_required()")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-26 11:04:42 +02:00
Luciano Coelho
1a5f0c13d1 mac80211: add a single-transaction driver op to switch contexts
In some cases, when the driver is already using all the channel
contexts it can handle at once, we have to do an in-place switch
(ie. we cannot afford using an extra context temporarily for the
transaction).  But some drivers may not support switching the channel
context assigned to a vif on the fly (ie. without unassigning and
assigning it) while others may only work if the context is changed on
the fly, without unassigning it first.

To allow these different scenarios, add a new driver operation that
let's the driver decide how to handle an in-place switch.

Signed-off-by: Luciano Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-26 11:04:41 +02:00
Pablo Neira
1708803ef2 netfilter: bridge: fix Kconfig unmet dependencies
Before f5efc69 ("netfilter: nf_tables: Add meta expression key for
bridge interface name"), the entire net/bridge/netfilter/ directory
depended on BRIDGE_NF_EBTABLES, ie. on ebtables. However, that
directory already contained the nf_tables bridge extension that
we should allow to compile separately. In f5efc69, we tried to
generalize this by using CONFIG_BRIDGE_NETFILTER which was not a good
idea since this option already existed and it is dedicated to enable
the Netfilter bridge IP/ARP filtering.

Let's try to fix this mess by:

1) making net/bridge/netfilter/ dependent on the toplevel
   CONFIG_NETFILTER option, just like we do with the net/netfilter and
   net/ipv{4,6}/netfilter/ directories.

2) Changing 'selects' to 'depends on' NETFILTER_XTABLES for
   BRIDGE_NF_EBTABLES. I believe this problem was already before
   f5efc69:

warning: (BRIDGE_NF_EBTABLES) selects NETFILTER_XTABLES which has
unmet direct dependencies (NET && INET && NETFILTER)

3) Fix ebtables/nf_tables bridge dependencies by making NF_TABLES_BRIDGE
   and BRIDGE_NF_EBTABLES dependent on BRIDGE and NETFILTER:

warning: (NF_TABLES_BRIDGE && BRIDGE_NF_EBTABLES) selects
BRIDGE_NETFILTER which has unmet direct dependencies (NET && BRIDGE &&
NETFILTER && INET && NETFILTER_ADVANCED)

net/built-in.o: In function `br_parse_ip_options':
br_netfilter.c:(.text+0x4a5ba): undefined reference to `ip_options_compile'
br_netfilter.c:(.text+0x4a5ed): undefined reference to `ip_options_rcv_srr'
net/built-in.o: In function `br_nf_pre_routing_finish':
br_netfilter.c:(.text+0x4a8a4): undefined reference to `ip_route_input_noref'
br_netfilter.c:(.text+0x4a987): undefined reference to `ip_route_output_flow'
make: *** [vmlinux] Error 1

Reported-by: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-26 00:42:30 -04:00
Peter Christensen
f44a5f45f5 ipvs: Fix panic due to non-linear skb
Receiving a ICMP response to an IPIP packet in a non-linear skb could
cause a kernel panic in __skb_pull.

The problem was introduced in
commit f2edb9f770 ("ipvs: implement
passive PMTUD for IPIP packets").

Signed-off-by: Peter Christensen <pch@ordbogen.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2014-05-26 10:22:46 +09:00
Fengguang Wu
db3287da34 NFC: nfc_sock_link() can be static
CC: Hiren Tandel <hirent@marvell.com>
CC: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-26 00:53:10 +02:00
Fengguang Wu
cb30caf027 NFC: digital: digital_in_send_attrib_req() can be static
CC: "Mark A. Greer" <mgreer@animalcreek.com>
CC: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-26 00:52:15 +02:00
Thierry Escande
9dc33705b2 NFC: digital: Randomize poll cycles
This change adds some entropy to polling cycles, choosing the next
polling rf technology randomly. This reflects the change done in the
pn533 driver, avoiding possible infinite loop for devices that export 2
targets on 2 different modulations. If the first target is not
readable, we will stay in an error loop for ever.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-26 00:42:02 +02:00
Thierry Escande
00e625df3e NFC: digital: Return proper error code when sending ATR_REQ
The error code returned by digital_in_send_cmd() was not returned by
digital_in_send_atr_req().

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-26 00:42:02 +02:00
Arnaldo Carvalho de Melo
85d3fc9418 tipc: Don't reset the timeout when restarting
As it may then take longer than what the user specified using
setsockopt(SO_RCVTIMEO).

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 14:11:41 -04:00
David S. Miller
8646224cdb Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-05-23

I have two more fixes intended for the 3.15 stream...

For the iwlwifi one, Emmanuel says:

"A race has been discovered in the beacon filtering code. Since the
fix is too big for 3.15, I disable here the feature."

For the bluetooth one, Gustavo says:

"This pull request contains a very important fix for 3.15. Here we fix the
permissions of a debugfs file that would otherwise allow unauthorized users
to write content to it."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 14:06:19 -04:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Linus Torvalds
5fa6a683c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "It looks like a sizeble collection but this is nearly 3 weeks of bug
  fixing while you were away.

   1) Fix crashes over IPSEC tunnels with NAT, the latter can reroute
      the packet through a non-IPSEC protected path and the code has to
      be able to handle SKBs attached to routes lacking an attached xfrm
      state.  From Steffen Klassert.

   2) Fix OOPSs in ipv4 and ipv6 ipsec layers for unsupported
      sub-protocols, also from Steffen Klassert.

   3) Set local_df on fragmented netfilter skbs otherwise we won't be
      able to forward successfully, from Florian Westphal.

   4) cdc_mbim ipv6 neighbour code does __vlan_find_dev_deep without
      holding RCU lock, from Bjorn Mork.

   5) local_df test in ip_may_fragment is inverted, from Florian
      Westphal.

   6) jme driver doesn't check for DMA mapping failures, from Neil
      Horman.

   7) qlogic driver doesn't calculate number of TX queues properly, from
      Shahed Shaikh.

   8) fib_info_cnt can drift irreversibly positive if we fail to
      allocate the fi->fib_metrics array, from Sergey Popovich.

   9) Fix use after free in ip6_route_me_harder(), also from Sergey
      Popovich.

  10) When SYSCTL is disabled, we don't handle local_port_range and
      ping_group_range defaults properly at all, from Cong Wang.

  11) Unaccelerated VLAN tagged frames improperly handled by cdc_mbim
      driver, fix from Bjorn Mork.

  12) cassini driver needs nested lock annotations for TX locking, from
      Emil Goode.

  13) On init error ipv6 VTI driver can unregister pernet ops twice,
      oops.  Fix from Mahtias Krause.

  14) If macvlan device is down, don't propagate IFF_ALLMULTI changes,
      from Peter Christensen.

  15) Missing NULL pointer check while parsing netlink config options in
      ip6_tnl_validate().  From Susant Sahani.

  16) Fix handling of neighbour entries during ipv6 router reachability
      probing, from Duan Jiong.

  17) x86 and s390 JIT address randomization has some address
      calculation bugs leading to crashes, from Alexei Starovoitov and
      Heiko Carstens.

  18) Clear up those uglies with nop patching and net_get_random_once(),
      from Hannes Frederic Sowa.

  19) Option length miscalculated in ip6_append_data(), fix also from
      Hannes Frederic Sowa.

  20) A while ago we fixed a race during device unregistry when a
      namespace went down, turns out there is a second place that needs
      similar protection.  From Cong Wang.

  21) In the new Altera TSE driver multicast filtering isn't working,
      disable it and just use promisc mode until the cause is found.
      From Vince Bridgers.

  22) When we disable router enabling in ipv6 we have to flush the
      cached routes explicitly, from Duan Jiong.

  23) NBMA tunnels should not cache routes on the tunnel object because
      the key is variable, from Timo Teräs.

  24) With stacked devices GRO information in skb->cb[] can be not setup
      properly, make sure it is in all code paths.  From Eric Dumazet.

  25) Really fix stacked vlan locking, multiple levels of nesting with
      intervening non-vlan devices are possible.  From Vlad Yasevich.

  26) Fallback ipip tunnel device's mtu is not setup properly, from
      Steffen Klassert.

  27) The packet scheduler's tcindex filter can crash because we
      structure copy objects with list_head's inside, oops.  From Cong
      Wang.

  28) Fix CHECKSUM_COMPLETE handling for ipv6 GRE tunnels, from Eric
      Dumazet.

  29) In some configurations 'itag' in __mkroute_input() can end up
      being used uninitialized because of how fib_validate_source()
      works.  Fix it by explitly initializing itag to zero like all the
      other fib_validate_source() callers do, from Li RongQing"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
  batman: fix a bogus warning from batadv_is_on_batman_iface()
  ipv4: initialise the itag variable in __mkroute_input
  bonding: Send ALB learning packets using the right source
  bonding: Don't assume 802.1Q when sending alb learning packets.
  net: doc: Update references to skb->rxhash
  stmmac: Remove unbalanced clk_disable call
  ipv6: gro: fix CHECKSUM_COMPLETE support
  net_sched: fix an oops in tcindex filter
  can: peak_pci: prevent use after free at netdev removal
  ip_tunnel: Initialize the fallback device properly
  vlan: Fix build error wth vlan_get_encap_level()
  can: c_can: remove obsolete STRICT_FRAME_ORDERING Kconfig option
  MAINTAINERS: Pravin Shelar is Open vSwitch maintainer.
  bnx2x: Convert return 0 to return rc
  bonding: Fix alb mode to only use first level vlans.
  bonding: Fix stacked device detection in arp monitoring
  macvlan: Fix lockdep warnings with stacked macvlan devices
  vlan: Fix lockdep warning with stacked vlan devices.
  net: Allow for more then a single subclass for netif_addr_lock
  net: Find the nesting level of a given device by type.
  ...
2014-05-23 15:29:43 -07:00
Daniel Borkmann
b1fcd35cf5 net: filter: let unattached filters use sock_fprog_kern
The sk_unattached_filter_create() API is used by BPF filters that
are not directly attached or related to sockets, and are used in
team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
internal managment of obtaining filter blocks and thus already
have them in kernel memory and set up before calling into
sk_unattached_filter_create(). As a result, due to __user annotation
in sock_fprog, sparse triggers false positives (incorrect type in
assignment [different address space]) when filters are set up before
passing them to sk_unattached_filter_create(). Therefore, let
sk_unattached_filter_create() API use sock_fprog_kern to overcome
this issue.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:48:05 -04:00
Daniel Borkmann
8556ce79d5 net: filter: remove DL macro
Lets get rid of this macro. After commit 5bcfedf06f ("net: filter:
simplify label names from jump-table"), labels have become more
readable due to omission of BPF_ prefix but at the same time more
generic, so that things like `git grep -n` would not find them. As
a middle path, lets get rid of the DL macro as it's not strictly
needed and would otherwise just hide the full name.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:48:05 -04:00
Tom Herbert
6b649feafe l2tp: Add support for zero IPv6 checksums
Added new L2TP configuration options to allow TX and RX of
zero checksums in IPv6. Default is not to use them.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Tom Herbert
1c19448c9b net: Make enabling of zero UDP6 csums more restrictive
RFC 6935 permits zero checksums to be used in IPv6 however this is
recommended only for certain tunnel protocols, it does not make
checksums completely optional like they are in IPv4.

This patch restricts the use of IPv6 zero checksums that was previously
intoduced. no_check6_tx and no_check6_rx have been added to control
the use of checksums in UDP6 RX and TX path. The normal
sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
dealing with a dual stack socket).

A helper function has been added (udp_set_no_check6) which can be
called by tunnel impelmentations to all zero checksums (send on the
socket, and accept them as valid).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Tom Herbert
28448b8045 net: Split sk_no_check into sk_no_check_{rx,tx}
Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Tom Herbert
b26ba202e0 net: Eliminate no_check from protosw
It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Tom Herbert
0f8066bd48 sunrpc: Remove sk_no_check setting
Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 16:28:53 -04:00
Sucheta Chakraborty
ed616689a3 net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool.
o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
  to have a bandwidth of at least this value.
  max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
  of up to this value.

o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
  which takes 4 arguments:
  netdev, VF number, min_tx_rate, max_tx_rate

o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.

o Drivers that currently implement ndo_set_vf_tx_rate should now call
  ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
  greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
  implemented by driver.

o If user enters only one of either min_tx_rate or max_tx_rate, then,
  userland should read back the other value from driver and set both
  for IFLA_VF_RATE.
  Drivers that have not yet implemented IFLA_VF_RATE should always
  return min_tx_rate as 0 when read from ip tool.

o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
  IFLA_VF_RATE should override.

o Idea is to have consistent display of rate values to user.

o Usage example: -

  ./ip link set p4p1 vf 0 rate 900

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

  ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
    min_tx_rate 200Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

  ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
    min_tx_rate 200Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-23 15:04:02 -04:00
Johan Hedberg
d7b2545023 Bluetooth: Clearly distinguish mgmt LTK type from authenticated property
On the mgmt level we have a key type parameter which currently accepts
two possible values: 0x00 for unauthenticated and 0x01 for
authenticated. However, in the internal struct smp_ltk representation we
have an explicit "authenticated" boolean value.

To make this distinction clear, add defines for the possible mgmt values
and do conversion to and from the internal authenticated value.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-23 11:24:04 -07:00
John W. Linville
5ca2504ea3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-05-23 10:55:58 -04:00
Pravin B Shelar
0c200ef94c openvswitch: Simplify genetlink code.
Following patch get rid of struct genl_family_and_ops which is
redundant due to changes to struct genl_family.

Signed-off-by: Kyle Mestery <mestery@noironetworks.com>
Acked-by: Kyle Mestery <mestery@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:37 -07:00
Jarno Rajahalme
893f139b9a openvswitch: Minimize ovs_flow_cmd_new|set critical sections.
Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:36 -07:00
Jarno Rajahalme
37bdc87ba0 openvswitch: Split ovs_flow_cmd_new_or_set().
Following patch will be easier to reason about with separate
ovs_flow_cmd_new() and ovs_flow_cmd_set() functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:36 -07:00
Jarno Rajahalme
aed067783e openvswitch: Minimize ovs_flow_cmd_del critical section.
ovs_flow_cmd_del() now allocates reply (if needed) after the flow has
already been removed from the flow table.  If the reply allocation
fails, a netlink error is signaled with netlink_set_err(), as is
already done in ovs_flow_cmd_new_or_set() in the similar situation.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:36 -07:00
Jarno Rajahalme
0e9796b4af openvswitch: Reduce locking requirements.
Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(),
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info().

A datapath pointer is available only when holding a lock.  Change
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a
dp_ifindex directly, rather than a datapath pointer that is then
(only) used to get the dp_ifindex.  This is useful, since the
dp_ifindex is available even when the datapath pointer is not, both
before and after taking a lock, which makes further critical section
reduction possible.

Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a
'flow' pointer.  This allows some future patches to do the allocation
before acquiring the flow pointer.

The locking requirements after this patch are:

ovs_flow_cmd_alloc_info(): May be called without locking, must not be
called while holding the RCU read lock (due to memory allocation).
If 'acts' belong to a flow in the flow table, however, then the
caller must hold ovs_mutex.

ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held.

ovs_flow_cmd_build_info(): This calls both of the above, so the caller
must hold ovs_mutex.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:36 -07:00
Jarno Rajahalme
86ec8dbae2 openvswitch: Fix ovs_flow_stats_get/clear RCU dereference.
For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:35 -07:00
Jarno Rajahalme
eb07265904 openvswitch: Fix typo.
Incorrect struct name was confusing, even though otherwise
inconsequental.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:35 -07:00
Jarno Rajahalme
6093ae9aba openvswitch: Minimize dp and vport critical sections.
Move most memory allocations away from the ovs_mutex critical
sections.  vport allocations still happen while the lock is taken, as
changing that would require major refactoring. Also, vports are
created very rarely so it should not matter.

Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather
than ovs_lock(), as nothing need to be changed.  This was done by
ovs_vport_cmd_get() already.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:35 -07:00
Jarno Rajahalme
56c19868e1 openvswitch: Make flow mask removal symmetric.
Masks are inserted when flows are inserted to the table, so it is
logical to correspondingly remove masks when flows are removed from
the table, in ovs_flow_table_remove().

This allows ovs_flow_free() to be called without locking, which will
be used by later patches.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:35 -07:00
Jarno Rajahalme
fb5d1e9e12 openvswitch: Build flow cmd netlink reply only if needed.
Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a
reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or
OVS_FLOW_CMD_DEL.  Currently, OVS userspace does not request a reply
for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats
may have changed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:34 -07:00
Jarno Rajahalme
bb6f9a708d openvswitch: Clarify locking.
Remove unnecessary locking from functions that are always called with
appropriate locking.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:34 -07:00
Jarno Rajahalme
be52c9e96a openvswitch: Avoid assigning a NULL pointer to flow actions.
Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified.  This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on.  This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:34 -07:00
Jarno Rajahalme
1139e241ec openvswitch: Compact sw_flow_key.
Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-05-22 16:27:34 -07:00
Cong Wang
b6ed549860 batman: fix a bogus warning from batadv_is_on_batman_iface()
batman tries to search dev->iflink to check if it's a batman interface,
but ->iflink could be 0, which is not a valid ifindex. It should just
avoid iflink == 0 case.

Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Antonio Quartulli <antonio@open-mesh.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 17:23:00 -04:00
David S. Miller
65db611a5c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-05-22

This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?

I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.

1) Simplify the xfrm audit handling, from Tetsuo Handa.

2) Codingstyle cleanup for xfrm_output, from abian Frederick.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 16:00:00 -04:00
NeilBrown
ef11ce2487 SUNRPC: track whether a request is coming from a loop-back interface.
If an incoming NFS request is coming from the local host, then
nfsd will need to perform some special handling.  So detect that
possibility and make the source visible in rq_local.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:59:18 -04:00
Li RongQing
fbdc0ad095 ipv4: initialise the itag variable in __mkroute_input
the value of itag is a random value from stack, and may not be initiated by
fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID
is not set

This will make the cached dst uncertainty

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:57:36 -04:00
Trond Myklebust
c789102c20 SUNRPC: Fix a module reference leak in svc_handle_xprt
If the accept() call fails, we need to put the module reference.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:57:22 -04:00
Chuck Lever
16e4d93f6d NFSD: Ignore client's source port on RDMA transports
An NFS/RDMA client's source port is meaningless for RDMA transports.
The transport layer typically sets the source port value on the
connection to a random ephemeral port.

Currently, NFS server administrators must specify the "insecure"
export option to enable clients to access exports via RDMA.

But this means NFS clients can access such an export via IP using an
ephemeral port, which may not be desirable.

This patch eliminates the need to specify the "insecure" export
option to allow NFS/RDMA clients access to an export.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-05-22 15:55:48 -04:00
Dan Carpenter
b3f7a7b48f ieee802154: missing put_dev() on error
We should call put_dev() on the error path here.

Fixes: 3e9c156e2c ('ieee802154: add netlink interfaces for llsec')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:54:45 -04:00
Cong Wang
b1282726d5 bridge: make br_device_notifier static
Merge net/bridge/br_notify.c into net/bridge/br.c,
since it has only br_device_event() and br.c is small.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:33:47 -04:00
Chen Gang
5c4a43b024 net/dccp/timer.c: use 'u64' instead of 's64' to avoid compiler's warning
'dccp_timestamp_seed' is initialized once by ktime_get_real() in
dccp_timestamping_init(). It is always less than ktime_get_real()
in dccp_timestamp().

Then, ktime_us_delta() in dccp_timestamp() will always return positive
number. So can use manual type cast to let compiler and do_div() know
about it to avoid warning.

The related warning (with allmodconfig under unicore32):

    CC [M]  net/dccp/timer.o
  net/dccp/timer.c: In function ‘dccp_timestamp’:
  net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:31:45 -04:00
Phoebe Buckheister
53819a6ced mac802154: llsec: correctly lookup implicit-indexed keys
Key id comparison for type 1 keys (implicit source, with index) should
return true if mode and id are equal, not false.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:27:32 -04:00
Phoebe Buckheister
62e9c117ee mac802154: llsec: fold useless return value check
llsec_do_encrypt will never return a positive value, so the restriction
to 0-or-negative on return is useless.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:24:13 -04:00
Phoebe Buckheister
6f3eabcd04 mac802154: llsec: fix incorrect lock pairing
In encrypt, sec->lock is taken with read_lock_bh, so in the error path,
we must read_unlock_bh.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:24:13 -04:00
Michal Kubeček
da08143b85 vlan: more careful checksum features handling
When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.

Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 15:07:23 -04:00
Ezequiel Garcia
e876f208af net: Add a software TSO helper API
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 14:57:15 -04:00
John W. Linville
40a10fd740 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
John W. Linville
99abe65ff1 NFC: 3.16: First pull request
This is the NFC pull request for 3.16. We have:
 
 - STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
   thus relies on the HCI stack. This submission provides support for tag
   redaer/writer mode (including Type 5) and device tree bindings.
 
 - PM runtime support and a bunch of bug fixes for TI's trf7970a.
 
 - Device tree support for NXP's pn544. Legacy platform data support is
   obviously kept intact.
 
 - NFC Tag type 4B support to the NFC Digital stack.
 
 - SOCK_RAW type support to the raw NFC socket, and allow NCI
   sniffing from that. This can be extended to report HCI frames and also
   proprietarry ones like e.g. the pn533 ones.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTepRlAAoJEIqAPN1PVmxKnF0P/RvfrZs6CbGNJC+dkEbk90p1
 nsngy4+4MmPwJYVzObnLz4Br0k1kmFKiOKske6drjMpgzDWeuQelw3B7bd3FYfxD
 YkQsc5RC984xrDoDH5pn8mA6VJqmn7whrmcibTYAixrDqTvo8gw6uja4ryAnSdZm
 n7cRbh/A5F/sa7O4mPA0bCTdp4jAS/vOP9rGFDOth/b5yJVs99XmC+AZp/Ad9BUx
 +/osWGmBV5jshtX7aPTSxIQB4BUaP/lP1DW8yF5whKDjsHC9QyJcAtw9HfZ4tv2h
 YNteZZ8yjM+rSjnDw/LvDc2Gp8Z8P1GYf8D3QN3cWhw1ZvXi7CnqKjEnm41sbfaH
 L5esIfsRBUdmk6Ika7zALqmOQFI3PzH+ag96punl29qb2gyBDRSnXKVLirv3xxFG
 h7vYtQL43Rosn/4pSilRbYReRwyKbSCxW3un/tUJy0Faafs6q+9oMC2aWbIfTT6l
 40n4H9EmzYy2OaaXSFckiIIYYgVDAji8GLXTf+dPHb+NrH3QQOR3m27WzHc4rmYk
 kUrv0lKoFswA+VLlIcJTrSKNF21FDjwuImzIWiPz6Fx/+rWJ0b4GlQyIynD72LpR
 2LkUhTrxuRuRtxVCtvTdkPlL6Bdp3HO7t4qZ0EirgnpmGK6NScBgABoqFJSbz9uS
 UUvZbHVIjLrDU9zzoyz8
 =cSl+
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"NFC: 3.16: First pull request

This is the NFC pull request for 3.16. We have:

- STMicroeectronics st21nfca support. The st21nfca is an HCI chipset and
  thus relies on the HCI stack. This submission provides support for tag
  redaer/writer mode (including Type 5) and device tree bindings.

- PM runtime support and a bunch of bug fixes for TI's trf7970a.

- Device tree support for NXP's pn544. Legacy platform data support is
  obviously kept intact.

- NFC Tag type 4B support to the NFC Digital stack.

- SOCK_RAW type support to the raw NFC socket, and allow NCI
  sniffing from that. This can be extended to report HCI frames and also
  proprietarry ones like e.g. the pn533 ones."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-05-22 13:56:46 -04:00
David S. Miller
8af750d739 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:06:23 -04:00
Neal Cardwell
ca8a226343 tcp: make cwnd-limited checks measurement-based, and gentler
Experience with the recent e114a710aa ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.

The main problems seemed to be:

(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.

(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).

This commit improves both of these, by:

(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.

(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-22 12:04:49 -04:00
Emmanuel Grumbach
67af981153 cfg80211: allow RSSI compensation
Channels in 2.4GHz band overlap, this means that if we
send a probe request on channel 1 and then move to channel
2, we will hear the probe response on channel 2. In this
case, the RSSI will be lower than if we had heard it on
the channel on which it was sent (1 in this case).

The firmware / low level driver can parse the channel in
the DS IE or HT IE and compensate the RSSI so that it will
still have a valid value even if we heard the frame on an
adjacent channel. This can be done up to a certain offset.

Add this offset as a configuration for the low level driver.
A low level driver that can compensate the low RSSI in this
case should assign the maximal offset for which the RSSI
value is still valid.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-22 09:58:49 +02:00
Eric Dumazet
4de462ab63 ipv6: gro: fix CHECKSUM_COMPLETE support
When GRE support was added in linux-3.14, CHECKSUM_COMPLETE handling
broke on GRE+IPv6 because we did not update/use the appropriate csum :

GRO layer is supposed to use/update NAPI_GRO_CB(skb)->csum instead of
skb->csum

Tested using a GRE tunnel and IPv6 traffic. GRO aggregation now happens
at the first level (ethernet device) instead of being done in gre
tunnel. Native IPv6+TCP is still properly aggregated.

Fixes: bf5a755f5e ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 17:18:47 -04:00
Alexei Starovoitov
5fe821a9de net: filter: cleanup invocation of internal BPF
Kernel API for classic BPF socket filters is:

sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter

Cleanup internal BPF kernel API as following:

sk_filter_select_runtime() - final step of internal BPF creation.
  Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program

Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.

Example of internal BPF create, run, destroy:

  struct sk_filter *fp;

  fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
  memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
  fp->len = prog_len;

  sk_filter_select_runtime(fp);

  SK_RUN_FILTER(fp, ctx);

  sk_filter_free(fp);

Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 17:07:17 -04:00
Cong Wang
bf63ac73b3 net_sched: fix an oops in tcindex filter
Kelly reported the following crash:

        IP: [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
        PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
        Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
        CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
        RIP: 0010:[<ffffffff817a993d>]  [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
        RSP: 0018:ffff8800d21b9b90  EFLAGS: 00010283
        RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
        RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
        RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
        R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
        R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
        FS:  00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
        Stack:
         ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
         ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
         ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
        Call Trace:
         [<ffffffff817c55bb>] tcindex_classify+0x88/0x9b
         [<ffffffff817a7f7d>] tc_classify_compat+0x3e/0x7b
         [<ffffffff817a7fdf>] tc_classify+0x25/0x9f
         [<ffffffff817b0e68>] htb_enqueue+0x55/0x27a
         [<ffffffff817b6c2e>] dsmark_enqueue+0x165/0x1a4
         [<ffffffff81775642>] __dev_queue_xmit+0x35e/0x536
         [<ffffffff8177582a>] dev_queue_xmit+0x10/0x12
         [<ffffffff818f8ecd>] packet_sendmsg+0xb26/0xb9a
         [<ffffffff810b1507>] ? __lock_acquire+0x3ae/0xdf3
         [<ffffffff8175cf08>] __sock_sendmsg_nosec+0x25/0x27
         [<ffffffff8175d916>] sock_aio_write+0xd0/0xe7
         [<ffffffff8117d6b8>] do_sync_write+0x59/0x78
         [<ffffffff8117d84d>] vfs_write+0xb5/0x10a
         [<ffffffff8117d96a>] SyS_write+0x49/0x7f
         [<ffffffff8198e212>] system_call_fastpath+0x16/0x1b

This is because we memcpy struct tcindex_filter_result which contains
struct tcf_exts, obviously struct list_head can not be simply copied.
This is a regression introduced by commit 33be627159
(net_sched: act: use standard struct list_head).

It's not very easy to fix it as the code is a mess:

       if (old_r)
               memcpy(&cr, r, sizeof(cr));
       else {
               memset(&cr, 0, sizeof(cr));
               tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
       }
       ...
       tcf_exts_change(tp, &cr.exts, &e);
       ...
       memcpy(r, &cr, sizeof(cr));

the above code should equal to:

        tcindex_filter_result_init(&cr);
        if (old_r)
               cr.res = r->res;
        ...
        if (old_r)
               tcf_exts_change(tp, &r->exts, &e);
        else
               tcf_exts_change(tp, &cr.exts, &e);
        ...
        r->res = cr.res;

after this change, since there is no need to copy struct tcf_exts.

And it also fixes other places zero'ing struct's contains struct tcf_exts.

Fixes: commit 33be627159 (net_sched: act: use standard struct list_head)
Reported-by: Kelly Anderson <kelly@xilka.com>
Tested-by: Kelly Anderson <kelly@xilka.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 16:47:13 -04:00
Li RongQing
1495664355 ipv6: slight optimization in ip6_dst_gc
entries is always greater than rt_max_size here, since if entries is less
than rt_max_size, the fib6_run_gc function will be skipped

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 15:52:23 -04:00
Tom Gundersen
f98f89a010 net: tunnels - enable module autoloading
Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 15:46:52 -04:00
Arik Nemtsov
4d3df547e8 cfg80211: don't set reg timeout for user-handled hint
Otherwise every "indoor" setting by usermode will cause a regdomain reset.

Acked-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-21 09:15:18 +02:00
Antonio Quartulli
7406353d43 cfg80211: implement cfg80211_get_station cfg80211 API
Implement and export the new cfg80211_get_station() API.
This utility can be used by other kernel modules to obtain
detailed information about a given wireless station.

It will be in particular useful to batman-adv which will
implement a wireless rate based metric.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-21 09:15:17 +02:00
Antonio Quartulli
cca674d47e mac80211: export the expected throughput
Add get_expected_throughput() API to mac80211 so that each
driver can implement its own version based on the RC
algorithm they are using (might be using an HW RC algo).
The API returns a value expressed in Kbps.

Also, add the new get_expected_throughput() member
to the rate_control_ops structure in order to be
able to query the RC algorithm (this patch provides an
implementation of this API for both minstrel and
minstrel_ht).

The related member in the station_info object is now
filled accordingly when dumping a station.

Cc: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-21 09:15:16 +02:00
Steffen Klassert
78ff4be45a ip_tunnel: Initialize the fallback device properly
We need to initialize the fallback device to have a correct mtu
set on this device. Otherwise the mtu is set to null and the device
is unusable.

Fixes: fd58156e45 ("IPIP: Use ip-tunneling code.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 02:08:32 -04:00
David S. Miller
d050de607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter/nftables fixes for net

The following patchset contains nftables fixes for your net tree, they
are:

1) Fix crash when using the goto action in a rule by making sure that
   we always fall back on the base chain. Otherwise, this may try to
   access the counter memory area of non-base chains, which does not
   exists.

2) Fix several aspects of the rule tracing that are currently broken:

   * Reset rule number counter after goto/jump action, otherwise the
     tracing reports a bogus rule number.
   * Fix tracing of the goto action.
   * Fix bogus rule number counter after goto.
   * Fix missing return trace after finishing the walk through the
     non-base chain.
   * Fix missing trace when matching non-terminal rule.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 01:24:19 -04:00
Johan Hedberg
1cc6114402 Bluetooth: Update smp_confirm to return a response code
Now that smp_confirm() is called "inline" we can have it return a
response code and have the sending of it be done in the shared place for
command handlers. One exception is when we're entering smp.c from mgmt.c
when user space responds to authentication, in which case we still need
our own code to call smp_failure().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:14 -07:00
Johan Hedberg
861580a970 Bluetooth: Update smp_random to return a response code
Since we're now calling smp_random() "inline" we can have it directly
return a response code and have the shared command handler send the
response.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:14 -07:00
Johan Hedberg
4a74d65868 Bluetooth: Rename smp->smp_flags to smp->flags
There's no reason to have "smp" in this variable name since it is
already part of the SMP struct which provides sufficient context.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:14 -07:00
Johan Hedberg
9dd4dd275f Bluetooth: Remove unnecessary work structs from SMP code
When the SMP code was initially created (mid-2011) parts of the
Bluetooth subsystem were still not converted to use workqueues. This
meant that the crypto calls, which could sleep, couldn't be called
directly. Because of this the "confirm" and "random" work structs were
introduced.

These days the entire Bluetooth subsystem runs through workqueues which
makes these structs unnecessary. This patch removes them and converts
the calls to queue them to use direct function calls instead.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:13 -07:00
Johan Hedberg
1ef35827a9 Bluetooth: Fix setting initial local auth_req value
There is no reason to have the initial local value conditional to
whether the remote value has bonding set or not. We can either way start
off with the value we received.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:12 -07:00
Johan Hedberg
4bc58f51e1 Bluetooth: Make SMP context private to smp.c
There are no users of the smp_chan struct outside of smp.c so move it
away from smp.h. The addition of the l2cap.h include to hci_core.c,
hci_conn.c and mgmt.c is something that should have been there already
previously to avoid warnings of undeclared struct l2cap_conn, but the
compiler warning was apparently shadowed away by the mention of
l2cap_conn in the struct smp_chan definition.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-05-20 08:44:11 -07:00
Antonio Quartulli
867d849fc8 cfg80211: export expected throughput through get_station()
Users may need information about the expected throughput
towards a given peer.
This value is supposed to consider the size overhead
generated by the 802.11 header.

This value is exported in kbps through the get_station() API
by including it into the station_info object.
Moreover, it is sent to user space when replying to the
nl80211 GET_STATION command.

This information will be useful to the batman-adv module
which will use it for its new metric computation.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-20 15:13:32 +02:00
Hiren Tandel
0515829642 NFC: NCI: Send all NCI frames to raw sockets
So that anyone listening on SOCKPROTO_RAW for raw frames will get all
NCI frames, in both directions. This actually implements userspace NFC
NCI sniffing.
It's now up to userspace to decode those frames.

Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-20 00:23:59 +02:00
Hiren Tandel
57be1f3f3e NFC: Add RAW socket type support for SOCKPROTO_RAW
This allows for a more generic NFC sniffing by using SOCKPROTO_RAW
SOCK_RAW to read RAW NFC frames. This is for sniffing anything but LLCP
(HCI, NCI, etc...).

Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Rahul Tank <rahult@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-20 00:06:04 +02:00
Hiren Tandel
c79d9f9ef8 NFC: NCI: No need to reverse ATR_RES Response
ATR_RES response received within Activation Parameters is already
in correct order. Reversing it fails LLCP magic number check and
so P2P functionality fails.

Signed-off-by: Hiren Tandel <hirent@marvell.com>
Signed-off-by: Rahul Tank <rahult@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-19 23:58:08 +02:00
Mark A. Greer
4b8b6267be NFC: digital: Handle multiple SENSF_REQ frames
According to section 5.15.1.3 of the NFC Activity
Specification, multiple SENSF_REQ commands can be
received by a target before it receives an ATR_REQ
command.  To handle this, add a routine that checks
whether a SENSF_REQ or ATR_REQ has been recieved.
If its a SENSF_REQ, respond appropriately and
continue waiting for a ATR_REQ.  If its an ATR_REQ,
handle it as before.

CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-19 23:52:40 +02:00
Mark A. Greer
96e829b433 NFC: digital: SENSF_RES excludes RD when SENSF_REQ RC is zero
The check in digital_tg_send_sensf_res() that excludes
the 'RD' field from the SENSF_RES is inverted.  The 'RD'
field should be excluded when the SENSF_REQ 'RC' field
is equal to DIGITAL_SENSF_REQ_RC_NONE instead of when
its not equal.  This is described in section 6.6.2.11
of the NFC Digital Specification.

CC: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-05-19 23:52:37 +02:00
John W. Linville
20b4f9c73f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2014-05-19 16:34:27 -04:00
Johannes Berg
922bd80fc3 cfg80211: constify wowlan/coalesce mask/pattern pointers
This requires changing the nl80211 parsing code a bit to use
intermediate pointers for the allocation, but clarifies the
API towards the drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 18:06:50 +02:00
Johannes Berg
c1e5f4714d cfg80211: constify more pointers in the cfg80211 API
This also propagates through the drivers.

The orinoco driver uses the cfg80211 API structs for internal
bookkeeping, and so needs a (void *) cast that removes the
const - but that's OK because it allocates those pointers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 17:53:16 +02:00
Johannes Berg
3b3a0162fa cfg80211: constify MAC addresses in cfg80211 ops
This propagates through all the drivers and mac80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 17:34:42 +02:00
Johannes Berg
00591cea31 mac80211: minstrel-ht: small clarifications
Antonio and I were looking over this code and some things
didn't immediately make sense, so we came up with two small
clarifications.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-05-19 14:30:37 +02:00
Pablo Neira Ayuso
c7c32e72cb netfilter: nf_tables: defer all object release via rcu
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
128ad3322b netfilter: nf_tables: remove skb and nlh from context structure
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:13 +02:00
Pablo Neira Ayuso
35151d840c netfilter: nf_tables: simplify nf_tables_*_notify
Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
60319eb1ca netfilter: nf_tables: use new transaction infrastructure to handle elements
Leave the set content in consistent state if we fail to load the
batch. Use the new generic transaction infrastructure to achieve
this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
55dd6f9307 netfilter: nf_tables: use new transaction infrastructure to handle table
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:12 +02:00
Pablo Neira Ayuso
e1aaca93ee netfilter: nf_tables: pass context to nf_tables_updtable()
So nf_tables_uptable() only takes one single parameter.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
f75edf5e9c netfilter: nf_tables: disabling table hooks always succeeds
nf_tables_table_disable() always succeeds, make this function void.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
91c7b38dc9 netfilter: nf_tables: use new transaction infrastructure to handle chain
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
ff3cd7b3c9 netfilter: nf_tables: refactor chain statistic routines
Add new routines to encapsulate chain statistics allocation and
replacement.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:11 +02:00
Pablo Neira Ayuso
958bee14d0 netfilter: nf_tables: use new transaction infrastructure to handle sets
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:

 1) create the set and send netlink message to the kernel
 2) process the response from the kernel that contains the allocated name.
 3) add the set elements and send netlink message to the kernel.
 4) process the response from the kernel (to check for errors).

To:

 1) add the set to the batch.
 2) add the set elements to the batch.
 3) add the rule that points to the set.
 4) send batch to the kernel.

This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.

Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
b380e5c733 netfilter: nf_tables: add message type to transactions
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
37082f930b netfilter: nf_tables: relocate commit and abort routines in the source file
Move the commit and abort routines to the bottom of the source code
file. This change is required by the follow up patches that add the
set, chain and table transaction support.

This patch is just a cleanup to access several functions without
having to declare their prototypes.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
1081d11b08 netfilter: nf_tables: generalise transaction infrastructure
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:10 +02:00
Pablo Neira Ayuso
7c95f6d866 netfilter: nf_tables: deconstify table and chain in context structure
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-05-19 12:06:09 +02:00
Oliver Hartkopp
45c700291a can: add hash based access to single EFF frame filters
In contrast to the direct access to the single SFF frame filters (which are
indexed by the SFF CAN ID itself) the single EFF frame filters are arranged
in a single linked hlist. To reduce the hlist traversal in the case of many
filter subscriptions a hash based access is introduced for single EFF filters.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-05-19 09:38:24 +02:00
Oliver Hartkopp
e3d3917f3d can: proc: make array printing function indenpendent from sff frames
The can_rcvlist_sff_proc_show_one() function which prints the array of filters
for the single SFF CAN identifiers is prepared to be used by a second caller.
Therefore it is also renamed to properly describe its future functionality.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2014-05-19 09:38:24 +02:00
David S. Miller
b6052af61a Included changes:
- fix codestyle to respect new checkpatch warnings
 - increase internal version number
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJTeLMvAAoJEEKTMo6mOh1VKtgP/RuR34USuUbY/xMZ9/Rn2/E7
 z1qn6hh8hlw+Hd+Vn+9BvDJzwn+Baneu1c3SMP08kE+pAst0n788y/f/pVzfToJk
 Gll0sOVHiSm05M0QQ0Vq57H+rxoFv2KACM1t2+NMW+pB+PsSYG5y87b6I+0hR4Pv
 lbBCNmgIxY2alxM8qab2Zlt+cCUdkKUnI67P0LtVnMh91JuKwsheOdR+Smxz2+2g
 J+2Bzcz+NIHhJP9c+QmJipV+gtIRjFr7+bebaXDm/eEBq/3f6cEhFtwa76CmCpI/
 cAIMDFORCHB27qNMgKSuzFDdhF1qQJnZh8FX0dfRBXvH8NwxBOkjFh1CBJ3iwjm1
 T7GBTLTKiv/JqdNjqrWJ9OxChl8I2jppevZdimq1VUjhv9117Jc73TnzazjULTST
 xr5PpZ1gRfruUVXl362otrtzm0N/hdqez+mYlkZEx/ERTDedLCZZAnjTsx5PPMG+
 GXlbc1BWuQZuHpvs8uWMcnXDaWtNyNKKpvfRPuvLIST80F1Bw/KRd2FDH/AiO2tL
 2eACn9ughC5XO9E+/iyfWm1MQMEwo/w9+EfWpnRWV9HtDuHepVGy59x3mCYH/bN0
 7FP23lbaFw05i/UpsRRneqkzMJLk/16qLCiNoC8u2hEiqKzu0/celPwl7B16Fs4Z
 CU65LSN/QNU9q+AXVQOd
 =tdAQ
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- fix codestyle to respect new checkpatch warnings
- increase internal version number
2014-05-18 21:27:09 -04:00
Manuel Schölling
71fd762f2e net: rds: Use time_after() for time comparison
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of raw math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 21:24:52 -04:00
stephen hemminger
614d056c8e ipv4: minor spelling fix
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 21:10:29 -04:00
stephen hemminger
025559eec8 bridge: fix spelling of promiscuous
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 21:10:08 -04:00
Ben Hutchings
61d88c6811 ethtool: Disallow ETHTOOL_SRSSH with both indir table and hash key unchanged
This would be a no-op, so there is no reason to request it.

This also allows conversion of the current implementations of
ethtool_ops::{get,set}_rxfh_indir to ethtool_ops::{get,set}_rxfh
with no change other than their parameters.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19 01:29:42 +01:00
Ben Hutchings
7455fa2422 ethtool: Name the 'no change' value for setting RSS hash key but not indir table
We usually allocate special values of u32 fields starting from the top
down, so also change the value to 0xffffffff.  As these operations
haven't been included in a stable release yet, it's not too late to
change.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19 01:18:19 +01:00
Ben Hutchings
fb95cd8d14 ethtool: Return immediately on error in ethtool_copy_validate_indir()
We must return -EFAULT immediately rather than continuing into
the loop.

Similarly, we may as well return -EINVAL directly.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-05-19 01:17:32 +01:00
Alexei Starovoitov
d4f0e0958d net: bridge: fix build
fix build when BRIDGE_VLAN_FILTERING is not set

Fixes: 2796d0c648 ("bridge: Automatically manage port promiscuous mode")

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-18 20:09:50 -04:00
Trond Myklebust
7a9a7b774f SUNRPC: Fix a module reference issue in rpcsec_gss
We're not taking a reference in the case where _gss_mech_get_by_pseudoflavor
loops without finding the correct rpcsec_gss flavour, so why are we
releasing it?

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-05-18 13:47:14 -04:00
Simon Wunderlich
871d3d9fdf batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-05-18 15:04:00 +02:00
Antonio Quartulli
2b64df2058 batman-adv: remove semi-colon after macro definition
Reported by checkpatch with the following warning:
"WARNING: macros should not use a trailing semicolon"

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-05-18 15:04:00 +02:00
Antonio Quartulli
f138694b15 batman-adv: add blank line between declarations and the rest of the code
Reported by checkpatch with the following message:
"WARNING: Missing a blank line after declarations"

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-05-18 15:03:52 +02:00
Vlad Yasevich
44a4085538 bonding: Fix stacked device detection in arp monitoring
Prior to commit fbd929f2dc
	bonding: support QinQ for bond arp interval

the arp monitoring code allowed for proper detection of devices
stacked on top of vlans.  Since the above commit, the
code can still detect a device stacked on top of single
vlan, but not a device stacked on top of Q-in-Q configuration.
The search will only set the inner vlan tag if the route
device is the vlan device.  However, this is not always the
case, as it is possible to extend the stacked configuration.

With this patch it is possible to provision devices on
top Q-in-Q vlan configuration that should be used as
a source of ARP monitoring information.

For example:
ip link add link bond0 vlan10 type vlan proto 802.1q id 10
ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
ip link add link vlan100 type macvlan

Note:  This patch limites the number of stacked VLANs to 2,
just like before.  The original, however had another issue
in that if we had more then 2 levels of VLANs, we would end
up generating incorrectly tagged traffic.  This is no longer
possible.

Fixes: fbd929f2dc (bonding: support QinQ for bond arp interval)
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: Patric McHardy <kaber@trash.net>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 22:29:05 -04:00
Vlad Yasevich
d38569ab2b vlan: Fix lockdep warning with stacked vlan devices.
This reverts commit dc8eaaa006.
	vlan: Fix lockdep warning when vlan dev handle notification

Instead we use the new new API to find the lock subclass of
our vlan device.  This way we can support configurations where
vlans are interspersed with other devices:
  bond -> vlan -> macvlan -> vlan

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 22:14:49 -04:00
Vlad Yasevich
4085ebe8c3 net: Find the nesting level of a given device by type.
Multiple devices in the kernel can be stacked/nested and they
need to know their nesting level for the purposes of lockdep.
This patch provides a generic function that determines a nesting
level of a particular device by its type (ex: vlan, macvlan, etc).
We only care about nesting of the same type of devices.

For example:
  eth0 <- vlan0.10 <- macvlan0 <- vlan1.20

The nesting level of vlan1.20 would be 1, since there is another vlan
in the stack under it.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 22:14:49 -04:00
Thomas Graf
97dc48e220 pktgen: Use seq_puts() where seq_printf() is not needed
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:30:30 -04:00
Eric Dumazet
29e9824278 net: gro: make sure skb->cb[] initial content has not to be zero
Starting from linux-3.13, GRO attempts to build full size skbs.

Problem is the commit assumed one particular field in skb->cb[]
was clean, but it is not the case on some stacked devices.

Timo reported a crash in case traffic is decrypted before
reaching a GRE device.

Fix this by initializing NAPI_GRO_CB(skb)->last at the right place,
this also removes one conditional.

Thanks a lot to Timo for providing full reports and bisecting this.

Fixes: 8a29111c7c ("net: gro: allow to build full sized skb")
Bisected-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:24:54 -04:00
Phoebe Buckheister
f0f77dc6be ieee802154, mac802154: implement devkey record option
The 802.15.4-2011 standard states that for each key, a list of devices
that use this key shall be kept. Previous patches have only considered
two options:

 * a device "uses" (or may use) all keys, rendering the list useless
 * a device is restricted to a certain set of keys

Another option would be that a device *may* use all keys, but need not
do so, and we are interested in the actual set of keys the device uses.
Recording keys used by any given device may have a noticable performance
impact and might not be needed as often. The common case, in which a
device will not switch keys too often, should still perform well.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:42 -04:00
Phoebe Buckheister
3e9c156e2c ieee802154: add netlink interfaces for llsec
This patch adds user-visible interfaces for the llsec infrastructure.
For the added methods, the only major difference between all add/remove
implementation lies in how the specific object is parsed, and for dump
requests, how objects are written into netlink messages.

To save on boilerplate code, table dumps are routed through a helper
function that handles netlink dump state, leaving the actual dumping
code to care only about iterating over the table to be dumped and
filling netlink messages. For add/remove methods, the boilerplate
required to work is not quite as large, but still enough to also move
into a local helper.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
9b0bb4a83f mac802154: propagate device address changes to llsec
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
29e023746a mac802154: add llsec configuration functions
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
af9eed5bbf ieee802154: add dgram sockopts for security control
Allow datagram sockets to override the security settings of the device
they send from on a per-socket basis. Requires CAP_NET_ADMIN or
CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
f30be4d53c mac802154: integrate llsec with wpan devices
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
4c14a2fb5d mac802154: add llsec decryption method
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:41 -04:00
Phoebe Buckheister
03556e4d0d mac802154: add llsec encryption method
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:40 -04:00
Phoebe Buckheister
5d637d5aab mac802154: add llsec structures and mutators
This patch adds containers and mutators for the major ieee802154_llsec
structures to mac802154. Most of the (rather simple) ieee802154_llsec
structs are wrapped only to provide an rcu_head for orderly disposal,
but some structs - llsec keys notably - require more complex
bookkeeping.

Since each llsec key may be referenced by a number of llsec key table
entries (with differing key ids, but the same actual key), we want to
save memory and not allocate crypto transforms for each entry in the
table. Thus, the mac802154 llsec key is reference-counted instead.
Further, each key will have four associated crypto transforms - three
CCM transforms for the authsizes 4/8/16 and one CTR transform for
unauthenticated encryption. If we had a CCM* transform that allowed
authsize 0, and authsize as part of requests instead of transforms, this
would not be necessary.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:40 -04:00
Phoebe Buckheister
87de726c9b mac802154: update Kconfig
Link-layer security requires AES CCM for authenticated modes and AES CTR
for the unauthenticated encryption mode.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:23:40 -04:00
David S. Miller
e54740e6d7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
A set of OVS changes for net-next/3.16.

The major change here is a switch from per-CPU to per-NUMA flow
statistics. This improves scalability by reducing kernel overhead
in flow setup and maintenance.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:21:51 -04:00
Vlad Yasevich
2796d0c648 bridge: Automatically manage port promiscuous mode.
There exist configurations where the administrator or another management
entity has the foreknowledge of all the mac addresses of end systems
that are being bridged together.

In these environments, the administrator can statically configure known
addresses in the bridge FDB and disable flooding and learning on ports.
This makes it possible to turn off promiscuous mode on the interfaces
connected to the bridge.

Here is why disabling flooding and learning allows us to control
promiscuity:
 Consider port X.  All traffic coming into this port from outside the
bridge (ingress) will be either forwarded through other ports of the
bridge (egress) or dropped.  Forwarding (egress) is defined by FDB
entries and by flooding in the event that no FDB entry exists.
In the event that flooding is disabled, only FDB entries define
the egress.  Once learning is disabled, only static FDB entries
provided by a management entity define the egress.  If we provide
information from these static FDBs to the ingress port X, then we'll
be able to accept all traffic that can be successfully forwarded and
drop all the other traffic sooner without spending CPU cycles to
process it.
 Another way to define the above is as following equations:
    ingress = egress + drop
 expanding egress
    ingress = static FDB + learned FDB + flooding + drop
 disabling flooding and learning we a left with
    ingress = static FDB + drop

By adding addresses from the static FDB entries to the MAC address
filter of an ingress port X, we fully define what the bridge can
process without dropping and can thus turn off promiscuous mode,
thus dropping packets sooner.

There have been suggestions that we may want to allow learning
and update the filters with learned addresses as well.  This
would require mac-level authentication similar to 802.1x to
prevent attacks against the hw filters as they are limited
resource.

Additionally, if the user places the bridge device in promiscuous mode,
all ports are placed in promiscuous mode regardless of the changes
to flooding and learning.

Since the above functionality depends on full static configuration,
we have also require that vlan filtering be enabled to take
advantage of this.  The reason is that the bridge has to be
able to receive and process VLAN-tagged frames and the there
are only 2 ways to accomplish this right now: promiscuous mode
or vlan filtering.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:33 -04:00
Vlad Yasevich
145beee8d6 bridge: Add addresses from static fdbs to non-promisc ports
When a static fdb entry is created, add the mac address
from this fdb entry to any ports that are currently running
in non-promiscuous mode.  These ports need this data so that
they can receive traffic destined to these addresses.
By default ports start in promiscuous mode, so this feature
is disabled.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:33 -04:00
Vlad Yasevich
f3a6ddf152 bridge: Introduce BR_PROMISC flag
Introduce a BR_PROMISC per-port flag that will help us track if the
current port is supposed to be in promiscuous mode or not.  For now,
always start in promiscuous mode.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:33 -04:00
Vlad Yasevich
8db24af71b bridge: Add functionality to sync static fdb entries to hw
Add code that allows static fdb entires to be synced to the
hw list for a specified port.  This will be used later to
program ports that can function in non-promiscuous mode.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:33 -04:00
Vlad Yasevich
e028e4b8dc bridge: Keep track of ports capable of automatic discovery.
By default, ports on the bridge are capable of automatic
discovery of nodes located behind the port.  This is accomplished
via flooding of unknown traffic (BR_FLOOD) and learning the
mac addresses from these packets (BR_LEARNING).
If the above functionality is disabled by turning off these
flags, the port requires static configuration in the form
of static FDB entries to function properly.

This patch adds functionality to keep track of all ports
capable of automatic discovery.  This will later be used
to control promiscuity settings.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:33 -04:00
Vlad Yasevich
63c3a622dd bridge: Turn flag change macro into a function.
Turn the flag change macro into a function to allow
easier updates and to reduce space.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 17:06:32 -04:00
Timo Teräs
22fb22eaeb ipv4: ip_tunnels: disable cache for nbma gre tunnels
The connected check fails to check for ip_gre nbma mode tunnels
properly. ip_gre creates temporary tnl_params with daddr specified
to pass-in the actual target on per-packet basis from neighbor
layer. Detect these tunnels by inspecting the actual tunnel
configuration.

Minimal test case:
 ip route add 192.168.1.1/32 via 10.0.0.1
 ip route add 192.168.1.2/32 via 10.0.0.2
 ip tunnel add nbma0 mode gre key 1 tos c0
 ip addr add 172.17.0.0/16 dev nbma0
 ip link set nbma0 up
 ip neigh add 172.17.0.1 lladdr 192.168.1.1 dev nbma0
 ip neigh add 172.17.0.2 lladdr 192.168.1.2 dev nbma0
 ping 172.17.0.1
 ping 172.17.0.2

The second ping should be going to 192.168.1.2 and head 10.0.0.2;
but cached gre tunnel level route is used and it's actually going
to 192.168.1.1 via 10.0.0.1.

The lladdr's need to go to separate dst for the bug to trigger.
Test case uses separate route entries, but this can also happen
when the route entry is same: if there is a nexthop exception or
the GRE tunnel is IPsec'ed in which case the dst points to xfrm
bundle unique to the gre lladdr.

Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 16:58:41 -04:00
Duan Jiong
ee30ef4d45 ip_tunnel: don't add tunnel twice
When using command "ip tunnel add" to add a tunnel, the tunnel will be added twice,
through ip_tunnel_create() and ip_tunnel_update().

Because the second is unnecessary, so we can just break after adding tunnel
through ip_tunnel_create().

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 16:57:44 -04:00
Fabian Godehardt
d1c0b471b3 net/dsa/dsa.c: increment chip_index during of_node handling on dsa_of_probe()
Adding more than one chip on device-tree currently causes the probing
routine to always use the first chips data pointer.

Signed-off-by: Fabian Godehardt <fg@emlix.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 16:56:33 -04:00
Lorenzo Colitti
2e47b29195 net: ipv6: make "ip -6 route get mark xyz" work.
Currently, "ip -6 route get mark xyz" ignores the mark passed in
by userspace. Make it honour the mark, just like IPv4 does.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-16 16:50:30 -04:00
Monam Agarwal
944df8ae84 net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

The rcu_assign_pointer() ensures that the initialization of a structure
is carried out before storing a pointer to that structure.
And in the case of the NULL pointer, there is no structure to initialize.
So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)

Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-05-16 13:40:29 -07:00
Jarno Rajahalme
88d73f6c41 openvswitch: Use TCP flags in the flow key for stats.
We already extract the TCP flags for the key, might as well use that
for stats.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-05-16 13:40:29 -07:00
Jarno Rajahalme
d92ab13558 openvswitch: Fix output of SCTP mask.
The 'output' argument of the ovs_nla_put_flow() is the one from which
the bits are written to the netlink attributes.  For SCTP we
accidentally used the bits from the 'swkey' instead.  This caused the
mask attributes to include the bits from the actual flow key instead
of the mask.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-05-16 13:40:29 -07:00
Jarno Rajahalme
63e7959c4b openvswitch: Per NUMA node flow stats.
Keep kernel flow stats for each NUMA node rather than each (logical)
CPU.  This avoids using the per-CPU allocator and removes most of the
kernel-side OVS locking overhead otherwise on the top of perf reports
and allows OVS to scale better with higher number of threads.

With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
rate doubles on a server with two hyper-threaded physical CPUs (16
logical cores each) compared to the current OVS master.  Tested with
non-trivial flow table with a TCP port match rule forcing all new
connections with unique port numbers to OVS userspace.  The IP
addresses are still wildcarded, so the kernel flows are not considered
as exact match 5-tuple flows.  This type of flows can be expected to
appear in large numbers as the result of more effective wildcarding
made possible by improvements in OVS userspace flow classifier.

Perf results for this test (master):

Events: 305K cycles
+   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
+   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
+   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
+   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
+   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
...

And after this patch:

Events: 356K cycles
+   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
+   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
+   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
+   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
+   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
+   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
+   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
+   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
+   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
+   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
+   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
+   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
+   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
+   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
...

There is a small increase in kernel spinlock overhead due to the same
spinlock being shared between multiple cores of the same physical CPU,
but that is barely visible in the netperf TCP_CRR test performance
(maybe ~1% performance drop, hard to tell exactly due to variance in
the test results), when testing for kernel module throughput (with no
userspace activity, handful of kernel flows).

On flow setup, a single stats instance is allocated (for the NUMA node
0).  As CPUs from multiple NUMA nodes start updating stats, new
NUMA-node specific stats instances are allocated.  This allocation on
the packet processing code path is made to never block or look for
emergency memory pools, minimizing the allocation latency.  If the
allocation fails, the existing preallocated stats instance is used.
Also, if only CPUs from one NUMA-node are updating the preallocated
stats instance, no additional stats instances are allocated.  This
eliminates the need to pre-allocate stats instances that will not be
used, also relieving the stats reader from the burden of reading stats
that are never used.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-05-16 13:40:29 -07:00
Jarno Rajahalme
23dabf88ab openvswitch: Remove 5-tuple optimization.
The 5-tuple optimization becomes unnecessary with a later per-NUMA
node stats patch.  Remove it first to make the changes easier to
grasp.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-05-16 13:40:29 -07:00