Commit Graph

31582 Commits

Author SHA1 Message Date
Dan Carpenter
08d4d217df rxrpc: out of bound read in debug code
Smatch complains because we are using an untrusted index into the
rxrpc_acks[] array.  It's just a read and it's only in the debug code,
but it's simple enough to add a check and fix it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:02:52 -08:00
Yegor Yefremov
2fa053a0a2 8021q: update description
Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add
some beautifications like replacing 'ethernet' with 'Ethernet' and
removing unneeded spaces.

Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 17:01:25 -08:00
Hannes Frederic Sowa
82b276cd2b ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts
Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:59:19 -08:00
FX Le Bail
446fab5933 ipv6: enable anycast addresses as source addresses in ICMPv6 error messages
- Uses ipv6_anycast_destination() in icmp6_send().

Suggested-by: Bill Fink <billfink@mindspring.com>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:53:23 -08:00
Peter Pan(潘卫平)
4d83e17730 tcp: delete redundant calls of tcp_mtup_init()
As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen
sock, we can delete the redundant calls of tcp_mtup_init() in
tcp_{v4,v6}_syn_recv_sock().

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:52:31 -08:00
Daniel Borkmann
f0d4eb29d1 packet: fix a couple of cppcheck warnings
Doesn't bring much, but also doesn't hurt us to fix 'em:

1) In tpacket_rcv() flush dcache page we can restirct the scope
   for start and end and remove one layer of indent.

2) In tpacket_destruct_skb() we can restirct the scope for ph.

3) In alloc_one_pg_vec_page() we can remove the NULL assignment
   and change spacing a bit.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:51:42 -08:00
Duan Jiong
967680e027 ipv4: remove the useless argument from ip_tunnel_hash()
Since commit c544193214("GRE: Refactor GRE tunneling code")
introduced function ip_tunnel_hash(), the argument itn is no
longer in use, so remove it.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:48:18 -08:00
Sabrina Dubroca
f14fe8a848 net: remove unnecessary initializations in net_dev_init
softnet_data is already set to 0, no need to use memset or initialize
specific fields to 0 or NULL afterwards.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 16:44:45 -08:00
WANG Cong
6e6a50c254 net_sched: act: export tcf_hash_search() instead of tcf_hash_lookup()
So that we will not expose struct tcf_common to modules.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
WANG Cong
c779f7af99 net_sched: act: fetch hinfo from a->ops->hinfo
Every action ops has a pointer to hash info, so we don't need to
hard-code it in each module.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 14:43:16 -08:00
Linus Torvalds
8e50966072 GPIO tree bulk changes for v3.14
A big set this merge window, as we have much going on in
 this subsystem. Major changes this time:
 
 - Some core improvements and cleanups to the new GPIO
   descriptor API. This seems to be working now so we can
   start the exodus to this API, moving gradually away from
   the global GPIO numberspace.
 
 - Incremental improvements to the ACPI GPIO core, and move
   the few GPIO ACPI clients we have to the GPIO descriptor
   API right *now* before we go any further. We actually
   managed to contain this *before* we started to litter
   the kernel with yet another hackish global numberspace for
   the ACPI GPIOs, which is a big win.
 
 - The RFkill GPIO driver and all platforms using it have
   been migrated to use the GPIO descriptors rather than
   fixed number assignments. Tegra machine has been migrated
   as part of this.
 
 - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x.
   Those should be really good examples of how I expect a
   nice GPIO driver to look these days.
 
 - Do away with custom GPIO implementations on a major
   part of the ARM machines: ks8695, lpc32xx, mv78xx0.
   Make a first step towards the same in the horribly
   convoluted Samsung S3C include forest. We expect to
   continue to clean this up as we move forward.
 
 - Flag GPIO lines used for IRQ on adnp, bcm-kona, em,
   intel-mid and lynxpoint.
   This makes the GPIOlib core aware that a certain GPIO line
   is used for IRQs and can then enforce some semantics such
   as disallowing a GPIO line marked as in use for IRQ to be
   switched to output mode.
 
 - Drop all use of irq_set_chip_and_handler_name().
   The name provided in these cases were just unhelpful
   tags like "mux" or "demux".
 
 - Extend the MCP23s08 driver to handle interrupts.
 
 - Minor incremental improvements for rcar, lynxpoint, em
   74x164 and msm drivers.
 
 - Some non-urgent bug fixes here and there, duplicate
   #includes and that usual kind of cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS3i/MAAoJEEEQszewGV1zVB8P/Rjzgx8To0gQPn49M4u/A1Mk
 mAzpUoKa05ILTKBm/bpWPYZPpg9PDqUxOYPsIDEAkc70BKMPTXxrYiE+LSfIzwaJ
 a8IRwOzNL7Iwc+zPNS/GrmRJyxymb4lmMD/fypk/YaumZ6j4Hbo+9R8Zct9gbZ5Q
 ZbKtz6kLhbkbNCc71bVMgk6yacSBx1ak8Xpd12HlW85NgOCoBj7/DI1Lb61x1ImY
 NYpSpmtfGGTkQLtBl5dTLefZOvL1dKSct9TMOsA2jzNqf3zA1YA6XOxPGHK/qtjq
 3s9cN1sIVF/g7sm1+qoKXe0OTQrXHT7SX8BH9/tb3MiKO8ItactlQUJlYNR3WFSN
 zm1PNe5zWr+GWzV0iUrqoMN4XX8nThiFDOxZpOwBTZcUD6qtDFIZp41M3qxwFTbJ
 hCtSQ8gUO1Ce+xtOQYYOwEkRS7FZa1Z+p/lendTFuGDh6DcXy97SrKkTktM4Q98B
 LhqrwUzCdES0ecNDi2+P5y4Fc7M0cMMn9SnFvbSBObLB89TF9uzMIn8jUBCZMvrM
 eAeZlRBYk8F+6F12higaWqZyiBKIEubXo/Z8T0L2KEDm/z/ddJvhQgBKvWlf3rqi
 RToD446rda+RhFBnxLZ3mTui5nZ2WyKTOqhVqeBuriJhE/cTUaQHUBUrbOwx20kE
 Xb9mQ2n3GRk2157n1CLY
 =lW2i
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO tree bulk changes from Linus Walleij:
 "A big set this merge window, as we have much going on in this
  subsystem.  The changes to other subsystems (notably a slew of ARM
  machines as I am doing away with their custom APIs) have all been
  ACKed to the extent possible.

  Major changes this time:

   - Some core improvements and cleanups to the new GPIO descriptor API.
     This seems to be working now so we can start the exodus to this
     API, moving gradually away from the global GPIO numberspace.

   - Incremental improvements to the ACPI GPIO core, and move the few
     GPIO ACPI clients we have to the GPIO descriptor API right *now*
     before we go any further.  We actually managed to contain this
     *before* we started to litter the kernel with yet another hackish
     global numberspace for the ACPI GPIOs, which is a big win.

   - The RFkill GPIO driver and all platforms using it have been
     migrated to use the GPIO descriptors rather than fixed number
     assignments.  Tegra machine has been migrated as part of this.

   - New drivers for MOXA ART, Xtensa GPIO32 and SMSC SCH311x.  Those
     should be really good examples of how I expect a nice GPIO driver
     to look these days.

   - Do away with custom GPIO implementations on a major part of the ARM
     machines: ks8695, lpc32xx, mv78xx0.  Make a first step towards the
     same in the horribly convoluted Samsung S3C include forest.  We
     expect to continue to clean this up as we move forward.

   - Flag GPIO lines used for IRQ on adnp, bcm-kona, em, intel-mid and
     lynxpoint.

     This makes the GPIOlib core aware that a certain GPIO line is used
     for IRQs and can then enforce some semantics such as disallowing a
     GPIO line marked as in use for IRQ to be switched to output mode.

   - Drop all use of irq_set_chip_and_handler_name().  The name provided
     in these cases were just unhelpful tags like "mux" or "demux".

   - Extend the MCP23s08 driver to handle interrupts.

   - Minor incremental improvements for rcar, lynxpoint, em 74x164 and
     msm drivers.

   - Some non-urgent bug fixes here and there, duplicate #includes and
     that usual kind of cleanups"

Fix up broken Kconfig file manually to make this all compile.

* tag 'gpio-v3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (71 commits)
  gpio: mcp23s08: fix casting caused build warning
  gpio: mcp23s08: depend on OF_GPIO
  gpio: mcp23s08: Add irq functionality for i2c chips
  ARM: S5P[v210|c100|64x0]: Fix build error
  gpio: pxa: clamp gpio get value to [0,1]
  ARM: s3c24xx: explicit dependency on <plat/gpio-cfg.h>
  ARM: S3C[24|64]xx: move includes back under <mach/> scope
  Documentation / ACPI: update to GPIO descriptor API
  gpio / ACPI: get rid of acpi_gpio.h
  gpio / ACPI: register to ACPI events automatically
  mmc: sdhci-acpi: convert to use GPIO descriptor API
  ARM: s3c24xx: fix build error
  gpio: f7188x: set can_sleep attribute
  gpio: samsung: Update documentation
  gpio: samsung: Remove hardware.h inclusion
  gpio: xtensa: depend on HAVE_XTENSA_GPIO32
  gpio: clps711x: Enable driver compilation with COMPILE_TEST
  gpio: clps711x: Use of_match_ptr()
  net: rfkill: gpio: convert to descriptor-based GPIO interface
  leds: s3c24xx: Fix build failure
  ...
2014-01-21 10:09:12 -08:00
Linus Torvalds
a0fa1dd3cd Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:

 - Add the initial implementation of SCHED_DEADLINE support: a real-time
   scheduling policy where tasks that meet their deadlines and
   periodically execute their instances in less than their runtime quota
   see real-time scheduling and won't miss any of their deadlines.
   Tasks that go over their quota get delayed (Available to privileged
   users for now)

 - Clean up and fix preempt_enable_no_resched() abuse all around the
   tree

 - Do sched_clock() performance optimizations on x86 and elsewhere

 - Fix and improve auto-NUMA balancing

 - Fix and clean up the idle loop

 - Apply various cleanups and fixes

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
  sched: Fix __sched_setscheduler() nice test
  sched: Move SCHED_RESET_ON_FORK into attr::sched_flags
  sched: Fix up attr::sched_priority warning
  sched: Fix up scheduler syscall LTP fails
  sched: Preserve the nice level over sched_setscheduler() and sched_setparam() calls
  sched/core: Fix htmldocs warnings
  sched/deadline: No need to check p if dl_se is valid
  sched/deadline: Remove unused variables
  sched/deadline: Fix sparse static warnings
  m68k: Fix build warning in mac_via.h
  sched, thermal: Clean up preempt_enable_no_resched() abuse
  sched, net: Fixup busy_loop_us_clock()
  sched, net: Clean up preempt_enable_no_resched() abuse
  sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
  sched/preempt, locking: Rework local_bh_{dis,en}able()
  sched/clock, x86: Avoid a runtime condition in native_sched_clock()
  sched/clock: Fix up clear_sched_clock_stable()
  sched/clock, x86: Use a static_key for sched_clock_stable
  sched/clock: Remove local_irq_disable() from the clocks
  sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs
  ...
2014-01-20 10:42:08 -08:00
Weilong Chen
82ef3d5d5f net: fix "queues" uevent between network namespaces
When I create a new namespace with 'ip netns add net0', or add/remove
new links in a namespace with 'ip link add/delete type veth', rx/tx
queues events can be got in all namespaces. That is because rx/tx queue
ktypes do not have namespace support, and their kobj parents are setted to
NULL. This patch is to fix it.

Reported-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Libo Chen <chenlibo@huawei.com>
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 20:02:02 -08:00
WANG Cong
671314a5ab net_sched: act: remove capab from struct tc_action_ops
It is not actually implemented.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:58:07 -08:00
Jason Wang
9d08dd3d32 net: document accel_priv parameter for __dev_queue_xmit()
To silent "make htmldocs" warning.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:51 -08:00
Hannes Frederic Sowa
602582ca7a ipv6: optimize link local address search
ipv6_link_dev_addr sorts newly added addresses by scope in
ifp->addr_list. Smaller scope addresses are added to the tail of the
list. Use this fact to iterate in reverse over addr_list and break out
as soon as a higher scoped one showes up, so we can spare some cycles
on machines with lot's of addresses.

The ordering of the addresses is not relevant and we are more likely to
get the eui64 generated address with this change anyway.

Suggested-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:55:50 -08:00
Hannes Frederic Sowa
4b261c75a9 ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams
We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.

Reported-by: Gert Doering <gert@space.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 19:53:18 -08:00
Yang Yingliang
a6e2fe17eb sch_netem: replace magic numbers with enumerate
Replace some magic numbers which describe states of 4-state model
loss generator with enumerate.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:17:34 -08:00
Florent Fourcot
6444f72b4b ipv6: add flowlabel_consistency sysctl
With the introduction of IPV6_FL_F_REFLECT, there is no guarantee of
flow label unicity. This patch introduces a new sysctl to protect the old
behaviour, enable by default.

Changelog of V3:
 * rename ip6_flowlabel_consistency to flowlabel_consistency
 * use net_info_ratelimited()
 * checkpatch cleanups

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
46e5f40176 ipv6: add a flag to get the flow label used remotly
This information is already available via IPV6_FLOWINFO
of IPV6_2292PKTOPTIONS, and them a filtering to get the flow label
information. But it is probably logical and easier for users to add this
here, and to control both sent/received flow label values with the
IPV6_FLOWLABEL_MGR option.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Florent Fourcot
df3687ffc6 ipv6: add the IPV6_FL_F_REFLECT flag to IPV6_FL_A_GET
With this option, the socket will reply with the flow label value read
on received packets.

The goal is to have a connection with the same flow label in both
direction of the communication.

Changelog of V4:
 * Do not erase the flow label on the listening socket. Use pktopts to
 store the received value

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-19 17:12:31 -08:00
Eric Dumazet
3acfa1e73c ipv4: be friend with drop monitor
Replace some dev_kfree_skb() with kfree_skb() calls when
we drop one skb, this might help bug tracking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:08:02 -08:00
Steffen Hurrle
342dfc306f net: add build-time checks for msg->msg_name size
This is a follow-up patch to f3d3342602 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.

Signed-off-by: Steffen Hurrle <steffen@hurrle.net>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 23:04:16 -08:00
Michal Sekletar
ea02f9411d net: introduce SO_BPF_EXTENSIONS
For user space packet capturing libraries such as libpcap, there's
currently only one way to check which BPF extensions are supported
by the kernel, that is, commit aa1113d9f8 ("net: filter: return
-EINVAL if BPF_S_ANC* operation is not supported"). For querying all
extensions at once this might be rather inconvenient.

Therefore, this patch introduces a new option which can be used as
an argument for getsockopt(), and allows one to obtain information
about which BPF extensions are supported by the current kernel.

As David Miller suggests, we do not need to define any bits right
now and status quo can just return 0 in order to state that this
versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
additions to BPF extensions need to add their bits to the
bpf_tell_extensions() function, as documented in the comment.

Signed-off-by: Michal Sekletar <msekleta@redhat.com>
Cc: David Miller <davem@davemloft.net>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 19:08:58 -08:00
David S. Miller
4180442058 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	net/ipv4/tcp_metrics.c

Overlapping changes between the "don't create two tcp metrics objects
with the same key" race fix in net and the addition of the destination
address in the lookup key in net-next.

Minor overlapping changes in bnx2x driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-18 00:55:41 -08:00
Stephen Warren
c91514972b Bluetooth: remove direct compilation of 6lowpan_iphc.c
It's now built as a separate utility module, and enabling BT selects
that module in Kconfig. This fixes:

net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): multiple definition of `__ksymtab_lowpan_process_data'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_process_data+0x0): first defined here
net/ieee802154/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): multiple definition of `__ksymtab_lowpan_header_compress'
net/bluetooth/built-in.o:(___ksymtab_gpl+lowpan_header_compress+0x0): first defined here
net/ieee802154/built-in.o: In function `lowpan_header_compress':
net/ieee802154/6lowpan_iphc.c:606: multiple definition of `lowpan_header_compress'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:606: first defined here
net/ieee802154/built-in.o: In function `lowpan_process_data':
net/ieee802154/6lowpan_iphc.c:344: multiple definition of `lowpan_process_data'
net/bluetooth/built-in.o:/home/swarren/shared/git_wa/kernel/kernel.git/net/bluetooth/../ieee802154/6lowpan_iphc.c:344: first defined here
make[1]: *** [net/built-in.o] Error 1

(this change probably simply wasn't "git add"d to a53d34c346)

Fixes: a53d34c346 ("net: move 6lowpan compression code to separate module")
Fixes: 18722c2470 ("Bluetooth: Enable 6LoWPAN support for BT LE devices")
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 19:13:49 -08:00
sfeldma@cumulusnetworks.com
1d3ee88ae0 bonding: add netlink attributes to slave link dev
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest.  Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes.  Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:51:58 -08:00
Eric Dumazet
6c7e7610ff ipv4: fix a dst leak in tunnels
This patch :

1) Remove a dst leak if DST_NOCACHE was set on dst
   Fix this by holding a reference only if dst really cached.

2) Remove a lockdep warning in __tunnel_dst_set()
    This was reported by Cong Wang.

3) Remove usage of a spinlock where xchg() is enough

4) Remove some spurious inline keywords.
   Let compiler decide for us.

Fixes: 7d442fab0a ("ipv4: Cache dst in tunnels")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:36:39 -08:00
Flavio Leitner
6a7cc41872 ipv6: send Change Status Report after DAD is completed
The RFC 3810 defines two type of messages for multicast
listeners. The "Current State Report" message, as the name
implies, refreshes the *current* state to the querier.
Since the querier sends Query messages periodically, there
is no need to retransmit the report.

On the other hand, any change should be reported immediately
using "State Change Report" messages. Since it's an event
triggered by a change and that it can be affected by packet
loss, the rfc states it should be retransmitted [RobVar] times
to make sure routers will receive timely.

Currently, we are sending "Current State Reports" after
DAD is completed.  Before that, we send messages using
unspecified address (::) which should be silently discarded
by routers.

This patch changes to send "State Change Report" messages
after DAD is completed fixing the behavior to be RFC compliant
and also to pass TAHI IPv6 testsuite.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:12:29 -08:00
Hannes Frederic Sowa
11ffff752c ipv6: simplify detection of first operational link-local address on interface
In commit 1ec047eb47 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.

Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.

Fixes: 1ec047eb47 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:10:01 -08:00
Christoph Paasch
77f99ad16a tcp: metrics: Avoid duplicate entries with the same destination-IP
Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.

Fixes: 51c5d0c4b1 (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:05:34 -08:00
Florent Fourcot
1d13a96c74 ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT
This patch is following the commit b903d324be (ipv6: tcp: fix TCLASS
value in ACK messages sent from TIME_WAIT).

For the same reason than tclass, we have to store the flow label in the
inet_timewait_sock to provide consistency of flow label on the last ACK.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:56:33 -08:00
Gerald Schaefer
c196403b79 net: rds: fix per-cpu helper usage
commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.

Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 17:52:22 -08:00
John W. Linville
7916a07557 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-17 14:43:17 -05:00
Michael Dalton
a953be53ce net-sysfs: add support for device-specific rx queue sysfs attributes
Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Michael Dalton
097b4f19e5 net: allow > 0 order atomic page alloc in skb_page_frag_refill
skb_page_frag_refill currently permits only order-0 page allocs
unless GFP_WAIT is used. Change skb_page_frag_refill to attempt
higher-order page allocations whether or not GFP_WAIT is used. If
memory cannot be allocated, the allocator will fall back to
successively smaller page allocs (down to order-0 page allocs).

This change brings skb_page_frag_refill in line with the existing
page allocation strategy employed by netdev_alloc_frag, which attempts
higher-order page allocations whether or not GFP_WAIT is set, falling
back to successively lower-order page allocations on failure. Part
of migration of virtio-net to per-receive queue page frag allocators.

Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Wei Yongjun
722e47d792 net_sched: fix error return code in fw_change_attrs()
The error code was not set if change indev fail, so the error
condition wasn't reflected in the return value. Fix to return a
negative error code from this error handling case instead of 0.

Fixes: 2519a602c2 ('net_sched: optimize tcf_match_indev()')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:12:03 -08:00
Ying Xue
9bbb4ecc68 tipc: standardize recvmsg routine
Standardize the behaviour of waiting for events in TIPC recvmsg()
so that all variables of socket or port structures are protected
within socket lock, allowing the process of calling recvmsg() to
be woken up at appropriate time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
391a6dd1da tipc: standardize sendmsg routine of connected socket
Standardize the behaviour of waiting for events in TIPC send_packet()
so that all variables of socket or port structures are protected within
socket lock, allowing the process of calling sendmsg() to be woken up
at appropriate time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
3f40504f7e tipc: standardize sendmsg routine of connectionless socket
Comparing the behaviour of how to wait for events in TIPC sendmsg()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, sk_sleep()
and tport->congested variables associated with socket are exposed
without socket lock protection while wait_event_interruptible_timeout()
accesses them. So standardizing it with similar implementation
in other stacks can help us correct these errors which the process
of calling sendmsg() cannot be woken up event if an expected event
arrive at socket or improperly woken up although the wake condition
doesn't match.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
6398e23cdb tipc: standardize accept routine
Comparing the behaviour of how to wait for events in TIPC accept()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. As sk_sleep() and
sk->sk_receive_queue variables associated with socket are not
protected by socket lock, the process of calling accept() may be
woken up improperly or sometimes cannot be woken up at all. After
standardizing it with inet_csk_wait_for_connect routine, we can
get benefits including: avoiding 'thundering herd' phenomenon,
adding a timeout mechanism for accept(), coping with a pending
signal, and having sk_sleep() and sk->sk_receive_queue being
always protected within socket lock scope and so on.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
Ying Xue
78eb3a5379 tipc: standardize connect routine
Comparing the behaviour of how to wait for events in TIPC connect()
with other stacks, the TIPC implementation might be perceived as
different, and sometimes even incorrect. For instance, as both
sock->state and sk_sleep() are directly fed to
wait_event_interruptible_timeout() as its arguments, and socket lock
has to be released before we call wait_event_interruptible_timeout(),
the two variables associated with socket are exposed out of socket
lock protection, thereby probably getting stale values so that the
process of calling connect() cannot be woken up exactly even if
correct event arrives or it is woken up improperly even if the wake
condition is not satisfied in practice. Therefore, standardizing its
behaviour with sk_stream_wait_connect routine can avoid these risks.

Additionally the implementation of connect routine is simplified as a
whole, allowing it to return correct values in all different cases.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 19:10:34 -08:00
wangweidong
abfce3ef58 sctp: remove the unnecessary assignment
When go the right path, the status is 0, no need to assign it again.
So just remove the assignment.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:31:49 -08:00
WANG Cong
6c80563c2f net_sched: act: pick a different type for act_xt
In tcf_register_action() we check either ->type or ->kind to see if
there is an existing action registered, but ipt action registers two
actions with same type but different kinds. They should have different
types too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:24:11 -08:00
David S. Miller
7dff08bbda Included change:
- properly format already existing kerneldoc
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS1xdYAAoJEEKTMo6mOh1VNv0P/R73udErvBaHTUIAVSJ3pH8/
 rQ673H+/ZDi6zJQNlNe5UHLu5/IrK/ARCz5Ldl71lNpNHjRNGloSmjIUsdOZY01q
 D8ZR9YdWxpVcTCNSKxGnDrhVyShM3bN70h2KuYWv01/IsTGJKvkkhnG0SXqJqKYm
 XBRrEyVnyn52mQM7BhWSkCrX+wSb8OyEtie4Gdn8LjKJ0i2LYeQZ7vnXSR4aORXt
 pqYe7O+cwWZoj7Vpqea8Ye7srt6BEATEGwuezcQYH+FvaIEepORHWsqqZm6z/o91
 wyjjoPcFVyKoEUASdrkC0Hq9eqoaRBVo3yZ/AWlFJ06O/eVhTNKIBN6HykmPJ+vW
 9QESaHjZ8Gackz6ehU86h7mu6Mtrbfk73BZfG23SVL6cZltXazE539v0Pv8Ge+P/
 2NKdDjRp1vCHViRLNCwX20VmZFg7bHVDHgYV0gBH/VgZAihhRhSQk9LEEhk1fbij
 UQjBMtRnWOoxNHi311Khsxqm+0X82/mBN2mw5VdV5S9ZnBXbbCNRAuxKlZWiZNGV
 MJ2oPUrzs4RIlhKuCc1Ckjr2n7xtbU/j05ASZ4iJgbDkulkGEniDBL+K1bOFIPJl
 Kb0T4F/0ucz2IcbeDijG8Xe+5XZmaFNbDQ0ooBiHrOAEVM9vqsiQ76P5uetWPnWr
 Gop6D4HE4fCGjWcONp1u
 =1KKs
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- properly format already existing kerneldoc

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:22:58 -08:00
WANG Cong
fb1d598d48 net_sched: act: use tcf_hash_release() in net/sched/act_police.c
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:21:55 -08:00
David S. Miller
8c12ec7411 Included change:
- properly compute the batman-adv header overhead. Such
   result is later used to initialize the hard_header_len
   member of the soft-interface netdev object
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS1xUQAAoJEEKTMo6mOh1VStwP/RsjepUZEgjIXs/SrKC0Pkdk
 UCnznIyn8b7sLIQokimNESDxADbTphyIBjMZC2XI87udVIRHD3v1NlHWRUzWYvqq
 KvYqxuQTfbRnHPwzWiuOzQb9tGJxF+cG4CX8E4ydKqm8LmrNgMX+hL3gq50t5FHU
 jtFOiTcqjrZTPcXsdtOyoH9eYDv+9TTqgeetCF4iOZ7G+W/3d4h/EGAaWiCmSjvP
 4vdxRFokCdr0OkU/a790SbtHNMWcjCxFCtRSoK9cZFrRjCE+ersOg1g0vWA100Vi
 O0yRkZWqODnTu12skhcA2aojFENiT9CFFvZENMpfUxJKnSortFebpKXo+vyW19gC
 H6uYR1/KLk00pFOzuKu27hQBvARiKwwkuubeSNthas9jadDSO5ZgZGrDeE4U3GNZ
 KIkHunf2HVS2zpylRdnb9A9B/mkO+iW5y6rVtDdXTkrzyhmqTpJPeJqi8j1S10Zl
 oyoJIpS+wN7Bslf3fo0f8Pag7cXT28GD4/iPmCkJDpc1GGz7A/HJF/Vpx06L8BWB
 pl9TFkxZutBOpCjsFCJ+DvRnmgjolGVqyyyDhizeBVrKMvDW/oJ1PHR6Tx5S3S35
 QG3ZNmWIaiZ73bJxzJAY5KxNoehGbPzITEBdTlpo2AMuP1+XFadDF+lquqmeAJSQ
 TAI0Paz0VP/RRXxhksKZ
 =opzN
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included change:
- properly compute the batman-adv header overhead. Such
  result is later used to initialize the hard_header_len
  member of the soft-interface netdev object

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:16:43 -08:00
Veaceslav Falico
1d486bfb66 net: add NETDEV_PRECHANGEMTU to notify before mtu change happens
Currently, if a device changes its mtu, first the change happens (invloving
all the side effects), and after that the NETDEV_CHANGEMTU is sent so that
other devices can catch up with the new mtu. However, if they return
NOTIFY_BAD, then the change is reverted and error returned.

This is a really long and costy operation (sometimes). To fix this, add
NETDEV_PRECHANGEMTU notification which is called prior to any change
actually happening, and if any callee returns NOTIFY_BAD - the change is
aborted. This way we're skipping all the playing with apply/revert the mtu.

CC: "David S. Miller" <davem@davemloft.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:15:41 -08:00
Tom Herbert
0b4cec8c2e net: Check skb->rxhash in gro_receive
When initializing a gro_list for a packet, first check the rxhash of
the incoming skb against that of the skb's in the list. This should be
a very strong inidicator of whether the flow is going to be matched,
and potentially allows a lot of other checks to be short circuited.
Use skb_hash_raw so that we don't force the hash to be calculated.

Tested by running netperf 200 TCP_STREAMs between two machines with
GRO, HW rxhash, and 1G. Saw no performance degration, slight reduction
of time in dev_gro_receive.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:22:54 -08:00
Daniel Borkmann
b013840810 packet: use percpu mmap tx frame pending refcount
In PF_PACKET's packet mmap(), we can avoid using one atomic_inc()
and one atomic_dec() call in skb destructor and use a percpu
reference count instead in order to determine if packets are
still pending to be sent out. Micro-benchmark with [1] that has
been slightly modified (that is, protcol = 0 in socket(2) and
bind(2)), example on a rather crappy testing machine; I expect
it to scale and have even better results on bigger machines:

./packet_mm_tx -s7000 -m7200 -z700000 em1, avg over 2500 runs:

With patch:    4,022,015 cyc
Without patch: 4,812,994 cyc

time ./packet_mm_tx -s64 -c10000000 em1 > /dev/null, stable:

With patch:
  real         1m32.241s
  user         0m0.287s
  sys          1m29.316s

Without patch:
  real         1m38.386s
  user         0m0.265s
  sys          1m35.572s

In function tpacket_snd(), it is okay to use packet_read_pending()
since in fast-path we short-circuit the condition already with
ph != NULL, since we have next frames to process. In case we have
MSG_DONTWAIT, we also do not execute this path as need_wait is
false here anyway, and in case of _no_ MSG_DONTWAIT flag, it is
okay to call a packet_read_pending(), because when we ever reach
that path, we're done processing outgoing frames anyway and only
look if there are skbs still outstanding to be orphaned. We can
stay lockless in this percpu counter since it's acceptable when we
reach this path for the sum to be imprecise first, but we'll level
out at 0 after all pending frames have reached the skb destructor
eventually through tx reclaim. When people pin a tx process to
particular CPUs, we expect overflows to happen in the reference
counter as on one CPU we expect heavy increase; and distributed
through ksoftirqd on all CPUs a decrease, for example. As
David Laight points out, since the C language doesn't define the
result of signed int overflow (i.e. rather than wrap, it is
allowed to saturate as a possible outcome), we have to use
unsigned int as reference count. The sum over all CPUs when tx
is complete will result in 0 again.

The BUG_ON() in tpacket_destruct_skb() we can remove as well. It
can _only_ be set from inside tpacket_snd() path and we made sure
to increase tx_ring.pending in any case before we called po->xmit(skb).
So testing for tx_ring.pending == 0 is not too useful. Instead, it
would rather have been useful to test if lower layers didn't orphan
the skb so that we're missing ring slots being put back to
TP_STATUS_AVAILABLE. But such a bug will be caught in user space
already as we end up realizing that we do not have any
TP_STATUS_AVAILABLE slots left anymore. Therefore, we're all set.

Btw, in case of RX_RING path, we do not make use of the pending
member, therefore we also don't need to use up any percpu memory
here. Also note that __alloc_percpu() already returns a zero-filled
percpu area, so initialization is done already.

  [1] http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:12 -08:00
Daniel Borkmann
87a2fd286a packet: don't unconditionally schedule() in case of MSG_DONTWAIT
In tpacket_snd(), when we've discovered a first frame that is
not in status TP_STATUS_SEND_REQUEST, and return a NULL buffer,
we exit the send routine in case of MSG_DONTWAIT, since we've
finished traversing the mmaped send ring buffer and don't care
about pending frames.

While doing so, we still unconditionally call an expensive
schedule() in the packet_current_frame() "error" path, which
is unnecessary in this case since it's enough to just quit
the function.

Also, in case MSG_DONTWAIT is not set, we should rather test
for need_resched() first and do schedule() only if necessary
since meanwhile pending frames could already have finished
processing and called skb destructor.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:11 -08:00
Daniel Borkmann
902fefb82e packet: improve socket create/bind latency in some cases
Most people acquire PF_PACKET sockets with a protocol argument in
the socket call, e.g. libpcap does so with htons(ETH_P_ALL) for
all its sockets. Most likely, at some point in time a subsequent
bind() call will follow, e.g. in libpcap with ...

  memset(&sll, 0, sizeof(sll));
  sll.sll_family          = AF_PACKET;
  sll.sll_ifindex         = ifindex;
  sll.sll_protocol        = htons(ETH_P_ALL);

... as arguments. What happens in the kernel is that already
in socket() syscall, we install a proto hook via register_prot_hook()
if our protocol argument is != 0. Yet, in bind() we're almost
doing the same work by doing a unregister_prot_hook() with an
expensive synchronize_net() call in case during socket() the proto
was != 0, plus follow-up register_prot_hook() with a bound device
to it this time, in order to limit traffic we get.

In the case when the protocol and user supplied device index (== 0)
does not change from socket() to bind(), we can spare us doing
the same work twice. Similarly for re-binding to the same device
and protocol. For these scenarios, we can decrease create/bind
latency from ~7447us (sock-bind-2 case) to ~89us (sock-bind-1 case)
with this patch.

Alternatively, for the first case, if people care, they should
simply create their sockets with proto == 0 argument and define
the protocol during bind() as this saves a call to synchronize_net()
as well (sock-bind-3 case).

In all other cases, we're tied to user space behaviour we must not
change, also since a bind() is not strictly required. Thus, we need
the synchronize_net() to make sure no asynchronous packet processing
paths still refer to the previous elements of po->prot_hook.

In case of mmap()ed sockets, the workflow that includes bind() is
socket() -> setsockopt(<ring>) -> bind(). In that case, a pair of
{__unregister, register}_prot_hook is being called from setsockopt()
in order to install the new protocol receive handler. Thus, when
we call bind and can skip a re-hook, we have already previously
installed the new handler. For fanout, this is handled different
entirely, so we should be good.

Timings on an i7-3520M machine:

  * sock-bind-1:   89 us
  * sock-bind-2: 7447 us
  * sock-bind-3:   75 us

sock-bind-1:
  socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=all(0),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

sock-bind-2:
  socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP)) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

sock-bind-3:
  socket(PF_PACKET, SOCK_RAW, 0) = 3
  bind(3, {sa_family=AF_PACKET, proto=htons(ETH_P_IP), if=lo(1),
           pkttype=PACKET_HOST, addr(0)={0, }, 20) = 0

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:17:11 -08:00
Paul Gortmaker
cf17228372 net/ipv4: don't use module_init in non-modular gre_offload
Recent commit 438e38fadc
("gre_offload: statically build GRE offloading support") added
new module_init/module_exit calls to the gre_offload.c file.

The file is obj-y and can't be anything other than built-in.
Currently it can never be built modular, so using module_init
as an alias for __initcall can be somewhat misleading.

Fix this up now, so that we can relocate module_init from
init.h into module.h in the future.  If we don't do this, we'd
have to add module.h to obviously non-modular code, and that
would be a worse thing.  We also make the inclusion explicit.

Note that direct use of __initcall is discouraged, vs. one
of the priority categorized subgroups.  As __initcall gets
mapped onto device_initcall, our use of device_initcall
directly in this change means that the runtime impact is
zero -- it will remain at level 6 in initcall ordering.

As for the module_exit, rather than replace it with __exitcall,
we simply remove it, since it appears only UML does anything
with those, and even for UML, there is no relevant cleanup
to be done here.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 16:08:27 -08:00
Eric Dumazet
0864c15883 net: eth_type_trans() should use skb_header_pointer()
eth_type_trans() can read uninitialized memory as drivers
do not necessarily pull more than 14 bytes in skb->head before
calling it.

As David suggested, we can use skb_header_pointer() to
fix this without breaking some drivers that might not expect
eth_type_trans() pulling 2 additional bytes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 15:30:31 -08:00
David S. Miller
5ff1dd2416 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
This small batch contains several Netfilter fixes for your net-next
tree, more specifically:

* Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set,
  from Kristian Evensen.

* Add dependency to IPV6 for NF_TABLES_INET. This one has been reported
  by the several robots that are testing .config combinations, from Paul
  Gortmaker.

* Fix default base chain policy setting in nf_tables, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:44:54 -08:00
Jiri Pirko
89740ca74f neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions.
When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
not initialized yet. This might cause confusion for the people who use
NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
usage in ndo_neigh_setup.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 11:31:58 -08:00
Eric Dumazet
aee636c480 bpf: do not use reciprocal divide
At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:02:08 -08:00
Thomas Haller
5b84efecb7 ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE
Refactor the deletion/update of prefix routes when removing an
address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
present with this flag, to not cleanup the route. Instead, assume
that userspace is taking care of this route.

Also perform the same cleanup, when userspace changes an existing address
to add NOPREFIXROUTE (to an address that didn't have this flag). This is
done because when the address was added, a prefix route was created for it.
Since the user now wants to handle this route by himself, we cleanup this
route.

This cleanup of the route is not totally robust. There is no guarantee,
that the route we are about to delete was really the one added by the
kernel. This behavior does not change by the patch, and in practice it
should work just fine.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:00:40 -08:00
Thomas Haller
761aac737e ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
When adding/modifying an IPv6 address, the userspace application needs
a way to suppress adding a prefix route. This is for example relevant
together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf
generated addresses, but depending on on-link, no route for the
prefix should be added.

Signed-off-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 17:00:40 -08:00
David S. Miller
6631c5cea8 Revert "batman-adv: drop dependency against CRC16"
This reverts commit 12afc36e38.

The dependency is actually still necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 16:54:14 -08:00
wangweidong
0ea5e4df7b sctp: create helper function to enable|disable sackdelay
add sctp_spp_sackdelay_{enable|disable} helper function for
avoiding code duplication.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 16:47:48 -08:00
Li RongQing
d76ed22b22 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:53:18 -08:00
Dmitry Eremin-Solenikov
a53d34c346 net: move 6lowpan compression code to separate module
IEEE 802.15.4 and Bluetooth networking stacks share 6lowpan compression
code. Instead of introducing Makefile/Kconfig hacks, build this code as
a separate module referenced from both ieee802154 and bluetooth modules.

This fixes the following build error observed in some kernel
configurations:

net/built-in.o: In function `header_create': 6lowpan.c:(.text+0x166149): undefined reference to `lowpan_header_compress'
net/built-in.o: In function `bt_6lowpan_recv': (.text+0x166b3c): undefined reference to `lowpan_process_data'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Dmitry Eremin-Solenikov <dmitry_eremin@mentor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:36:38 -08:00
Veaceslav Falico
5bb025fae5 net: rename sysfs symlinks on device name change
Currently, we don't rename the upper/lower_ifc symlinks in
/sys/class/net/*/ , which might result stale/duplicate links/names.

Fix this by adding netdev_adjacent_rename_links(dev, oldname) which renames
all the upper/lower interface's links to dev from the upper/lower_oldname
to the new name.

We don't need a rollback because only we control these symlinks and if we
fail to rename them - sysfs will anyway complain.

Reported-by: Ding Tianhong <dingtianhong@huawei.com>
CC: Ding Tianhong <dingtianhong@huawei.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:16:20 -08:00
Veaceslav Falico
3ee3270756 net: add sysfs helpers for netdev_adjacent logic
They clean up the code a bit and can be used further.

CC: Ding Tianhong <dingtianhong@huawei.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:16:19 -08:00
Simon Wunderlich
1b371d1307 batman-adv: use consistent kerneldoc style
Reported-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-16 00:16:00 +01:00
Marek Lindner
1df0cbd509 batman-adv: fix batman-adv header overhead calculation
Batman-adv prepends a full ethernet header in addition to its own
header. This has to be reflected in the MTU calculation, especially
since the value is used to set dev->hard_header_len.

Introduced by 411d6ed93a
("batman-adv: consider network coding overhead when calculating required mtu")

Reported-by: cmsv <cmsv@wirelesspt.net>
Reported-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-15 23:54:20 +01:00
Jiri Pirko
3977458c9c neigh: split lines for NEIGH_VAR_SET so they are not too long
introduced by:
commit 1f9248e560
"neigh: convert parms to an array"

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 14:46:00 -08:00
Kristian Evensen
847c8e2959 netfilter: nft_ct: fix compilation warning if NF_CONNTRACK_MARK is not set
net/netfilter/nft_ct.c: In function 'nft_ct_set_eval':
net/netfilter/nft_ct.c:136:6: warning: unused variable 'value' [-Wunused-variable]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-15 11:00:14 +01:00
Eric Dumazet
b53c733600 tcp: do not export tcp_gso_segment() and tcp_gro_receive()
tcp_gso_segment() and tcp_gro_receive() no longer need to be
exported. IPv4 and IPv6 offloads are statically linked.

Note that tcp_gro_complete() is still used by bnx2x, unfortunately.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:53:48 -08:00
Ying Xue
7f2b8562c2 net: nl80211: __dev_get_by_index instead of dev_get_by_index to find interface
As __cfg80211_rdev_from_attrs(), nl80211_dump_wiphy_parse() and
nl80211_set_wiphy() are all under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used
to find interface handler in them allowing us to avoid to change
interface reference counter.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
5af28de353 can: use __dev_get_by_index instead of dev_get_by_index to find interface
As cgw_create_job() is always under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used to
find interface handler in it having us avoid to change interface
reference counter.

Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
a74e942694 caif: __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that chnl_net_open() is under
rtnl_lock protection as __dev_open() is protected by rtnl_lock.
So if __dev_get_by_index() instead of dev_get_by_index() is used
to find interface handler in it, this would help us avoid to change
interface reference counter.

__dev_open()
  chnl_net_open()

Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
16b77695ed batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that batadv_is_on_batman_iface()
is always under rtnl_lock protection as call_netdevice_notifier()
is protected by rtnl_lock. So if __dev_get_by_index() rather than
dev_get_by_index() is used to find interface handler in it, this
would help us avoid to change interface reference counter.

call_netdevice_notifier()
  batadv_hard_if_event()
    batadv_hardif_add_interface()
      batadv_is_valid_iface()
        batadv_is_on_batman_iface()

Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:47 -08:00
Ying Xue
d4c5fba2f6 decnet: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chain we can identify that dn_cache_getroute() is
protected under rtnl_lock. So if we use __dev_get_by_index() instead
of dev_get_by_index() to find interface handlers in it, this would help
us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
    netlink_rcv_skb()
      dn_cache_getroute()
  rtnl_unlock()

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:46 -08:00
Ying Xue
d9ac62be57 dcb: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that dcb_doit() is protected
under rtnl_lock. So if we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handlers in it, this would
help us avoid to change interface reference counter.

rtnetlink_rcv()
  rtnl_lock()
  netlink_rcv_skb()
    dcb_doit()
  rtnl_unlock()

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:50:46 -08:00
FX Le Bail
ec35b61ea5 IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.

Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Dan Carpenter
0e864b21e5 sctp: remove a redundant NULL check
It confuses Smatch when we check "sinit" for NULL and then non-NULL and
that causes a false positive warning later.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
stephen hemminger
963a185539 tipc: spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
stephen hemminger
db9c7c3943 ipv6: addrconf spelling fixes
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 18:18:22 -08:00
Hannes Frederic Sowa
95f4a45de1 net: avoid reference counter overflows on fib_rules in multicast forwarding
Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d0 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:37:25 -08:00
Geert Uytterhoeven
88bfe6ea00 net: Spelling s/transmition/transmission/
Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 17:11:26 -08:00
Christian Engelmayer
267d29a69c ieee802154: Fix memory leak in ieee802154_add_iface()
Fix a memory leak in the ieee802154_add_iface() error handling path.
Detected by Coverity: CID 710490.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:40:56 -08:00
Aruna-Hewapathirane
63862b5bef net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.

Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:15:25 -08:00
Hannes Frederic Sowa
825edac4e7 ipv6: copy traffic class from ping request to reply
Suggested-by: Simon Schneider <simon-schneider@gmx.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:08:52 -08:00
WANG Cong
72c1d3bdd5 ipv4: register igmp_notifier even when !CONFIG_PROC_FS
We still need this notifier even when we don't config
PROC_FS.

It should be rare to have a kernel without PROC_FS,
so just for completeness.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 15:03:33 -08:00
Ben Hutchings
ae78dbfa40 net: Add trace events for all receive entry points, exposing more skb fields
The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.

Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour.  Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:46:02 -08:00
Ben Hutchings
d87d04a785 net: Add net_dev_start_xmit trace event, exposing more skb fields
The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.

Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:46:02 -08:00
Ben Hutchings
20567661a1 net: Fix indentation in dev_hard_start_xmit()
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:45:42 -08:00
David S. Miller
0a379e21c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-01-14 14:42:42 -08:00
Paul Durrant
ed1f50c3a7 net: add skb_checksum_setup
This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Veaceslav Falico <vfalico@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14 14:24:19 -08:00
Paul Gortmaker
419331d8ff netfilter: Add dependency on IPV6 for NF_TABLES_INET
Commit 1d49144c0a ("netfilter: nf_tables: add "inet" table for
IPv4/IPv6") allows creation of non-IPV6 enabled .config files that
will fail to configure/link as follows:

warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
warning: (NF_TABLES_INET) selects NF_TABLES_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && NF_TABLES)
net/built-in.o: In function `nft_reject_eval':
nft_reject.c:(.text+0x651e8): undefined reference to `nf_ip6_checksum'
nft_reject.c:(.text+0x65270): undefined reference to `ip6_route_output'
nft_reject.c:(.text+0x656c4): undefined reference to `ip6_dst_hoplimit'
make: *** [vmlinux] Error 1

Since the feature is to allow for a mixed IPV4 and IPV6 table, it
seems sensible to make it depend on IPV6.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-14 16:33:00 +01:00
Ilya Dryomov
f2be82b005 libceph: fix preallocation check in get_reply()
The check that makes sure that we have enough memory allocated to read
in the entire header of the message in question is currently busted.
It compares front_len of the incoming message with iov_len field of
ceph_msg::front structure, which is used primarily to indicate the
amount of data already read in, and not the size of the allocated
buffer.  Under certain conditions (e.g. a short read from a socket
followed by that socket's shutdown and owning ceph_connection reset)
this results in a warning similar to

[85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0)

and, through another bug, leads to forever hung tasks and forced
reboots.  Fix this by comparing front_len with front_alloc_len field of
struct ceph_msg, which stores the actual size of the buffer.

Fixes: http://tracker.ceph.com/issues/5425

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-14 11:27:47 +02:00
Ilya Dryomov
3f0a4ac55f libceph: rename front to front_len in get_reply()
Rename front local variable to front_len in get_reply() to make its
purpose more clear.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-14 11:27:42 +02:00
Ilya Dryomov
3cea4c3071 libceph: rename ceph_msg::front_max to front_alloc_len
Rename front_max field of struct ceph_msg to front_alloc_len to make
its purpose more clear.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2014-01-14 11:27:26 +02:00
WANG Cong
b86f81cca9 bridge: move br_net_exit() to br.c
And it can become static.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:42:39 -08:00
David S. Miller
aef2b45fe4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
	net/xfrm/xfrm_policy.c

Steffen Klassert says:

====================
This pull request has a merge conflict between commits be7928d20b
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and
da7c224b1b ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.

The version from net-next can be used, like it is done in linux-next.

1) Checkpatch cleanups, from Weilong Chen.

2) Fix lockdep complaints when pktgen is used with IPsec,
   from Fan Du.

3) Update pktgen to allow any combination of IPsec transport/tunnel mode
   and AH/ESP/IPcomp type, from Fan Du.

4) Make pktgen_dst_metrics static, Fengguang Wu.

5) Compile fix for pktgen when CONFIG_XFRM is not set,
   from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 23:14:25 -08:00
Neal Cardwell
70315d22d3 inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.

This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.

Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.

This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 22:35:46 -08:00
David S. Miller
853dc21bfe Included changes:
- drop dependency against CRC16
 - move to new release version
 - add size check at compile time for packet structs
 - update copyright years in every file
 - implement new bonding/interface alternation feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS0p53AAoJEEKTMo6mOh1VZ7sP/RDgh5PMXC5l1LNTG9gtsV5Z
 zzqs+kEfeTCivcMtONXhBhuli7wlW3eZehscB/Hn6VOKf40Ktwb0tNjrJZ+OMEHC
 hwR7i56ucNOKA5L1cswWwprIY3tV3dK+tI/Y7Oc0d/HAkhU2j3wHWdzCdUMa2yBp
 PurZZrRFXrqcKtIKP+AK1DkkQ2TyUZVNB6dZDf1AifMxhcfFf6Vxg1JGj6AKgvKF
 zf9q8SC5x33qGfvx4VML1b0JNChAwt01PecY/Eo154W3eOWHNXh9UxQEQBRoChbJ
 A/hCoo/LWGFeyqZwEeWBIjR+nGi/5zbCg310FGTgjWZaJ+BBD/mjKAjDhtszX2sI
 Lf0TJ7ytfCn/qij0Xmqg3R5RnmstJpb+weZ7gMqk63o9I06RtvV6/x58tPB+7+Y1
 AJ5egjL0yUypn7LtFUL3z1S7Np6m+FC9KuH47Yc1FyXR8tSwDWYszdlAloqPeYVI
 CDMw73T+6vsWr2UPBnXqudy0BtG3XT8LFXAUC9GMkz0k8GY8RZK7MeUun9Sax+ra
 c7MC+yZYDhwKgNSiEw3J86n0ozsIGVCheAQGuenmdicQdirJa8KImj1CZCoGqWBk
 kWsyNhw68SVLZFSfCiKPjJjbWijpVfQKM/TSB30Cmj2L1hyWeEtBn0McHig6srdX
 hf1Xh4tFh1yGwMYdCcAq
 =BKos
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- drop dependency against CRC16
- move to new release version
- add size check at compile time for packet structs
- update copyright years in every file
- implement new bonding/interface alternation feature

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 21:50:27 -08:00
Eric Paris
4440e85481 audit: convert all sessionid declaration to unsigned int
Right now the sessionid value in the kernel is a combination of u32,
int, and unsigned int.  Just use unsigned int throughout.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
2014-01-13 22:31:46 -05:00
Veaceslav Falico
2315dc91a5 net: make dev_set_mtu() honor notification return code
Currently, after changing the MTU for a device, dev_set_mtu() calls
NETDEV_CHANGEMTU notification, however doesn't verify it's return code -
which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this
change, and continues nevertheless.

To fix this, verify the return code, and if it's an error - then revert the
MTU to the original one, notify again and pass the error code.

CC: Jiri Pirko <jiri@resnulli.us>
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 15:19:26 -08:00
stephen hemminger
6daaf0de2f sctp: make sctp_addto_chunk_fixed local
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 14:42:30 -08:00
stephen hemminger
b5d2b2858f l2tp: make local functions static
Avoid needless export of local functions

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 12:00:16 -08:00
Neal Cardwell
b884b1a46f gre_offload: simplify GRE header length calculation in gre_gso_segment()
Simplify the GRE header length calculation in gre_gso_segment().
Switch to an approach that is simpler, faster, and more general. The
new approach will continue to be correct even if we add support for
the optional variable-length routing info that may be present in a GRE
header.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:53:49 -08:00
WANG Cong
7eb8896df0 net_sched: act: remove struct tcf_act_hdr
It is not necessary at all.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong
a8701a6c7a net_sched: avoid casting void pointer
tp->root is a void* pointer, no need to cast it.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong
2519a602c2 net_sched: optimize tcf_match_indev()
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.

Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.

BTW, this will also save some bytes from the core struct of u32.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:15 -08:00
WANG Cong
832d1d5bfa net_sched: add struct net pointer to tcf_proto_ops->dump
It will be needed by the next patch.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
WANG Cong
a56e19538d net_sched: act: clean up notification functions
Refactor tcf_add_notify() and factor out tcf_del_notify().

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
WANG Cong
ddafd34f41 net_sched: act: move idx_gen into struct tcf_hashinfo
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:50:14 -08:00
Luis R. Rodriguez
4f7b91404c cfg80211: make regulatory_hint() remove REGULATORY_CUSTOM_REG
The REGULATORY_CUSTOM_REG can be used during early init with
the goal of overriding the wiphy's default regulatory settings
in case the alpha2 of the device is not known. In the case that
the alpha2 becomes known lets avoid having drivers having to
clear the REGULATORY_CUSTOM_REG flag by doing it for them
when regulatory_hint() is used.

Cc: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:46:58 -05:00
Eric Dumazet
600adc18eb net: gro: change GRO overflow strategy
GRO layer has a limit of 8 flows being held in GRO list,
for performance reason.

When a packet comes for a flow not yet in the list,
and list is full, we immediately give it to upper
stacks, lowering aggregation performance.

With TSO auto sizing and FQ packet scheduler, this situation
happens more often.

This patch changes strategy to simply evict the oldest flow of
the list. This works better because of the nature of packet
trains for which GRO is efficient. This also has the effect
of lowering the GRO latency if many flows are competing.

Tested :

Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM
netperf.

Before patch, aggregate rate is 11Gbps (while a single flow can reach
30Gbps)

After patch, line rate is reached.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:43:46 -08:00
John W. Linville
f13352519e Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-01-13 14:40:59 -05:00
Wei Yongjun
d10dbad2aa gre_offload: fix sparse non static symbol warning
Fixes the following sparse warning:

net/ipv4/gre_offload.c:253:5: warning:
 symbol 'gre_gro_complete' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:36:43 -08:00
John W. Linville
ec665facde This is the first NFC pull request for 3.14
It includes:
 
 * A new NFC driver for Marvell's 8897, and a few NCI fixes and
   improvements needed to support this chipset.
 
 * An LLCP fix for how we were setting the default MIU on a p2p link. If
   there is no explicit MIU extension announced at connection time, we
   must use the default one and not the one announced at LLCP link
   establishement time.
 
 * A pn544 EEPROM config update. Some of the currently EEPROM configured
   values are overwriting the firmware ones while other should not be set
   by the driver itself.
 
 * Some NFC digital stack fixes and improvements. Asynchronous functions
   are better documented, RF technologies and CRC functions are set upon
   PSL_REQ reception, and a few minor bugs are fixed.
 
 * Minor and miscelaneous pn533, mei_phy and port100 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSzfOKAAoJEIqAPN1PVmxKODEP/i1tmx6bwSjuR0gMyvIkqcBJ
 1mM7BwdXlTKTvS/HaKTqaftS5S9Kj/IYSsHPjqRAJp3ipZdc39D8rR3jiyhWzKyD
 A/o6whTBTnyAgt8/enNp+h8S/Iq+E3itL/51KUOeeFIKSpGqqfcssZ1/3qhvoYZQ
 75zck2OPiEs8KBl1bCrrzK1kP4s8aEH6PepmXd7WS8njKe+dcyl3erw0IVN4WPfP
 FKFemvL/HP8+cUyshdiQGRiSw+TyD1VLaZinhyoeJxVRcXUjcodLwtCIATwqvu54
 2fMk1ccFineAQZGFfZGbtMAjHQLUeOpHHxFfdkW1g7P9IBp4zjtEiNOhNvPnKlR2
 p4g4R/vPdXxbQWjIoWzXI8qw/eFq8xIVC0ap37W/Y65532ParnXESAwk29BJ6770
 kqpHTjfZTUmW2POuvqhEKUKPPVp5nt0ArgfnjvHOS1wxcT885vWeu/YOxpOm9VdU
 rjFSBBaBDC43vGkCHn5szU9sEwu4O1/JFHElSToXsu+bRtS0tA3O62Kv732RZmbm
 1SCbZ63o1ivZr8Q37bY1NDW1/YdwUJMNbEb/t/wDLBqYx0vQcD0aDUWCwoACi2Du
 FbUElc975E1ChvM7VfV7uqFN0Pc++M1IEHLcM2BiXmSIGjziFY02FFDkqHiea/tC
 xlYfYrrUoIj0v/u7q1t4
 =ydBy
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-3.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the first NFC pull request for 3.14

It includes:

* A new NFC driver for Marvell's 8897, and a few NCI fixes and
  improvements needed to support this chipset.

* An LLCP fix for how we were setting the default MIU on a p2p link. If
  there is no explicit MIU extension announced at connection time, we
  must use the default one and not the one announced at LLCP link
  establishement time.

* A pn544 EEPROM config update. Some of the currently EEPROM configured
  values are overwriting the firmware ones while other should not be set
  by the driver itself.

* Some NFC digital stack fixes and improvements. Asynchronous functions
  are better documented, RF technologies and CRC functions are set upon
  PSL_REQ reception, and a few minor bugs are fixed.

* Minor and miscelaneous pn533, mei_phy and port100 fixes."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-13 14:36:42 -05:00
Hannes Frederic Sowa
8ed1dc44d3 ipv4: introduce hardened ip_no_pmtu_disc mode
This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
to be honored by protocols which do more stringent validation on the
ICMP's packet payload. This knob is useful for people who e.g. want to
run an unmodified DNS server in a namespace where they need to use pmtu
for TCP connections (as they are used for zone transfers or fallback
for requests) but don't want to use possibly spoofed UDP pmtu information.

Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
if the returned packet is in the window or if the association is valid.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:55 -08:00
Hannes Frederic Sowa
0954cf9c61 ipv6: introduce ip6_dst_mtu_forward and protect forwarding path with it
In the IPv6 forwarding path we are only concerend about the outgoing
interface MTU, but also respect locked MTUs on routes. Tunnel provider
or IPSEC already have to recheck and if needed send PtB notifications
to the sending host in case the data does not fit into the packet with
added headers (we only know the final header sizes there, while also
using path MTU information).

The reason for this change is, that path MTU information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv6 forwarding path to
wrongfully emit Packet-too-Big errors and drop IPv6 packets.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:54 -08:00
Hannes Frederic Sowa
f87c10a8aa ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing
While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:22:54 -08:00
Terry Lam
6c76a07a71 HHF qdisc: fix jiffies-time conversion.
This is to be compatible with the use of "get_time" (i.e. default
time unit in us) in iproute2 patch for HHF as requested by Stephen.

Signed-off-by: Terry Lam <vtlam@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-13 11:20:39 -08:00
Peter Zijlstra
1774e9f3e5 sched, net: Clean up preempt_enable_no_resched() abuse
The only valid use of preempt_enable_no_resched() is if the very next
line is schedule() or if we know preemption cannot actually be enabled
by that statement due to known more preempt_count 'refs'.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: rjw@rjwysocki.net
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 17:39:04 +01:00
Antonio Quartulli
12afc36e38 batman-adv: drop dependency against CRC16
The crc16 functionality is not used anymore, therefore
we can safely remove the dependency in the Kbuild file.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:21 +01:00
Simon Wunderlich
8d90d775ca batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:20 +01:00
Simon Wunderlich
e19f9759ed batman-adv: update copyright years for 2014
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:19 +01:00
Simon Wunderlich
031ace8d05 batman-adv: add build checks for packet sizes
With unrolling the batadv_header into the respective structures, the
offsetof checks are now useless. Instead, add build checks for all
packet types which go over the wire to avoid problems with wrong sizes
or compatibility issues on some architectures which don't use every day.

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:19 +01:00
Antonio Quartulli
82ab33214e batman-adv: remove returns at the end of void functions
Return at the end of void functions is not needed.

Since most of the void functions in the code do not do so,
make all the others consistent by removing the useless
returns. Actually all the functions to be "fixed" are in
network-coding.h only.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-12 14:41:18 +01:00
Simon Wunderlich
cb1c92ec37 batman-adv: add debugfs support to view multiif tables
Show tables for the multi interface operation. Originator tables
are added per hard interface.

This patch also changes the API by adding the interface to the
bat_orig_print() parameters.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:16 +01:00
Simon Wunderlich
5bc7c1eb44 batman-adv: add debugfs structure for information per interface
To show information per interface, add a debugfs hardif structure
similar to the system in sysfs. Hard interface folders will be created
in "$debugfs/batman-adv/". Files are not yet added.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:15 +01:00
Simon Wunderlich
f3b3d90189 batman-adv: add bonding again
With the new interface alternating, the first hop may send packets
in a round robin fashion to it's neighbors because it has multiple
valid routes built by the multi interface optimization. This patch
enables the feature if bonding is selected. Note that unlike the
bonding implemented before, this version is much simpler and may
even enable multi path routing to a certain degree.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:15 +01:00
Simon Wunderlich
ef0a937f7a batman-adv: consider outgoing interface in OGM sending
The current OGM sending an aggregation functionality decides on
which interfaces a packet should be sent when it parses the forward
packet struct. However, with the network wide multi interface
optimization the outgoing interface is decided by the OGM processing
function.

This is reflected by moving the decision in the OGM processing function
and add the outgoing interface in the forwarding packet struct. This
practically implies that an OGM may be added multiple times (once per
outgoing interface), and this also affects aggregation which needs to
consider the outgoing interface as well.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:14 +01:00
Simon Wunderlich
c039876892 batman-adv: add WiFi penalty
If the same interface is used for sending and receiving, there might be
throughput degradation on half-duplex interfaces such as WiFi. Add a
penalty if the same interface is used to reflect this problem in the
metric. At the same time, change the hop penalty from 30 to 15 so there
will be no change for single wifi mesh network. the effective hop
penalty will stay at 30 due to the new wifi penalty for these networks.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:13 +01:00
Simon Wunderlich
7351a4822d batman-adv: split out router from orig_node
For the network wide multi interface optimization there are different
routers for each outgoing interface (outgoing from the OGM perspective,
incoming for payload traffic). To reflect this, change the router and
associated data to a list of routers.

While at it, rename batadv_orig_node_get_router() to
batadv_orig_router_get() to follow the new naming scheme.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:11 +01:00
Simon Wunderlich
89652331c0 batman-adv: split tq information in neigh_node struct
For the network wide multi interface optimization it is required to save
metrics per outgoing interface in one neighbor. Therefore a new type is
introduced to keep interface-specific information. This also requires
some changes in access and list management.

The compare and equiv_or_better API calls are changed to take the
outgoing interface into consideration.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:10 +01:00
Simon Wunderlich
f6c8b71173 batman-adv: remove bonding and interface alternating
Remove bonding and interface alternating code - it will be replaced
by a new, network-wide multi interface optimization which enables
both bonding and interface alternating in a better way.

Keep the sysfs and find router function though, this will be needed
later.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-12 14:41:10 +01:00
David S. Miller
45593c2bd2 Included changes:
- substitute FSF address with URL
 - deselect current bat-GW when GW-client mode gets deactivated
 - send every DHCP packet using bat-unicast messages when GW-client mode is
   enabled
 - implement the Extended Isolation mechanism (it is an enhancement of the
   already existing batman-AP-isolation). This mechanism allows the user to drop
   packets exchanged by selected clients by using netfilter marks.
 - fix typ0 in header guard
 - minor code cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJSzk9EAAoJEEKTMo6mOh1VT9UP/0Mdcg7VM7oKSUEkUszAAQsW
 HwYVxFo89bwxMVauv4qAnmC6J4mV1IeciXFnpTeon8Bqr7isRDz8gCpDV/6m9AZp
 Rh3PFEWkJEE7xZy2sSOyn2cZrgP/Wd/zYTxac+XAf0I+cjSYo40vGGc1/9/EN7to
 lo1A4ru+BQJvQkt500a859Z5PAAsVXolYtLqJcxD0eGDbzR1kHTQmUDJEEkNwzUP
 55vLu1KsSbYxw/T4A8ABKwCvkGRTJhgKmKKvwymeH9PHc5ODAZeInw1HTupKwIOQ
 W+WJxksJ0oBEuZB7y2NVXBRyPC2bF3D10C/7yZlul0PEntmT8vWV/eeO+Lw59YS3
 rzFi+wpvdHwkjuBKpr+mc8lMPE0nWU31HqpFJP3y5IzsjL31kWT6sioWHxhY1zo9
 hvZpb2/F8BniSgT5o3vpMcfInBQefViXP6ELjyB5i6+z2Pf8TqPukNGxCEWpOF6O
 r8HUHlPjbwERohK8/x4LRA8F7VpNagvMJ8kSHRUeR1j5QfcpbqFj3xi5LEBciakT
 WHok0AJdNrFUBVuj2n9z0hHFTTGF4Yxqf61A/vHJkROEthqwvoqtTOX5L1c1ZC/f
 DV6q4m2mLWuUxuRLGVWlXoN2XK8min+Of4RABzicX47EUIjTiPHOh38oymQUXG7D
 FS7/kYo5ZMIJmJgNKp+4
 =6EjB
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- substitute FSF address with URL
- deselect current bat-GW when GW-client mode gets deactivated
- send every DHCP packet using bat-unicast messages when GW-client mode is
  enabled
- implement the Extended Isolation mechanism (it is an enhancement of the
  already existing batman-AP-isolation). This mechanism allows the user to drop
  packets exchanged by selected clients by using netfilter marks.
- fix typ0 in header guard
- minor code cleanups

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:59:34 -05:00
Christoph Paasch
3e7013ddf5 tcp: metrics: Allow selective get/del of tcp-metrics based on src IP
We want to be able to get/del tcp-metrics based on the src IP. This
patch adds the necessary parsing of the netlink attribute and if the
source address is set, it will match on this one too.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
Christoph Paasch
bbf852b96e tcp: metrics: Delete all entries matching a certain destination
As we now can have multiple entries per destination-IP, the "ip
tcp_metrics delete address ADDRESS" command deletes all of them.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
Christoph Paasch
8a59359cb8 tcp: metrics: New netlink attribute for src IP and dumped in netlink reply
This patch adds a new netlink attribute for the source-IP and appends it
to the netlink reply. Now, iproute2 can have access to the source-IP.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
Christoph Paasch
a544302820 tcp: metrics: Add source-address to tcp-metrics
We add the source-address to the tcp-metrics, so that different metrics
will be used per source/destination-pair. We use the destination-hash to
store the metric inside the hash-table. That way, deleting and dumping
via "ip tcp_metrics" is easy.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
Christoph Paasch
324fd55a19 tcp: metrics: rename tcpm_addr to tcpm_daddr
As we will add also the source-address, we rename all accesses to the
tcp-metrics address to use "daddr".

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 17:38:18 -05:00
David S. Miller
1a6c1e5bd2 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull these updates for the 3.14 stream!

For the mac80211 bits, Johannes says:

"Felix adds some helper functions for P2P NoA software tracking, Joe
fixes alignment (but as this apparently never caused issues I didn't
send it to 3.13), Kyeyoon/Jouni add QoS-mapping support (a Hotspot 2.0
feature), Weilong fixed a bunch of checkpatch errors and I get to play
fire-fighter or so and clean up other people's locking issues. I also
added nl80211 vendor-specific events, as we'd discussed at the wireless
summit."

For the iwlwifi bits, Emmanuel says:

"I have here a rework of the interrupt handling to meet RT kernel
requirements - basically we don't take any lock in the primary interrupt
handler. This gave me a good reason to clean things up a bit on the way.
There is also a fix of the QoS mapping along with a few workarounds for
hardware / firmware issues that are hard to hit.
Three fixes suggested by static analyzers, and other various stuff.
Most importantly, I update the Copyright note to include the new year."

For the bluetooth bits, Gustavo says:

"More patches to 3.14. The bulk of changes here is the 6LoWPAN support for
Bluetooth LE Devices. The commits that touches net/ieee802154/ are already
acked by David Miller. Other than that we have some RFCOMM fixes and
improvements plus fixes and clean ups all over the tree."

Beyond that, ath9k, brcmfmac, mwifiex, and wil6210 get their usual
level of attention.  The wl1251 driver gets a number of updates,
and there are a handful of other bits here and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 14:53:33 -05:00
David S. Miller
ef8570d859 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
This batch contains one single patch with the l2tp match
for xtables, from James Chapman.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 14:50:02 -05:00
Jason Wang
f663dd9aaf net: core: explicitly select a txq before doing l2 forwarding
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 13:23:08 -05:00
David S. Miller
c4d7099867 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
For the mac80211 bits, Johannes says:

"I have a fix from Javier for mac80211_hwsim when used with wmediumd
userspace, and a fix from Felix for buffering in AP mode."

For the NFC bits, Samuel says:

"This pull request only contains one fix for a regression introduced with
commit e29a9e2ae1. Without this fix, we can not establish a p2p link
in target mode. Only initiator mode works."

For the iwlwifi bits, Emmanuel says:

"It only includes new device IDs so it's not vital. If you have a pull
request to net.git anyway, I'd happy to have this in."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-10 13:21:22 -05:00
Pablo Neira Ayuso
8f46df184c netfilter: nf_tables: fix missing byteorder conversion in policy
When fetching the policy attribute, the byteorder conversion was
missing, breaking the chain policy setting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-10 18:26:13 +01:00
John W. Linville
235f939228 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
Conflicts:
	net/ieee802154/6lowpan.c
2014-01-10 10:59:40 -05:00
Johannes Berg
b77cf4f8e1 mac80211: handle MMPDUs at EOSP correctly
If a uAPSD service period ends with an MMPDU, we currently just
send that MMPDU, but it obviously won't get the EOSP bit set as
it doesn't have a QoS header. This contradicts the standard, so
add a QoS-nulldata frame after the MMPDU to properly terminate
the service period with a frame that has EOSP set.

Also fix a bug wrt. the TID for the MMPDU, it shouldn't be set
to 0 unconditionally but use the actual TID that was assigned.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:50:02 +01:00
Johannes Berg
03c8c06f2d mac80211: reset TX info flags when frame will be reprocessed
The temporary TX info flags need to be cleared if the frame will
be processed through the TX handlers again, otherwise it can get
messed up. This fixes a bug that happened when an aggregation
session was stopped while the station was sleeping - some frames
might get transmitted marked as aggregation erroneously without
this fix.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:43:34 +01:00
Johannes Berg
f9f760b488 mac80211: release multiple ACs in uAPSD, fix more-data bug
When a response for PS-Poll or a uAPSD trigger frame is sent, the
more-data bit should be set according to 802.11-2012 11.2.1.5 h),
meaning that it should indicate more data on the relevant ACs
(delivery-enabled or nondelivery-enabled for uAPSD or PS-Poll.)

In, for example, the following scenario:
 * 1 frame on VO queue (either in driver or in mac80211)
 * at least 1 frame on VI queue (in the driver)
 * both VO/VI are delivery-enabled
 * uAPSD trigger frame received

The more-data flag to the driver would not be set, even though
it should be.

While fixing this, I noticed that we should really release frames
from multiple ACs where there's data buffered in the driver for
the corresponding TIDs.

To address all this, restructure the code a bit to consider all
ACs if we only release driver frames or only buffered frames.
This also addresses the more-data bug described above as now the
TIDs will all be marked as released, so the driver will have to
check the number of frames.

While at it, clarify some code and comments and remove the found
variable, replacing it with the appropriate sw/hw release check.

Reported-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:43:34 +01:00
Johannes Berg
0a1cb80975 mac80211: fix PS-Poll driver release TID
Using ffs() for the PS-Poll release TID is wrong, it will cause
frames to be released in order 0 1 2 3 4 5 6 7 instead of the
correct 7 6 5 4 3 0 2 1. Fix this by adding a new function that
implements "highest priority TID" properly.

Reported-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-10 09:43:34 +01:00
Fan Du
6bae919003 {xfrm,pktgen} Fix compiling error when CONFIG_XFRM is not set
0-DAY kernel build testing backend reported below error:
All error/warnings:

   net/core/pktgen.c: In function 'pktgen_if_write':
>> >> net/core/pktgen.c:1487:10: error: 'struct pktgen_dev' has no member named 'spi'
>> >> net/core/pktgen.c:1488:43: error: 'struct pktgen_dev' has no member named 'spi'

Fix this by encapuslating the code with CONFIG_XFRM.

Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-10 07:46:24 +01:00
Hannes Frederic Sowa
07edd741c8 ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME
In the past the IFA_PERMANENT flag indicated, that the valid and preferred
lifetime where ignored. Since change fad8da3e08 ("ipv6 addrconf: fix
preferred lifetime state-changing behavior while valid_lft is infinity")
we honour at least the preferred lifetime on those addresses. As such
the valid lifetime gets recalculated and updated to 0.

If loopback address is added manually this problem does not occur.
Also if NetworkManager manages IPv6, those addresses will get added via
inet6_rtm_newaddr and thus will have a correct lifetime, too.

Reported-by: François-Xavier Le Bail <fx.lebail@yahoo.com>
Reported-by: Damien Wyart <damien.wyart@gmail.com>
Fixes: fad8da3e08 ("ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity")
Cc: Yasushi Asano <yasushi.asano@jp.fujitsu.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 23:07:47 -05:00
David S. Miller
751fcac19a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says:

====================
nf_tables updates for net-next

The following patchset contains the following nf_tables updates,
mostly updates from Patrick McHardy, they are:

* Add the "inet" table and filter chain type for this new netfilter
  family: NFPROTO_INET. This special table/chain allows IPv4 and IPv6
  rules, this should help to simplify the burden in the administration
  of dual stack firewalls. This also includes several patches to prepare
  the infrastructure for this new table and a new meta extension to
  match the layer 3 and 4 protocol numbers, from Patrick McHardy.

* Load both IPv4 and IPv6 conntrack modules in nft_ct if the rule is used
  in NFPROTO_INET, as we don't certainly know which one would be used,
  also from Patrick McHardy.

* Do not allow to delete a table that contains sets, otherwise these
  sets become orphan, from Patrick McHardy.

* Hold a reference to the corresponding nf_tables family module when
  creating a table of that family type, to avoid the module deletion
  when in use, from Patrick McHardy.

* Update chain counters before setting the chain policy to ensure that
  we don't leave the chain in inconsistent state in case of errors (aka.
  restore chain atomicity). This also fixes a possible leak if it fails
  to allocate the chain counters if no counters are passed to be restored,
  from Patrick McHardy.

* Don't check for overflows in the table counter if we are just renaming
  a chain, from Patrick McHardy.

* Replay the netlink request after dropping the nfnl lock to load the
  module that supports provides a chain type, from Patrick.

* Fix chain type module references, from Patrick.

* Several cleanups, function renames, constification and code
  refactorizations also from Patrick McHardy.

* Add support to set the connmark, this can be used to set it based on
  the meta mark (similar feature to -j CONNMARK --restore), from
  Kristian Evensen.

* A couple of fixes to the recently added meta/set support and nft_reject,
  and fix missing chain type unregistration if we fail to register our
  the family table/filter chain type, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 21:36:01 -05:00
Pablo Neira Ayuso
cf4dfa8539 netfilter: nf_tables: fix error path in the init functions
We have to unregister chain type if this fails to register netns.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 23:25:48 +01:00
James Chapman
74f77a6b2b netfilter: introduce l2tp match extension
Introduce an xtables add-on for matching L2TP packets. Supports L2TPv2
and L2TPv3 over IPv4 and IPv6. As well as filtering on L2TP tunnel-id
and session-id, the filtering decision can also include the L2TP
packet type (control or data), protocol version (2 or 3) and
encapsulation type (UDP or IP).

The most common use for this will likely be to filter L2TP data
packets of individual L2TP tunnels or sessions. While a u32 match can
be used, the L2TP protocol headers are such that field offsets differ
depending on bits set in the header, making rules for matching generic
L2TP connections cumbersome. This match extension takes care of all
that.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 21:36:39 +01:00
Wei Yongjun
d0eb1f7e66 ip_tunnel: fix sparse non static symbol warning
Fixes the following sparse warning:

net/ipv4/ip_tunnel.c:116:18: warning:
 symbol 'tunnel_dst_check' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 14:31:47 -05:00
Wei Yongjun
ece37c87ab openvswitch: Use kmem_cache_free() instead of kfree()
memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().

Fixes: e298e50570 ('openvswitch: Per cpu flow stats.')
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-09 14:26:39 -05:00
Patrick McHardy
3876d22dba netfilter: nf_tables: rename nft_do_chain_pktinfo() to nft_do_chain()
We don't encode argument types into function names and since besides
nft_do_chain() there are only AF-specific versions, there is no risk
of confusion.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy
44a6f0df03 netfilter: nf_tables: prohibit deletion of a table with existing sets
We currently leak the set memory when deleting a table that still has
sets in it. Return EBUSY when attempting to delete a table with sets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy
7047f9d052 netfilter: nf_tables: take AF module reference when creating a table
The table refers to data of the AF module, so we need to make sure the
module isn't unloaded while the table exists.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:16 +01:00
Patrick McHardy
c5c1f975ad netfilter: nf_tables: perform flags validation before table allocation
Simplifies error handling. Additionally use the correct type u32 for the
host byte order flags value.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy
fa2c1de0bb netfilter: nf_tables: minor nf_chain_type cleanups
Minor nf_chain_type cleanups:

- reorder struct to plug a hoe
- rename struct module member to "owner" for consistency
- rename nf_hookfn array to "hooks" for consistency
- reorder initializers for better readability

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy
2a37d755b8 netfilter: nf_tables: constify chain type definitions and pointers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:15 +01:00
Patrick McHardy
93b0806f00 netfilter: nf_tables: replay request after dropping locks to load chain type
To avoid races, we need to replay to request after dropping the nfnl_mutex
to auto-load the chain type module.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Patrick McHardy
88ce65a71c netfilter: nf_tables: add missing module references to chain types
In some cases we neither take a reference to the AF info nor to the
chain type, allowing the module to be unloaded while in use.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Patrick McHardy
baae3e62f3 netfilter: nf_tables: fix chain type module reference handling
The chain type module reference handling makes no sense at all: we take
a reference immediately when the module is registered, preventing the
module from ever being unloaded.

Fix by taking a reference when we're actually creating a chain of the
chain type and release the reference when destroying the chain.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:14 +01:00
Patrick McHardy
758206760c netfilter: nf_tables: fix check for table overflow
The table use counter is only increased for new chains, so move the check
to the correct position.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:13 +01:00
Patrick McHardy
4401a86200 netfilter: nf_tables: restore chain change atomicity
Chain counter validation is performed after the chain policy has
potentially been changed. Move counter validation/setting before
changing of the chain policy to fix this.

Additionally fix a memory leak if chain counter allocation fails
for new chains, remove an unnecessary free_percpu() and move
counter allocation for new chains

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:13 +01:00
Patrick McHardy
57de2a0cd9 netfilter: nf_tables: split chain policy validation from actually setting it
Currently nf_tables_newchain() atomicity is broken because of having
validation of some netlink attributes performed after changing attributes
of the chain. The chain policy is (currently) fine, but split it up as
preparation for the following fixes and to avoid future mistakes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:17:13 +01:00
Pablo Neira Ayuso
b38895c577 netfilter: nft_meta: fix lack of validation of the input register
We have to validate that the input register is in the range of
allowed registers, otherwise we can take a incorrect register
value as input that may lead us to a crash.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 20:04:16 +01:00
Kristian Evensen
c4ede3d382 netfilter: nft_ct: Add support to set the connmark
This patch adds kernel support for setting properties of tracked
connections. Currently, only connmark is supported. One use-case
for this feature is to provide the same functionality as
-j CONNMARK --save-mark in iptables.

Some restructuring was needed to implement the set op. The new
structure follows that of nft_meta.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-09 19:07:44 +01:00
Ujjal Roy
f5aa0d21dd cfg80211: add sanity check for retry limit in wext-compat
Block setting the wrong values through iwconfig retry
command. Add sanity checking before sending the retry
limit to the driver.

Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-09 17:05:28 +01:00
John W. Linville
0f74d82d80 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2014-01-09 10:19:01 -05:00
Ilan Peer
bdfbec2d2d cfg80211: Add a function to get the number of supported channels
Add a utility function to get the number of channels supported by
the device, and update the places in the code that need this data.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
[replace another occurrence in libertas, fix kernel-doc, fix bugs]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-09 14:24:24 +01:00
David S. Miller
54b553e2c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains three Netfilter updates, they are:

* Fix wrong usage of skb_header_pointer in the DCCP protocol helper that
  has been there for quite some time. It was resulting in copying the dccp
  header to a pointer allocated in the stack. Fortunately, this pointer
  provides room for the dccp header is 4 bytes long, so no crashes have been
  reported so far. From Daniel Borkmann.

* Use format string to print in the invocation of nf_log_packet(), again
  in the DCCP helper. Also from Daniel Borkmann.

* Revert "netfilter: avoid get_random_bytes call" as prandom32 does not
  guarantee enough entropy when being calling this at boot time, that may
  happen when reloading the rule.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-08 15:04:56 -05:00
Antonio Quartulli
42cb0bef01 batman-adv: set the isolation mark in the skb if needed
If a broadcast packet is coming from a client marked as
isolated, then mark the skb using the isolation mark so
that netfilter (or any other application) can recognise
them.

The mark is written in the skb based on the mask value:
only bits set in the mask are substitued by those in the
mark value

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:46 +01:00
Antonio Quartulli
eceb22ae0b batman-adv: create helper function to get AP isolation status
The AP isolation status may be evaluated in different spots.
Create an helper function to avoid code duplication.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:45 +01:00
Antonio Quartulli
2d2fcc2a3f batman-adv: extend the ap_isolation mechanism
Change the AP isolation mechanism to not only "isolate" WIFI
clients but also all those marked with the more generic
"isolation flag" (BATADV_TT_CLIENT_ISOLA).

The result is that when AP isolation is on any unicast
packet originated by an "isolated" client and directed to
another "isolated" client is dropped at the source node.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:44 +01:00
Antonio Quartulli
dd24ddb265 batman-adv: print the new BATADV_TT_CLIENT_ISOLA flag
Print the new BATADV_TT_CLIENT_ISOLA flag properly in the
Local and Global Translation Table output.

The character 'I' is used in the flags column to indicate
that the entry is marked as isolated.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:44 +01:00
Antonio Quartulli
9464d07188 batman-adv: mark a local client as isolated when needed
A client sending packets which mark matches the value
configured via sysfs has to be identified as isolated using
the TT_CLIENT_ISOLA flag.

The match is mask based, meaning that only bits set in the
mask are compared with those in the mark value.

If the configured mask is equal to 0 no operation is
performed.

Such flag is then advertised within the classic client
announcement mechanism.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:43 +01:00
Antonio Quartulli
c42edfe382 batman-adv: add isolation_mark sysfs attribute
This attribute can be used to set and read the value and the
mask of the skb mark which will be used to classify the
source non-mesh client as ISOLATED. In this way a client can
be advertised as such and the mark can potentially be
restored at the receiving node before delivering the skb.

This can be helpful for creating network wide netfilter
policies.

This sysfs file expects a string of the shape "$mark/$mask".
Where $mark has to be a 32-bit number in any base, while
$mask must be a 32bit mask expressed in hex base. Only bits
in $mark covered by the bitmask are really stored.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:42 +01:00
Antonio Quartulli
6c413b1c22 batman-adv: send every DHCP packet as bat-unicast
In different situations it is possible that the DHCP server
or client uses broadcast Ethernet frames to send messages
to each other. The GW component in batman-adv takes care of
using bat-unicast packets to bring broadcast DHCP
Discover/Requests to the "best" server.

On the way back the DHCP server usually sends unicasts,
but upon client request it may decide to use broadcasts as
well.

This patch improves the GW component so that it now snoops
and sends as unicast all the DHCP packets, no matter if they
were generated by a DHCP server or client.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:42 +01:00
Antonio Quartulli
36484f84d5 batman-adv: remove parenthesis from return statements
Remove parenthesis around return expression as suggested by
checkpatch.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:41 +01:00
Antonio Quartulli
4e820e72db batman-adv: rename gw_deselect() to gw_reselect()
The function batadv_gw_deselect() is actually not deselecting
anything. It is just informing the GW code to perform a
re-election procedure when possible.
The current gateway is not being touched at all and therefore
the name of this function is rather misleading.

Rename it to batadv_gw_reselect() to batadv_gw_reselect()
to make its behaviour easier to grasp.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:41 +01:00
Antonio Quartulli
f316318157 batman-adv: deselect current GW on client mode switch off
When switching from gw_mode client to either off or server
the current selected gateway has to be deselected.
In this way when client mode is enabled again a gateway
re-election is forced and a GW_ADD event is consequently
sent.

The current behaviour instead is to keep the current gateway
leading to no GW_ADD event when gw_mode client is selected
for a second time

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-08 20:49:40 +01:00
Antonio Quartulli
ebf38fb7ab batman-adv: remove FSF address from GPL disclaimer
As suggested by checkpatch, remove all the references to the
FSF address since the kernel already has one reference in
its documentation.

In this way it is easier to update it in case of future
changes.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:39 +01:00
Antonio Quartulli
3fba7325bb batman-adv: don't switch byte order too often if not needed
If possible, operations like ntohs/ntohl should not be
performed too often. Use a variable to locally store the
converted value and then use it.

Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-01-08 20:49:39 +01:00
Antonio Quartulli
a48bcacdb3 batman-adv: properly rename define in distributed arp table header file
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-01-08 20:49:38 +01:00
John W. Linville
300e5fd160 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-01-08 13:44:29 -05:00
John W. Linville
2eff7c791a This is the first NFC fixes pull request for 3.13.
It only contains one fix for a regression introduced with commit
 e29a9e2ae1. Without this fix, we can not establish a p2p link in
 target mode. Only initiator mode works.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSywyaAAoJEIqAPN1PVmxK3eMP/2XKZa2qzcMZfWZBLEMXQHce
 UO36BqZF0nre0ZUZQmaXzM5L0PM/UNxvhf2DsLx3s54/Tk/o0wHYSM/7GfX58dkX
 YAYSbG3viQM9L1dRhgfGQwRxXUcd8M85fLUH9SdJeUzCIgxXEDFnlpkeaKPE+UaM
 PgCHRXbW+cDje4DSO5JXVzIRFzsCtAwgd5bx6u/5bM5PLzxGwHJHkHe+lZ9b4Hbe
 ZLsqbU0HRy0rB8hmpro6NcrIoNHEXYupdd1gwslb88jdA+BDOUAZj7htcBBcC9Lw
 7D8RQTI6YNDsOzcLJyWmmfqTKd1j3RWPcTSuWkAGTvy04VOhGEc1LcgOOVLvhsLZ
 Hw412d0lYOiIwtjeIwS1etY42+f7tPOHOuhWFO3EQX0/1fIQ2H9V18DkM2qFPCWT
 L5GwV70YWbYfeRF1i3kGWKN15qLh9/toxB6rgE8eM2vCCGJXbt52yz9oSNM3QvZf
 2rHnzAHCEKGv4xS7oHhJSnkwH3Nd7WR6gjWatbfswrjtaDU2PKlsL703Uxk1gIPz
 o3itmJv/Ej1j6TqUpUz2Vs/sr4dltCDiD7tiaiv8zprHn0LZcNyp1/E+p31pMHBP
 8IW3BXxpBh3S8gpJ932LVAYe40ymQ0m1ZhssddbH1CC90jfs6m/HfaUC+GsgBX/x
 0iaxEH1jwwRIZC9p1Qso
 =YkKJ
 -----END PGP SIGNATURE-----

Merge tag 'nfc-fixes-3.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-fixes

Samuel Ortiz <sameo@linux.intel.com> says:

"This is the first NFC fixes pull request for 3.13.

It only contains one fix for a regression introduced with commit
e29a9e2ae1. Without this fix, we can not establish a p2p link in
target mode. Only initiator mode works."

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-01-08 13:36:17 -05:00
Ying Xue
da7c224b1b net: xfrm: xfrm_policy: silence compiler warning
Fix below compiler warning:

net/xfrm/xfrm_policy.c:1644:12: warning: ‘xfrm_dst_alloc_copy’ defined but not used [-Wunused-function]

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 22:45:26 -05:00
Jon Paul Maloy
581465fa28 tipc: make link start event synchronous
When a link is created we delay the start event by launching it
to be executed later in a tasklet. As we hold all the
necessary locks at the moment of creation, and there is no risk
of deadlock or contention, this delay serves no purpose in the
current code.

We remove this obsolete indirection step, and the associated function
link_start(). At the same time, we rename the function tipc_link_stop()
to the more appropriate tipc_link_purge_queues().

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:44:26 -05:00
Ying Xue
f9a2c80b8b tipc: introduce new spinlock to protect struct link_req
Currently, only 'bearer_lock' is used to protect struct link_req in
the function disc_timeout(). This is unsafe, since the member fields
'num_nodes' and 'timer_intv' might be accessed by below three different
threads simultaneously, none of them grabbing bearer_lock in the
critical region:

link_activate()
  tipc_bearer_add_dest()
    tipc_disc_add_dest()
      req->num_nodes++;

tipc_link_reset()
  tipc_bearer_remove_dest()
    tipc_disc_remove_dest()
      req->num_nodes--
      disc_update()
        read req->num_nodes
	write req->timer_intv

disc_timeout()
  read req->num_nodes
  read/write req->timer_intv

Without lock protection, the only symptom of a race is that discovery
messages occasionally may not be sent out. This is not fatal, since such
messages are best-effort anyway. On the other hand, since discovery
messages are not time critical, adding a protecting lock brings no
serious overhead either. So we add a new, dedicated spinlock in
order to guarantee absolute data consistency in link_req objects.
This also helps reduce the overall role of the bearer_lock, which
we want to remove completely in a later commit series.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:44:25 -05:00
Jon Paul Maloy
b9d4c33935 tipc: remove 'has_redundant_link' flag from STATE link protocol messages
The flag 'has_redundant_link' is defined only in RESET and ACTIVATE
protocol messages. Due to an ambiguity in the protocol specification it
is currently also transferred in STATE messages. Its value is used to
initialize a link state variable, 'permit_changeover', which is used
to inhibit futile link failover attempts when it is known that the
peer node has no working links at the moment, although the local node
may still think it has one.

The fact that 'has_redundant_link' incorrectly is read from STATE
messages has the effect that 'permit_changeover' sometimes gets a wrong
value, and permanently blocks any links from being re-established. Such
failures can only occur in in dual-link systems, and are extremely rare.
This bug seems to have always been present in the code.

Furthermore, since commit b4b5610223
("tipc: Ensure both nodes recognize loss of contact between them"),
the 'permit_changeover' field serves no purpose any more. The task of
enforcing 'lost contact' cycles at both peer endpoints is now taken
by a new mechanism, using the flags WAIT_NODE_DOWN and WAIT_PEER_DOWN
in struct tipc_node to abort unnecessary failover attempts.

We therefore remove the 'has_redundant_link' flag from STATE messages,
as well as the now redundant 'permit_changeover' variable.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:44:25 -05:00
Jon Paul Maloy
170b3927b4 tipc: rename functions related to link failover and improve comments
The functionality related to link addition and failover is unnecessarily
hard to understand and maintain. We try to improve this by renaming
some of the functions, at the same time adding or improving the
explanatory comments around them. Names such as "tipc_rcv()" etc. also
align better with what is used in other networking components.

The changes in this commit are purely cosmetic, no functional changes
are made.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:44:25 -05:00
David S. Miller
a04c0e2c0d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains two patches:

* fix the IRC NAT helper which was broken when adding (incomplete) IPv6
  support, from Daniel Borkmann.

* Refine the previous bugtrap that Jesper added to catch problems for the
  usage of the sequence adjustment extension in IPVs in Dec 16th, it may
  spam messages in case of finding a real bug.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:38:17 -05:00
Daniel Borkmann
be7928d20b net: xfrm: xfrm_policy: fix inline not at beginning of declaration
Fix three warnings related to:

  net/xfrm/xfrm_policy.c:1644:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
  net/xfrm/xfrm_policy.c:1656:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]
  net/xfrm/xfrm_policy.c:1668:1: warning: 'inline' is not at beginning of declaration [-Wold-style-declaration]

Just removing the inline keyword is sufficient as the compiler will
decide on its own about inlining or not.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 18:34:00 -05:00
Patrick McHardy
9638f33ecf netfilter: nft_ct: load both IPv4 and IPv6 conntrack modules for NFPROTO_INET
The ct expression can currently not be used in the inet family since
we don't have a conntrack module for NFPROTO_INET, so
nf_ct_l3proto_try_module_get() fails. Add some manual handling to
load the modules for both NFPROTO_IPV4 and NFPROTO_IPV6 if the
ct expression is used in the inet family.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:32 +01:00
Patrick McHardy
4566bf2706 netfilter: nft_meta: add l4proto support
For L3-proto independant rules we need to get at the L4 protocol value
directly. Add it to the nft_pktinfo struct and use the meta expression
to retrieve it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:31 +01:00
Patrick McHardy
124edfa9e0 netfilter: nf_tables: add nfproto support to meta expression
Needed by multi-family tables to distinguish IPv4 and IPv6 packets.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:30 +01:00
Patrick McHardy
1d49144c0a netfilter: nf_tables: add "inet" table for IPv4/IPv6
This patch adds a new table family and a new filter chain that you can
use to attach IPv4 and IPv6 rules. This should help to simplify
rule-set maintainance in dual-stack setups.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:57:25 +01:00
Patrick McHardy
115a60b173 netfilter: nf_tables: add support for multi family tables
Add support to register chains to multiple hooks for different address
families for mixed IPv4/IPv6 tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2014-01-07 23:55:46 +01:00
Patrick McHardy
c9484874e7 netfilter: nf_tables: add hook ops to struct nft_pktinfo
Multi-family tables need the AF from the hook ops. Add a pointer to the
hook ops and replace usage of the hooknum member in struct nft_pktinfo.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Patrick McHardy
3b088c4bc0 netfilter: nf_tables: make chain types override the default AF functions
Currently the AF-specific hook functions override the chain-type specific
hook functions. That doesn't make too much sense since the chain types
are a special case of the AF-specific hooks.

Make the AF-specific hook functions the default and make the optional
chain type hooks override them.

As a side effect, the necessary code restructuring reduces the code size,
f.i. in case of nf_tables_ipv4.o:

  nf_tables_ipv4_init_net   |  -24
  nft_do_chain_ipv4         | -113
 2 functions changed, 137 bytes removed, diff: -137

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Pablo Neira Ayuso
688d18636f netfilter: nft_reject: fix compilation warning if NF_TABLES_IPV6 is disabled
net/netfilter/nft_reject.c: In function 'nft_reject_eval':
net/netfilter/nft_reject.c:37:14: warning: unused variable 'net' [-Wunused-variable]

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-07 23:50:43 +01:00
Jerry Chu
bf5a755f5e net-gre-gro: Add GRE support to the GRO stack
This patch built on top of Commit 299603e837
("net-gro: Prepare GRO stack for the upcoming tunneling support") to add
the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO
stack. It also serves as an example for supporting other encapsulation
protocols in the GRO stack in the future.

The patch supports version 0 and all the flags (key, csum, seq#) but
will flush any pkt with the S (seq#) flag. This is because the S flag
is not support by GSO, and a GRO pkt may end up in the forwarding path,
thus requiring GSO support to break it up correctly.

Currently the "packet_offload" structure only contains L3 (ETH_P_IP/
ETH_P_IPV6) GRO offload support so the encapped pkts are limited to
IP pkts (i.e., w/o L2 hdr). But support for other protocol type can
be easily added, so is the support for GRE variations like NVGRE.

The patch also support csum offload. Specifically if the csum flag is on
and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE),
the code will take advantage of the csum computed by the h/w when
validating the GRE csum.

Note that commit 60769a5dcd "ipv4: gre:
add GRO capability" already introduces GRO capability to IPv4 GRE
tunnels, using the gro_cells infrastructure. But GRO is done after
GRE hdr has been removed (i.e., decapped). The following patch applies
GRO when pkts first come in (before hitting the GRE tunnel code). There
is some performance advantage for applying GRO as early as possible.
Also this approach is transparent to other subsystem like Open vSwitch
where GRE decap is handled outside of the IP stack hence making it
harder for the gro_cells stuff to apply. On the other hand, some NICs
are still not capable of hashing on the inner hdr of a GRE pkt (RSS).
In that case the GRO processing of pkts from the same remote host will
all happen on the same CPU and the performance may be suboptimal.

I'm including some rough preliminary performance numbers below. Note
that the performance will be highly dependent on traffic load, mix as
usual. Moreover it also depends on NIC offload features hence the
following is by no means a comprehesive study. Local testing and tuning
will be needed to decide the best setting.

All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs.
(super_netperf 50 -H 192.168.1.18 -l 30)

An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1
mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123)
is configured.

The GRO support for pkts AFTER decap are controlled through the device
feature of the GRE device (e.g., ethtool -K gre1 gro on/off).

1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off
thruput: 9.16Gbps
CPU utilization: 19%

1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off
thruput: 5.9Gbps
CPU utilization: 15%

1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 12-13%

1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on
thruput: 9.26Gbps
CPU utilization: 10%

The following tests were performed on a different NIC that is capable of
csum offload. I.e., the h/w is capable of computing IP payload csum
(CHECKSUM_COMPLETE).

2.1 ethtool -K gre1 gro on (hence will use gro_cells)

2.1.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 8.53Gbps
CPU utilization: 9%

2.1.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 8.97Gbps
CPU utilization: 7-8%

2.1.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 8.83Gbps
CPU utilization: 5-6%

2.1.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.98Gbps
CPU utilization: 5%

2.2 ethtool -K gre1 gro off

2.2.1 ethtool -K eth0 gro off; csum offload disabled
thruput: 5.93Gbps
CPU utilization: 9%

2.2.2 ethtool -K eth0 gro off; csum offload enabled
thruput: 5.62Gbps
CPU utilization: 8%

2.2.3 ethtool -K eth0 gro on; csum offload disabled
thruput: 7.69Gbps
CPU utilization: 8%

2.2.4 ethtool -K eth0 gro on; csum offload enabled
thruput: 8.96Gbps
CPU utilization: 5-6%

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 16:21:31 -05:00
Benjamin Poirier
cdb3f4a31b net: Do not enable tx-nocache-copy by default
There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
        692000±1000 tps
        50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
        693000±1000 tps
        50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
        86450±80 tps
        50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
        86110±60 tps
        50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

                        throughput  cpu0        cpu1        demand
                        (Gb/s)      (Gcycle)    (Gcycle)    (cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off     9402±5      9.4±0.2                 0.80±0.01
tx-nocache-copy on      9403±3      9.85±0.04               0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off     9401±5      5.83±0.03   5.0±0.1     0.923±0.007
tx-nocache-copy on      9404±2      5.74±0.03   5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9407.44   2.50     -1.00    0.522   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       4282.648396 task-clock                #    0.423 CPUs utilized
             9,348 context-switches          #    0.002 M/sec
                88 CPU-migrations            #    0.021 K/sec
               355 page-faults               #    0.083 K/sec
    11,812,797,651 cycles                    #    2.758 GHz                     [82.79%]
     9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle    [82.54%]
     4,579,889,681 stalled-cycles-backend    #   38.77% backend  cycles idle    [67.33%]
     6,053,172,792 instructions              #    0.51  insns per cycle
                                             #    1.49  stalled cycles per insn [83.64%]
       597,275,583 branches                  #  139.464 M/sec                   [83.70%]
         8,960,541 branch-misses             #    1.50% of all branches         [83.65%]

      10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % U      us/KB   us/KB

 87380  16384  16384    10.00      9412.45   2.15     -1.00    0.449   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

       2847.375441 task-clock                #    0.281 CPUs utilized
            11,632 context-switches          #    0.004 M/sec
                49 CPU-migrations            #    0.017 K/sec
               354 page-faults               #    0.124 K/sec
     7,646,889,749 cycles                    #    2.686 GHz                     [83.34%]
     6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle    [83.31%]
     1,726,460,071 stalled-cycles-backend    #   22.58% backend  cycles idle    [66.55%]
     2,079,702,453 instructions              #    0.27  insns per cycle
                                             #    2.94  stalled cycles per insn [83.22%]
       363,773,213 branches                  #  127.757 M/sec                   [83.29%]
         4,242,732 branch-misses             #    1.17% of all branches         [83.51%]

      10.128449949 seconds time elapsed

CC: Tom Herbert <therbert@google.com>
Signed-off-by: Benjamin Poirier <bpoirier@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 16:20:19 -05:00
Erik Hugne
732256b933 tipc: correctly unlink packets from deferred packet queue
When we pull a received packet from a link's 'deferred packets' queue
for processing, its 'next' pointer is not cleared, and still refers to
the next packet in that queue, if any. This is incorrect, but caused
no harm before commit 40ba3cdf54 ("tipc:
message reassembly using fragment chain") was introduced. After that
commit, it may sometimes lead to the following oops:

general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: tipc
CPU: 4 PID: 0 Comm: swapper/4 Tainted: G        W 3.13.0-rc2+ #6
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
task: ffff880017af4880 ti: ffff880017aee000 task.ti: ffff880017aee000
RIP: 0010:[<ffffffff81710694>]  [<ffffffff81710694>] skb_try_coalesce+0x44/0x3d0
RSP: 0018:ffff880016603a78  EFLAGS: 00010212
RAX: 6b6b6b6bd6d6d6d6 RBX: ffff880013106ac0 RCX: ffff880016603ad0
RDX: ffff880016603ad7 RSI: ffff88001223ed00 RDI: ffff880013106ac0
RBP: ffff880016603ab8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88001223ed00
R13: ffff880016603ad0 R14: 000000000000058c R15: ffff880012297650
FS:  0000000000000000(0000) GS:ffff880016600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000805b000 CR3: 0000000011f5d000 CR4: 00000000000006e0
Stack:
 ffff880016603a88 ffffffff810a38ed ffff880016603aa8 ffff88001223ed00
 0000000000000001 ffff880012297648 ffff880016603b68 ffff880012297650
 ffff880016603b08 ffffffffa0006c51 ffff880016603b08 00ffffffa00005fc
Call Trace:
 <IRQ>
 [<ffffffff810a38ed>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffa0006c51>] tipc_link_recv_fragment+0xd1/0x1b0 [tipc]
 [<ffffffffa0007214>] tipc_recv_msg+0x4e4/0x920 [tipc]
 [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
 [<ffffffffa000177c>] tipc_l2_rcv_msg+0xcc/0x250 [tipc]
 [<ffffffffa00016f0>] ? tipc_l2_rcv_msg+0x40/0x250 [tipc]
 [<ffffffff8171e65b>] __netif_receive_skb_core+0x80b/0xd00
 [<ffffffff8171df94>] ? __netif_receive_skb_core+0x144/0xd00
 [<ffffffff8171eb76>] __netif_receive_skb+0x26/0x70
 [<ffffffff8171ed6d>] netif_receive_skb+0x2d/0x200
 [<ffffffff8171fe70>] napi_gro_receive+0xb0/0x130
 [<ffffffff815647c2>] e1000_clean_rx_irq+0x2c2/0x530
 [<ffffffff81565986>] e1000_clean+0x266/0x9c0
 [<ffffffff81985f7b>] ? notifier_call_chain+0x2b/0x160
 [<ffffffff8171f971>] net_rx_action+0x141/0x310
 [<ffffffff81051c1b>] __do_softirq+0xeb/0x480
 [<ffffffff819817bb>] ? _raw_spin_unlock+0x2b/0x40
 [<ffffffff810b8c42>] ? handle_fasteoi_irq+0x72/0x100
 [<ffffffff81052346>] irq_exit+0x96/0xc0
 [<ffffffff8198cbc3>] do_IRQ+0x63/0xe0
 [<ffffffff81981def>] common_interrupt+0x6f/0x6f
 <EOI>

This happens when the last fragment of a message has passed through the
the receiving link's 'deferred packets' queue, and at least one other
packet was added to that queue while it was there. After the fragment
chain with the complete message has been successfully delivered to the
receiving socket, it is released. Since 'next' pointer of the last
fragment in the released chain now is non-NULL, we get the crash shown
above.

We fix this by clearing the 'next' pointer of all received packets,
including those being pulled from the 'deferred' queue, before they
undergo any further processing.

Fixes: 40ba3cdf54 ("tipc: message reassembly using fragment chain")
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reported-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 16:15:24 -05:00
J. Bruce Fields
bba0f88bf7 minor svcauth_gss.c cleanup 2014-01-07 16:01:16 -05:00
Jiri Pirko
dfd1582d1e ipv4: loopback device: ignore value changes after device is upped
When lo is brought up, new ifa is created. Then, devconf and neigh values
bitfield should be set so later changes of default values would not
affect lo values.

Note that the same behaviour is in ipv6. Also note that this is likely
not an issue in many distros (for example Fedora 19) because userspace
sets address to lo manually before bringing it up.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 15:55:17 -05:00
FX Le Bail
509aba3b0d IPv6: add the option to use anycast addresses as source addresses in echo reply
This change allows to follow a recommandation of RFC4942.

- Add "anycast_src_echo_reply" sysctl to control the use of anycast addresses
  as source addresses for ICMPv6 echo reply. This sysctl is false by default
  to preserve existing behavior.
- Add inline check ipv6_anycast_destination().
- Use them in icmpv6_echo_reply().

Reference:
RFC4942 - IPv6 Transition/Coexistence Security Considerations
   (http://tools.ietf.org/html/rfc4942#section-2.1.6)

2.1.6. Anycast Traffic Identification and Security

   [...]
   To avoid exposing knowledge about the internal structure of the
   network, it is recommended that anycast servers now take advantage of
   the ability to return responses with the anycast address as the
   source address if possible.

Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 15:51:39 -05:00
Li RongQing
657e5d1965 ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c
initialise pcpu_tstats.syncp to kill the calltrace
[   11.973950] Call Trace:
[   11.973950]  [<819bbaff>] dump_stack+0x48/0x60
[   11.973950]  [<819bbaff>] dump_stack+0x48/0x60
[   11.973950]  [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10
[   11.973950]  [<81078dcf>] __lock_acquire.isra.22+0x1bf/0xc10
[   11.973950]  [<81079fa7>] lock_acquire+0x77/0xa0
[   11.973950]  [<81079fa7>] lock_acquire+0x77/0xa0
[   11.973950]  [<817ca7ab>] ? dev_get_stats+0xcb/0x130
[   11.973950]  [<817ca7ab>] ? dev_get_stats+0xcb/0x130
[   11.973950]  [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230
[   11.973950]  [<8183862d>] ip_tunnel_get_stats64+0x6d/0x230
[   11.973950]  [<817ca7ab>] ? dev_get_stats+0xcb/0x130
[   11.973950]  [<817ca7ab>] ? dev_get_stats+0xcb/0x130
[   11.973950]  [<811cf8c1>] ? __nla_reserve+0x21/0xd0
[   11.973950]  [<811cf8c1>] ? __nla_reserve+0x21/0xd0
[   11.973950]  [<817ca7ab>] dev_get_stats+0xcb/0x130
[   11.973950]  [<817ca7ab>] dev_get_stats+0xcb/0x130
[   11.973950]  [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20
[   11.973950]  [<817d5409>] rtnl_fill_ifinfo+0x569/0xe20
[   11.973950]  [<810352e0>] ? kvm_clock_read+0x20/0x30
[   11.973950]  [<810352e0>] ? kvm_clock_read+0x20/0x30
[   11.973950]  [<81008e38>] ? sched_clock+0x8/0x10
[   11.973950]  [<81008e38>] ? sched_clock+0x8/0x10
[   11.973950]  [<8106ba45>] ? sched_clock_local+0x25/0x170
[   11.973950]  [<8106ba45>] ? sched_clock_local+0x25/0x170
[   11.973950]  [<810da6bd>] ? __kmalloc+0x3d/0x90
[   11.973950]  [<810da6bd>] ? __kmalloc+0x3d/0x90
[   11.973950]  [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70
[   11.973950]  [<817b8c10>] ? __kmalloc_reserve.isra.41+0x20/0x70
[   11.973950]  [<810da81a>] ? slob_alloc_node+0x2a/0x60
[   11.973950]  [<810da81a>] ? slob_alloc_node+0x2a/0x60
[   11.973950]  [<817b919a>] ? __alloc_skb+0x6a/0x2b0
[   11.973950]  [<817b919a>] ? __alloc_skb+0x6a/0x2b0
[   11.973950]  [<817d8795>] rtmsg_ifinfo+0x65/0xe0
[   11.973950]  [<817d8795>] rtmsg_ifinfo+0x65/0xe0
[   11.973950]  [<817cbd31>] register_netdevice+0x531/0x5a0
[   11.973950]  [<817cbd31>] register_netdevice+0x531/0x5a0
[   11.973950]  [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90
[   11.973950]  [<81892b87>] ? ip6_tnl_get_cap+0x27/0x90
[   11.973950]  [<817cbdb6>] register_netdev+0x16/0x30
[   11.973950]  [<817cbdb6>] register_netdev+0x16/0x30
[   11.973950]  [<81f574a6>] vti6_init_net+0x1c4/0x1d4
[   11.973950]  [<81f574a6>] vti6_init_net+0x1c4/0x1d4
[   11.973950]  [<81f573af>] ? vti6_init_net+0xcd/0x1d4
[   11.973950]  [<81f573af>] ? vti6_init_net+0xcd/0x1d4
[   11.973950]  [<817c16df>] ops_init.constprop.11+0x17f/0x1c0
[   11.973950]  [<817c16df>] ops_init.constprop.11+0x17f/0x1c0
[   11.973950]  [<817c1779>] register_pernet_operations.isra.9+0x59/0x90
[   11.973950]  [<817c1779>] register_pernet_operations.isra.9+0x59/0x90
[   11.973950]  [<817c18d1>] register_pernet_device+0x21/0x60
[   11.973950]  [<817c18d1>] register_pernet_device+0x21/0x60
[   11.973950]  [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4
[   11.973950]  [<81f574b6>] ? vti6_init_net+0x1d4/0x1d4
[   11.973950]  [<81f574c7>] vti6_tunnel_init+0x11/0x68
[   11.973950]  [<81f574c7>] vti6_tunnel_init+0x11/0x68
[   11.973950]  [<81f572a1>] ? mip6_init+0x73/0xb4
[   11.973950]  [<81f572a1>] ? mip6_init+0x73/0xb4
[   11.973950]  [<81f0cba4>] do_one_initcall+0xbb/0x15b
[   11.973950]  [<81f0cba4>] do_one_initcall+0xbb/0x15b
[   11.973950]  [<811a00d8>] ? sha_transform+0x528/0x1150
[   11.973950]  [<811a00d8>] ? sha_transform+0x528/0x1150
[   11.973950]  [<81f0c544>] ? repair_env_string+0x12/0x51
[   11.973950]  [<81f0c544>] ? repair_env_string+0x12/0x51
[   11.973950]  [<8105c30d>] ? parse_args+0x2ad/0x440
[   11.973950]  [<8105c30d>] ? parse_args+0x2ad/0x440
[   11.973950]  [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[   11.973950]  [<810546be>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[   11.973950]  [<81f0cd27>] kernel_init_freeable+0xe3/0x182
[   11.973950]  [<81f0cd27>] kernel_init_freeable+0xe3/0x182
[   11.973950]  [<81f0c532>] ? do_early_param+0x7a/0x7a
[   11.973950]  [<81f0c532>] ? do_early_param+0x7a/0x7a
[   11.973950]  [<819b5b1b>] kernel_init+0xb/0x100
[   11.973950]  [<819b5b1b>] kernel_init+0xb/0x100
[   11.973950]  [<819cebf7>] ret_from_kernel_thread+0x1b/0x28
[   11.973950]  [<819cebf7>] ret_from_kernel_thread+0x1b/0x28
[   11.973950]  [<819b5b10>] ? rest_init+0xc0/0xc0
[   11.973950]  [<819b5b10>] ? rest_init+0xc0/0xc0

Before 469bdcefdc ("ipv6: fix the use of pcpu_tstats in ip6_vti.c"),
the pcpu_tstats.syncp is not used to pretect the 64bit elements of
pcpu_tstats, so not appear this calltrace.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-07 14:12:46 -05:00
Thierry Escande
b711ad524b NFC: digital: Set rf tech and crc functions when receiving a PSL_REQ
This patch sets the correct rf tech value and crc functions in target
mode when receiving a PSL_REQ, as done when receiving an ATR_REQ.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 18:48:12 +01:00
Thierry Escande
48e1044515 NFC: digital: Set current target active on activate_target() call
The curr_protocol field of nfc_digital_dev structure used to determine
if a target is currently active was set too soon, immediately when a
target is found. This is not good since there is no other way than
deactivate_target() to reset curr_protocol and if activate_target() is
not called, the target remains active and it's not possible to put the
device in poll mode anymore.

With this patch curr_protocol is set when nfc core activates a target,
puts a device up, or when an ATR_REQ is received in target mode.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 18:48:12 +01:00
Heikki Krogerus
2111cad655 net: rfkill: gpio: convert to descriptor-based GPIO interface
Convert to the safer gpiod_* family of API functions.

Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2014-01-07 18:45:20 +01:00
Emmanuel Grumbach
349b196044 mac80211: allow to set smps mode to OFF in AP mode
In managed mode, we should not ask for OFF mode because the
power settings may still require DYNAMIC. In AP mode, this
should be allowed since the default settings is OFF and
AUTOMATIC is not allowed.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 16:25:49 +01:00
Johannes Berg
3c2723f503 mac80211: clean up prepare_for_handlers() return value
Using an int with 0/1 is not very common, make the function
return a bool instead with the same values (false/true).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 16:23:24 +01:00
Emmanuel Grumbach
40791942ec mac80211: simplify code in ieee80211_prepare_and_rx_handle
No need to assign the return value of prepare_for_handlers
to a variable if the only usage is to test it.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 16:22:15 +01:00
Emmanuel Grumbach
87ee475ef6 mac80211: clean up garbage in comment
Not clear how this landed here.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-07 16:21:56 +01:00
Masanari Iida
8faaaead62 treewide: fix comments and printk msgs
This patch fixed several typo in printk from various
part of kernel source.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-01-07 15:06:07 +01:00
Claudio Takahasi
e825eb1d7e Bluetooth: Fix 6loWPAN peer lookup
This patch fixes peer address lookup for 6loWPAN over Bluetooth Low
Energy links.

ADDR_LE_DEV_PUBLIC, and ADDR_LE_DEV_RANDOM are the values allowed for
"dst_type" field in the hci_conn struct for LE links.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2014-01-07 11:32:15 -02:00
Claudio Takahasi
b071a62099 Bluetooth: Fix setting Universal/Local bit
This patch fixes the Bluetooth Low Energy Address type checking when
setting Universal/Local bit for the 6loWPAN network device or for the
peer device connection.

ADDR_LE_DEV_PUBLIC or ADDR_LE_DEV_RANDOM are the values allowed for
"src_type" and "dst_type" in the hci_conn struct. The Bluetooth link
type can be obtainned reading the "type" field in the same struct.

Signed-off-by: Claudio Takahasi <claudio.takahasi@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2014-01-07 11:32:11 -02:00
Eric Dumazet
438e38fadc gre_offload: statically build GRE offloading support
GRO/GSO layers can be enabled on a node, even if said
node is only forwarding packets.

This patch permits GSO (and upcoming GRO) support for GRE
encapsulated packets, even if the host has no GRE tunnel setup.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 20:28:34 -05:00
David S. Miller
39b6b2992f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
[GIT net-next] Open vSwitch

Open vSwitch changes for net-next/3.14. Highlights are:
 * Performance improvements in the mechanism to get packets to userspace
   using memory mapped netlink and skb zero copy where appropriate.
 * Per-cpu flow stats in situations where flows are likely to be shared
   across CPUs. Standard flow stats are used in other situations to save
   memory and allocation time.
 * A handful of code cleanups and rationalization.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 19:48:38 -05:00
Amitkumar Karwar
22c15bf30b NFC: NCI: Add set_config API
This API can be used by drivers to send their custom
configuration using SET_CONFIG NCI command to the device.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Amitkumar Karwar
86e8586ed5 NFC: NCI: Add setup handler
Some drivers require special configuration while initializing.
This patch adds setup handler for this custom configuration.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Amitkumar Karwar
1907299867 NFC: NCI: Don't reverse local general bytes
Local general bytes returned by nfc_get_local_general_bytes()
are already in correct order. We don't need to reverse them.

Remove local_gb[] local array as it's not needed any more.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-07 01:32:40 +01:00
Stephen Hemminger
443cd88c8a ovs: make functions local
Several functions and datastructures could be local
Found with 'make namespacecheck'

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:54:39 -08:00
Thomas Graf
09c5e6054e openvswitch: Compute checksum in skb_gso_segment() if needed
The copy & csum optimization is no longer present with zerocopy
enabled. Compute the checksum in skb_gso_segment() directly by
dropping the HW CSUM capability from the features passed in.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:53:24 -08:00
Thomas Graf
bda56f143c openvswitch: Use skb_zerocopy() for upcall
Use of skb_zerocopy() can avoid the expensive call to memcpy()
when copying the packet data into the Netlink skb. Completes
checksum through skb_checksum_help() if not already done in
GSO segmentation.

Zerocopy is only performed if user space supported unaligned
Netlink messages. memory mapped netlink i/o is preferred over
zerocopy if it is set up.

Cost of upcall is significantly reduced from:
+   7.48%       vhost-8471  [k] memcpy
+   5.57%     ovs-vswitchd  [k] memcpy
+   2.81%       vhost-8471  [k] csum_partial_copy_generic

to:
+   5.72%     ovs-vswitchd  [k] memcpy
+   3.32%       vhost-5153  [k] memcpy
+   0.68%       vhost-5153  [k] skb_zerocopy

(megaflows disabled)

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:53:17 -08:00
Thomas Graf
8055a89cfa openvswitch: Pass datapath into userspace queue functions
Allows removing the net and dp_ifindex argument and simplify the
code.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:53:07 -08:00
Thomas Graf
44da5ae5fb openvswitch: Drop user features if old user space attempted to create datapath
Drop user features if an outdated user space instance that does not
understand the concept of user_features attempted to create a new
datapath.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:53:00 -08:00
Thomas Graf
43d4be9cb5 openvswitch: Allow user space to announce ability to accept unaligned Netlink messages
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:53 -08:00
Thomas Graf
af2806f8f9 net: Export skb_zerocopy() to zerocopy from one skb to another
Make the skb zerocopy logic written for nfnetlink queue available for
use by other modules.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:42 -08:00
Wei Yongjun
5f03f47c9c openvswitch: remove duplicated include from flow_table.c
Remove duplicated include.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:35 -08:00
Daniel Borkmann
11d6c461b3 net: ovs: use kfree_rcu instead of rcu_free_{sw_flow_mask_cb,acts_callback}
As we're only doing a kfree() anyway in the RCU callback, we can
simply use kfree_rcu, which does the same job, and remove the
function rcu_free_sw_flow_mask_cb() and rcu_free_acts_callback().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:30 -08:00
Pravin B Shelar
e298e50570 openvswitch: Per cpu flow stats.
With mega flow implementation ovs flow can be shared between
multiple CPUs which makes stats updates highly contended
operation. This patch uses per-CPU stats in cases where a flow
is likely to be shared (if there is a wildcard in the 5-tuple
and therefore likely to be spread by RSS). In other situations,
it uses the current strategy, saving memory and allocation time.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:24 -08:00
Thomas Graf
795449d8b8 openvswitch: Enable memory mapped Netlink i/o
Use memory mapped Netlink i/o for all unicast openvswitch
communication if a ring has been set up.

Benchmark
  * pktgen -> ovs internal port
  * 5M pkts, 5M flows
  * 4 threads, 8 cores

Before:
Result: OK: 67418743(c67108212+d310530) usec, 5000000 (9000byte,0frags)
  74163pps 5339Mb/sec (5339736000bps) errors: 0
	+   2.98%     ovs-vswitchd  [k] copy_user_generic_string
	+   2.49%     ovs-vswitchd  [k] memcpy
	+   1.84%       kpktgend_2  [k] memcpy
	+   1.81%       kpktgend_1  [k] memcpy
	+   1.81%       kpktgend_3  [k] memcpy
	+   1.78%       kpktgend_0  [k] memcpy

After:
Result: OK: 24229690(c24127165+d102524) usec, 5000000 (9000byte,0frags)
  206358pps 14857Mb/sec (14857776000bps) errors: 0
	+   2.80%     ovs-vswitchd  [k] memcpy
	+   1.31%       kpktgend_2  [k] memcpy
	+   1.23%       kpktgend_0  [k] memcpy
	+   1.09%       kpktgend_1  [k] memcpy
	+   1.04%       kpktgend_3  [k] memcpy
	+   0.96%     ovs-vswitchd  [k] copy_user_generic_string

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:12 -08:00
Thomas Graf
aae9f0e22c netlink: Avoid netlink mmap alloc if msg size exceeds frame size
An insufficent ring frame size configuration can lead to an
unnecessary skb allocation for every Netlink message. Check frame
size before taking the queue lock and allocating the skb and
re-check with lock to be safe.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:52:06 -08:00
Thomas Graf
bb9b18fb55 genl: Add genlmsg_new_unicast() for unicast message allocation
Allocates a new sk_buff large enough to cover the specified payload
plus required Netlink headers. Will check receiving socket for
memory mapped i/o capability and use it if enabled. Will fall back
to non-mapped skb if message size exceeds the frame size of the ring.

Signed-of-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:53 -08:00
Jesse Gross
663efa3696 openvswitch: Silence RCU lockdep checks from flow lookup.
Flow lookup can happen either in packet processing context or userspace
context but it was annotated as requiring RCU read lock to be held. This
also allows OVS mutex to be held without causing warnings.

Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
2014-01-06 15:51:48 -08:00
Andy Zhou
5bb506324d openvswitch: Change ovs_flow_tbl_lookup_xx() APIs
API changes only for code readability. No functional chnages.

This patch removes the underscored version. Added a new API
ovs_flow_tbl_lookup_stats() that returns the n_mask_hits.

Reported by: Ben Pfaff <blp@nicira.com>
Reviewed-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:41 -08:00
Ben Pfaff
8f49ce1135 openvswitch: Shrink sw_flow_mask by 8 bytes (64-bit) or 4 bytes (32-bit).
We won't normally have a ton of flow masks but using a size_t to store
values no bigger than sizeof(struct sw_flow_key) seems excessive.

This reduces sw_flow_key_range and sw_flow_mask by 4 bytes on 32-bit
systems.  On 64-bit systems it shrinks sw_flow_key_range by 12 bytes but
sw_flow_mask only by 8 bytes due to padding.

Compile tested only.

Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:27 -08:00
Ben Pfaff
d1211908b9 openvswitch: Correct comment.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2014-01-06 15:51:21 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Gianluca Anzolin
f86772af6a Bluetooth: Remove rfcomm_carrier_raised()
Remove the rfcomm_carrier_raised() definition as that function isn't
used anymore.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 13:51:45 -08:00
Gianluca Anzolin
4a2fb3ecc7 Bluetooth: Always wait for a connection on RFCOMM open()
This patch fixes two regressions introduced with the recent rfcomm tty
rework.

The current code uses the carrier_raised() method to wait for the
bluetooth connection when a process opens the tty.

However processes may open the port with the O_NONBLOCK flag or set the
CLOCAL termios flag: in these cases the open() syscall returns
immediately without waiting for the bluetooth connection to
complete.

This behaviour confuses userspace which expects an established bluetooth
connection.

The patch restores the old behaviour by waiting for the connection in
rfcomm_dev_activate() and removes carrier_raised() from the tty_port ops.

As a side effect the new code also fixes the case in which the rfcomm
tty device is created with the flag RFCOMM_REUSE_DLC: the old code
didn't call device_move() and ModemManager skipped the detection
probe. Now device_move() is always called inside rfcomm_dev_activate().

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Andrey Vihrov <andrey.vihrov@gmail.com>
Reported-by: Beson Chow <blc+bluez@mail.vanade.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 13:51:45 -08:00
Gianluca Anzolin
e228b63390 Bluetooth: Move rfcomm_get_device() before rfcomm_dev_activate()
This is a preparatory patch which moves the rfcomm_get_device()
definition before rfcomm_dev_activate() where it will be used.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 13:51:45 -08:00
Gianluca Anzolin
5b89924187 Bluetooth: Release RFCOMM port when the last user closes the TTY
This patch fixes a userspace regression introduced by the commit
29cd718b.

If the rfcomm device was created with the flag RFCOMM_RELEASE_ONHUP the
user space expects that the tty_port is released as soon as the last
process closes the tty.

The current code attempts to release the port in the function
rfcomm_dev_state_change(). However it won't get a reference to the
relevant tty to send a HUP: at that point the tty is already destroyed
and therefore NULL.

This patch fixes the regression by taking over the tty refcount in the
tty install method(). This way the tty_port is automatically released as
soon as the tty is destroyed.

As a consequence the check for RFCOMM_RELEASE_ONHUP flag in the hangup()
method is now redundant. Instead we have to be careful with the reference
counting in the rfcomm_release_dev() function.

Signed-off-by: Gianluca Anzolin <gianluca@sottospazio.it>
Reported-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 13:51:45 -08:00
Jamal Hadi Salim
805c1f4aed net_sched: act: action flushing missaccounting
action flushing missaccounting
Account only for deleted actions

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:46:56 -05:00
Jamal Hadi Salim
63acd6807c net_sched: Remove unnecessary checks for act->ops
Remove unnecessary checks for act->ops
(suggested by Eric Dumazet).

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:46:32 -05:00
sfeldma@cumulusnetworks.com
fbf2671bb8 bridge: use DEVICE_ATTR_xx macros
Use DEVICE_ATTR_RO/RW macros to simplify bridge sysfs attribute definitions.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:40:46 -05:00
Curt Brune
fe0d692bbc bridge: use spin_lock_bh() in br_multicast_set_hash_max
br_multicast_set_hash_max() is called from process context in
net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function.

br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock),
which can deadlock the CPU if a softirq that also tries to take the
same lock interrupts br_multicast_set_hash_max() while the lock is
held .  This can happen quite easily when any of the bridge multicast
timers expire, which try to take the same lock.

The fix here is to use spin_lock_bh(), preventing other softirqs from
executing on this CPU.

Steps to reproduce:

1. Create a bridge with several interfaces (I used 4).
2. Set the "multicast query interval" to a low number, like 2.
3. Enable the bridge as a multicast querier.
4. Repeatedly set the bridge hash_max parameter via sysfs.

  # brctl addbr br0
  # brctl addif br0 eth1 eth2 eth3 eth4
  # brctl setmcqi br0 2
  # brctl setmcquerier br0 1

  # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done

Signed-off-by: Curt Brune <curt@cumulusnetworks.com>
Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:39:47 -05:00
Eric Dumazet
996b175e39 tcp: out_of_order_queue do not use its lock
TCP out_of_order_queue lock is not used, as queue manipulation
happens with socket lock held and we therefore use the lockless
skb queue routines (as __skb_queue_head())

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to make this more consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:34:34 -05:00
Hannes Frederic Sowa
88ad31491e ipv6: don't install anycast address for /128 addresses on routers
It does not make sense to create an anycast address for an /128-prefix.
Suppress it.

As 32019e651c ("ipv6: Do not leave router anycast address for /127
prefixes.") shows we also may not leave them, because we could accidentally
remove an anycast address the user has allocated or got added via another
prefix.

Cc: François-Xavier Le Bail <fx.lebail@yahoo.com>
Cc: Thomas Haller <thaller@redhat.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 16:32:43 -05:00
Jeff Layton
0fdc26785d sunrpc: get rid of use_gssp_lock
We can achieve the same result with a cmpxchg(). This also fixes a
potential race in use_gss_proxy(). The value of sn->use_gss_proxy could
go from -1 to 1 just after we check it in use_gss_proxy() but before we
acquire the spinlock. The procfile write would end up returning success
but the value would flip to 0 soon afterward. With this method we not
only avoid locking but the first "setter" always wins.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-06 15:14:18 -05:00
Jeff Layton
a92e5eb110 sunrpc: fix potential race between setting use_gss_proxy and the upcall rpc_clnt
An nfsd thread can call use_gss_proxy and find it set to '1' but find
gssp_clnt still NULL, so that when it attempts the upcall the result
will be an unnecessary -EIO.

So, ensure that gssp_clnt is created first, and set the use_gss_proxy
variable only if that succeeds.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-06 15:14:17 -05:00
Jeff Layton
1654a04cd7 sunrpc: don't wait for write before allowing reads from use-gss-proxy file
It doesn't make much sense to make reads from this procfile hang. As
far as I can tell, only gssproxy itself will open this file and it
never reads from it. Change it to just give the present setting of
sn->use_gss_proxy without waiting for anything.

Note that we do not want to call use_gss_proxy() in this codepath
since an inopportune read of this file could cause it to be disabled
prematurely.

Cc: stable@vger.kernel.org
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-06 15:14:16 -05:00
Vijay Subramanian
d4b36210c2 net: pkt_sched: PIE AQM scheme
Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.

>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.

We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software.  "

Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions.  Naeem Khademi and Dave Taht independently contributed to ECN
support.

For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.

Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00

All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.

For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: Mythili Prabhu <mysuryan@cisco.com>
CC: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 15:13:01 -05:00
John W. Linville
a50f9d5ee0 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2014-01-06 14:19:18 -05:00
Thomas Pedersen
057d5f4ba1 mac80211: sync dtim_count to TSF
On starting a mesh or AP BSS, the interface dtim_count
countdown should match that of the driver TSF.

Signed-off-by: Thomas Pedersen <twpedersen@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 20:10:47 +01:00
John W. Linville
9d1cd503c7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless 2014-01-06 14:08:41 -05:00
Ujjal Roy
60a4fe0ae9 cfg80211: fix wext-compat for getting retry value
While getting the retry limit, wext-compat returns the value
without updating the flag for retry->flags is 0. Also in this
case, it updates long retry flag when short and long retry
value are unequal.

So, iwconfig never showing "Retry short limit" and showing
"Retry long limit" when both values are unequal.

Updated the flags and corrected the condition properly.

Signed-off-by: Ujjal Roy <royujjal@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 20:00:12 +01:00
David S. Miller
83111e7fe8 netfilter: Fix build failure in nfnetlink_queue_core.c.
net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1

Just a missing include of net/tcp_states.h

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:36:06 -05:00
David S. Miller
9aa28f2b71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>

====================
nftables updates for net-next

The following patchset contains nftables updates for your net-next tree,
they are:

* Add set operation to the meta expression by means of the select_ops()
  infrastructure, this allows us to set the packet mark among other things.
  From Arturo Borrero Gonzalez.

* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
  Borkmann.

* Add new queue expression to nf_tables. These comes with two previous patches
  to prepare this new feature, one to add mask in nf_tables_core to
  evaluate the queue verdict appropriately and another to refactor common
  code with xt_NFQUEUE, from Eric Leblond.

* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
  Eric Leblond.

* Add the reject expression to nf_tables, this adds the missing TCP RST
  support. It comes with an initial patch to refactor common code with
  xt_NFQUEUE, again from Eric Leblond.

* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
  Nazarewicz.

* Remove the nft_meta_target code, now that Arturo added the set operation
  to the meta expression, from me.

* Add help information for nf_tables to Kconfig, also from me.

* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
  available to other nf_tables objects, requested by Arturo, from me.

* Expose the table usage counter, so we can know how many chains are using
  this table without dumping the list of chains, from Tomasz Bursztyka.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 13:29:30 -05:00
Johan Hedberg
cb6ca8e1ed Bluetooth: Default to no security with L2CAP RAW sockets
L2CAP RAW sockets can be used for things which do not involve
establishing actual connection oriented L2CAP channels. One example of
such usage is the l2ping tool. The default security level for L2CAP
sockets is LOW, which implies that for SSP based connection
authentication is still requested (although with no MITM requirement),
which is not what we want (or need) for things like l2ping. Therefore,
default to one lower level, i.e. BT_SECURITY_SDP, for L2CAP RAW sockets
in order not to trigger unwanted authentication requests.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 09:26:23 -08:00
Johan Hedberg
8cef8f50d4 Bluetooth: Fix NULL pointer dereference when disconnecting
When disconnecting it is possible that the l2cap_conn pointer is already
NULL when bt_6lowpan_del_conn() is entered. Looking at l2cap_conn_del
also verifies this as there's a NULL check there too. This patch adds
the missing NULL check without which the following bug may occur:

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<c131e9c7>] bt_6lowpan_del_conn+0x19/0x12a
*pde = 00000000
Oops: 0000 [#1] SMP
CPU: 1 PID: 52 Comm: kworker/u5:1 Not tainted 3.12.0+ #196
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Workqueue: hci0 hci_rx_work
task: f6259b00 ti: f48c0000 task.ti: f48c0000
EIP: 0060:[<c131e9c7>] EFLAGS: 00010282 CPU: 1
EIP is at bt_6lowpan_del_conn+0x19/0x12a
EAX: 00000000 EBX: ef094e10 ECX: 00000000 EDX: 00000016
ESI: 00000000 EDI: f48c1e60 EBP: f48c1e50 ESP: f48c1e34
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 30c65000 CR4: 00000690
Stack:
 f4d38000 00000000 f4d38000 00000002 ef094e10 00000016 f48c1e60 f48c1e70
 c1316bed f48c1e84 c1316bed 00000000 00000001 ef094e10 f48c1e84 f48c1ed0
 c1303cc6 c1303c7b f31f331a c1303cc6 f6e7d1c0 f3f8ea16 f3f8f380 f4d38008
Call Trace:
 [<c1316bed>] l2cap_disconn_cfm+0x3f/0x5b
 [<c1316bed>] ? l2cap_disconn_cfm+0x3f/0x5b
 [<c1303cc6>] hci_event_packet+0x645/0x2117
 [<c1303c7b>] ? hci_event_packet+0x5fa/0x2117
 [<c1303cc6>] ? hci_event_packet+0x645/0x2117
 [<c12681bd>] ? __kfree_skb+0x65/0x68
 [<c12681eb>] ? kfree_skb+0x2b/0x2e
 [<c130d3fb>] ? hci_send_to_sock+0x18d/0x199
 [<c12fa327>] hci_rx_work+0xf9/0x295
 [<c12fa327>] ? hci_rx_work+0xf9/0x295
 [<c1036d25>] process_one_work+0x128/0x1df
 [<c1346a39>] ? _raw_spin_unlock_irq+0x8/0x12
 [<c1036d25>] ? process_one_work+0x128/0x1df
 [<c103713a>] worker_thread+0x127/0x1c4
 [<c1037013>] ? rescuer_thread+0x216/0x216
 [<c103aec6>] kthread+0x88/0x8d
 [<c1040000>] ? task_rq_lock+0x37/0x6e
 [<c13474b7>] ret_from_kernel_thread+0x1b/0x28
 [<c103ae3e>] ? __kthread_parkme+0x50/0x50
Code: 05 b8 f4 ff ff ff 8d 65 f4 5b 5e 5f 5d 8d 67 f8 5f c3 57 8d 7c 24 08 83 e4 f8 ff 77 fc 55 89 e5 57 56f
EIP: [<c131e9c7>] bt_6lowpan_del_conn+0x19/0x12a SS:ESP 0068:f48c1e34
CR2: 0000000000000000

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2014-01-06 09:26:23 -08:00
Chun-Yeow Yeoh
6b5895d93d mac80211: enable WME for peer mesh STA
Enable the WME for peer mesh STA so that the driver,
such as wcn36xx, will pick this up and enabling it in HW.

Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 17:43:06 +01:00
Daniel Borkmann
b22f5126a2 netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages
Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc780499a ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06 17:40:02 +01:00
Johannes Berg
5b0ec94f9c mac80211: fix memory leak in register_hw() error path
Move the internal scan request allocation below the last
sanity check in ieee80211_register_hw() to avoid leaking
memory if the sanity check actually triggers.

Reported-by: ZHAO Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 16:02:34 +01:00
Johannes Berg
ef04a29737 mac80211: handle station TX latency allocation errors
When the station's TX latency data structures need to be
allocated, handle failures properly and also free all the
structures if there are any other problems.

Move the allocation code up so that allocation failures
don't trigger rate control algorithm calls.

Reported-by: ZHAO Gang <gamerh2o@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 15:58:10 +01:00
Johannes Berg
4cd3c4ecfc mac80211: clean up netdev debugfs macros a bit
Clean up the file macros a bit and use that to remove the
unnecessary format function for the tkip MIC test file
that really is write-only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 15:47:15 +01:00
Jesper Dangaard Brouer
f2661adc0c netfilter: only warn once on wrong seqadj usage
Avoid potentially spamming the kernel log with WARN splash messages
when catching wrong usage of seqadj, by simply using WARN_ONCE.

This is a followup to commit db12cf2743 (netfilter: WARN about
wrong usage of sequence number adjustments)

Suggested-by: Flavio Leitner <fbl@redhat.com>
Suggested-by: Daniel Borkmann <dborkman@redhat.com>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06 14:23:17 +01:00
Daniel Borkmann
138aef7dca netfilter: nf_conntrack_dccp: use %s format string for buffer
Some invocations of nf_log_packet() use arg buffer directly instead of
"%s" format string with follow-up buffer pointer. Currently, these two
usages are not really critical, but we should fix this up nevertheless
so that we don't run into trouble if that changes one day.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06 14:19:16 +01:00
Daniel Borkmann
2690d97ade netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
Commit 5901b6be88 attempted to introduce IPv6 support into
IRC NAT helper. By doing so, the following code seemed to be removed
by accident:

  ip = ntohl(exp->master->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.ip);
  sprintf(buffer, "%u %u", ip, port);
  pr_debug("nf_nat_irc: inserting '%s' == %pI4, port %u\n", buffer, &ip, port);

This leads to the fact that buffer[] was left uninitialized and
contained some stack value. When we call nf_nat_mangle_tcp_packet(),
we call strlen(buffer) on excatly this uninitialized buffer. If we
are unlucky and the skb has enough tailroom, we overwrite resp. leak
contents with values that sit on our stack into the packet and send
that out to the receiver.

Since the rather informal DCC spec [1] does not seem to specify
IPv6 support right now, we log such occurences so that admins can
act accordingly, and drop the packet. I've looked into XChat source,
and IPv6 is not supported there: addresses are in u32 and print
via %u format string.

Therefore, restore old behaviour as in IPv4, use snprintf(). The
IRC helper does not support IPv6 by now. By this, we can safely use
strlen(buffer) in nf_nat_mangle_tcp_packet() and prevent a buffer
overflow. Also simplify some code as we now have ct variable anyway.

  [1] http://www.irchelp.org/irchelp/rfc/ctcpspec.html

Fixes: 5901b6be88 ("netfilter: nf_nat: support IPv6 in IRC NAT helper")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06 14:17:17 +01:00
Pablo Neira Ayuso
2a50d805e5 Revert "netfilter: avoid get_random_bytes calls"
This reverts commit a42b99a6e3.

Hannes Frederic Sowa reported some problems with this patch, more specifically
that prandom_u32() may not be ready at boot time, see:

http://marc.info/?l=linux-netdev&m=138896532403533&w=2

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-06 14:00:55 +01:00
Johannes Berg
e03ad6eade nl80211: move vendor/testmode event skb functions out of ifdef
The vendor/testmode event skb functions are needed outside
the ifdef for vendor-specific events, so move them out.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 12:09:09 +01:00
Johannes Berg
1b000789a4 mac80211: add tracing for ieee80211_sta_set_buffered
This is useful for debugging issues with drivers using this
function (erroneously), so add tracing for the API call.

Change-Id: Ice9d7eabb8fecbac188f0a741920d3488de700ec
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2014-01-06 12:09:01 +01:00
Fengguang Wu
5537a0557c pktgen_dst_metrics[] can be static
CC: Fan Du <fan.du@windriver.com>
CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-06 11:00:07 +01:00
Daniel Borkmann
a48d4bb0b0 net: netdev_kobject_init: annotate with __init
netdev_kobject_init() is only being called from __init context,
that is, net_dev_init(), so annotate it with __init as well, thus
the kernel can take this as a hint that the function is used only
during the initialization phase and free up used memory resources
after its invocation.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:27:54 -05:00
Daniel Borkmann
965801e1eb net: 6lowpan: fix lowpan_header_create non-compression memcpy call
In function lowpan_header_create(), we invoke the following code
construct:

  struct ipv6hdr *hdr;
  ...
  hdr = ipv6_hdr(skb);
  ...
  if (...)
    memcpy(hc06_ptr + 1, &hdr->flow_lbl[1], 2);
  else
    memcpy(hc06_ptr, &hdr, 4);

Where the else path of the condition, that is, non-compression
path, calls memcpy() with a pointer to struct ipv6hdr *hdr as
source, thus two levels of indirection. This cannot be correct,
and likely only one level of pointer was intended as source
buffer for memcpy() here.

Fixes: 44331fe2aa ("IEEE802.15.4: 6LoWPAN basic support")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Werner Almesberger <werner@almesberger.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:25:24 -05:00
David S. Miller
855404efae Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-05 20:18:50 -05:00
Amitkumar Karwar
fa9be5f009 NFC: NCI: Cancel cmd_timer in nci_close_device()
nci_close_device() sends nci reset command to the device.
If there is no response for this command, nci request timeout
occurs first and then cmd timeout happens. Because command
timer has started after sending the command.

We are immediately flushing command workqueue after nci
timeout. Later we will try to schedule cmd_work in command
timer which leads to a crash.

Cancel cmd_timer before flushing the workqueue to fix the
problem.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-05 23:20:15 +01:00
Weston Andros Adamson
6ff33b7dd0 sunrpc: Fix infinite loop in RPC state machine
When a task enters call_refreshresult with status 0 from call_refresh and
!rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting
or max number of retries.

Instead of trying forever, make use of the retry path that other errors use.

This only seems to be possible when the crrefresh callback is gss_refresh_null,
which only happens when destroying the context.

To reproduce:

1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific
   operations).

2) reboot - the client will be stuck and will need to be hard rebooted

BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46]
Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy
irq event stamp: 195724
hardirqs last  enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30
hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80
softirqs last  enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276
softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a
CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Workqueue: rpciod rpc_async_schedule [sunrpc]
task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000
RIP: 0010:[<ffffffffa0064fd4>]  [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc]
RSP: 0018:ffff880079003d18  EFLAGS: 00000246
RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007
RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900
RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7
R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e
R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900
FS:  0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0
Stack:
 ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830
 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260
 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000
Call Trace:
 [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1
 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffff81052974>] process_one_work+0x211/0x3a5
 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5
 [<ffffffff81052eeb>] worker_thread+0x134/0x202
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280
 [<ffffffff810584a0>] kthread+0xc9/0xd1
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0
 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61
Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00

And the output of "rpcdebug -m rpc -s all":

RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refresh (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0
RPC:    61 call_refreshresult (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refresh (status 0)
RPC:    61 call_refreshresult (status 0)
RPC:    61 refreshing RPCSEC_GSS cred ffff88007a413cf0

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org # 2.6.37+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-01-05 15:29:26 -05:00
stephen hemminger
eec73f1c96 tipc: remove unused code
Remove dead code;
       tipc_bearer_find_interface
       tipc_node_redundant_links

This may break out of tree version of TIPC if there still is one.
But that maybe a good thing :-)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:18:50 -05:00
stephen hemminger
9805696399 tipc: make local function static
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:18:50 -05:00
stephen hemminger
6aee49c558 dccp: make local variable static
Make DCCP module config variable static, only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:18:50 -05:00
stephen hemminger
fd34d627ae dccp: remove obsolete code
This function is defined but not used.
Remove it now, can be resurrected if ever needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:18:49 -05:00
Li RongQing
8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
Marcel Holtmann
f9f462faa0 Bluetooth: Add quirk for disabling Delete Stored Link Key command
Some controller pretend they support the Delete Stored Link Key command,
but in reality they really don't support it.

  < HCI Command: Delete Stored Link Key (0x03|0x0012) plen 7
      bdaddr 00:00:00:00:00:00 all 1
  > HCI Event: Command Complete (0x0e) plen 4
      Delete Stored Link Key (0x03|0x0012) ncmd 1
      status 0x11 deleted 0
      Error: Unsupported Feature or Parameter Value

Not correctly supporting this command causes the controller setup to
fail and will make a device not work. However sending the command for
controller that handle stored link keys is important. This quirk
allows a driver to disable the command if it knows that this command
handling is broken.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-01-04 20:10:40 +02:00
Thierry Escande
4f319e3251 NFC: digital: Use NFC_NFCID3_MAXSIZE from nfc.h
This removes the declaration of NFCID3 size in digital_dep.c and now
uses the one from nfc.h.

This also removes a faulty and unneeded call to max().

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:35:34 +01:00
Thierry Escande
67af1d7a0f NFC: digital: Fix incorrect use of ERR_PTR and PTR_ERR macros
It's bad to use these macros when not dealing with error code. this
patch changes calls to these macros with correct casts.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:35:34 +01:00
Samuel Ortiz
a434c24074 NFC: Only warn on SE discovery error
SE discovery errors are currently overwriting the dev_up() return error.
This is wrong for many reasons:

- We don't want to report an error if we actually brought the device up
  but it failed to discover SEs. By doing so we pretend we don't have an
  NFC functional device even we do. The only thing we could not do was
  checking for SEs availability. This is the false negative case.

- In some cases the actual device power up failed but the SE discovery
  succeeded. Userspace then believes the device is up while it's not.
  This is the false positive case.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:32:27 +01:00
Szymon Janc
11bfb1c4b9 NFC: llcp: Use default MIU if none was specified on connect
If MIUX is not present in CONNECT or CC use default MIU value (128)
instead of one announced durring link setup.

This was affecting Bluetooth handover with Android 4.3+ NCI stack.

Signed-off-by: Szymon Janc <szymon.janc@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:32:27 +01:00
Szymon Janc
43d53c29dd NFC: llcp: Fix possible memory leak while sending I frames
If sending was not completed due to low memory condition msg_data
was not free before returning from function.

Signed-off-by: Szymon Janc <szymon.janc@gmail.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:32:27 +01:00
Samuel Ortiz
249eb5bd74 NFC: Return driver failure upon unknown event reception
If the device is polling, this will trigger a netlink event to notify
userspace about the polling error.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:32:27 +01:00
Arron Wang
d31652a26b NFC: Fix target mode p2p link establishment
With commit e29a9e2ae1, we set the active_target pointer from
nfc_dep_link_is_up() in order to support the case where the target
detection and the DEP link setting are done atomically by the driver.
That can only happen in initiator mode, so we need to check for that
otherwise we fail to bring a p2p link in target mode.

Signed-off-by: Arron Wang <arron.wang@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2014-01-04 03:31:32 +01:00
stephen hemminger
5e419e68a6 llc: make lock static
The llc_sap_list_lock does not need to be global, only acquired
in core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:56:48 -05:00
stephen hemminger
8f09898bf0 socket: cleanups
Namespace related cleaning

 * make cred_to_ucred static
 * remove unused sock_rmalloc function

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:55:58 -05:00
Tom Herbert
9a4aa9af44 ipv4: Use percpu Cache route in IP tunnels
percpu route cache eliminates share of dst refcnt between CPUs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:40:57 -05:00
Tom Herbert
7d442fab0a ipv4: Cache dst in tunnels
Avoid doing a route lookup on every packet being tunneled.

In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:38:45 -05:00
Neil Horman
f916ec9608 sctp: Add process name and pid to deprecation warnings
Recently I updated the sctp socket option deprecation warnings to be both a bit
more clear and ratelimited to prevent user processes from spamming the log file.
Ben Hutchings suggested that I add the process name and pid to these warnings so
that users can tell who is responsible for using the deprecated apis.  This
patch accomplishes that.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 19:36:46 -05:00
Pablo Neira Ayuso
c9c8e48597 netfilter: nf_tables: dump sets in all existing families
This patch allows you to dump all sets available in all of
the registered families. This allows you to use NFPROTO_UNSPEC
to dump all existing sets, similarly to other existing table,
chain and rule operations.

This patch is based on original patch from Arturo Borrero
González.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-04 00:23:11 +01:00
Kinglong Mee
7e55b59b2f SUNRPC/NFSD: Support a new option for ignoring the result of svc_register
NFSv4 clients can contact port 2049 directly instead of needing the
portmapper.

Therefore a failure to register to the portmapper when starting an
NFSv4-only server isn't really a problem.

But Gareth Williams reports that an attempt to start an NFSv4-only
server without starting portmap fails:

  #rpc.nfsd -N 2 -N 3
  rpc.nfsd: writing fd to kernel failed: errno 111 (Connection refused)
  rpc.nfsd: unable to set any sockets for nfsd

Add a flag to svc_version to tell the rpc layer it can safely ignore an
rpcbind failure in the NFSv4-only case.

Reported-by: Gareth Williams <gareth@garethwilliams.me.uk>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-01-03 18:18:49 -05:00
Daniel Borkmann
82a37132f3 netfilter: x_tables: lightweight process control group matching
It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

 1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

 2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

 3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:44 +01:00
Daniel Borkmann
86f8515f97 net: netprio: rename config to be more consistent with cgroup configs
While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:42 +01:00
Daniel Borkmann
fe1217c4f3 net: net_cls: move cgroupfs classid handling into core
Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

 [1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:41 +01:00
Eric Leblond
14abfa161d netfilter: xt_CT: fix error value in xt_ct_tg_check()
If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:39 +01:00
stephen hemminger
dcd93ed4cd netfilter: nf_conntrack: remove dead code
The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:37 +01:00
stephen hemminger
02eca9d2cc netfilter: ipset: remove unused code
Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:35 +01:00
Daniel Borkmann
34ce324019 netfilter: nf_nat: add full port randomization support
We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

 [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
 [2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 23:41:26 +01:00
Michal Nazarewicz
720e0dfa3a netfilter: nf_tables: remove unused variable in nf_tables_dump_set()
The nfmsg variable is not used (except in sizeof operator which does
not care about its value) between the first and second time it is
assigned the value.  Furthermore, nlmsg_data has no side effects, so
the assignment can be safely removed.

Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 22:51:14 +01:00
Daniel Borkmann
1466291790 netfilter: nf_tables: fix type in parsing in nf_tables_set_alloc_name()
In nf_tables_set_alloc_name(), we are trying to find a new, unused
name for our new set and interate through the list of present sets.
As far as I can see, we're using format string %d to parse already
present names in order to mark their presence in a bitmap, so that
we can later on find the first 0 in that map to assign the new set
name to. We should rather use a temporary variable of type int to
store the result of sscanf() to, and for making sanity checks on.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-03 22:50:59 +01:00
Fan Du
8101328b79 {pktgen, xfrm} Show spi value properly when ipsec turned on
If user run pktgen plus ipsec by using spi, show spi value
properly when cat /proc/net/pktgen/ethX

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:12 +01:00
Fan Du
c454997e68 {pktgen, xfrm} Introduce xfrm_state_lookup_byspi for pktgen
Introduce xfrm_state_lookup_byspi to find user specified by custom
from "pgset spi xxx". Using this scheme, any flow regardless its
saddr/daddr could be transform by SA specified with configurable
spi.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:12 +01:00
Fan Du
cf93d47ed4 {pktgen, xfrm} Construct skb dst for tunnel mode transformation
IPsec tunnel mode encapuslation needs to set outter ip header
with right protocol/ttl/id value with regard to skb->dst->child.

Looking up a rt in a standard way is absolutely wrong for every
packet transmission. In a simple way, construct a dst by setting
neccessary information to make tunnel mode encapuslation working.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:11 +01:00
Fan Du
de4aee7d69 {pktgen, xfrm} Using "pgset spi xxx" to spedifiy SA for a given flow
User could set specific SPI value to arm pktgen flow with IPsec
transformation, instead of looking up SA by sadr/daddr. The reaseon
to do so is because current state lookup scheme is both slow and, most
important of all, in fact pktgen doesn't need to match any SA state
addresses information, all it needs is the SA transfromation shell to
do the encapuslation.

And this option also provide user an alternative to using pktgen
test existing SA without creating new ones.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:11 +01:00
Fan Du
4ae770bf58 {pktgen, xfrm} Correct xfrm_state_lock usage in xfrm_stateonly_find
Acquiring xfrm_state_lock in process context is expected to turn BH off,
as this lock is also used in BH context, namely xfrm state timer handler.
Otherwise it surprises LOCKDEP with below messages.

[   81.422781] pktgen: Packet Generator for packet performance testing. Version: 2.74
[   81.725194]
[   81.725211] =========================================================
[   81.725212] [ INFO: possible irq lock inversion dependency detected ]
[   81.725215] 3.13.0-rc2+ #92 Not tainted
[   81.725216] ---------------------------------------------------------
[   81.725218] kpktgend_0/2780 just changed the state of lock:
[   81.725220]  (xfrm_state_lock){+.+...}, at: [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725231] but this lock was taken by another, SOFTIRQ-safe lock in the past:
[   81.725232]  (&(&x->lock)->rlock){+.-...}
[   81.725232]
[   81.725232] and interrupts could create inverse lock ordering between them.
[   81.725232]
[   81.725235]
[   81.725235] other info that might help us debug this:
[   81.725237]  Possible interrupt unsafe locking scenario:
[   81.725237]
[   81.725238]        CPU0                    CPU1
[   81.725240]        ----                    ----
[   81.725241]   lock(xfrm_state_lock);
[   81.725243]                                local_irq_disable();
[   81.725244]                                lock(&(&x->lock)->rlock);
[   81.725246]                                lock(xfrm_state_lock);
[   81.725248]   <Interrupt>
[   81.725249]     lock(&(&x->lock)->rlock);
[   81.725251]
[   81.725251]  *** DEADLOCK ***
[   81.725251]
[   81.725254] no locks held by kpktgend_0/2780.
[   81.725255]
[   81.725255] the shortest dependencies between 2nd lock and 1st lock:
[   81.725269]  -> (&(&x->lock)->rlock){+.-...} ops: 8 {
[   81.725274]     HARDIRQ-ON-W at:
[   81.725276]                       [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[   81.725282]                       [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725284]                       [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725289]                       [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725292]                       [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725300]                       [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725303]                       [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725305]                       [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725308]                       [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725313]                       [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725316]                       [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725329]                       [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725333]                       [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725338]     IN-SOFTIRQ-W at:
[   81.725340]                       [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[   81.725342]                       [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725344]                       [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725347]                       [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725349]                       [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725352]                       [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725355]                       [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725358]                       [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725360]                       [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725363]                       [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725365]                       [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725368]                       [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725370]                       [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725373]     INITIAL USE at:
[   81.725375]                      [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[   81.725385]                      [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725388]                      [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725390]                      [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   81.725394]                      [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   81.725398]                      [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   81.725401]                      [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   81.725404]                      [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   81.725407]                      [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   81.725409]                      [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   81.725412]                      [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   81.725415]                      [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   81.725417]                      [<ffffffff8102e5b0>] start_secondary+0x190/0x1f0
[   81.725420]   }
[   81.725421]   ... key      at: [<ffffffff8295b9c8>] __key.46349+0x0/0x8
[   81.725445]   ... acquired at:
[   81.725446]    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725449]    [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725452]    [<ffffffff816dc057>] __xfrm_state_delete+0x37/0x140
[   81.725454]    [<ffffffff816dc18c>] xfrm_state_delete+0x2c/0x50
[   81.725456]    [<ffffffff816dc277>] xfrm_state_flush+0xc7/0x1b0
[   81.725458]    [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725465]    [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725468]    [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725471]    [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725476]    [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725479]    [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725482]    [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725484]
[   81.725486] -> (xfrm_state_lock){+.+...} ops: 11 {
[   81.725490]    HARDIRQ-ON-W at:
[   81.725493]                     [<ffffffff8109a64b>] __lock_acquire+0x65b/0x1d70
[   81.725504]                     [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725507]                     [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[   81.725510]                     [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[   81.725513]                     [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725516]                     [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725519]                     [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725522]                     [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725525]                     [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725527]                     [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725530]                     [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725533]    SOFTIRQ-ON-W at:
[   81.725534]                     [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725537]                     [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725539]                     [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725541]                     [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725544]                     [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725547]                     [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725550]                     [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725555]                     [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725565]    INITIAL USE at:
[   81.725567]                    [<ffffffff8109a31a>] __lock_acquire+0x32a/0x1d70
[   81.725569]                    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725572]                    [<ffffffff81774e4b>] _raw_spin_lock_bh+0x3b/0x70
[   81.725574]                    [<ffffffff816dc1df>] xfrm_state_flush+0x2f/0x1b0
[   81.725576]                    [<ffffffffa005f6cc>] pfkey_flush+0x7c/0x100 [af_key]
[   81.725580]                    [<ffffffffa005efb7>] pfkey_process+0x1c7/0x1f0 [af_key]
[   81.725583]                    [<ffffffffa005f139>] pfkey_sendmsg+0x159/0x260 [af_key]
[   81.725586]                    [<ffffffff8162c16f>] sock_sendmsg+0xaf/0xc0
[   81.725589]                    [<ffffffff8162c99c>] SYSC_sendto+0xfc/0x130
[   81.725594]                    [<ffffffff8162cf3e>] SyS_sendto+0xe/0x10
[   81.725597]                    [<ffffffff8177dd12>] system_call_fastpath+0x16/0x1b
[   81.725599]  }
[   81.725600]  ... key      at: [<ffffffff81cadef8>] xfrm_state_lock+0x18/0x50
[   81.725606]  ... acquired at:
[   81.725607]    [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[   81.725609]    [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[   81.725611]    [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725614]    [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725616]    [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725627]    [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725629]    [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725632]    [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725635]    [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725637]    [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725640]
[   81.725641]
[   81.725641] stack backtrace:
[   81.725645] CPU: 0 PID: 2780 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #92
[   81.725647] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[   81.725649]  ffffffff82537b80 ffff880018199988 ffffffff8176af37 0000000000000007
[   81.725652]  ffff8800181999f0 ffff8800181999d8 ffffffff81099358 ffffffff82537b80
[   81.725655]  ffffffff81a32def ffff8800181999f4 0000000000000000 ffff880002cbeaa8
[   81.725659] Call Trace:
[   81.725664]  [<ffffffff8176af37>] dump_stack+0x46/0x58
[   81.725667]  [<ffffffff81099358>] print_irq_inversion_bug.part.42+0x1e8/0x1f0
[   81.725670]  [<ffffffff810995c0>] check_usage_backwards+0x110/0x150
[   81.725672]  [<ffffffff81099e96>] mark_lock+0x196/0x2f0
[   81.725675]  [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[   81.725685]  [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   81.725691]  [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[   81.725694]  [<ffffffff81089b38>] ? sched_clock_cpu+0xa8/0x120
[   81.725697]  [<ffffffff8109a31a>] ? __lock_acquire+0x32a/0x1d70
[   81.725699]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725702]  [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   81.725704]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725707]  [<ffffffff810899a5>] ? sched_clock_local+0x25/0x90
[   81.725710]  [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   81.725712]  [<ffffffff816dd751>] ? xfrm_stateonly_find+0x41/0x1f0
[   81.725715]  [<ffffffff810971ec>] ? lock_release_holdtime.part.26+0x1c/0x1a0
[   81.725717]  [<ffffffff816dd751>] xfrm_stateonly_find+0x41/0x1f0
[   81.725721]  [<ffffffffa008af03>] mod_cur_headers+0x793/0x7f0 [pktgen]
[   81.725724]  [<ffffffffa008bca2>] pktgen_thread_worker+0xd42/0x1880 [pktgen]
[   81.725727]  [<ffffffffa008ba71>] ? pktgen_thread_worker+0xb11/0x1880 [pktgen]
[   81.725729]  [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[   81.725733]  [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[   81.725745]  [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[   81.725751]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   81.725753]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   81.725757]  [<ffffffffa008af60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[   81.725759]  [<ffffffff81078f84>] kthread+0xe4/0x100
[   81.725762]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[   81.725765]  [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   81.725768]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:11 +01:00
Fan Du
6de9ace4ae {pktgen, xfrm} Add statistics counting when transforming
so /proc/net/xfrm_stat could give user clue about what's
wrong in this process.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:11 +01:00
Fan Du
0af0a4136b {pktgen, xfrm} Correct xfrm state lock usage when transforming
xfrm_state lock protects its state, i.e., VALID/DEAD and statistics,
not the transforming procedure, as both mode/type output functions
are reentrant.

Another issue is state lock can be used in BH context when state timer
alarmed, after transformation in pktgen, update state statistics acquiring
state lock should disabled BH context for a moment. Otherwise LOCKDEP
critisize this:

[   62.354339] pktgen: Packet Generator for packet performance testing. Version: 2.74
[   62.655444]
[   62.655448] =================================
[   62.655451] [ INFO: inconsistent lock state ]
[   62.655455] 3.13.0-rc2+ #70 Not tainted
[   62.655457] ---------------------------------
[   62.655459] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
[   62.655463] kpktgend_0/2764 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   62.655466]  (&(&x->lock)->rlock){+.?...}, at: [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
[   62.655479] {IN-SOFTIRQ-W} state was registered at:
[   62.655484]   [<ffffffff8109a61d>] __lock_acquire+0x62d/0x1d70
[   62.655492]   [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   62.655498]   [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   62.655505]   [<ffffffff816dc3a3>] xfrm_timer_handler+0x43/0x290
[   62.655511]   [<ffffffff81059437>] __tasklet_hrtimer_trampoline+0x17/0x40
[   62.655519]   [<ffffffff8105a1b7>] tasklet_hi_action+0xd7/0xf0
[   62.655523]   [<ffffffff81059ac6>] __do_softirq+0xe6/0x2d0
[   62.655526]   [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   62.655530]   [<ffffffff8177fd0a>] smp_apic_timer_interrupt+0x4a/0x60
[   62.655537]   [<ffffffff8177e96f>] apic_timer_interrupt+0x6f/0x80
[   62.655541]   [<ffffffff8100b7c6>] arch_cpu_idle+0x26/0x30
[   62.655547]   [<ffffffff810ace28>] cpu_startup_entry+0x88/0x2b0
[   62.655552]   [<ffffffff81761c3c>] rest_init+0xbc/0xd0
[   62.655557]   [<ffffffff81ea5e5e>] start_kernel+0x3c4/0x3d1
[   62.655583]   [<ffffffff81ea55a8>] x86_64_start_reservations+0x2a/0x2c
[   62.655588]   [<ffffffff81ea569f>] x86_64_start_kernel+0xf5/0xfc
[   62.655592] irq event stamp: 77
[   62.655594] hardirqs last  enabled at (77): [<ffffffff810ab7f2>] vprintk_emit+0x1b2/0x520
[   62.655597] hardirqs last disabled at (76): [<ffffffff810ab684>] vprintk_emit+0x44/0x520
[   62.655601] softirqs last  enabled at (22): [<ffffffff81059b57>] __do_softirq+0x177/0x2d0
[   62.655605] softirqs last disabled at (15): [<ffffffff8105a026>] irq_exit+0x96/0xc0
[   62.655609]
[   62.655609] other info that might help us debug this:
[   62.655613]  Possible unsafe locking scenario:
[   62.655613]
[   62.655616]        CPU0
[   62.655617]        ----
[   62.655618]   lock(&(&x->lock)->rlock);
[   62.655622]   <Interrupt>
[   62.655623]     lock(&(&x->lock)->rlock);
[   62.655626]
[   62.655626]  *** DEADLOCK ***
[   62.655626]
[   62.655629] no locks held by kpktgend_0/2764.
[   62.655631]
[   62.655631] stack backtrace:
[   62.655636] CPU: 0 PID: 2764 Comm: kpktgend_0 Not tainted 3.13.0-rc2+ #70
[   62.655638] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006
[   62.655642]  ffffffff8216b7b0 ffff88001be43ab8 ffffffff8176af37 0000000000000007
[   62.655652]  ffff88001c8d4fc0 ffff88001be43b18 ffffffff81766d78 0000000000000000
[   62.655663]  ffff880000000001 ffff880000000001 ffffffff8101025f ffff88001be43b18
[   62.655671] Call Trace:
[   62.655680]  [<ffffffff8176af37>] dump_stack+0x46/0x58
[   62.655685]  [<ffffffff81766d78>] print_usage_bug+0x1f1/0x202
[   62.655691]  [<ffffffff8101025f>] ? save_stack_trace+0x2f/0x50
[   62.655696]  [<ffffffff81099f8c>] mark_lock+0x28c/0x2f0
[   62.655700]  [<ffffffff810994b0>] ? check_usage_forwards+0x150/0x150
[   62.655704]  [<ffffffff8109a67a>] __lock_acquire+0x68a/0x1d70
[   62.655712]  [<ffffffff81115b09>] ? irq_work_queue+0x69/0xb0
[   62.655717]  [<ffffffff810ab7f2>] ? vprintk_emit+0x1b2/0x520
[   62.655722]  [<ffffffff8109cec5>] ? trace_hardirqs_on_caller+0x105/0x1d0
[   62.655730]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[   62.655734]  [<ffffffff8109c3c7>] lock_acquire+0x97/0x130
[   62.655741]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[   62.655745]  [<ffffffff81774af6>] _raw_spin_lock+0x36/0x70
[   62.655752]  [<ffffffffa00886f6>] ? pktgen_thread_worker+0x1796/0x1860 [pktgen]
[   62.655758]  [<ffffffffa00886f6>] pktgen_thread_worker+0x1796/0x1860 [pktgen]
[   62.655766]  [<ffffffffa0087a79>] ? pktgen_thread_worker+0xb19/0x1860 [pktgen]
[   62.655771]  [<ffffffff8109cf9d>] ? trace_hardirqs_on+0xd/0x10
[   62.655777]  [<ffffffff81775410>] ? _raw_spin_unlock_irq+0x30/0x40
[   62.655785]  [<ffffffff8151faa0>] ? e1000_clean+0x9d0/0x9d0
[   62.655791]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   62.655795]  [<ffffffff81094310>] ? __init_waitqueue_head+0x60/0x60
[   62.655800]  [<ffffffffa0086f60>] ? mod_cur_headers+0x7f0/0x7f0 [pktgen]
[   62.655806]  [<ffffffff81078f84>] kthread+0xe4/0x100
[   62.655813]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170
[   62.655819]  [<ffffffff8177dc6c>] ret_from_fork+0x7c/0xb0
[   62.655824]  [<ffffffff81078ea0>] ? flush_kthread_worker+0x170/0x170

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-03 07:29:10 +01:00
David S. Miller
aca5f58f9b netpoll: Fix missing TXQ unlock and and OOPS.
The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems.

1) It exits without unlocking the TXQ.

2) It then tries to queue a NULL skb to npinfo->txq.

Reported-by: Ahmed Tamrawi <atamrawi@iastate.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:50:52 -05:00
Li RongQing
469bdcefdc ipv6: fix the use of pcpu_tstats in ip6_vti.c
when read/write the 64bit data, the correct lock should be hold.
and we can use the generic vti6_get_stats to return stats, and
not define a new one in ip6_vti.c

Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:37:21 -05:00
Li RongQing
abb6013cca ipv6: fix the use of pcpu_tstats in ip6_tunnel
when read/write the 64bit data, the correct lock should be hold.

Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:37:21 -05:00
Yasushi Asano
fad8da3e08 ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity
Fixed a problem with setting the lifetime of an IPv6
address. When setting preferred_lft to a value not zero or
infinity, while valid_lft is infinity(0xffffffff) preferred
lifetime is set to forever and does not update. Therefore
preferred lifetime never becomes deprecated. valid lifetime
and preferred lifetime should be set independently, even if
valid lifetime is infinity, preferred lifetime must expire
correctly (meaning it must eventually become deprecated)

Signed-off-by: Yasushi Asano <yasushi.asano@jp.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:34:40 -05:00
Daniel Borkmann
4d231b76ee net: llc: fix use after free in llc_ui_recvmsg
While commit 30a584d944 fixes datagram interface in LLC, a use
after free bug has been introduced for SOCK_STREAM sockets that do
not make use of MSG_PEEK.

The flow is as follow ...

  if (!(flags & MSG_PEEK)) {
    ...
    sk_eat_skb(sk, skb, false);
    ...
  }
  ...
  if (used + offset < skb->len)
    continue;

... where sk_eat_skb() calls __kfree_skb(). Therefore, cache
original length and work on skb_len to check partial reads.

Fixes: 30a584d944 ("[LLX]: SOCK_DGRAM interface fixes")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:31:09 -05:00
Wei-Chun Chao
7a7ffbabf9 ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC
VM to VM GSO traffic is broken if it goes through VXLAN or GRE
tunnel and the physical NIC on the host supports hardware VXLAN/GRE
GSO offload (e.g. bnx2x and next-gen mlx4).

Two issues -
(VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
integrity check fails in udp4_ufo_fragment if inner protocol is
TCP. Also gso_segs is calculated incorrectly using skb->len that
includes tunnel header. Fix: robust check should only be applied
to the inner packet.

(VXLAN & GRE) Once GSO header integrity check passes, NULL segs
is returned and the original skb is sent to hardware. However the
tunnel header is already pulled. Fix: tunnel header needs to be
restored so that hardware can perform GSO properly on the original
packet.

Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:06:47 -05:00
Cong Wang
c1ddf295f5 net: revert "sched classifier: make cgroup table local"
This reverts commit de6fb288b1.
Otherwise we got:

net/sched/cls_cgroup.c:106:29: error: static declaration of ‘net_cls_subsys’ follows non-static declaration
 static struct cgroup_subsys net_cls_subsys = {
                             ^
In file included from include/linux/cgroup.h:654:0,
                 from net/sched/cls_cgroup.c:18:
include/linux/cgroup_subsys.h:35:29: note: previous declaration of ‘net_cls_subsys’ was here
 SUBSYS(net_cls)
                             ^
make[2]: *** [net/sched/cls_cgroup.o] Error 1

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:02:12 -05:00
Vlad Yasevich
619a60ee04 sctp: Remove outqueue empty state
The SCTP outqueue structure maintains a data chunks
that are pending transmission, the list of chunks that
are pending a retransmission and a length of data in
flight.  It also tries to keep the emtpy state so that
it can performe shutdown sequence or notify user.

The problem is that the empy state is inconsistently
tracked.  It is possible to completely drain the queue
without sending anything when using PR-SCTP.  In this
case, the empty state will not be correctly state as
report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
can cause an association to be perminantly stuck in the
SHUTDOWN_PENDING state.

Additionally, SCTP is incredibly inefficient when setting
the empty state.  Even though all the data is availaible
in the outqueue structure, we ignore it and walk a list
of trasnports.

In the end, we can completely remove the extra empty
state and figure out if the queue is empty by looking
at 3 things:  length of pending data, length of in-flight
data, and exisiting of retransmit data.  All of these
are already in the strucutre.

Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 17:22:48 -05:00
stephen hemminger
de6fb288b1 sched classifier: make cgroup table local
Doesn't need to be global.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
stephen hemminger
9c75f4029c sched action: make local function static
No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Weilong Chen
dd9b45598a ipv4: switch and case should be at the same indent
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Weilong Chen
442c67f844 ipv4: spaces required around that '='
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:36 -05:00
Li RongQing
0c3584d589 ipv6: remove prune parameter for fib6_clean_all
since the prune parameter for fib6_clean_all always is 0, remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
wangweidong
b055597697 tipc: make the code look more readable
In commit 3b8401fe9d ("tipc: kill unnecessary goto's") didn't make
the code look most readable, so fix it. This patch is cosmetic
and does not change the operation of TIPC in any way.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 03:30:35 -05:00
Weilong Chen
2f3ea9a95c xfrm: checkpatch erros with inline keyword position
Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-02 07:48:51 +01:00
Weilong Chen
42054569f9 xfrm: fix checkpatch error
Fix that "else should follow close brace '}'".

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-02 07:48:50 +01:00
Weilong Chen
02d0892f98 xfrm: checkpatch erros with space prohibited
Fix checkpatch error "space prohibited xxx".

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-02 07:48:50 +01:00
Weilong Chen
3e94c2dcfd xfrm: checkpatch errors with foo * bar
This patch clean up some checkpatch errors like this:
ERROR: "foo * bar" should be "foo *bar"
ERROR: "(foo*)" should be "(foo *)"

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-02 07:48:49 +01:00
Weilong Chen
9b7a787d0d xfrm: checkpatch errors with space
This patch cleanup some space errors.

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-01-02 07:48:48 +01:00
Salam Noureddine
56022a8fdd ipv4: arp: update neighbour address when a gratuitous arp is received and arp_accept is set
Gratuitous arp packets are useful in switchover scenarios to update
client arp tables as quickly as possible. Currently, the mac address
of a neighbour is only updated after a locktime period has elapsed
since the last update. In most use cases such delays are unacceptable
for network admins. Moreover, the "updated" field of the neighbour
stucture doesn't record the last time the address of a neighbour
changed but records any change that happens to the neighbour. This is
clearly a bug since locktime uses that field as meaning "addr_updated".
With this observation, I was able to perpetuate a stale address by
sending a stream of gratuitous arp packets spaced less than locktime
apart. With this change the address is updated when a gratuitous arp
is received and the arp_accept sysctl is set.

Signed-off-by: Salam Noureddine <noureddine@aristanetworks.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 00:08:38 -05:00
stephen hemminger
e82435341f ipv6: namespace cleanups
Running 'make namespacecheck' shows:
  net/ipv6/route.o
    ipv6_route_table_template
    rt6_bind_peer
  net/ipv6/icmp.o
    icmpv6_route_lookup
    ipv6_icmp_table_template

This addresses some of those warnings by:
 * make icmpv6_route_lookup static
 * move inline's out of ip6_route.h since only used into route.c
 * move rt6_bind_peer into route.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger
1d143d9f0c net: core functions cleanup
The following functions are not used outside of net/core/dev.c
and should be declared static.

  call_netdevice_notifiers_info
  __dev_remove_offload
  netdev_has_any_upper_dev
  __netdev_adjacent_dev_remove
  __netdev_adjacent_dev_link_lists
  __netdev_adjacent_dev_unlink_lists
  __netdev_adjacent_dev_unlink
  __netdev_adjacent_dev_link_neighbour
  __netdev_adjacent_dev_unlink_neighbour

And the following are never used and should be deleted
  netdev_lower_dev_get_private_rcu
  __netdev_find_adj_rcu

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:46:09 -05:00
stephen hemminger
2173f8d953 netlink: cleanup tap related functions
Cleanups in netlink_tap code
 * remove unused function netlink_clear_multicast_users
 * make local function static

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:43:36 -05:00
stephen hemminger
3678a9d863 netlink: cleanup rntl_af_register
The function __rtnl_af_register is never called outside this
code, and the return value is always 0.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 23:42:19 -05:00
Li RongQing
c3ac17cd6a ipv6: fix the use of pcpu_tstats in sit
when read/write the 64bit data, the correct lock should be hold.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-01 22:48:59 -05:00
John W. Linville
ad86c55bac Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2014-01-01 15:39:56 -05:00
Pablo Neira Ayuso
d497c63527 netfilter: add help information to new nf_tables Kconfig options
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-01-01 18:37:10 +01:00
Zhi Yong Wu
21eb218989 net, sch: fix the typo in register_qdisc()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 16:44:10 -05:00
Zhi Yong Wu
855abcf066 net, rps: fix the comment of net_rps_action_and_irq_enable()
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 16:44:10 -05:00
David S. Miller
2205369a31 vlan: Fix header ops passthru when doing TX VLAN offload.
When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.

But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.

This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.

Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.

To facilitate this add a dev_rebuild_header().  There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP).  So we need a helper function just like
dev_hard_header() to avoid crashes.

Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.

With lots of help from Florian Westphal.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 16:23:35 -05:00
wangweidong
0438816efd sctp: move skb_dst_set() a bit downwards in sctp_packet_transmit()
skb_dst_set will use dst, if dst is NULL although is not a problem,
then goto the 'no_route' and free nskb, so do the skb_dst_set is pointless.
so move the skb_dst_set after dst check.
Remove the unnecessary initialization as well.

v2: fix the subject line because it would confuse people,
    as pointed out by Daniel.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31 14:31:44 -05:00