Commit Graph

46431 Commits

Author SHA1 Message Date
David S. Miller
f6d4c71332 linux-can-fixes-for-4.12-20170609
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCgAxFiEE4bay/IylYqM/npjQHv7KIOw4HPYFAlk6mLwTHG1rbEBwZW5n
 dXRyb25peC5kZQAKCRAe/sog7Dgc9vx5CADTRLfMxgmmGsLcM8nO0sKn/7X/o4Hy
 qprZy1Nu4J49kG/93pumIvWkcG9fzZ4hbDy+4DiEmL/fIumbvg56aaW90fyvW9Cl
 HzL3S7Id8Db6Qwfr4Wd7r1IRYn08+om4ukO09GdskmMpS4wOr68+3WhutGGu1mEX
 ZS2sTE/dKnozZqUw6agcwXJmAJWFbYBaVcF+x2GT07ako0jYXF29zqe/GlpmzVBJ
 Wmib9DO2BeDm/QpNIot/I3Se86xS0AIH6ds++YRJX6k9dlFb4T2Liw6UysPSVCz6
 QkAT3SiTu3TKoT1TXEyMQEQSpnrFtpMowdTk/s04bKibVIpTTskZm44c
 =DNKW
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-fixes-for-4.12-20170609' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2017-06-09

this is a pull request of 6 patches for net/master.

There's a patch by Stephane Grosjean that fixes an uninitialized symbol warning
in the peak_canfd driver. A patch by Johan Hovold to fix the product-id
endianness in an error message in the the peak_usb driver. A patch by Oliver
Hartkopp to enable CAN FD for virtual CAN devices by default. Three patches by
me, one makes the helper function can_change_state() robust to be called with
cf == NULL. The next patch fixes a memory leak in the gs_usb driver. And the
last one fixes a lockdep splat by properly initialize the per-net
can_rcvlists_lock spin_lock.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 15:41:57 -04:00
Johannes Berg
c7a61cba71 mac80211: free netdev on dev_alloc_name() error
The change to remove free_netdev() from ieee80211_if_free()
erroneously didn't add the necessary free_netdev() for when
ieee80211_if_free() is called directly in one place, rather
than as the priv_destructor. Add the missing call.

Fixes: cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 15:40:15 -04:00
ashwanth@codeaurora.org
773fc8f6e8 net: rps: send out pending IPI's on CPU hotplug
IPI's from the victim cpu are not handled in dev_cpu_callback.
So these pending IPI's would be sent to the remote cpu only when
NET_RX is scheduled on the victim cpu and since this trigger is
unpredictable it would result in packet latencies on the remote cpu.

This patch add support to send the pending ipi's of victim cpu.

Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 15:35:01 -04:00
Krister Johansen
f186ce61bb Fix an intermittent pr_emerg warning about lo becoming free.
It looks like this:

Message from syslogd@flamingo at Apr 26 00:45:00 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 4

They seem to coincide with net namespace teardown.

The message is emitted by netdev_wait_allrefs().

Forced a kdump in netdev_run_todo, but found that the refcount on the lo
device was already 0 at the time we got to the panic.

Used bcc to check the blocking in netdev_run_todo.  The only places
where we're off cpu there are in the rcu_barrier() and msleep() calls.
That behavior is expected.  The msleep time coincides with the amount of
time we spend waiting for the refcount to reach zero; the rcu_barrier()
wait times are not excessive.

After looking through the list of callbacks that the netdevice notifiers
invoke in this path, it appears that the dst_dev_event is the most
interesting.  The dst_ifdown path places a hold on the loopback_dev as
part of releasing the dev associated with the original dst cache entry.
Most of our notifier callbacks are straight-forward, but this one a)
looks complex, and b) places a hold on the network interface in
question.

I constructed a new bcc script that watches various events in the
liftime of a dst cache entry.  Note that dst_ifdown will take a hold on
the loopback device until the invalidated dst entry gets freed.

[      __dst_free] on DST: ffff883ccabb7900 IF tap1008300eth0 invoked at 1282115677036183
    __dst_free
    rcu_nocb_kthread
    kthread
    ret_from_fork
Acked-by: Eric Dumazet <edumazet@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 12:27:28 -04:00
Mateusz Jurczyk
defbcf2dec af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_UNIX socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-09 10:10:24 -04:00
Marc Kleine-Budde
74b7b49088 can: af_can: namespace support: fix lockdep splat: properly initialize spin_lock
This patch uses spin_lock_init() instead of __SPIN_LOCK_UNLOCKED() to
initialize the per namespace net->can.can_rcvlists_lock lock to fix this
lockdep warning:

| INFO: trying to register non-static key.
| the code is fine but needs lockdep annotation.
| turning off the locking correctness validator.
| CPU: 0 PID: 186 Comm: candump Not tainted 4.12.0-rc3+ #47
| Hardware name: Marvell Kirkwood (Flattened Device Tree)
| [<c0016644>] (unwind_backtrace) from [<c00139a8>] (show_stack+0x18/0x1c)
| [<c00139a8>] (show_stack) from [<c0058c8c>] (register_lock_class+0x1e4/0x55c)
| [<c0058c8c>] (register_lock_class) from [<c005bdfc>] (__lock_acquire+0x148/0x1990)
| [<c005bdfc>] (__lock_acquire) from [<c005deec>] (lock_acquire+0x174/0x210)
| [<c005deec>] (lock_acquire) from [<c04a6780>] (_raw_spin_lock+0x50/0x88)
| [<c04a6780>] (_raw_spin_lock) from [<bf02116c>] (can_rx_register+0x94/0x15c [can])
| [<bf02116c>] (can_rx_register [can]) from [<bf02a868>] (raw_enable_filters+0x60/0xc0 [can_raw])
| [<bf02a868>] (raw_enable_filters [can_raw]) from [<bf02ac14>] (raw_enable_allfilters+0x2c/0xa0 [can_raw])
| [<bf02ac14>] (raw_enable_allfilters [can_raw]) from [<bf02ad38>] (raw_bind+0xb0/0x250 [can_raw])
| [<bf02ad38>] (raw_bind [can_raw]) from [<c03b5fb8>] (SyS_bind+0x70/0xac)
| [<c03b5fb8>] (SyS_bind) from [<c000f8c0>] (ret_fast_syscall+0x0/0x1c)

Cc: Mario Kicherer <dev@kicherer.org>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2017-06-09 11:39:23 +02:00
Arnd Bergmann
0db47e3d32 ila_xlat: add missing hash secret initialization
While discussing the possible merits of clang warning about unused initialized
functions, I found one function that was clearly meant to be called but
never actually is.

__ila_hash_secret_init() initializes the hash value for the ila locator,
apparently this is intended to prevent hash collision attacks, but this ends
up being a read-only zero constant since there is no caller. I could find
no indication of why it was never called, the earliest patch submission
for the module already was like this. If my interpretation is right, we
certainly want to backport the patch to stable kernels as well.

I considered adding it to the ila_xlat_init callback, but for best effect
the random data is read as late as possible, just before it is first used.
The underlying net_get_random_once() is already highly optimized to avoid
overhead when called frequently.

Fixes: 7f00feaf10 ("ila: Add generic ILA translation facility")
Cc: stable@vger.kernel.org
Link: https://www.spinics.net/lists/kernel/msg2527243.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 15:36:56 -04:00
David Ahern
8397ed36b7 net: ipv6: Release route when device is unregistering
Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:

$ ifdown bond2    # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2

Steps to reproduce:
    echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
    ip link add dev bond12 type bond
    ip link add dev bond13 type bond
    ip addr add 2001:db8:2::0/64 dev bond12
    ip addr add 2001:db8:3::0/64 dev bond13
    ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
    ip link del dev bond12
    ip link del dev bond13

The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.

Fixes: a1a22c1206 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 11:12:39 -04:00
Mintz, Yuval
0eed9cf584 net: Zero ifla_vf_info in rtnl_fill_vfinfo()
Some of the structure's fields are not initialized by the
rtnetlink. If driver doesn't set those in ndo_get_vf_config(),
they'd leak memory to user.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 10:58:02 -04:00
Mateusz Jurczyk
dd0da17b20 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 10:51:22 -04:00
David S. Miller
c164772dd3 Revert "decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb"
This reverts commit 85eac2ba35.

There is an updated version of this fix which we should
use instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 10:50:18 -04:00
Mateusz Jurczyk
85eac2ba35 decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
Verify that the length of the socket buffer is sufficient to cover the
entire nlh->nlmsg_len field before accessing that field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < sizeof(*nlh) expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 10:38:54 -04:00
David S. Miller
cf124db566 net: Fix inconsistent teardown and release of private netdev state.
Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:53:24 -04:00
Alexander Potapenko
c28294b941 net: don't call strlen on non-terminated string in dev_set_alias()
KMSAN reported a use of uninitialized memory in dev_set_alias(),
which was caused by calling strlcpy() (which in turn called strlen())
on the user-supplied non-terminated string.

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:58:45 -04:00
Linus Torvalds
b29794ec95 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Made TCP congestion control documentation match current reality,
    from Anmol Sarma.

 2) Various build warning and failure fixes from Arnd Bergmann.

 3) Fix SKB list leak in ipv6_gso_segment().

 4) Use after free in ravb driver, from Eugeniu Rosca.

 5) Don't use udp_poll() in ping protocol driver, from Eric Dumazet.

 6) Don't crash in PCI error recovery of cxgb4 driver, from Guilherme
    Piccoli.

 7) _SRC_NAT_DONE_BIT needs to be cleared using atomics, from Liping
    Zhang.

 8) Use after free in vxlan deletion, from Mark Bloch.

 9) Fix ordering of NAPI poll enabled in ethoc driver, from Max
    Filippov.

10) Fix stmmac hangs with TSO, from Niklas Cassel.

11) Fix crash in CALIPSO ipv6, from Richard Haines.

12) Clear nh_flags properly on mpls link up. From Roopa Prabhu.

13) Fix regression in sk_err socket error queue handling, noticed by
    ping applications. From Soheil Hassas Yeganeh.

14) Update mlx4/mlx5 MAINTAINERS information.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (78 commits)
  net: stmmac: fix a broken u32 less than zero check
  net: stmmac: fix completely hung TX when using TSO
  net: ethoc: enable NAPI before poll may be scheduled
  net: bridge: fix a null pointer dereference in br_afspec
  ravb: Fix use-after-free on `ifconfig eth0 down`
  net/ipv6: Fix CALIPSO causing GPF with datagram support
  net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
  Revert "sit: reload iphdr in ipip6_rcv"
  i40e/i40evf: proper update of the page_offset field
  i40e: Fix state flags for bit set and clean operations of PF
  iwlwifi: fix host command memory leaks
  iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265
  iwlwifi: mvm: clear new beacon command template struct
  iwlwifi: mvm: don't fail when removing a key from an inexisting sta
  iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3
  iwlwifi: mvm: fix firmware debug restart recording
  iwlwifi: tt: move ucode_loaded check under mutex
  iwlwifi: mvm: support ibss in dqa mode
  iwlwifi: mvm: Fix command queue number on d0i3 flow
  iwlwifi: mvm: rs: start using LQ command color
  ...
2017-06-06 14:30:17 -07:00
Nikolay Aleksandrov
1020ce3108 net: bridge: fix a null pointer dereference in br_afspec
We might call br_afspec() with p == NULL which is a valid use case if
the action is on the bridge device itself, but the bridge tunnel code
dereferences the p pointer without checking, so check if p is null
first.

Reported-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Fixes: efa5356b0d ("bridge: per vlan dst_metadata netlink support")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:05:31 -04:00
Richard Haines
e3ebdb20fd net/ipv6: Fix CALIPSO causing GPF with datagram support
When using CALIPSO with IPPROTO_UDP it is possible to trigger a GPF as the
IP header may have moved.

Also update the payload length after adding the CALIPSO option.

Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Huw Davies <huw@codeweavers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:18:20 -04:00
David S. Miller
f4eb17e1ef Revert "sit: reload iphdr in ipip6_rcv"
This reverts commit b699d00358.

As per Eric Dumazet, the pskb_may_pull() is a NOP in this
particular case, so the 'iph' reload is unnecessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 11:34:06 -04:00
Haishuang Yan
6044bd4a7d devlink: fix potential memort leak
We must free allocated skb when genlmsg_put() return fails.

Fixes: 1555d204e7 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-05 11:24:28 -04:00
Haishuang Yan
b699d00358 sit: reload iphdr in ipip6_rcv
Since iptunnel_pull_header() can call pskb_may_pull(),
we must reload any pointer that was related to skb->head.

Fixes: a09a4c8dd1 ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 23:04:31 -04:00
Eric Dumazet
77d4b1d369 net: ping: do not abuse udp_poll()
Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 22:56:55 -04:00
Florian Fainelli
b07ac98946 net: dsa: Fix stale cpu_switch reference after unbind then bind
Commit 9520ed8fb8 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.

The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.

To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().

Fixes: 9520ed8fb8 ("net: dsa: use cpu_switch instead of ds[0]")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 22:55:17 -04:00
David S. Miller
e3e86b5119 ipv6: Fix leak in ipv6_gso_segment().
If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:41:10 -04:00
Soheil Hassas Yeganeh
38b257938a sock: reset sk_err when the error queue is empty
Prior to f5f99309fa (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.

Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.

Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.

Fixes: f5f99309fa (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Tested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 20:01:53 -04:00
Liam McBirnie
5f733ee68f ip6_tunnel: fix traffic class routing for tunnels
ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.

Commit 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.

The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.

Fixes: 0e9a709560 ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 19:49:33 -04:00
Linus Torvalds
125f42b0e2 NFS client bugfixes for Linux 4.12
Bugfixes include:
 
 - Fix a typo in commit e092693443 that breaks copy offload
 - Fix the connect error propagation in xs_tcp_setup_socket()
 - Fix a lock leak in nfs40_walk_client_list
 - Verify that pNFS requests lie within the offset range of the layout segment.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZM0YBAAoJEGcL54qWCgDyLUwQALaPEVp00UMdDR0in7MIFKsO
 2mgi7pOyn6po3EjxKbGtjAbL4nSlVxdaFpCIGg47YXrl9/95Zjjmyke+iwRdnMsa
 ZPyXwfhVRa80fxbOogAverNCnCptoHoG7EzdWuCTcOOxMxR3Ixs7wVJXrs+7ig+r
 IdvIAyTsiDYuP5yVp5KkmJCtLGc0Ze20rb7VgdQJfdiLibWvfYCLZ9CgfAQkdAMU
 RIlbT0/BG13XDqwh/C2V1vLge0VfpT5p8qbIb/kFyQ0ZJUUiicGGGjp3u/yj0aG9
 ljldI34WmQpsy+nCNN4dEgsF461ECvWLwRZnnpN9nv7VurUBpJNUqHLnubvDbzhh
 w8QX54ceEWuQAjg96keNuYOhoG53Omle2/Cm+nmiJOmShJbJ0yh4OcB9DYe0gdYa
 5YXbKRjPvf/HfdE7PPpvbPG2E211zfvkLdHnFxswggWyGrh23kqlWrpcHpZomGNW
 GbJLfIfhyEfBjCPdNJT3Tzvewo2LkcTNLb+3mJhkxOegkdops8vGYA9G2mba3Daj
 1HWl1yFAdzlEf2H1Cb8Y2ZrJKHAmaYBKBkKZYUeAcr6EtoxNqnNMP+PEDcVIzPKg
 6Jq7DiYwYksK+XDWK9G4QBguKKGLvYtv0MIA3QDX+bBGLo+eFYxc2iaaJefYNdkK
 +vdLHclg/YpepLg+Ui21
 =P2bm
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Bugfixes include:

   - Fix a typo in commit e092693443 ("NFS append COMMIT after
     synchronous COPY") that breaks copy offload

   - Fix the connect error propagation in xs_tcp_setup_socket()

   - Fix a lock leak in nfs40_walk_client_list

   - Verify that pNFS requests lie within the offset range of the layout
     segment"

* tag 'nfs-for-4.12-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: Mark unnecessarily extern functions as static
  SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
  NFSv4.0: Fix a lock leak in nfs40_walk_client_list
  pnfs: Fix the check for requests in range of layout segment
  xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup()
  pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
  NFS fix COMMIT after COPY
2017-06-04 11:56:53 -07:00
Yuchung Cheng
44abafc4cc tcp: disallow cwnd undo when switching congestion control
When the sender switches its congestion control during loss
recovery, if the recovery is spurious then it may incorrectly
revert cwnd and ssthresh to the older values set by a previous
congestion control. Consider a congestion control (like BBR)
that does not use ssthresh and keeps it infinite: the connection
may incorrectly revert cwnd to an infinite value when switching
from BBR to another congestion control.

This patch fixes it by disallowing such cwnd undo operation
upon switching congestion control.  Note that undo_marker
is not reset s.t. the packets that were incorrectly marked
lost would be corrected. We only avoid undoing the cwnd in
tcp_undo_cwnd_reduction().

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02 14:18:13 -04:00
Ben Hutchings
6e80ac5cc9 ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
xfrm6_find_1stfragopt() may now return an error code and we must
not treat it as a length.

Fixes: 2423496af3 ("ipv6: Prevent overrun when parsing v6 header options")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Craig Gallek <kraig@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02 13:57:27 -04:00
David S. Miller
13fb6c2c7f Just two fixes:
* fix the per-CPU drop counters to not be added to the
    rx_packets counter, but really the drop counter
  * fix TX aggregation start/stop callback races by setting
    bits instead of allocating and queueing an skb
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlkxbE0ACgkQa3t4Rpy0
 AB0vlA/8DJ/EccfrzUulYgz7N6e3eZnLV0PL5HG4QLFhS+iL71EkGMgX66PrTpJf
 0J308+VuNI/B0n48NF3NOUIg57yiF8i/VulxqR1FNYXdOfLXc5gc9Ca4oSOOlzHr
 z+CrCm2Z4GLwAZrketrUKuQBoPL8UXY3UKp4OzQoOCk50UujszInlRzXqlLdUHIE
 If+lg7O+Uq9udGb0WjH845H/GkjEiy5+4rM64pCkmu+rcPhb9uXbC9JI3b3SRu2j
 VeXl0ShaEEGA971JdncQ20x91rpadItJgnCm0bJ+zNwxZT5JakXW+ZUJGn2EKEqw
 hPvlvMgBzeAeLsCaRiQJspVdNHlgUa1nNTmn2n7R7+qn6LXuI7tZcj4UdOsn/Sfa
 eHTHc5irAiyp3ow6MAM+HgjH4/UHMbQg6HQMitVAGFO8Lluy1F1hIijP2amO0/It
 rHSINcDMi0Crn2rn+2tsYlU6pSzSJFS3kg+yfooK+C+pNl+Td0vH6n6EScvsKttG
 X6iAykbhPpjS/TrZg4RAPkFNqa7yooXXpvoIX1xtjUFRd1xUm6IE3O/6wN/l4X34
 QVyIQolw0LgWvoh3N3YZw7f9OFdc2AImnTU7XmHo6jYiZn4Y7+pO4x4OnAocw4aX
 SNXmqVwui7EjwGycDMohtOfFavTHC7KjoLRGBKONZXi7ZqqEKtI=
 =ZgHD
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two fixes:
 * fix the per-CPU drop counters to not be added to the
   rx_packets counter, but really the drop counter
 * fix TX aggregation start/stop callback races by setting
   bits instead of allocating and queueing an skb
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02 10:37:11 -04:00
Florian Fainelli
ac2629a479 net: dsa: Move dsa_switch_{suspend,resume} out of legacy.c
dsa_switch_suspend() and dsa_switch_resume() are functions that belong in
net/dsa/dsa.c and are not part of the legacy platform support code.

Fixes: a6a71f19fe ("net: dsa: isolate legacy code")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-02 10:31:16 -04:00
Johannes Berg
e165bc02a0 mac80211: fix dropped counter in multiqueue RX
In the commit enabling per-CPU station statistics, I inadvertedly
copy-pasted some code to update rx_packets and forgot to change it
to update rx_dropped_misc. Fix that.

This addresses https://bugzilla.kernel.org/show_bug.cgi?id=195953.

Fixes: c9c5962b56 ("mac80211: enable collecting station statistics per-CPU")
Reported-by: Petru-Florin Mihancea <petrum@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-06-01 21:26:03 +02:00
Nikolay Aleksandrov
aeb073241f net: bridge: start hello timer only if device is up
When the transition of NO_STP -> KERNEL_STP was fixed by always calling
mod_timer in br_stp_start, it introduced a new regression which causes
the timer to be armed even when the bridge is down, and since we stop
the timers in its ndo_stop() function, they never get disabled if the
device is destroyed before it's upped.

To reproduce:
$ while :; do ip l add br0 type bridge hello_time 100; brctl stp br0 on;
ip l del br0; done;

CC: Xin Long <lucien.xin@gmail.com>
CC: Ivan Vecera <cera@cera.cz>
CC: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 6d18c732b9 ("bridge: start hello_timer when enabling KERNEL_STP in br_stp_start")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 12:28:31 -04:00
Nicolas Dichtel
7212462fa6 netlink: don't send unknown nsid
The NETLINK_F_LISTEN_ALL_NSID otion enables to listen all netns that have a
nsid assigned into the netns where the netlink socket is opened.
The nsid is sent as metadata to userland, but the existence of this nsid is
checked only for netns that are different from the socket netns. Thus, if
no nsid is assigned to the socket netns, NETNSA_NSID_NOT_ASSIGNED is
reported to the userland. This value is confusing and useless.
After this patch, only valid nsid are sent to userland.

Reported-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 11:49:39 -04:00
Roopa Prabhu
c2e8471d98 mpls: fix clearing of dead nh_flags on link up
recent fixes to use WRITE_ONCE for nh_flags on link up,
accidently ended up leaving the deadflags on a nh. This patch
fixes the WRITE_ONCE to use freshly evaluated nh_flags.

Fixes: 39eb8cd175 ("net: mpls: rt_nhn_alive and nh_flags should be accessed using READ_ONCE")
Reported-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 14:48:24 -04:00
Douglas Caetano dos Santos
15e5651525 tcp: reinitialize MTU probing when setting MSS in a TCP repair
MTU probing initialization occurred only at connect() and at SYN or
SYN-ACK reception, but the former sets MSS to either the default or the
user set value (through TCP_MAXSEG sockopt) and the latter never happens
with repaired sockets.

The result was that, with MTU probing enabled and unless TCP_MAXSEG
sockopt was used before connect(), probing would be stuck at
tcp_base_mss value until tcp_probe_interval seconds have passed.

Signed-off-by: Douglas Caetano dos Santos <douglascs@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 12:28:59 -04:00
NeilBrown
6ea44adce9 SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
If you attempt a TCP mount from an host that is unreachable in a way
that triggers an immediate error from kernel_connect(), that error
does not propagate up, instead EAGAIN is reported.

This results in call_connect_status receiving the wrong error.

A case that it easy to demonstrate is to attempt to mount from an
address that results in ENETUNREACH, but first deleting any default
route.
Without this patch, the mount.nfs process is persistently runnable
and is hard to kill.  With this patch it exits as it should.

The problem is caused by the fact that xs_tcp_force_close() eventually
calls
      xprt_wake_pending_tasks(xprt, -EAGAIN);
which causes an error return of -EAGAIN.  so when xs_tcp_setup_sock()
calls
      xprt_wake_pending_tasks(xprt, status);
the status is ignored.

Fixes: 4efdd92c92 ("SUNRPC: Remove TCP client connection reset hack")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-31 12:26:44 -04:00
Johannes Berg
7a7c0a6438 mac80211: fix TX aggregation start/stop callback race
When starting or stopping an aggregation session, one of the steps
is that the driver calls back to mac80211 that the start/stop can
proceed. This is handled by queueing up a fake SKB and processing
it from the normal iface/sdata work. Since this isn't flushed when
disassociating, the following race is possible:

 * associate
 * start aggregation session
 * driver callback
 * disassociate
 * associate again to the same AP
 * callback processing runs, leading to a WARN_ON() that
   the TID hadn't requested aggregation

If the second association isn't to the same AP, there would only
be a message printed ("Could not find station: <addr>"), but the
same race could happen.

Fix this by not going the whole detour with a fake SKB etc. but
simply looking up the aggregation session in the driver callback,
marking it with a START_CB/STOP_CB bit and then scheduling the
regular aggregation work that will now process these bits as well.
This also simplifies the code and gets rid of the whole problem
with allocation failures of said skb, which could have left the
session in limbo.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-05-30 09:08:40 +02:00
David S. Miller
468b0df61a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Conntrack SCTP CRC32c checksum mangling may operate on non-linear
   skbuff, patch from Davide Caratti.

2) nf_tables rb-tree set backend does not handle element re-addition
   after deletion in the same transaction, leading to infinite loop.

3) Atomically unclear the IPS_SRC_NAT_DONE_BIT on nat module removal,
   from Liping Zhang.

4) Conntrack hashtable resizing while ctnetlink dump is progress leads
   to a dead reference to released objects in the lists, also from
   Liping.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-29 23:16:54 -04:00
Linus Torvalds
6741d51699 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
    Borkmann.

 2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
    Caratti.

 3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
    driver, from Jesper Brouer.

 4) Defer slave->link updates until bonding is ready to do a full commit
    to the new settings, from Nithin Sujir.

 5) Properly reference count ipv4 FIB metrics to avoid use after free
    situations, from Eric Dumazet and several others including Cong Wang
    and Julian Anastasov.

 6) Fix races in llc_ui_bind(), from Lin Zhang.

 7) Fix regression of ESP UDP encapsulation for TCP packets, from
    Steffen Klassert.

 8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.

 9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
    Dawson.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  ipv4: add reference counting to metrics
  net: ethernet: ax88796: don't call free_irq without request_irq first
  ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
  sctp: fix ICMP processing if skb is non-linear
  net: llc: add lock_sock in llc_ui_bind to avoid a race condition
  bonding: Don't update slave->link until ready to commit
  test_bpf: Add a couple of tests for BPF_JSGE.
  bpf: add various verifier test cases
  bpf: fix wrong exposure of map_flags into fdinfo for lpm
  bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
  bpf: properly reset caller saved regs after helper call and ld_abs/ind
  bpf: fix incorrect pruning decision when alignment must be tracked
  arp: fixed -Wuninitialized compiler warning
  tcp: avoid fastopen API to be used on AF_UNSPEC
  net: move somaxconn init from sysctl code
  net: fix potential null pointer dereference
  geneve: fix fill_info when using collect_metadata
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  ...
2017-05-26 13:51:01 -07:00
Eric Dumazet
3fb07daff8 ipv4: add reference counting to metrics
Andrey Konovalov reported crashes in ipv4_mtu()

I could reproduce the issue with KASAN kernels, between
10.246.7.151 and 10.246.7.152 :

1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &

2) At the same time run following loop :
while :
do
 ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
 ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done

Cong Wang attempted to add back rt->fi in commit
82486aa6f1 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.

Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)

I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.

Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.

Fixes: 2860583fe8 ("ipv4: Kill rt->fi")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 14:57:07 -04:00
Peter Dawson
0e9a709560 ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
This fix addresses two problems in the way the DSCP field is formulated
 on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661

1) The IPv6 tunneling code was manipulating the DSCP field of the
 encapsulating packet using the 32b flowlabel. Since the flowlabel is
 only the lower 20b it was incorrect to assume that the upper 12b
 containing the DSCP and ECN fields would remain intact when formulating
 the encapsulating header. This fix handles the 'inherit' and
 'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.

2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
 incorrect and resulted in the DSCP value always being set to 0.

Commit 90427ef5d2 ("ipv6: fix flow labels when the traffic class
 is non-0") caused the regression by masking out the flowlabel
 which exposed the incorrect handling of the DSCP portion of the
 flowlabel in ip6_tunnel and ip6_gre.

Fixes: 90427ef5d2 ("ipv6: fix flow labels when the traffic class is non-0")
Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 14:54:39 -04:00
Davide Caratti
804ec7ebe8 sctp: fix ICMP processing if skb is non-linear
sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.

v2:
- don't use skb_header_pointer() to read the transport header, since
  icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 14:40:46 -04:00
linzhang
0908cf4dfe net: llc: add lock_sock in llc_ui_bind to avoid a race condition
There is a race condition in llc_ui_bind if two or more processes/threads
try to bind a same socket.

If more processes/threads bind a same socket success that will lead to
two problems, one is this action is not what we expected, another is
will lead to kernel in unstable status or oops(in my simple test case,
cause llc2.ko can't unload).

The current code is test SOCK_ZAPPED bit to avoid a process to
bind a same socket twice but that is can't avoid more processes/threads
try to bind a same socket at the same time.

So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.

Signed-off-by: Lin Zhang <xiaolou4617@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-26 14:20:29 -04:00
Linus Torvalds
80941b2aeb A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
messenger patch from Zheng and Luis' fallocate fix.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZKD/7AAoJEEp/3jgCEfOLu6sIAKEvmDAZxkRiIV9HF36+0jLO
 947jIV22sb4FjngcMs0eBdFD4IJrL8QPq1UVYjIyHtnJN4Tbp9VDPfjyWArhr7+k
 hjfTcgnTStmwFy1bUXSq7xNusg9qm0Mw5zpY1DJLCdvkwIU0yrN9zusTlIQvlV5G
 Kg4Mzvc3EaL/VgUcsGI2lKuVlMt95wb5u1YGt5AG9FjLv1BTBhpX+3/swtvmtzy3
 ZpxyujS4YH+RBpHr9AI/+5IJ2xumZB0C6hzOoa/DAyGzjUH7MQJEuD8hjXqMOWQy
 L1wqZo7gXrIk3NSEjxrCb7/mE0S915jkKyHjoJbUxBhy1zEZmri9AfEwe9isb0M=
 =enjn
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client

Pul ceph fixes from Ilya Dryomov:
 "A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
  messenger patch from Zheng and Luis' fallocate fix"

* tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client:
  ceph: check that the new inode size is within limits in ceph_fallocate()
  libceph: cleanup old messages according to reconnect seq
  libceph: NULL deref on crush_decode() error path
  libceph: fix error handling in process_one_ticket()
  libceph: validate blob_struct_v in process_one_ticket()
  libceph: drop version variable from ceph_monmap_decode()
  libceph: make ceph_msg_data_advance() return void
  libceph: use kbasename() and kill ceph_file_part()
2017-05-26 09:35:22 -07:00
Daniel Borkmann
41703a7310 bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.

Fixes: 36bbef52c7 ("bpf: direct packet write and access for helpers for clsact progs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:44:28 -04:00
Ihar Hrachyshka
5990baaa6d arp: fixed -Wuninitialized compiler warning
Commit 7d472a59c0 ("arp: always override
existing neigh entries with gratuitous ARP") introduced a compiler
warning:

net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]

While the code logic seems to be correct and doesn't allow the variable
to be used uninitialized, and the warning is not consistently
reproducible, it's still worth fixing it for other people not to waste
time looking at the warning in case it pops up in the build environment.
Yes, compiler is probably at fault, but we will need to accommodate.

Fixes: 7d472a59c0 ("arp: always override existing neigh entries with gratuitous ARP")
Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:38:20 -04:00
Wei Wang
ba615f6752 tcp: avoid fastopen API to be used on AF_UNSPEC
Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.

One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A:                            Thread B:
------------------------------------------------------------------------
sendto()
 - tcp_sendmsg()
     - sk_stream_memory_free() = 0
         - goto wait_for_sndbuf
	     - sk_stream_wait_memory()
	        - sk_wait_event() // sleep
          |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
	  |                           - tcp_sendmsg()
	  |                              - tcp_sendmsg_fastopen()
	  |                                 - __inet_stream_connect()
	  |                                    - tcp_disconnect() //because of AF_UNSPEC
	  |                                       - tcp_transmit_skb()// send RST
	  |                                    - return 0; // no reconnect!
	  |                           - sk_stream_wait_connect()
	  |                                 - sock_error()
	  |                                    - xchg(&sk->sk_err, 0)
	  |                                    - return -ECONNRESET
	- ... // wake up, see sk->sk_err == 0
    - skb_entail() on TCP_CLOSE socket

If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.

When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.

Fixes: cf60af03ca ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:30:34 -04:00
Roman Kapl
7c3f1875c6 net: move somaxconn init from sysctl code
The default value for somaxconn is set in sysctl_core_net_init(), but this
function is not called when kernel is configured without CONFIG_SYSCTL.

This results in the kernel not being able to accept TCP connections,
because the backlog has zero size. Usually, the user ends up with:
"TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
If SYN cookies are not enabled the connection is rejected.

Before ef547f2ac1 (tcp: remove max_qlen_log), the effects were less
severe, because the backlog was always at least eight slots long.

Signed-off-by: Roman Kapl <roman.kapl@sysgo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-25 13:12:17 -04:00
David S. Miller
029c58178b Just two fixes this time:
* fix the scheduled scan "BUG: scheduling while atomic"
  * check mesh address extension flags more strictly
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEExu3sM/nZ1eRSfR9Ha3t4Rpy0AB0FAlkkLTwACgkQa3t4Rpy0
 AB3i4hAAmq9+ZqKcHMtObQL0xnzkm8MEHoqL/ed1VOkjyvAnhw5sDMsvN1KaqjY3
 XwSVQBkJcyLVTixEjJuD6oKKKSiX6IGWmaawsuSxpgNZRuYv4DpZHx8zvO+a6228
 RwExpn1l7ziQS/wvSiKCvvK/SsPB/KGtp5YijpoXeUsWzxUOoJaJGNb6EsyoChPY
 tLLboT65aPvDEp3ZHX540XzxR1yJbM9Dwm4cecfjgF8b/jPYAgohul6hRHoO+l5u
 O9yW5FO4Z58IhfLqYD1o5Vp2sV5ZODP166pkIQjLs7oDKe4OZSmAMDVrz9TTM33p
 4mBMAfIE0G4g0F8Oc+n4xnHury4fiB4OEYIg5SEh6JGfDFWfJtKGlwS2HuyLAh0d
 hLzOjCZ1h2yAfxZrNN34chtVmpTXZOiQfDaGxqE+xjsFwWCY8usN0BmZOl0j/mRo
 pfAoYnFOj0Co2av9qIjOG1trys1iZn9BrocZXUAqaAoLADhLeHbYv6y1bAv0riuw
 xkDkCIoVfdrT9ZvaVlIZA+itKQb6lvDQ+WO4TR5KjjPu8FAHVpwvi8dZW/h12+S8
 VE9A1CjfC0ePrFrMcN3uMZmqK3I5QmKhg60JE22zPtkTLnSeTnSq3pzWBjWGZMC7
 Aht5HiEoA4rLTYgOR2Eq6tsXu7j4/VEr5b5iwniZCWpu0EQQXN8=
 =H9PL
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just two fixes this time:
 * fix the scheduled scan "BUG: scheduling while atomic"
 * check mesh address extension flags more strictly
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 15:31:39 -04:00
Alexander Potapenko
0ff50e83b5 net: rtnetlink: bail out from rtnl_fdb_dump() on parse error
rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
to contents of |ifm| being uninitialized because nlh->nlmsglen was too
small to accommodate |ifm|. The uninitialized data may affect some
branches and result in unwanted effects, although kernel data doesn't
seem to leak to the userspace directly.

The bug has been detected with KMSAN and syzkaller.

For the record, here is the KMSAN report:

==================================================================
BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:16
 dump_stack+0x143/0x1b0 lib/dump_stack.c:52
 kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
 __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
 rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
 netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
 __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
 netlink_dump_start ./include/linux/netlink.h:165
 rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
 netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
 rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
 netlink_unicast_kernel net/netlink/af_netlink.c:1272
 netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
 netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
 __sys_sendmsg net/socket.c:2031
 SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
 SyS_sendmsg+0x87/0xb0 net/socket.c:2038
 do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
 entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
RIP: 0033:0x401300
RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
origin: 000000008fe00056
 save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
 kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
 kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
 slab_alloc_node mm/slub.c:2743
 __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
 __kmalloc_reserve net/core/skbuff.c:138
 __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
 alloc_skb ./include/linux/skbuff.h:933
 netlink_alloc_large_skb net/netlink/af_netlink.c:1144
 netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
 sock_sendmsg_nosec net/socket.c:633
 sock_sendmsg net/socket.c:643
 ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
 __sys_sendmsg net/socket.c:2031
 SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
 SyS_sendmsg+0x87/0xb0 net/socket.c:2038
 do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
 return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
==================================================================

and the reproducer:

==================================================================
  #include <sys/socket.h>
  #include <net/if_arp.h>
  #include <linux/netlink.h>
  #include <stdint.h>

  int main()
  {
    int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
    struct msghdr msg;
    memset(&msg, 0, sizeof(msg));
    char nlmsg_buf[32];
    memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
    struct nlmsghdr *nlmsg = nlmsg_buf;
    nlmsg->nlmsg_len = 0x11;
    nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
    // type = 0x0e = 1110b
    // kind = 2
    nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
    nlmsg->nlmsg_seq = 0;
    nlmsg->nlmsg_pid = 0;
    nlmsg_buf[16] = (char)7;
    struct iovec iov;
    iov.iov_base = nlmsg_buf;
    iov.iov_len = 17;
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    sendmsg(sock, &msg, 0);
    return 0;
  }
==================================================================

Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-24 15:27:16 -04:00